Norgorn | 27 Nov 13:01 2014

Terms vector for multiple documents

I'm working with social media data.
We have blog posts in our index - text + authors_id.
Now we need to clusterize authors by their texts. We need to get term vector
not for documents, but one vector per one author (for all authors

We can't get all documents and then unite 'em cause It'll take ages.

And we can't just concat all posts in one mega-post per author (to have  one
document per author), cause our data grows every day and we receive new
posts for authors.

Can u suggest any solution?

View this message in context:
Sent from the Solr - User mailing list archive at

hhc | 27 Nov 11:33 2014

Solr mlt doesn't return documents with "exactly the same" contents

I have two documents with ids "aaa" and "bbb", and the titles of both
documents are "a black fox jumps over a red flower".  I imported both
documents, along with several other testing documents, two a core "test".

I want solr to return documents similar to document "aaa", so I submited the


Solr returned some similar documents.  However, document "bbb", which should
be the most similar document of "aaa", was not in the mlt returned list. 
Any ideas how this could happen?  Thanks!

View this message in context:
Sent from the Solr - User mailing list archive at

Nishant Kelkar | 27 Nov 09:45 2014

Re: SolrTestCaseJ4 Error: "java.lang.RuntimeException: Can't find resource..."

Hi All,

I'm trying to run a simple piece of code, to get SolrTestCaseJ4 to work.
Here's my code:

public class MyTest extends SolrTestCaseJ4 {

     <at> BeforeClass
    public static void init() throws Exception {
        initCore("solrconfig.xml", "schema.xml");
        lrf = h.getRequestFactory("standard", 0, 20);

     <at> Test
    public void testNothing() {


I have the required solrconfig.xml and schema.xml inside

However, when I run a test on testNothing() method, I get the following

*java.lang.RuntimeException: Can't find resource*
'rs_A_count_gte300k.txt' in classpath or
at __randomizedtesting.SeedInfo.seed([BB606BF344F3401]:0)
at org.apache.solr.schema.IndexSchema.<init>(
(Continue reading)

Upayavira | 26 Nov 21:05 2014

comparing feature vectors using Solr/Lucene


I've been asked how to use Solr as a component in a machine learning
system, doing document comparison based upon feature vectors.

If I have two vectors, one in the index (in some form) and one in the
query (in some form), how can I do, for example, a vector multiplication
of the two vectors in order to calculate a score?

The feature space I am being given has 100 features, with numerical
scores for each feature. In this case, it is not sparse - most features
will have a value.

I have ideas, but it seems they get me some of the way, but not all.

Has anyone worked with Solr in this way?



Thomas L. Redman | 26 Nov 20:45 2014

TrieLongField not store large longs correctly

I believe I have encountered a bug in SOLR. I have a data type defined as follows:

<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0”/>

And I have a field defined like so:

<field name="aid" type="long" indexed="true" stored="true" multiValued="false" required="true"
omitNorms="true" />

I have not been able to reproduce this problem for smaller numbers, but for some of the very large numbers,
the value that gets stored for this “aid” field is not the same as the number that gets indexed. For
example, 20140716126615474 is stored as 20140716126615470, or in any even, that is the way it is getting
reported back. When I issue a query, “aid: 20140716126615474”, the value reported back for aid is 20140716126615470!

Any suggestions?
Ramkumar R. Aiyengar | 26 Nov 19:58 2014

Re: Dealing with bad apples in a SolrCloud cluster

As Eric mentions, his change to have a state where indexing happens but
querying doesn't surely helps in this case.

But these are still boolean decisions of send vs don't send. In general, it
would be nice to abstract the routing policy so that it is pluggable. You
could then do stuff like have a "least pending" policy for choosing
replicas -- instead of choosing a replica at random, you maintain a pending
response count, and you always send to the one with least pending (or
randomly amongst a set of replicas if there is a tie).

Also the chances your distrib=false case will be hit is actually 1/5 (or
something like that, I have forgotten my probability theory). Because you
have two shards and you get two chances at hitting the bad apple. This was
one of the reasons we got in SOLR-6730 to use replica and host affinity.
Under good enough load, the load distribution will more or less be the same
with this change, but chances of hitting bad apples will be lesser..
On 21 Nov 2014 18:56, "Timothy Potter" <thelabdude <at>> wrote:

Just soliciting some advice from the community ...

Let's say I have a 10-node SolrCloud cluster and have a single collection
with 2 shards with replication factor 10, so basically each shard has one
replica on each of my nodes.

Now imagine one of those nodes starts getting into a bad state and starts
to be slow about serving queries (not bad enough to crash outright though)
... I'm sure we could ponder any number of ways a box might slow down
without crashing.

From my calculations, about 2/10ths of the queries will now be affected
(Continue reading)

Andreas Hubold | 26 Nov 16:18 2014

soft commit and deletions


I've read about soft commits in Erick Erickson's excellent blog article [1]:

 > The thing to understand most about soft commits are that they will 
make documents visible

But I'm still not totally sure. Does a soft commit also make deleted 
documents invisible?

In a test with an EmbeddedSolrServer I triggered a soft commit and was 
still able to find a deleted document afterwards. Is this as expected?

Thank you,


Lee Carroll | 26 Nov 15:56 2014

cross site scripting

Hi All,
In solr 4.7 this query


value'":"nasty value"}]}}

This is naughty. Has this been seen before / fixed ?
Sven Schönfeldt | 26 Nov 13:09 2014

Getting multiple Result for same document, doing a dateRange query on multiple date field

Hi Solr-Users,

i like to do a date range query on a multiple date field "dateField_dts:[NOW TO NOW+7DAY]“.
If the query find a document that has more then one date matching in that range, it would be nice to have
multiple times the document in the result, with an identification what date the result hit.

Is there any chance to do that, or something similar in query time?

Regards Sven

yriveiro | 26 Nov 12:46 2014

Move a shard from one disk to another


I need to move some data from one disk to another one. My question is if can
I move the shard and do a symlink on the place where the shard was?

This works?

Best regards
View this message in context:
Sent from the Solr - User mailing list archive at

Suchi Amalapurapu | 26 Nov 05:24 2014

updateNumericDocValue in solr 4.6.1

The following code changes don't seem to really update the docValue in my

IndexWriter iw = core.getSolrCoreState().getIndexWriter(core).get();

value = Long.parseLong(score);

Term term = new Term(ID, id1);

iw.updateNumericDocValue(term, 'rank', value);


Schema changes:

<field name="rank" type="long" indexed="true" stored="true"
required="false" docValues="true"/>
Does any one have a working sample of updateNumericDocValue in solr 4.6.1?