Norgorn | 25 Oct 09:42 2014

Solr + HDFS settings

I'm trying to run SOLR with HDFS
 in solrconfig.xml I've written 

<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
	  <str name="solr.hdfs.home">hdfs://<PATH_TO_NAMENODE>/solr</str>
	  <bool name="solr.hdfs.blockcache.enabled">true</bool>
	  <int name="solr.hdfs.blockcache.slab.count">1</int>
	  <bool name="">true</bool>
	  <int name="solr.hdfs.blockcache.blocksperbank">16384</int>
	  <bool name="">true</bool>
	  <bool name="solr.hdfs.blockcache.write.enabled">true</bool>
	  <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
	  <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
	  <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>

But, when I'm trying to create collection, I get 
"Caused by: Protocol
message end-group tag did not match expected tag." 

tcpdump shows, that server answers with "Bad request", I've tried to set
path to datanode instead of namenode, but got the same result.
What am I doing wrong?

P.S. I've found, that the problem can be cause of different protobuf.jar,
I've changed that jar (and hadoop-*.jar for comparability) in my SOLR libs,
but the problem didn't change.

View this message in context:
(Continue reading)

john eipe | 25 Oct 08:48 2014

solr highlighting query


I'm trying to match keywords based on 2 fields and excluding order
importance but with distance restriction.

title:(Jobs) AND content_raw:(Jobs born)~15

This throws error: Cannot parse '(Jobs born)~15':
Encountered " <FUZZY_SLOP> "~15 "

What's the correct way to frame this query?

*John Eipe*

“The Roots of Violence: Wealth without work, Pleasure without conscience,
Knowledge without character, Commerce without morality, Science without
humanity, Worship without sacrifice, Politics without principles”
- Mahatma Gandhi
Olivier Austina | 24 Oct 21:44 2014

Remove indexes of XML file


This is newbie question. I have indexed some documents using some XML files
as indicating in the tutorial
<> with the command :

java -jar post.jar *.xml

I have seen how to delete an index for one document but how to delete
all indexes
for documents within an XML file. For example if I have indexed some
files A, B, C, D etc.,
how to delete indexes of documents from file C. Is there a command
like above or other
solution without using individual ID? Thank you.

Tang, Rebecca | 24 Oct 21:18 2014

plans for CollapsingQParser to support untruncated facet count

We are using collapsingQParser to group results for collapsing possible duplicate records.

We have a signature field that the collapsingQParser acts on to group the results.  However, when we facet on
top of it, we get facet counts that are seemingly wrong.

For example, if we have 4 records:

Signature   availability
1                   pub
1                   conf
2                   conf
2                   conf

When we run a search of *:*, we get 2 results back because there are two distinct signatures.

When we facet on availability, we get pub -> 1 and conf ->1.  But when we add the fq=availability:conf to the
query (when user clicks on the facet), the count becomes 3.  It's confusing for our users.

After reading, I understand why it behaves this
way.  collapsingQParser returns the facet count the same as group.truncate=true.

I wanted to see if there are plans (in the near future) to add support for collapsingQParser where I could get
untruncated count for facets.  I don't want to use solr group because when group.ngroups=true, the
performance is absolutely atrocious.


Rebecca Tang
Applications Developer, UCSF CKM
Legacy Tobacco Document Library<>
(Continue reading)

Robust Links | 24 Oct 19:51 2014

phrase query in solr 4


We are trying to upgrade our index from 3.6.1 to 4.9.1 and I wanted to make
sure our existing indexing strategy is still valid or not. The statistics
of the raw corpus are:

- 4.8 Billon total number of tokens in the entire corpus.

- 13MM documents

We have 3 requirements

1) we want to index and search all tokens in a document (i.e. we do not
rely on external stores)

2) we need search time to be fast and willing to pay larger indexing time
and index size,

3)  be able to search as fast as possible ngrams of 3 tokens or less (i.e,
unigrams, bigrams and trigrams).

To satisfy (1) we used the default  <maxFieldLength>2147483647</
maxFieldLength> in solrconfig.xml of 3.6.1 index to specify the total
number of tokens to index in an article. In solr 4 we are specifying it via
the tokenizer in the analyzer chain

 <tokenizer class="solr.ClassicTokenizerFactory" maxTokenLength="2147483647"

To satisfy 2 and 3 in our 3.6.1 index we indexed using the following
(Continue reading)

David Philip | 24 Oct 19:27 2014

Spell Check for Multiple Words


   I am trying to obtain multi-word spellcheck suggestions. For eg., I have
a title field with content as "Indefinite and fictitious large numbers" and
user searched for "larg numberr", in that case, I wanted to obtain "large
number" as suggestion from spell check suggestions. Could you please tell
me what should be the configuration to get this?

The field type is text_general [that which is defined in example schema.xml]

nabil Kouici | 24 Oct 15:59 2014

SolrJ: Force SolrServer to send request to specific node

Hi All,When using SolrServer we specify only Zookeeper ensemble and this last will give us reference to
solr node to be used. Is it possible to force SolrServer to use a specific node?
Karthik Nagarajan | 24 Oct 13:50 2014

Question on deleteByQuery behavior without updateLog

We are using Solr/Lucene 4.4. We noticed that deleteByQuery call commits
only on alternate commits, i.e., the first deleteByQuery changes are not
written out to the Directory but on the second commit it is reflected in
the Directory. The solrconfig.xml does NOT have the updateLog turned ON. If
we have it turned ON as per
it works fine. Do we need to have updateLog turned ON for commits to be
reflected for deleteByQuery?

We noted few things that we would like to share here.

In DirectUpdateHandler2's deleteByQuery method, after the call to index
writer's deleteByQuery, the ulog.deleteByQuery(cmd) is called. This opens a
new IndexSearcher and in that flow the deletes are applied and
check-pointed. This is picked up by the later commit call and reflects in
the final storage.

FLOW (captured only the relevant items in this flow):
* DirectUpdateHandler2.deleteByQuery -> Indexwriter.deleteByQuery()
[followed by] updateLog.deleteByQuery() -> open SolrIndexSearcher ->
applyAllDeletes -> checkpoint() -> ...
* DirectUpdateHandler2.commit() -> get index writer on current core ->
check if there is any uncommitted changes --YES--> Indexwriter.commit() ->

When updateLog is turned OFF, DirectUpdateHandler2's deleteByQuery method
just calls deleteByQuery on indexwriter and never opens a Index Searcher
(Continue reading)

Bernd Fehling | 24 Oct 08:23 2014

Re: QueryAutoStopWordAnalyzer

Am 23.10.2014 um 18:03 schrieb Alexandre Rafalovitch:
> How is this different from using StopFilterFactory in Solr:
> ?

With StopFilterFactory you have to set up a file with stopwords
and maintain that file.

With QueryAutoStopWordAnalyzer the docs say "An Analyzer used primarily
at query time to wrap another analyzer and provide a layer of protection
which prevents very common words from being passed into queries."
And what I'm looking for "... QueryAutoStopWordAnalyzer with stopwords
calculated for the given selection of fields from terms with a document
frequency greater than the given maxDocFreq", which is by default set to 0.400...
but can probably be adjusted to a "value of personal taste".
So you don't have to setup and maintain a stopword.txt file.

I might be wrong, but does this sound different to StopFilterFactory?


> Lucene "wraps" analyzers, Solr has a chain instead (though analyzers
> are supported as well).
> You just configure the chain. Writing a factory for when one analyzer
> wraps another would be just duplication of the chain code.
(Continue reading)

Danesh Kuruppu | 24 Oct 08:04 2014

Synonyms Search using solr

Hi all,

I need to get the synonyms search using solr. what the best possible way of
doing this. is there are any documentation to follow.

eShard | 23 Oct 23:09 2014

recip function error

Good evening,
I'm using solr 4.0 Final.
I tried using this function
but it fails with this error:
org.apache.lucene.queryparser.classic.ParseException: Expected ')' at
position 29 in 'recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05))'

I applied this patch 
Rebuilt and redeployed AND I get the exact same error.
I only copied over the new jars and war file. Non of the other libraries
seemed to have changed.
the patch is in solr core so I figured I was safe.

Does anyone know how to fix this?


View this message in context:
Sent from the Solr - User mailing list archive at