Zisis Tachtsidis | 28 Nov 19:58 2015

Single-sharded SolrCloud vs Lucene indexing speed

I'm conducting some indexing experiments in SolrCloud and I want to confirm
my conclusions and ask for suggestions on how to improve performance.

My setup includes a single-sharded collection with 1 additional replica in
SolrCloud 5.3.1. I'm using SolrJ and the indexing speed refers to the actual
SolrJ call that adds the document. I've run some indexing tests and it seems
that Lucene indexing is equal to or better than Solr's in all cases. In all
cases the same documents are sent to both Lucene&Solr and the same analysis
is performed on the documents. 

- 2 replicas, leader is a replica on a machine under heavy load => ~3x
slower than Lucene.
- 2 replicas, leader is a replica on a machine under light load => ~2x
slower than Lucene.
- 1 replica on a machine under light load => indexing speed similar to

(*) It seems that the slowest replica determines the indexing speed. 
(*) It gets even worse if the slowest replica is the leader. This is
justified if it's true that only after the leader finishes indexing it
forwards the request to the remaining replicas.

Regarding improvements
(*) I'm indexing pretty big documents 0.5MB<DocSize<1MB so batch updates do
not offer significant performance gain. 
(*) Can I see improvement if I use a multi-sharded collection?


(Continue reading)

GOURAUD Emmanuel | 28 Nov 09:04 2015

restoring an index from restore command

Hi there, 

I'm currently trying to restore a core from a snapshot, and it does not work as excepted (in my case). 

My context: 
I'm using solr 5.3.0 
i used named snapshot and specific location to ensure collection integrity a a time in a single directory 

below, i will describe backup/restore for a specific core 

backup command: 

it has created the snapshot with all data: 
[vagrant <at> centos7 solr-5.3.0]$ ls -lh
total 32K 
-rw-r--r-- 1 solr solr 346 Nov 28 07:36 _5.fdt 
-rw-r--r-- 1 solr solr 84 Nov 28 07:36 _5.fdx 
-rw-r--r-- 1 solr solr 431 Nov 28 07:36 _5.fnm 
-rw-r--r-- 1 solr solr 110 Nov 28 07:36 _5_Lucene50_0.doc 
-rw-r--r-- 1 solr solr 697 Nov 28 07:36 _5_Lucene50_0.tim 
-rw-r--r-- 1 solr solr 171 Nov 28 07:36 _5_Lucene50_0.tip 
-rw-r--r-- 1 solr solr 432 Nov 28 07:36 _5.si 
-rw-r--r-- 1 solr solr 165 Nov 28 07:36 segments_7 

i flush all data from the same core and restart the solr5 instance and then try to restore from the previous
[vagrant <at> centos7 solr-5.3.0]$ curl
(Continue reading)

Josh Collins | 27 Nov 22:49 2015

Boosting Question & Parser Selection


I have a few questions related to boosting and whether my use case makes sense for Dismax vs. the standard parser.

I have created a gist of my field definitions and current query structure here: https://gist.github.com/joshdcollins/0e3f24dd23c3fc6ac8e3

With the given configuration I am attempting to:

  *   Support partial and exact matches by indexing fields twice — once with ngram and once without
  *   Boost exact matches higher than partial matches
  *   Boost matches in the entity_name (and entity_name_exact) field higher than content and content_exact fields
  *   Boost matches with an entity_type of ‘company’ and ‘insight’ higher than other result types

1)  Does the field definition and query approach make sense given the above objectives?

2)  I have an additional use case to support a query syntax where terms wrapped in single quotes must be exact
matches.  Example “hello ‘wor'”  would NOT match a document containing hello and world.

a) Using the dismax parser can you explicitly determine which terms will be checked against which fields?
In this case I would search “hello” against my general fields and “wor" against the _exact fields.

b) Does this level of structured query better lend itself to using the standard query parser?

3)  Does anyone have any experience or resources troubleshooting the fast vector highlighter?  It is
working correctly in most cases, but some search terms (sized lower than the boundryScanner.maxScan)
return no content in the highlighter results like:
<lst name="highlighting">
<lst name="d-3318"/>
<lst name="cn-29348"/>
<lst name="cn-29952"/>
(Continue reading)

Vishnu Mishra | 27 Nov 16:11 2015

Facet count mismatch between solr simple facet and Json facet API.


I am using solr 5.3.1 in my application. I have indexed field named given
below :

<field name="Title" type="string" indexed="true" stored="false"
multiValued="true" docValues="true" />

And then using solr json facet API for faceting. But it seems that json
facet API produce less and incorrect result counts than simple solr facet.
The json facet request which I am doing is as below:

    TitleFacet: {
        type: terms,
        field: Title,
        offset: 0,
        limit: 100,
        mincount: 1,
        sort: {
            count: desc

gives for example 63 count. And then equivalent simple facet query given


(Continue reading)

Adrian Liew | 27 Nov 11:23 2015

SolrCloud Shard + Replica on Multiple servers with SolrCloud

Hi all,

I am trying to figure out how to setup 3 shard 3 server setup with a replication factor of 2 with SolrCloud 5.3.0.

In particular trying to follow this setup described in this blog: http://lucidworks.com/blog/2014/06/03/introducing-the-solr-scale-toolkit/

EC2 Instance 1

Shard 1 - Leader  (port 8984 separate drive with 50 GB SSD)
Shard 2 - Leader  (port 8985 separate drive with 50 GB SSD)

EC2 Instance 2

Shard 1 - Replica (port 8984 separate drive with 50 GB SSD)
Shard 2 - Replica (port 8985 separate drive with 50 GB SSD)

EC2 Instance 3

Shard 1 - Replica (port 8984 separate drive with 50 GB SSD)
Shard 2 - Replica (port 8985 separate drive with 50 GB SSD)

Can anyone shed some light on how these can be configured using the SolrCloud collection API or using Solr
command line utility to split them on different instances.

As I know there are two approaches to sharding that is "Custom Sharding" and "Automatic Sharding". Which
approach suits the use case described above?

Is anyone able to provide pointers from past experience or point me to a good article that describes how this
can be setup?

(Continue reading)

Midas A | 27 Nov 06:42 2015

Error on DIH log

org.apache.solr.common.SolrException: ERROR: [doc=83629504] Error adding
field 'master_id'='java.math.BigInteger:0' msg=For input string:

<field name="master_id" type="tint" indexed="true" stored="true" />

<fieldType name="tint" class="solr.TrieLongField" precisionStep="8"

How can i remove this error ?
Ryan Yacyshyn | 27 Nov 04:41 2015

Spellcheck on first character

Hi all,

Is it possible to provide spelling suggestions if it's just the first
character that's wrong (or has an additional character added)?

We have users querying for "eappointment" when they should just be
searching for "appointment". I'd like to show "appointment" as a spelling
suggestion for "eappointment".

Is this possible?

I'm using 4.10.1 and below are my configs:

*<!-- the **spellchecking** defaults in my **requestHandler** -->*
<str name="spellcheck">on</str>
<str name="spellcheck.extendedResults">false</str>
<str name="spellcheck.count">1</str>
<str name="spellcheck.alternativeTermCount">1</str>
<str name="spellcheck.maxResultsForSuggest">1</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.collateExtendedResults">false</str>
<str name="spellcheck.maxCollationTries">5</str>
<str name="spellcheck.maxCollations">1</str>

*<!-- **spellchecking** component -->*
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
  <!-- a spellchecker built from a field of the main index -->
  <lst name="spellchecker">
    <str name="name">default</str>
    <str name="field">spell</str>
(Continue reading)

Doug Turnbull | 26 Nov 17:28 2015

Re: Solr UI open source

Nope, it's more of a template. But I still think its simpler than coding up
and deploying an API that acts as a relay to a search endpoint. Again, I
don't think this is right for every use case. But we use it for

In the nginx.conf, you need to basically update two spots

# Replace this with your Solr host, ie solr.quepid.com
server_name YOUR.SOLR.HOST;

And then copy the block for every search endpoint you want to support,
replacing with your collection name/

# Create a location block for each handler you'd like to whitelist
location /solr/collection1/select {

On Thu, Nov 26, 2015 at 11:14 AM, Alexandre Rafalovitch <arafalov <at> gmail.com>

> I am happy to be corrected, but that repository says "This repository
> gives a basic outline to creating a functional reverse proxy with
> Nginx" as well as the famous last words ("e.t.c.") . Which is why I
> feel it is not exactly a turnkey solution I can recommend to a new
> Solr user. Is there an example of a full production config anywhere?
> Regards,
>    Alex.
> ----
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
(Continue reading)

Chaushu, Shani | 26 Nov 14:59 2015

Solr UI open source

Hi all,
I want to build UI for Solr that get result to the user and also update the solr back (set for specific field)
I start using ajax-solr because there is good tutorial and it's easy to use, but I didn't saw an example for
update, and also I'm not sure the code is stable (no release in GIT)
I saw also banana but it's more complicated and more relevant for time series (my data doesn't have date field)

What's better for basic solr UI? Ajax-solr or banana?
There is another option? Something that also update the solr and not only one way requests?


Intel Electronics Ltd.

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
Bernd Fehling | 26 Nov 14:26 2015

OT: is Heliosearch discontinued?

It is always interesting to see what other Search Engines are doing.
So I just wanted to have a look at Heliosearch (http://heliosearch.org/)
but nothing showed up.

Is Heliosearch discontinued or only a hiccup in the internet?


Salman Ansari | 26 Nov 13:38 2015

Setting up Solr on multiple machines


I have seen the guide of setting up Solr on one machine as well as setting
it up on multiple machines on Liunx. Is there a good guide of how to setup
Solr on multiple machines on Windows Server with Zookeeper ensemble? My
structure is as follows

1) 3 machines will have Zookeeper to create an ensemble
2) 2 of these machines will have Solr installed (with each having a replica
of other to provide high availability)

Any link/article that provides such a guide?