Faisal Mansoor | 1 Nov 05:10 2014

How to update SOLR schema from continuous integration environment


How do people usually update Solr configuration files from continuous
integration environment like TeamCity or Jenkins.

We have multiple development and testing environments and use WebDeploy and
AwsDeploy type of tools to remotely deploy code multiple times a day, to
update solr I wrote a simple node server which accepts conf folder over
http, updates the specified conf core folder and restarts the solr service.

Does there exists a standard tool for this uses case. I know about schema
rest api, but, I want to update all the files in the conf folder rather
than just updating a single file or adding or removing synonyms piecemeal.

Here is the link for the node server I mentioned if anyone is interested.

Greg Solovyev | 1 Nov 00:08 2014

Consul instead of ZooKeeper anyone?

I am investigating a project to make SolrCloud run on Consul instead of ZooKeeper. So far, my research
revealed no such efforts, but I wanted to check with this list to make sure I am not going to be reinventing
the wheel. Have anyone attempted using Consul instead of ZK to coordinate SolrCloud nodes? 

hschillig | 31 Oct 16:49 2014

Only copy string up to certain character symbol?

So I have a title field that is common to look like this:

Personal legal forms simplified : the ultimate guide to personal legal forms
/ Daniel Sitarz.

I made a copyField that is of type "title_only". I want to ONLY copy the
text "Personal legal forms simplified : the ultimate guide to personal legal
forms".. so everything before the "/" symbol. I have it like this in my

<fieldType name="title_only" class="solr.TextField">
    <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="4"
maxGramSize="15" side="front" />
        <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(\/.+?$)" replacement=""/>
    <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="(\/.+?$)" replacement=""/>

My regex seems to be off though as the field still holds the entire value
when I reindex and restart SolR. Thanks for any help!

(Continue reading)

tedsolr | 31 Oct 15:44 2014

exporting to CSV with solrj

I am trying to invoke the CSVResponseWriter to create a CSV file of all
stored fields. There are millions of documents so I need to write to the
file iteratively. I saw a snippet of code online that claimed it could
effectively remove the SorDocumentList wrapper and allow the docs to be
retrieved in the actual format requested in the query. However, I get a null
pointer from the CSVResponseWriter.write() method.

SolrQuery qry = new SolrQuery("*:*");
qry.setParam("wt", "csv");
// set other params
SolrServer server = getSolrServer();
try {
	QueryResponse res = server.query(qry);

	CSVResponseWriter writer = new CSVResponseWriter();
	Writer w = new StringWriter();
         SolrQueryResponse solrResponse = new SolrQueryResponse();
        try {
	      SolrParams list = new MapSolrParams(new HashMap<String, String>());
	      writer.write(w, new LocalSolrQueryRequest(null, list), solrResponse);
    } catch (IOException e) {
        throw new RuntimeException(e);

} catch (SolrServerException e) {

(Continue reading)

5ton3 | 31 Oct 13:11 2014

The exact same query gets executed n times for the nth row when retrieving body (plaintext) from BLOB column with Tika Entity Processor


Not sure if this is a problem or if I just don't understand the debug
response, but it seems somewhat odd to me.
The "main" entity can have multiple BLOB documents. I'm using Tika Entity
Processor to retrieve the body (plaintext) from these documents and put the
result in a multivalued field, "filedata".  The data-config looks like this:

It seems to work properly, but when I debug the data import, it seems that
the query on TABLE2 on the BLOB column ("FILEDATA_BIN") gets executed 1 time
for document #1, which is correct, but 2 times for document #2, 3 times for
document #3, and so on.
I.e. for document #1:

And for document #2:

The result seems correct, ie. it doesn't duplicate the filedata. But why
does it query the DB two times for document #2? Any ideas? Maybe something
wrong in my config?

View this message in context: http://lucene.472066.n3.nabble.com/The-exact-same-query-gets-executed-n-times-for-the-nth-row-when-retrieving-body-plaintext-from-BLOB-r-tp4166822.html
Sent from the Solr - User mailing list archive at Nabble.com.

ku3ia | 31 Oct 11:33 2014

Solr index corrupt question

Hi folks!
I'm interesting in, can delete operation destroy Solr index, if optimize
command never  perform?

View this message in context: http://lucene.472066.n3.nabble.com/Solr-index-corrupt-question-tp4166810.html
Sent from the Solr - User mailing list archive at Nabble.com.

S.L | 30 Oct 23:18 2014

Master Slave set up in Solr Cloud

Hi All,

As I previously reported due to no overlap in terms of the documets in the
SolrCloud replicas of the index shards , I have turned off the replication
and basically have there shards with a replication factor of 1.

It obviously seems will not be scalable due to the fact that the same core
will be indexed and queried at the same time as this is a long running
indexing task.

My questions is what options do I have to set up the replicas of the single
per shard core outside of the SolrCloud replication factor mechanism
because that does not seem to work for me ?

AJ Lemke | 30 Oct 22:27 2014

Missing Records

Hi All,

We have a SOLR cloud instance that has been humming along nicely for months.
Last week we started experiencing missing records.

Admin DIH Example:
Fetched: 903,993 (736/s), Skipped: 0, Processed: 903,993 (736/s)
A *:* search claims that there are only 903,902 this is the first full index.
Subsequent full indexes give the following counts for the *:* search

All the while the admin returns: Fetched: 903,993 (x/s), Skipped: 0, Processed: 903,993 (x/s) every time.
---records per second is variable

I found an item that should be in the index but is not found in a search.

Here are the referenced lines of the log file.

DEBUG - 2014-10-30 15:10:51.160; org.apache.solr.update.processor.LogUpdateProcessor;
PRE_UPDATE add{,id=750041421} {{params(debug=false&optimize=true&indent=true&commit=true&clean=true&wt=json&command=full-import&entity=ads&verbose=false),defaults(config=data-config.xml)}}
DEBUG - 2014-10-30 15:10:51.160; org.apache.solr.update.SolrCmdDistributor; sending update to retry:0 add{,id=750041421} params:update.distrib=TOLEADER&distrib.from=http%3A%2F%2F192.168.20.57%3A8983%2Fsolr%2Finventory_shard1_replica1%2F

--- there are 746 lines of log between entries ---

DEBUG - 2014-10-30 15:10:51.340; org.apache.http.impl.conn.Wire;  >>
Highmark[0xe0]/VerticalSiteIDs!2[0xe0]-ClassBinaryIDp <at> [0xe0]#lat(42.48929[0xe0]-SubClassFacet01704|Snowmobiles[0xe0](FuelType%Other[0xe0]2DivisionName_Lower,recreational[0xe0]&latlon042.4893,-96.3693[0xe0]*PhotoCount!8[0xe0](HasVideo[0x2][0xe0]"ID)750041421[0xe0]&Engine
(Continue reading)

Ian Rose | 30 Oct 21:23 2014

Ideas for debugging poor SolrCloud scalability

Howdy all -

The short version is: We are not seeing Solr Cloud performance scale (event
close to) linearly as we add nodes. Can anyone suggest good diagnostics for
finding scaling bottlenecks? Are there known 'gotchas' that make Solr Cloud
fail to scale?

In detail:

We have used Solr (in non-Cloud mode) for over a year and are now beginning
a transition to SolrCloud.  To this end I have been running some basic load
tests to figure out what kind of capacity we should expect to provision.
In short, I am seeing very poor scalability (increase in effective QPS) as
I add Solr nodes.  I'm hoping to get some ideas on where I should be
looking to debug this.  Apologies in advance for the length of this email;
I'm trying to be comprehensive and provide all relevant information.

Our setup:

1 load generating client
 - generates tiny, fake documents with unique IDs
 - performs only writes (no queries at all)
 - chooses a random solr server for each ADD request (with 1 doc per add

N collections spread over K solr servers
 - every collection is sharded K times (so every solr instance has 1 shard
from every collection)
 - no replicas
 - external zookeeper server (not using zkRun)
(Continue reading)

Craig Hoffman | 30 Oct 18:58 2014

Automating Solr

Simple question:
What is best way to automate re-indexing Solr? Setup a CRON JOB / Curl Script? 

Craig Hoffman
w: http://www.craighoffmanphotography.com
FB: www.facebook.com/CraigHoffmanPhotography
TW: https://twitter.com/craiglhoffman

Håvard Wahl Kongsgård | 30 Oct 18:47 2014

Boosting on field-not-empty

Hi, a simple question how to boost field-not-empty. For some reasons
solr(4.6) returns rows with empty fields first (while the fields are not
part of the search query).

I came across this old thread
, but no solution


Håvard Wahl Kongsgård