Michael Pitsounis | 21 Aug 01:14 2014

embedded documents

Hello everybody,

I had a requirement to store complicated json documents in solr.

i have modified the JsonLoader to accept complicated json documents with
arrays/objects as values.

It stores the object/array and then flatten it and  indexes the fields.

e.g  basic example document

        "titles_json":{"FR":"This is the FR title" , "EN":"This is the EN
title"} ,
        "id": 1000003,
        "guid": "3b2f2998-85ac-4a4e-8867-beb551c0b3c6"

It will store titles_json:{"FR":"This is the FR title" , "EN":"This is the
EN title"}
and then index fields

titles.FR:"This is the FR title"
titles.EN:"This is the EN title"

Do you see any problems with this approach?

Michael Pitsounis
(Continue reading)

didier deshommes | 21 Aug 00:20 2014

Re: Unload collection in SolrCloud

I added a JIRA issue here: https://issues.apache.org/jira/browse/SOLR-6399

On Thu, May 22, 2014 at 4:16 PM, Erick Erickson <erickerickson <at> gmail.com>

> "Age out" in this context is just implementing a LRU cache for open
> cores. When the cache limit is exceeded, the oldest core is closed
> automatically.
> Best,
> Erick
> On Thu, May 22, 2014 at 10:27 AM, Saumitra Srivastav
> <saumitra.srivastav7 <at> gmail.com> wrote:
> > Eric,
> >
> > Can you elaborate more on what you mean by "age out"?
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Unload-collection-in-SolrCloud-tp4135706p4137707.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
Ryan Josal | 20 Aug 20:36 2014

Dynamically loaded core.properties file

Hi all, I have a question about dynamically loading a core properties 
file with the new core discovery method of defining cores.  The concept 
is that I can have a dev.properties file and a prod.properties file, and 
specify which one to load with -Dsolr.env=dev.  This way I can have one 
file which specifies a bunch of runtime properties like external servers 
a plugin might use, etc.

Previously I was able to do this in solr.xml because it can do system 
property substitution when defining which properties file to use for a core.

Now I'm not sure how to do this with core discovery, since the core is 
discovered based on this file, and now the file needs to contain things 
that are specific to that core, like name, which previously were defined 
in the xml definition.

Is there a way I can plugin some code that gets run before any schema or 
solrconfigs are parsed?  That way I could write a property loader that 
adds properties from ${solr.env}.properties to the JVM system properties.


YourKit | 20 Aug 16:48 2014

YourKit Java Profiler 2014 Released


We are glad to announce immediate availability of YourKit Java Profiler 2014.

Download: http://www.yourkit.com/download/
Changes: http://www.yourkit.com/changes/



  - Profiling results are presented with exact source code line numbers for
    - CPU sampling;
    - object allocation recording;
    - stack local GC roots;
    - thread stack telemetry;
    - thread stacks in HPROF snapshots;
    - event stacks;
    - exception telemetry;
    - monitor profiling.

  - IDE integration: Tools | Open in IDE (F7) action can open exact line


  - Improved UI responsiveness when the profiler is connected to live
    profiled application

(Continue reading)

Corey Gerhardt | 20 Aug 17:54 2014

Business Name Collation

Solr 4.8.1

Correct value: Wardell F E B Dr

Just wondering if anyone can see an issue with my spellchecker settings.  There is no collation value and I'm
hoping that someone can explain why.

  <lst name="spellchecker">
      <str name="classname">org.apache.solr.spelling.DirectSolrSpellChecker</str>
      <str name="name">default</str>
      <str name="field">spell</str>
      <str name="distanceMeasure">internal</str>
      <float name="accuracy">0.8</float>
      <int name="maxEdits">2</int>
      <int name="minPrefix">1</int>
      <int name="maxInspections">5</int>
      <int name="minQueryLength">1</int>
      <float name="thresholdTokenFrequency">0.0001</float>
      <float name="maxQueryFrequency">0.01</float>
      <str name="buildOnCommit">true</str>

(Continue reading)


Unsupported ContentType: application/pdf Not in: [application/xml,​ text/csv,​ text/json,​ application/csv,​ application/javabin,​ text/xml,​ application/json]


I have solr 4.9.0 and I’m getting the above error if I try to index a pdf document with the Solr Web-Interface.

Here is my schema and solrconfig. Do I miss something? :

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="simple" version="1.1">
                               <fieldtype name="string" class="solr.StrField" postingsFormat="SimpleText" />
                               <fieldtype name="ignored" class="solr.TextField" />
                               <fieldtype name="text" class="solr.TextField" postingsFormat="SimpleText">
                                               <analyzer type="index">
                                                               <tokenizer class="solr.StandardTokenizerFactory"/>
                                                               <filter class="solr.LowerCaseFilterFactory" /> <!--Lowercases the letters in each token. Leaves
non-letter tokens alone.-->
                                                               <filter class="solr.TrimFilterFactory"/> <!--Trims whitespace at either end of a token. -->
                                                               <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <!--Discards
common words.  -->
                                                               <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
                                               <analyzer type="query">
                                                               <tokenizer class="solr.StandardTokenizerFactory"/>
                                                               <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
                                                               <filter class="solr.LowerCaseFilterFactory" />
                                                               <filter class="solr.TrimFilterFactory"/>
                                                               <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
(Continue reading)

S.L | 20 Aug 03:23 2014

Intermittent error indexing SolrCloud 4.7.0

Hi All,

I get "No Live SolrServers available to handle this request" error
intermittently while indexing in a SolrCloud cluster with 3 shards and
replication factor of 2.

I am using Solr 4.7.0.

Please see the stack trace below.

org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request
	at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:352)
	at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:640)
	at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
	at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168)
	at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146)
Ethan | 20 Aug 00:59 2014

Inconsistent Solr Index Behavior

A while back we added a span support for multi-value fields and did a full
re-index for data spanning over 4 years.  It worked perfectly for a month,
and then suddenly results are not reliable anymore.  We are noticing that
the span is not working on most of the data and is returning wrong results.
 Is there anything that might corrupt or drop index data(old data)? Less
RAM? Something?

rks_lucene | 19 Aug 22:47 2014

Question on Solr Relevancy using Okapi BM25F

I am trying to get OkapiBM25F working over some press release articles I am
indexing. The data has text portions spread across 3 fields - Title, Summary
and Full Article. 

 I would like to influence the standard BM25 by giving more weight to words
in title and summary of the article than the full description. The
importance has to be of the order title > Summary > Full description.

I am unable to find schema examples online that can help me with it. 

Can someone guide me with a possible schema for this. (or a link to an
article/blog that explains it)

Thanks for your help.


View this message in context: http://lucene.472066.n3.nabble.com/Question-on-Solr-Relevancy-using-Okapi-BM25F-tp4153866.html
Sent from the Solr - User mailing list archive at Nabble.com.

Tim.Cardwell | 19 Aug 22:09 2014

Index not respecting Omit Norms

Please reference the below images:




As you can see from the first image, the text field-type doesn't define the
omitNorms flag, meaning it is set to false. Also on the first image you can
see that the description field doesn't define the omitNorms flag, again
meaning it is set to false. (Default for omitNorms is false). This can all
be confirmed on the second image, where the Properties and Schema rows have
omitNorms set to checked.
I am having some issues understanding why some results have a fieldNorm set
to 1 for matches on the description field. As you can see from the third
image, the description field has a rather large number of terms in it, yet
the fieldNorm is being set to 1.0 for matching 'supply' on the description
field. My guess is that the Omit Norms flag for the 'Index' row is causing
the issue.

From the first picture, can anyone tell me what each row (Properties, Schema
and Index) refers to? I think the Properties row refers to the flags set
when defining the Field Type, which for this field is text. The Schema row
refers to the flags set when defining the field, which is description. I'm
not as sure where the Index row flags come from, but I'm assuming it defines
what the index is really representing?  
Am I right in assuming the Omit Norms flag in the Index row of the first
picture is what is causing fieldNorm issues in the second image?  
(Continue reading)

SolrUser1543 | 19 Aug 20:57 2014

Performance of Boolean query with hundreds of OR clauses.

I am using Solr to perform search for finding similar pictures. 

For this purpose, every image indexed as a set of descriptors ( descriptor
is a string of 6 chars ) .
Number of descriptors for every image may vary ( from few to many thousands)

When I want to search  for a similar image , I am extracting the descriptors
from it and create a query like :
MyImage:( desc1 desc2 ...  desc n )

Number of descriptors in query may also vary. Usual it is about 1000.

Of course performance of this query very bad and may take few minutes to
return . 

Any ideas for performance improvement ? 

P.s I also tried to use lire , but it is not fits my use case.  

View this message in context: http://lucene.472066.n3.nabble.com/Performance-of-Boolean-query-with-hundreds-of-OR-clauses-tp4153844.html
Sent from the Solr - User mailing list archive at Nabble.com.