Tom Burton-West | 17 Sep 23:18 2014
Picon

How does KeywordRepeatFilterFactory help giving a higher score to an original term vs a stemmed term

The Solr wiki says   "A repeated question is "how can I have the
original term contribute
more to the score than the stemmed version"? In Solr 4.3, the
KeywordRepeatFilterFactory has been added to assist this
functionality. "

https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming

(Full section reproduced below.)
I can see how in the example from the wiki reproduced below that both
the stemmed and original term get indexed, but I don't see how the
original term gets more weight than the stemmed term.  Wouldn't this
require a filter that gives terms with the keyword attribute more
weight?

What am I missing?

Tom

---------------------------------------------
"A repeated question is "how can I have the original term contribute
more to the score than the stemmed version"? In Solr 4.3, the
KeywordRepeatFilterFactory has been added to assist this
functionality. This filter emits two tokens for each input token, one
of them is marked with the Keyword attribute. Stemmers that respect
keyword attributes will pass through the token so marked without
change. So the effect of this filter would be to index both the
original word and the stemmed version. The 4 stemmers listed above all
respect the keyword attribute.

(Continue reading)

KNitin | 17 Sep 22:41 2014
Picon

Loading an index (generated by map reduce) in SolrCloud

Hello

 I have generated a lucene index (with 6 shards) using Map Reduce. I want
to load this into a SolrCloud Cluster inside a collection.

Is there any out of the box way of doing this?  Any ideas are much
appreciated

Thanks
Nitin
keeblerh | 17 Sep 16:20 2014
Picon

solr 4.8 Tika stripping out all xml tags

I'm processing a zip file with an xml file.   The TikaEntityProcessor opens
the zip, reads the file but is stripping the xml tags even though I have
supplied the htmlMapper="identity" attribute.  It maintains any html that is
contained in a CDATA section but seems to strip the other xml tags.   Is
this due to the recursive nature of opening the zip file?  Somehow that
identity value is lost?  My understanding is that this should work in this
version 4.8.  Thanks.  Below is my config info.

<dataConfig><dataSource type="BinFileDataSource" /><document>
<entity
name="kmlfiles" dataSource=null" rootEntity="false" baseDir="mydirectory"
fileName=".*\.kmz$" onError="skip" processor="FileListEntityProcessor"
recursive="false" >
<field defs........................
/>
<entity name="kmlImport" processor="TikaEntityProcessor"
datasource="kmlfiles" htmlMapper="identity" format="xml"
transformer="TemplateTransformer" url="${kmlfiles.fileAbsolutePath}"
recursive="true">
<more field defs....
/>
  <entity name="xml" processor="XPathEntityProcessor" ForEach="/kml"
dataSource="fds"
  dataField="kmlImport.text">
  <field xpath=//name" column="name" />
...more field defs
  </entity>
</entity>
</entity>
</document></dataConfig>
(Continue reading)

Ere Maijala | 17 Sep 15:06 2014
Picon
Picon

Ping handler during initial wamup

As far as I can see, when a Solr instance is started (whether standalone 
or SolrCloud), a PingRequestHandler will wait until index warmup is 
complete before returning (at least with useColdSearcher=false) which 
may take a while. This poses a problem in that a load balancer either 
needs to wait for the result or employ a short timeout for timely 
failover. Of course the request is eventually served, but it would be 
better to be able to switch over to another server until warmup is complete.

So, is it possible to configure a ping handler to return quickly with 
non-OK status if a search handler is not yet available? This would allow 
the load balancer to quickly fail over to another server. I couldn't 
find anything like this in the docs, but I'm still hopeful.

I'm aware of the possibility of using a health state file, but I'd 
rather have a way of doing this automatically.

--Ere

phiroc | 17 Sep 11:51 2014
Picon

Problem deploying solr-4.10.0.war in Tomcat


Hello,

I've dropped solr-4.10.0.war in Tomcat 7's webapp directory.

When I start the Java web server, the following message appears in catalina.out:

-------------------

INFO: Starting Servlet Engine: Apache Tomcat/7.0.55
Sep 17, 2014 11:35:59 AM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deploying web application archive /archives/apache-tomcat-7.0.55_solr_8983/webapps/solr-4.10.0.war
Sep 17, 2014 11:35:59 AM org.apache.catalina.core.StandardContext startInternal
SEVERE: Error filterStart
Sep 17, 2014 11:35:59 AM org.apache.catalina.core.StandardContext startInternal
SEVERE: Context [/solr-4.10.0] startup failed due to previous errors

------------------

Any help would be much appreciated.

Cheers,

Philippe

Clemens Wyss DEV | 17 Sep 11:28 2014
Picon

Solr(j) API for manipulating the schema(.xml)?

Is there an API to manipulate/consolidate the schema(.xml) of a Solr-core? Through SolrJ? 

Context:
We already have a generic indexing/searching framework (based on lucene) where any component can act as a
so called IndexDataPorvider. This provider delivers the field-types and also the entities to be
(converted into documents and then) indexed. Each of these IndexProviders has ist own lucene index.
So we kind of have the information for the Solr schema.xml.

Hope the intention is clear. And yes the manipulation of the schema.xml is basically only needed when the
field types change. Thats why I am looking for a way to consolidate the schema.xml (upon boot,
initialization oft he IndexDataProviders ...). 
In 99,999% it won't change, But I'd like to keep the possibility of an IndexDataProvider to hand in "its schema".

Also, again driven by the dynamic nature of our framework, can I easily create new cores over Sorj or the
Solr-REST API ?

vaibhav.patil123 | 17 Sep 09:03 2014
Picon

Solr Suggestion not working in solr PLZ HELP

Suggestion
In solrconfig.xml:
<searchComponent name="suggest" class="solr.SuggestComponent">
    <lst name="suggester">
      <str name="name">mySuggester</str>
      <str name="lookupImpl">FuzzyLookupFactory</str>   
      <str name="dictionaryImpl">DocumentDictionaryFactory</str> 
      <str name="field">content</str>
      <str name="weightField"></str>
      <str name="suggestAnalyzerFieldType">string</str>
    </lst>
  </searchComponent>

 <requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="suggest">true</str>
      <str name="suggest.count">10</str>
      <str name="suggest.dictionary">mySuggester</str>    
    </lst>
    <arr name="components">
      <str>suggest</str>
    </arr>
  </requestHandler>

--------------------------

Suggestion: localhost:28080/solr/suggest?&q=foobat

above throwing exception as below 

(Continue reading)

William Bell | 17 Sep 05:32 2014
Picon

MaxScore

What we need is a function like scale(field,min,max) but only operates on
the results that come back from the search results.

scale() takes the min, max from the field in the index, not necessarily
those in the results.

I cannot think of a solution. max() only looks at one field, not across
fields in the results.

I tried a query() but cannot think of a way to get the max value of a field
ONLY in the results...

Ideas?

--

-- 
Bill Bell
billnbell <at> gmail.com
cell 720-256-8076
bbarani | 17 Sep 02:15 2014
Picon

How to preserve 0 after decimal point?

I have a requirement to preserve 0 after decimal point, currently with the
below field type 

 <fieldType class="solr.SortableFloatField" name="sfloat" omitNorms="true"
sortMissingLast="true"/>

27.50 is stripped as 27.5
27.00 is stripped as 27.0
27.90 is stripped as 29.9

<float name="Price">27.5</float>

I also tried using double but even then the 0's are getting stripped.

<double name="Price">27.5</double>

Input data:

<field name="Price">27.50</field> 

--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-preserve-0-after-decimal-point-tp4159295.html
Sent from the Solr - User mailing list archive at Nabble.com.

Tom Burton-West | 17 Sep 00:15 2014
Picon

Solr 4.10 termsIndexInterval and termsIndexDivisor not supported with default PostingsFormat?

Hello,

I think the documentation and example files for Solr 4.x need to be
updated.  If someone will let me know I'll be happy to fix the example
and perhaps someone with edit rights could fix the reference guide.

Due to dirty OCR and over 400 languages we have over 2 billion unique
terms in our index.  In Solr 3.6 we set termIndexInterval to 1024 (8
times the default of 128) to reduce the size of the in-memory index.
Previously we used termIndexDivisor for a similar purpose.

We suspect that in Solr 4.10 (and probably previous Solr 4.x versions)
termIndexInterval and termIndexDivisor do not apply to the default
codec and are probably unnecessary (since the default terms index now
uses a much more efficient representation).

According to the JavaDocs for IndexWriterConfig, the Lucene level
implementations of these do not apply to the default PostingsFormat
implementation.
http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/index/IndexWriterConfig.html#setReaderTermsIndexDivisor%28int%29

Despite this statement in the Lucene JavaDocs, in the
example/solrconfig.xml there is the following:

<!-- Expert: Controls how often Lucene loads terms into memory
278 Default is 128 and is likely good for most everyone.
279 -->
280 <!-- <termIndexInterval>128</termIndexInterval> -->

In the 4.10 reference manual page 365 there is also an example showing
(Continue reading)

Michael Joyner | 16 Sep 21:25 2014

Access solr cloud via ssh tunnel?

I am in a situation where I need to access a solrcloud behind a firewall.

I have a tunnel enabled to one of the zookeeper as a starting points and 
the following test code:

CloudSolrServer server = new CloudSolrServer("localhost:2181");
server.setDefaultCollection("test");
SolrPingResponse p = server.ping();
System.out.println(p.getRequestUrl());

Right now it just "hangs" without any errors... what additional ports 
need forwarding and other configurations need setting to access a 
solrcloud over a ssh tunnel or tunnels?


Gmane