Alexandre Rafalovitch | 16 Apr 17:50 2014
Picon

Can I reconstruct text from tokens?

Hello,

If I use very basic tokenizers, e.g. space based and no filters, can I
reconstruct the text from the tokenized form?

So, "This is a test" -> "This", "is", "a", "test" -> "This is a test"?

I know we store enough information, but I don't know internal API
enough to know what I should be looking at for reconstruction
algorithm.

Any hints?

The XY problem is that I want to store large amount of very repeatable
text into Solr. I want the index to be as small as possible, so
thought if I just pre-tokenized, my dictionary will be quite small.
And I will be reconstructing some final form anyway.

The other option is to just use compressed fields on stored field, but
I assume that does not take cross-document efficiencies into account.
And, it will be a read-only index after build, so I don't care about
updates messing things up.

Regards,
   Alex

Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency

(Continue reading)

Arthur Pemberton | 16 Apr 17:47 2014
Picon

Stuck on SEVERE: Error filterStart

I am trying Solr for the first time, and I am stuck at the error "SEVERE:
Error filterStart"

My setup:
 - Centos 6.x
 - OpenJDK 1.7
 - Tomcat 7

From reading [1] I believe the issue is missing JAR files, but I have no
idea where to put them, even the wiki is a bit vague on that.

Lib directories that I am aware of
 - /usr/share/tomcat/lib (for tomcat)
 - /opt/solr/example/solr/collection1/lib (for my instance)

This is the error I get:

Apr 15, 2014 11:35:36 PM org.apache.catalina.core.StandardContext
filterStart
SEVERE: Exception starting filter SolrRequestFilter
java.lang.NoClassDefFoundError: Failed to initialize Apache Solr: Could not
find necessary SLF4j logging jars. If using Jetty, the SLF4j logging jars
need to go in the jetty lib/ext directory. For other containers, the
corresponding directory should be used. For more information, see:
http://wiki.apache.org/solr/SolrLogging
        at
org.apache.solr.servlet.CheckLoggingConfiguration.check(CheckLoggingConfiguration.java:28)
        at
org.apache.solr.servlet.BaseSolrFilter.<clinit>(BaseSolrFilter.java:31)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
(Continue reading)

Ajay Patel | 16 Apr 14:51 2014

Deploy solr.war on glassfish server

Hi there
i am trying to deply the solr.war on glassfish server while deploying 
those war file i get the following error. can some one please guide me 
for this same. that how can i deploy solr.war in galssfish server.

Error occurred during deployment:
Exception while loading the app :
CDI deployment failure:WELD-001408 Unsatisfied dependencies for type 
[Set<Service>] with qualifiers [ <at> Default] at injection point 
[[BackedAnnotatedParameter] Parameter 1 of [BackedAnnotatedConstructor] 
 <at> Inject com.google.common.util.concurrent.ServiceManager(Set<Service>)]. 
Please see server.log for more details.

Thanks
Ajay Patel

Picon

Show the score in the search result

I read that if I add the string "score" in the fl field, I should be able to see the score within the retuned documents.

As I understand "score" is a "special/reserved" word and I don't have to define in the schema (right)?

I did so, but in the returned fields' list I see no score field...

Here is the request's URL: http://localhost:7001/solr/collection1/select?q=*%3A*&fl=*%2Cscore&wt=json&indent=true

Do I miss something?

Francesco
Mukesh Jha | 16 Apr 06:44 2014
Picon

Tipping point of solr shards (Num of docs / size)

Hi Gurus,

In my solr cluster I've multiple shards and each shard containing
~500,000,000 documents total index size being ~1 TB.

I was just wondering how much more can I keep on adding to the shard before
we reach a tipping point and the performance starts to degrade?

Also as best practice what is the recomended no of docs / size of shards .

Txz in advance :)

--

-- 
Thanks & Regards,

*Mukesh Jha <me.mukesh.jha <at> gmail.com>*
Ed Smiley | 15 Apr 23:04 2014

Odd extra character duplicates in spell checking

Hi,
I am going to make this question pretty short, so I don’t overwhelm with technical details until  the end.
I suspect that some folks may be seeing this issue without the particular configuration we are using.

What our problem is:

  1.  Correctly spelled words are returning as not spelled correctly, with the original, correctly spelled
word with a single oddball character appended as multiple suggestions.
  2.  Incorrectly spelled words are returning correct spelling suggestions with a single oddball character
appended as multiple suggestions.
  3.  We’re seeing this in Solr 4.5x and 4.7x.

Example:

The return values are all a single character (unicode shown in square brackets).

correction=attitude[2d]
correction=attitude[2f]
correction=attitude[2026]

Spurious characters:

  *   Unicode Character 'HYPHEN-MINUS' (U+002D)
  *   Unicode Character 'SOLIDUS' (U+002F)
  *   Unicode Character 'HORIZONTAL ELLIPSIS' (U+2026)

Anybody see anything like this?  Anybody fix something like this?

Thanks!
—Ed
(Continue reading)

Jean-Sebastien Vachon | 15 Apr 21:57 2014

Transformation on a numeric field

Hi All,

I am looking for a way to index a numeric field and its value divided by 1 000 into another numeric field.
I thought about using a CopyField with a PatternReplaceFilterFactory to keep only the first few digits
(cutting the last three).

Solr complains that I can not have an analysis chain on a numeric field:

Core: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Plugin init
failure for [schema.xml] fieldType "truncated_salary": FieldType: TrieIntField
(truncated_salary) does not support specifying an analyzer. Schema file is /data/solr/solr-no-cloud/Core1/schema.xml

Is there a way to accomplish this ?

Thanks
Peter Keegan | 15 Apr 20:11 2014
Picon

Distributed commits in CloudSolrServer

I have a SolrCloud index, 1 shard, with a leader and one replica, and 3
ZKs. The Solr indexes are behind a load balancer. There is one
CloudSolrServer client updating the indexes. The index schema includes 3
ExternalFileFields. When the CloudSolrServer client issues a hard commit, I
observe that the commits occur sequentially, not in parallel, on the leader
and replica. The duration of each commit is about a minute. Most of this
time is spent reloading the 3 ExternalFileField files. Because of the
sequential commits, there is a period of time (1 minute+) when the index
searchers will return different results, which can cause a bad user
experience. This will get worse as replicas are added to handle
auto-scaling. The goal is to keep all replicas in sync w.r.t. the user
queries.

My questions:

1. Is there a reason that the distributed commits are done in sequence, not
in parallel? Is there a way to change this behavior?

2. If instead, the commits were done in parallel by a separate client via a
GET to each Solr instance, how would this client get the host/port values
for each Solr instance from zookeeper? Are there any downsides to doing
commits this way?

Thanks,
Peter
Matt Kuiper | 15 Apr 19:55 2014

cache warming questions

Hello,

I have a few questions regarding how Solr caches are warmed.

My understanding is that there are two ways to warm internal Solr caches (only one way for document cache and
lucene FieldCache):

Auto warming - occurs when there is a current searcher handling requests and new searcher is being
prepared.  "When a new searcher is opened, its caches may be prepopulated or "autowarmed" with cached
object from caches in the old searcher. autowarmCount is the number of cached items that will be
regenerated in the new searcher."    http://wiki.apache.org/solr/SolrCaching#autowarmCount

Explicit warming - where the static warming queries specified in Solrconfig.xml for newSearcher and
firstSearcher listeners are executed when a new searcher is being prepared.

What does it mean that items will be regenerated or prepopulated from the current searcher's cache to the
new searcher's cache?  I doubt it means copy, as the index has likely changed with a commit and possibly
invalidated some contents of the cache.  Are the queries, or filters, that define the contents of the
current caches re-executed for the new searcher's caches?

For the case where auto warming is configured, a current searcher is active, and static warming queries are
defined how does auto warming and explicit warming work together? Or do they?  Is only one type of warming
activated to fill the caches?

Thanks,
Matt
Rich Mayfield | 15 Apr 18:15 2014
Picon

Race condition in Leader Election

I see something similar where, given ~1000 shards, both nodes spend a LOT of time sorting through the leader
election process. Roughly 30 minutes.

I too am wondering - if I force all leaders onto one node, then shut down both, then start up the node with all of
the leaders on it first, then start up the other node, then I think I would have a much faster startup sequence.

Does that sound reasonable? And if so, is there a way to trigger the leader election process without taking
the time to unload and recreate the shards?

> Hi
> 
>   When restarting a node in solrcloud, i run into scenarios where both the
> replicas for a shard get into "recovering" state and never come up causing
> the error "No servers hosting this shard". To fix this, I either unload one
> core or restart one of the nodes again so that one of them becomes the
> leader.
> 
> Is there a way to "force" leader election for a shard for solrcloud? Is
> there a way to break ties automatically (without restarting nodes) to make
> a node as the leader for the shard?
> 
> 
> Thanks
> Nitin
Alexey Kozhemiakin | 15 Apr 17:41 2014

Empty documents in Solr\lucene 3.6

Dear Community,

We've faced a strange data corruption issue with one of our clients old solr setup (3.6).

When we do a query (id:X OR id:Y) we get 2 nodes, one contains normal doc data, another is empty (<doc />).
We've looked inside lucene index using Luke - same story, one of documents is empty.
When we click on 1st document - it shows nothing.
http://snag.gy/O5Lgq.jpg

Probably files for stored data were corrupted? But luke index check says OK.
Any clues how to troubleshoot root cause?

Best regards,
Alexey


Gmane