onur kasimlar | 26 Apr 14:23 2015
Picon

trec_collection

Hi,
I have a very general question. Is it possible to index a TREC collection
with Solr (not Lucene Benchmark) using an own schema.xml (analyzer
defintions,...)? I want to index a TREC collection with different settings
to see which fits best.
Tomasz Borek | 26 Apr 03:20 2015
Picon

Fuzzy phrases + weighting at query level or do I need to program?

Ave!

How do I make fuzzy search on lengthy names? As in "La Riviera Montana de
los Diablos" or "Unified Mega Corp Super Dwelling"? Across all queries?

My query has 3 levels of results:
Best results are: +title:X +place:Y -> Q1
If none such are found, +title:x -> Q2
then +place:Y -> Q3

All in all: (Q1) (Q2) (Q3)

Yonik's examples have fuzzy search by ~ on one-term, on more than one it
becomes proximity search.

pozdrawiam,
LAFK
Steven White | 24 Apr 23:30 2015
Picon

Using SolrJ to access schema.xml

Hi Everyone,

Per this link
https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-ListFieldTypes
Solr supports REST Schema API to modify to the schema.  I looked at
http://lucene.apache.org/solr/4_2_1/solr-solrj/index.html?overview-summary.html
in hope SolrJ has Java API to allow schema modification, but I couldn't
find any.  Is this the case or did I not look hard enough?

My need is to manage Solr's schema.xml file using a remote API.  The REST
Schema API gets me there but but I have to write code to work with the
response / request XML which I much rather avoid if it is already out there.

Thanks

Steve
Steven White | 24 Apr 16:03 2015
Picon

Remote connection to Solr

Hi Everyone,

This maybe a Jetty question but let me start here first.

I have Solr running on my laptop and from my desktop I have no issue
accessing it.  However, if I take my laptop home and connect it to my home
network, the next day when I connect the laptop to my office network, I no
longer can access Solr from my desktop.  A restart of Solr will not do, the
only fix is to restart my Windows 8.1 OS (that's what's on my laptop).

I have not been able to figure out why this is happening and I'm suspecting
it has to do something with Jetty because I have Solr 3.6 running on my
laptop in a WebSphere profile and it does not run into this issue.

Any ideas what could be causing this?  Is this question for the Jetty
mailing list?

Thanks

Steve
Scott Dawson | 24 Apr 15:16 2015
Picon

ArrayIndexOutOfBoundsException in RecordingJSONParser.java

Hello,
I'm running Solr 5.1 and during indexing I get an
ArrayIndexOutOfBoundsException at line 61 of
org/apache/solr/util/RecordingJSONParser.java. Looking at the code (see
below), it seems obvious that the if-statement at line 60 should use a
greater-than sign instead of greater-than-or-equals.

   <at> Override
  public CharArr getStringChars() throws IOException {
    CharArr chars = super.getStringChars();
    recordStr(chars.toString());
    position = getPosition();
    // if reading a String , the getStringChars do not return the closing
single quote or double quote
    //so, try to capture that
    if(chars.getArray().length >=chars.getStart()+chars.size()) {     //
line 60
      char next = chars.getArray()[chars.getStart() + chars.size()]; //
line 61
      if(next =='"' || next == '\'') {
        recordChar(next);
      }
    }
    return chars;
  }

Should I create a JIRA ticket? (Am I allowed to?)  I can provide more info
about my particular usage including a stacktrace if that's helpful. I'm
using the new custom JSON indexing, which, by the way, is an excellent
feature and will be of great benefit to my project. Thanks for that.
(Continue reading)

Clemens Wyss DEV | 24 Apr 14:01 2015
Picon

o.a.s.c.SolrException: missing content stream

Context: Solr/Lucene 5.1
Adding documents to Solr core/index through SolrJ

I extract pdf's using tika. The pdf-content is one of the fields of my SolrDocuments that are transmitted to
Solr using SolrJ.
As not all documents seem to be "coming through" I looked into the Solr-logs and see the follwoing exceptions:
org.apache.solr.common.SolrException: Exception writing document id fustusermanuals#4614 to the
index; possible analysis error.
	at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:170)
	at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
	at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
	at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:931)
	at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1085)
...
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in
field="content__s_i_suggest" (whose UTF8 encoding is longer than the max length 32766), all of which
were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term
is: '[10, 32, 10, 32, 10, 10, 70, 82, 32, 77, 111, 100, 101, 32, 100, 39, 101, 109, 112, 108, 111, 105, 32, 10,
10, 32, 10, 10, 32, 10]...', original message: bytes can be at most 32766 in length; got 186493
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:667)
	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:344)
	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:300)
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:231)
	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:449)
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1349)
	at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:242)
	at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:166)
	... 40 more
(Continue reading)

Dmitry Kan | 24 Apr 13:50 2015
Picon

payload similarity

Hi,

Using the approach here
http://lucidworks.com/blog/getting-started-with-payloads/ I have
implemented my own PayloadSimilarity class. When debugging the code I have
noticed, that the scorePayload method is never called. What could be wrong?

[code]

class PayloadSimilarity extends DefaultSimilarity {
     <at> Override
    public float scorePayload(int doc, int start, int end, BytesRef payload) {
        float payloadValue = PayloadHelper.decodeFloat(payload.bytes);
        System.out.println("payloadValue = " + payloadValue);
        return payloadValue;
    }
}

[/code]

Here is how the similarity is injected during indexing:

[code]

PayloadEncoder encoder = new FloatEncoder();
IndexWriterConfig indexWriterConfig = new
IndexWriterConfig(Version.LUCENE_4_10_4, new
PayloadAnalyzer(encoder));
payloadSimilarity = new PayloadSimilarity();
indexWriterConfig.setSimilarity(payloadSimilarity);
(Continue reading)

SolrCloud to exclude xslt files in conf from zookeeper

I am creating a  SolrCloud with 4 solr instances and 5 zookeeper instances. I need to make sure that querying
is working even when my 3 zookeepers are down. But it looks like the queries using json transformation
based xslt templates which is not available since the zookeeper ensemble is not available.

So is it possible to exclude files (eg: xslt folder) in the conf directory from being loaded into Zookeeper
rather point it to the filesystem so that querying the solrcloud is not broken.

Thanks
Kumar
Paul Libbrecht | 24 Apr 11:36 2015
Picon

require diversity in results?

Hello list,

I'm wondering if there could extra parameters or query operators that
where I could impose that sorting by relevance should be relaxed so that
there's a minimum diversity in some fields in the first page of results.

For example, I'd like the search results to contain at least three
possible type of resources in the first page, fetching things from below
if needed.
I know that could be done as a search result post-processor but I think
that this is generally a bad idea for performance.

Any other idea?

thanks

Paul

Norgorn | 24 Apr 10:49 2015
Picon

Simple search low speed

We have simple search over 50 GB index.
And it's slow.
I can't even wonder why, whole index is in RAM (and a lot of free space is
available) and CPU is a bottleneck (100% load).

The query is simple (except tvrh):

q=(text:(word1+word2)++title:(word1+word2))&tv=true&isShard=true&qt=/tvrh&fq=cat:(10+11+12)&fq=field1:(150)&fq=field2:(0)&fq=date:[2015-01-01T00:00:00Z+TO+2015-04-24T23:59:59Z]

text, title - text_general fields
cat, field1, field2 - tint fields
date - a date field (I know, it's deprecated, will be changed soon).
All fields are indexed, some of them are stored.

And search time is 15 seconds (for warmed searcher, it's not the first
query).

debug=true shows timings process={time=15382.0,query={time=15282.0}

What can I check?

--
View this message in context: http://lucene.472066.n3.nabble.com/Simple-search-low-speed-tp4202135.html
Sent from the Solr - User mailing list archive at Nabble.com.

Vamsi Krishna Devabathini | 24 Apr 07:46 2015
Picon

Newly added json facet api returning inconsistent results in distributed mode

Hi, 
I am new to the solr community and I am sorry if this is not the right medium to bring the issue to notice. I have
found following issue : 
https://issues.apache.org/jira/browse/SOLR-7452 as mentioned in the subject and raised a ticket for
the same. 

Any help is appreciated! 

Sent from my iPhone

Gmane