mmastroianni | 21 Oct 20:01 2014

SolrJ and Lucene queries

I have an existing application using raw lucene that does some entity
extraction on a raw query and mixes in some other params to augment or
replace pieces of a large boolean query that it then constructs, which is a
mix of term queries, range queries, and recursiveprefixtree queries. 

I'm now switching (or at least trying to switch) to solr for the ease of nrt
indexing and the operational benefits, but am worried about how to do this
query processing. 

I could put it in as a plugin, which seems painful,  especially as I have
several different tokenizers and in general just a lot of code and
configuration that I would have to shoehorn into solr. Not the least of my
fears there may seem trivial but just consists of how I would mix together
all of the params coming from the client. 

Anyway, I was hoping that I could somehow use solrj to send the lucene query
straight through. This does not appear possible, and the last posts on this
board regarding this issue are several years old. Is there in general no
good way of serializing/deserializing lucene queries from solrj through to
solr? 

Is my best option to go down the plugin route?

--
View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-and-Lucene-queries-tp4165233.html
Sent from the Solr - User mailing list archive at Nabble.com.

Tassi Pierluigi | 21 Oct 16:32 2014
Picon

Some clarification needed on "migrate" command in Collections API

Ciao to all!
	We're testing Collections API on our SolrCloud test cluster (4.10.1) managed by a standalone Zookeeper
server (3.4.6).

We're following Collection API documentation and Yonik Seeley's blog post about Migration feature
available since 4.7.x you can read also at: http://heliosearch.org/solr-4-7-features/

We have two collections with the same number of shards and nodes. Our UniqueKey is simply "id" field (which
indeed represents an URI). We tried to migrate the first collection to the second using the following API call:
http://10.10.80.97:9474/solr/admin/collections?action=MIGRATE&collection=web4&target.collection=web6&split.key=http
then we committed an update on the second collection. Unfortunately we can't see any data there.

We didn't get any error in the response XML: http://pastebin.com/ggQ6fhfT  
We're unsure about the value of the split.key argument. We chose "http" instead of leaving it empty (it's a
mandatory field). 
Our goal is to migrate an entire collection to another one.

--
Best regards, Pierluigi Tassi

Parvesh Garg | 21 Oct 09:09 2014

Facet counts and RankQuery

Hi All,

We have written a RankQuery plugin with a custom TopDocsCollector to
suppress documents below a certain threshold w.r.t. to the maxScore for
that query. It works fine and is reflected well with numFound and start
parameters.

Our problem lies with facet counts. Even though the docs numFound gives a
very less number, the facet counts are still coming from unsuppressed query
results.

E.g. in a test with a threshold of 20% , we reduced the totalDocs from
46030 to 6080 but the top facet count on a field is still 20500

The query parameter we are using looks like rq={!threshold value=0.2}

Is there a way propagate the suppression of results to FacetsComponent as
well? Can we send the same rq to FacetsComponent ?

Regards,
Parvesh Garg,

http://www.zettata.com
aurelien.mazoyer | 21 Oct 08:43 2014

Nested documents in Solr

Hi,

I have some question regarding nested document queries.

For example, let’s say that I have many books, one of which is the 
following one:
Book _title: Nested documents for dummies
Chapter1_Title: Introduction
Chapter1_Content: Nested documents are fun.
Chapter2_Title: Which technology should I use?
Chapter2_Content: Lucene of course!

First I want to find books that contain an introduction and that are 
about Lucene. So I decide to flatten my data and use 3 multivalued 
fields (Book_Title,Chapter_Title and Chapter_Content), I index my 
document and then I get what I want when I run the following query : “ 
chapter_title:Introduction AND chapter_title:Lucene “
But now I want to find books that contain “fun” in a chapter which name 
is “introduction”.  My model is no more valid (Chapter2_content is no 
more linked with Chapter2_title). That is why I change my datamodel and 
use nested documents:
I now have a parent with a single valued field Book_title and different 
childs with single valued fields Chapter_title and Chapter_Content. Now, 
when I run the query “chapter_title: Introduction AND 
chapter_content:fun” I also get what I want… But what do I have to do if 
I want to use these two kinds of query with a unique data model?
Maybe the only way to do this is to use nested documents and to index 
data both in child documents and in a flattened form in the parent 
document. Then we will be able to run the two different queries.

(Continue reading)

Jaeyoung Yoon | 21 Oct 01:35 2014
Picon

Shared Directory for two Solr Clouds(Writer and Reader)

Hi Folks,

Here are some my ideas to use shared file system with two separate Solr
Clouds(Writer Solr Cloud and Reader Solr Cloud).

I want to get your valuable feedbacks

For prototype, I setup two separate Solr Clouds(one for Writer and the
other for Reader).

Basically big picture of my prototype is like below.

1. Reader and Writer Solr clouds share the same directory
2. Writer SolrCloud sends the "openSearcher" commands to Reader Solr Cloud
inside postCommit eventHandler. That is, when new data are added to Writer
Solr Cloud, writer Solr Cloud sends own openSearcher command to Reader Solr
Cloud.
3. Reader opens "searcher" only when it receives "openSearcher" commands
from Writer SolrCloud
4. Writer has own deletionPolicy to keep old commit points which might be
used by running queries on Reader Solr Cloud when new searcher is opened on
reader SolrCloud.
5. Reader has no update/no commits. Everything on reader Solr Cloud are
read-only. It also creates searcher from directory not from
indexer(nrtMode=false).

That is,
In Writer Solr Cloud, I added postCommit eventListner. Inside the
postCommit eventListner, it sends own "openSearcher" command to reader Solr
Cloud's own handler. Then reader Solr Cloud will create openSearcher
(Continue reading)

O. Olson | 21 Oct 00:04 2014
Picon

Is there a problem with -Infinity as boost?

I am considering using a boost as follows: 

&boost=log(qty)

Where qty is the quantity in stock of a given product i.e. qty could be 0,
1, 2, 3, … etc. The problem I see is that log(0) is -Infinity. Would this be
a problem for Solr? For me it is not a problem because 
log(0) < log(1) < log(2) etc. 

I'd be grateful for any thoughts. One alternative is to use max e.g.
&boost=max(log(qty), -1) 

But still this would cause Solr to compute the -Infinity and then discard
it.  So can I use an expression for boost that would result in –Infinity? 

Thank you
O. O.

--
View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-problem-with-Infinity-as-boost-tp4165036.html
Sent from the Solr - User mailing list archive at Nabble.com.

LongY | 20 Oct 22:54 2014
Picon

javascript form data save to XML in server side

hello list,

The functionality I would like to add the the existing /browse request
handler is using a user interface (e.g.,webform) to collect the user's
input.

My approach is add a javascript form into the velocity template, below is
the code I added to the velocity
template(for example):
<form id="myForm", action="ProcessData.php">
  First name: <input type="text" name="fname" value=""><br>
  Last name: <input type="text" name="lname" value=""><br>
</form>

And I am using this ProcessData.php to process the user input to generate a
XML in the server.

My question is 
1) how to make solr to run this ProcessData.php? It seems solr does not
support php?
2) Where is this ProcessData.php going to be placed in the solr directory?

I am a newbie in web programming. I tried very hard to catch up with it.

Thank you.

--
View this message in context: http://lucene.472066.n3.nabble.com/javascript-form-data-save-to-XML-in-server-side-tp4165025.html
Sent from the Solr - User mailing list archive at Nabble.com.

(Continue reading)

Prathik Puthran | 20 Oct 18:20 2014
Picon

Verify if solr reload core is successful or not

Hi,

How do I verify if Solr core reload is successful or not? I use Solr 4.6.

To reload the core I send the below request:

http://hostname:7090/solr/admin/cores?action=RELOAD&core=core0&wt=json

Also is the above request synchronous ( I mean will the reload happen
before the response is recieved) or does it happen after we get the
response to the above request and we have to poll if the reload is
successful?

Thanks,
Prathik
Zisis Tachtsidis | 20 Oct 16:59 2014

SolrCloud use of "min_rf" through SolrJ

Hi all,

I'm trying to make use of the "min_rf" (minimum replication factor) feature
described in https://issues.apache.org/jira/browse/SOLR-5468. According to
the ticket, all that is needed is to pass "min_rf" param into the update
request and get back the "rf" param from the response or even easier make
use of CloudSolrServer.getMinAchievedReplicationFactor().

I'm using SolrJ's CloudSolrServer but I couldn't find any way to pass
"min_rf" using the available add() methods when sending a document to Solr,
so I resorted to the following

UpdateRequest req = new UpdateRequest();
req.setParam(UpdateRequest.MIN_REPFACT, "1");
req.add(doc);
UpdateResponse response = req.process(cloudSolrServer);
int rf = cloudSolrServer.getMinAchievedReplicationFactor("collection_name",
response.getResponse());

Still the returned "rf" value is always -1. How can I utilize "min_rf"
through SolrJ?
I'm using Solr 4.10.0 with a collection that has 2 replicas (one leader, one
replica).

Thanks

--
View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-use-of-min-rf-through-SolrJ-tp4164966.html
Sent from the Solr - User mailing list archive at Nabble.com.

(Continue reading)

David Philip | 20 Oct 16:06 2014
Picon

Word Break Spell Checker Implementation algorithm

Hi,

    Could you please point me to the link where I can learn about the
theory behind the implementation of word break spell checker?
Like we know that the solr's DirectSolrSpellCheck component uses levenstian
distance algorithm, what is the algorithm used behind the word break spell
checker component? How does it detects the space that is needed if it
doesn't use shingle?

Thanks - David
andreic9203 | 20 Oct 13:06 2014
Picon

Solr replicas - stop replication and start again

Hello,

I have a problem which I can't figure out how to solve it.
For a little scenario, I've setup a cluster with two nodes, one shard, and
two replicas, and both nodes connected to an external ZooKeeper.
Great, but now I want to stop replication for an amount of time (or more
precisely, to stop it, and then to start it again).
I need it because, one of my replica will be on a central server and another
one will be on a client server, and in the replica from central server will
be inserted a lot of documents, but before these documents to be replicated,
I want to process them, and after that to start the replication again.

There is a way for doing this?

I thought to something about suspending connection to the clients directly
from the ZooKeeper, if you know something about that, could be probably a
solution.

If you can see another configuration, which could resolve this problem, feel
free to share with us. 

Thanks in advance,
Andrei

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-replicas-stop-replication-and-start-again-tp4164931.html
Sent from the Solr - User mailing list archive at Nabble.com.


Gmane