thakkar.aayush | 28 Jan 11:56 2015
Picon

Solr facet search improvements

I have around 1 million job titles which are indexed on Solr and am looking
to improve the faceted search results on job title matches.

For example: a job search for *Research Scientist Computer Architecture* is
made, and the facet field title which is tokenized in solr and gives the
following results:

1. Senior Data Scientist 
2. PARALLEL COMPUTING SOFTWARE ENGINEER 
3. Engineer/Scientist 4 
4. Data Scientist 
5. Engineer/Scientist 
6. Senior Research Scientist 
7. Research Scientist-Wireless Networks 
8. Research Scientist-Andriod Development 
9. Quantum Computing Theorist Job 
10.Data Sceintist Smart Analytics

I want to be able to improve / optimize the job titles and be able to make
exclusions and some normalizations. Is this possible with Solr? What is the
best way to have more granular control over the facted search results ?

For example *Engineer/Scientist 4* - is not useful and too specific and
titles like *Quantum Computing theorist* would ideally also be excluded

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-facet-search-improvements-tp4182502.html
Sent from the Solr - User mailing list archive at Nabble.com.

(Continue reading)

solrk | 28 Jan 08:13 2015
Picon

Reading data from another solr core

Hi Guys,

I have multiple cores setup in my solr server. I would like read/import data
from one core(source) into another core(target) and index it..Is there is a
easy way in solr to do so?

I was thinking of using SolrEntityProcessor for this purpose..any other
suggestions is appreciated..

http://blog.trifork.com/2011/11/08/importing-data-from-another-solr/

For example:

<dataconfig>
  <document>
  <entity name="user" pk="id"
      url=""
      processor="XPathEntityProcessor">

     <field column="id" xpath="/user/id" />
     <entity name="sep" processor="SolrEntityProcessor" query="*:*"
url="http://127.0.0.1:8081/solr/core2">
    </entity>

    </entity
  </document>
</dataconfig>

Please sugget me if there is better solution? or Should i write new
processor which reads the index of another core?
(Continue reading)

SolrUser1543 | 28 Jan 07:54 2015
Picon

Reindex data without creating new index.

I want to reindex my data in order to change a value of some field according
to value of another. ( both field are existing ) 

For this purpose I run a "clue" utility in order to get a list of IDs.  
Then I created an update processor , which can set a value of field A
according to value of field B.
I added a new request handler ,like a classic update , but with new update
chain with a new update processor

I want to run a http post request for each ID , to a new handler ,with item
id only. 
This will trigger my update processor , which will get an existing doc from
the index and do the logic. 

So in this way I can do some enrichment , without full data import and
without creating a new index .

What do you think about it ?
Could it cause a performance degradation because of it? SOLR can handle it
or it will rebalance the index ?
Does SOLR has some built in feature which can do it ?

--
View this message in context: http://lucene.472066.n3.nabble.com/Reindex-data-without-creating-new-index-tp4182464.html
Sent from the Solr - User mailing list archive at Nabble.com.

vsriram30 | 28 Jan 01:50 2015
Picon

Solrcloud open new searcher not happening in slave for deletebyID

Hi All,

I am using Solrcloud 4.6.1 In that if I use CloudSolrServer to add a record
to solr, then I see the following commit update command in both master and
in slave node :

2015-01-27 15:20:23,625 INFO org.apache.solr.update.UpdateHandler: start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}

I am also setting the updateRequest.setCommitWithin(5000);

Here as noticed, the openSearcher=true and hence after 5 seconds, I am able
to see the record in index in both slave and in master.

Now if I trigger another UpdateRequest with only deleteById set and no add
documents to Solr, with the same commit within time, then 

in the master log I see,

2015-01-27 15:21:46,389 INFO org.apache.solr.update.UpdateHandler: start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}

and in the slave log I see,
2015-01-27 15:21:56,393 INFO org.apache.solr.update.UpdateHandler: start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

Here as noticed, the master is having openSearcher=true and slave is having
openSearcher=false. This causes inconsistency in the results as master shows
that the record is deleted and slave still has the record.

(Continue reading)

Carl Roberts | 28 Jan 01:47 2015
Picon

Running multiple full-import commands via curl in a script

Hi,

I am attempting to run all these curl commands from a script so that I 
can put them in a crontab job, however, it seems that only the first one 
executes and the other ones return with an error (below):

curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2002"
curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2003"
curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2004"
curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2005"
curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2006"
curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2007"
curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2008"
curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2009"
curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2010"
curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2011"
curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2012"
curl 
"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2013"
(Continue reading)

Joshi, Shital | 27 Jan 22:51 2015
Picon

replica never takes leader role

Hello,

We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three zookeeper instances. We have
noticed that when a leader node goes down the replica never takes over as a leader, cloud becomes unusable
and we have to bounce entire cloud for replica to assume leader role. Is this default behavior? How can we
change this?

Thanks. 


Carl Roberts | 27 Jan 21:42 2015
Picon

Cannot reindex to add a new field

Hi,

I have tried to reindex to add a new field named product-info and no 
matter what I do, I cannot get the new field to appear in the index 
after import via DIH.

Here is the rss-data-config.xml configuration (field product-info is the 
new field I added):

<dataConfig>
     <dataSource type="ZIPURLDataSource" connectionTimeout="15000" 
readTimeout="30000"/>
     <document>
         <entity name="cve-2002"
                 pk="id"
url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"
                 processor="XPathEntityProcessor"
                 forEach="/nvd/entry"
                 transformer="RegexTransformer">
             <field column="id" xpath="/nvd/entry/ <at> id" 
commonField="false" />
             <field column="cve" xpath="/nvd/entry/cve-id" 
commonField="false" />
             <field column="cwe" xpath="/nvd/entry/cwe/ <at> id" 
commonField="false" />
*<field column="product-info" 
xpath="/nvd/entry/vulnerable-software-list/product" commonField="false"/>*
             <field column="vulnerable-configuration" 
xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/ <at> name" 
commonField="false"/>
(Continue reading)

vit | 27 Jan 21:07 2015
Picon

trouble running indexer with Solr spatial

I am using Solr 4.2

I added 
<fieldType name="location_rpt"  
class="solr.SpatialRecursivePrefixTreeFieldType"

according to
<a href="http://">https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 
 and have spatial4j-0.3.jar in my project.

When running the indexer I started getting this error:

java.lang.NoClassDefFoundError: com/google/common/cache/CacheBuilder
	at
org.apache.solr.schema.AbstractSpatialFieldType.<init>(AbstractSpatialFieldType.java:82)
	at
org.apache.solr.schema.AbstractSpatialPrefixTreeFieldType.<init>(AbstractSpatialPrefixTreeFieldType.java:32)
	at
org.apache.solr.schema.SpatialRecursivePrefixTreeFieldType.<init>(SpatialRecursivePrefixTreeFieldType.java:28)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at java.lang.Class.newInstance0(Class.java:357)
	at java.lang.Class.newInstance(Class.java:310)
	at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:470)
	at
(Continue reading)

Picon

"Contextual" sponsored results with Solr

Hi all,

Recently I got an interesting use case that I'm not sure how to implement, the idea is that the client wants a
fixed number of documents, let's call it N, to appear in the top of the results. Let me explain a little we're
working with web documents so the idea is too promote the documents that match the query of the user from a
given domain (wikipedia, for example) to the top of the list. So if I apply a a boost using the boost parameter:

http://localhost:8983/solr/select?q=search&fl=url&boost=map(query($type1query),0,0,1,50)&type1query=host:wikipedia

I get *all* the documents from the desired host at the top, but there is no way of limiting the number of
documents from the host that are boosted to the top of the result list (which could lead to several pages of
content from the same host, which is not desired, the idea is to only show N) . I was thinking in something
like field collapsing/grouping but only for the documents that match my $type1query parameter
(host:wikipedia) but I don't see any way of doing grouping/collapsing on only one group and leave the
other results untouched. 

I although thought on using 2 groups using group.query=host:wikipedia and
group.query=-host:wikipedia, but in this case there is no way of controlling how much documents each
independently group will have.

In this particular case QueryElevationComponent it's not helping because I don't want to map all the
posible queries I just want to put the some of the results from a certain host in the top of the list, but
without boosting all the documents from the same host.

Any thoughts or recommendations on this? 

Thank you,

Regards,

(Continue reading)

Peter Puzanovs | 27 Jan 18:48 2015

Making SpellChecker output better with language rules from languagetool

Hello,

Thinking to combine the output provided from the IndexBasedSpellChecker
with the language rules from the languagetool(languagetool.org).

Wondering if this is already implemented ?

Thanks,
Peter

Alexander Albrecht | 27 Jan 18:40 2015
Picon

AdminUI wrong core name

Hi,
 i think i found a bug in the AdminUI.

When i create a new collection with the Collection API, the name of the
core is displayed wrong in the AdminUI.

This is the call:

http://localhost:8983/solr/admin/collections?action=CREATE&name=mycollection&collection.configName=myconfig&property.name=mycollection&numShards=1&property.instanceDir=mycollection

The core.properties has the correct value: name=mycollection, but in the
AdminUI the new core has the name mycollection_shard1_replica1.

Only after a restart of the solr server the right name "mycollection" is
shown in the AdminUI.

Gmane