Midas A | 12 Feb 13:27 2016
Picon

query knowledge graph

 Please suggest how to create query knowledge graph for e-commerce
application .

please describe in detail . our mote is to improve relevancy . we are from
LAMP back ground .
vidya | 12 Feb 12:28 2016

Solr-kerbarose URL not accessible

Hi

  When I am trying to access my solrCloud web UI page, deployed in cloudera
cluster, I have encountered with the error "DEFECTED TOKENS DETECTED" . Find
the attachment of the error that is added here. It is because of kerbarose
installed on cluster.

Is there any other way that I can access solr in this scenario with
kerbarose installed ?
Writing a java program helps in any way? While writing a java program also,
i have to give connection to solr URL with port or zookeeper host variable.
Will that java program work out?

Please help me out.

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-kerbarose-URL-not-accessible-tp4256926.html
Sent from the Solr - User mailing list archive at Nabble.com.

Sebastian Geerken | 12 Feb 11:59 2016
Picon
Picon

Weird behaviour related to facetting

Hi!

I've experienced a strange behaviour with several versions of SOLR
(currently testing with 5.4.1, but this effects can also be reproduced
with 5.3.1). Some facet values are not returned when querying
"*:*", but only when I search for something special, say text "foo".

I've stripped down both config/schema and data as far as possible,
files are attached (hope this is ok on the list).

How to reproduce:

Set up a core with the config and schema attached to this post:

$ bin/solr start
$ bin/solr create_core -c test
$ bin/solr stop
$ cp solrconfig.xml schema.xml server/solr/test/conf/
$ bin/solr start

Upload data:

$ curl 'http://localhost:8983/solr/test/update?commit=true' -H 'Content-type:application/json'
-d  <at> data.json

Search for "*:*":

$ curl 'http://localhost:8983/solr/test/select?q=*%3A*&rows=0&wt=json&indent=true&facet=true&facet.field=tags_hierarchy'

The facet value "1/tax/downloads/i/" will not be returned, but it will
(Continue reading)

Anil | 12 Feb 10:31 2016
Picon

Searching special characters

HI,

How can we search special characters like *, " (double quote) where these
are actually solr uses for exact and wild card searches.

Please advice.

Regards,
Anil
Midas A | 12 Feb 06:13 2016
Picon

error

we have upgraded solr version last night getting following error

org.apache.solr.common.SolrException: Bad content Type for search handler
:application/octet-stream

what i should do ? to remove this .
Senthil | 11 Feb 23:33 2016
Picon

edismax query parser - pf field question

Clarification needed on edismax query parser "pf" field.

*SOLR Query:*
/query?q=refrigerator water filter&qf=P_NAME^1.5
CategoryName&wt=xml&debugQuery=on&pf=P_NAME
CategoryName&mm=2&fl=CategoryName P_NAME score&defType=edismax

*Parsed Query from DebugQuery results:*
<str name="parsedquery">(+((DisjunctionMaxQuery((P_NAME:refriger^1.5 |
CategoryName:refrigerator)) DisjunctionMaxQuery((P_NAME:water^1.5 |
CategoryName:water)) DisjunctionMaxQuery((P_NAME:filter^1.5 |
CategoryName:filter)))~2) DisjunctionMaxQuery((P_NAME:"refriger water
filter")))/no_coord</str>

In the SOLR query given above, I am asking for phrase matches on 2 fields:
P_NAME and CategoryName.
But If you notice ParsedQuery, I see Phrase match is applied only on P_NAME
field but not on CategoryName field. Why?

--
View this message in context: http://lucene.472066.n3.nabble.com/edismax-query-parser-pf-field-question-tp4256845.html
Sent from the Solr - User mailing list archive at Nabble.com.

Sreenivasa Kallu | 11 Feb 19:42 2016
Picon

outlook email file pst extraction problem

Hi ,
       I am currently indexing individual outlook messages and searching is
working fine.
I have created solr core using following command.
 ./solr create -c sreenimsg1 -d data_driven_schema_configs

I am using following command to index individual messages.
curl  "
http://localhost:8983/solr/sreenimsg/update/extract?literal.id=msg9&uprefix=attr_&fmap.content=attr_content&commit=true"
-F "myfile= <at> /home/ec2-user/msg9.msg"

This setup is working fine.

But new requirement is extract messages using outlook pst file.
I tried following command to extract messages from outlook pst file.

curl  "
http://localhost:8983/solr/sreenimsg1/update/extract?literal.id=msg7&uprefix=attr_&fmap.content=attr_content&commit=true"
-F "myfile= <at> /home/ec2-user/sateamc_0006.pst"

This command extracting only high level tags and extracting all messages
into one message. I am not getting all tags when extracted individual
messgaes. is above command is correct? is it problem not using recursion?
 how to add recursion to above command ? is it tika library problem?

Please help to solve above problem.

Advanced Thanks.

--sreenivasa kallu
(Continue reading)

KNitin | 11 Feb 19:29 2016
Picon

SolrCloud shard marked as down and "reloading" collection doesnt restore it

Hi,

 I noticed while running an indexing job (2M docs but per doc size could be
2-3 MB) that one of the shards goes down just after the commit.  (Not
related to OOM or high cpu/load).  This marks the shard as "down" in zk and
even a reload of the collection does not recover the state.

There are no exceptions in the logs and the stack trace indicates jetty
threads in blocked state.

The last few lines in the logs are as follows:

trib=TOLEADER&wt=javabin&version=2} {add=[1552605 (1525453861590925312)]} 0
5
INFO  - 2016-02-06 19:17:47.658;
org.apache.solr.update.DirectUpdateHandler2; start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2016-02-06 19:18:02.209; org.apache.solr.core.SolrDeletionPolicy;
SolrDeletionPolicy.onCommit: commits: num=2
INFO  - 2016-02-06 19:18:02.209; org.apache.solr.core.SolrDeletionPolicy;
newest commit generation = 6
INFO  - 2016-02-06 19:18:02.233; org.apache.solr.search.SolrIndexSearcher;
Opening Searcher <at> 321a0cc9 main
INFO  - 2016-02-06 19:18:02.296; org.apache.solr.core.QuerySenderListener;
QuerySenderListener sending requests to Searcher <at> 321a0cc9
main{StandardDirectoryReader(segments_6:180:nrt
_20(4.6):C15155/216:delGen=1 _w(4.6):C1538/63:delGen=2
_16(4.6):C279/20:delGen=2 _e(4.6):C11386/514:delGen=3
_g(4.6):C4434/204:delGen=3 _p(4.6):C418/5:delGen=1 _v(4.6):C1
_x(4.6):C17583/316:delGen=2 _y(4.6):C9783/112:delGen=2
(Continue reading)

Melissa Warnkin | 11 Feb 19:23 2016

ApacheCon NA 2016 - Important Dates!!!

 Hello everyone!
I hope this email finds you well.  I hope everyone is as excited about ApacheCon as I am!
I'd like to remind you all of a couple of important dates, as well as ask for your assistance in spreading the
word! Please use your social media platform(s) to get the word out! The more visibility, the better
ApacheCon will be for all!! :)
CFP Close: February 12, 2016CFP Notifications: February 29, 2016Schedule Announced: March 3, 2016
To submit a talk, please visit:  http://events.linuxfoundation.org/events/apache-big-data-north-america/program/cfp

Link to the main site can be found here:  http://events.linuxfoundation.org/events/apache-big-data-north-america

Apache: Big Data North America 2016 Registration Fees:
Attendee Registration Fee: US$599 through March 6, US$799 through April 10, US$999
thereafterCommitter Registration Fee: US$275 through April 10, US$375 thereafterStudent
Registration Fee: US$275 through April 10, $375 thereafter
Planning to attend ApacheCon North America 2016 May 11 - 13, 2016? There is an add-on option on the
registration form to join the conference for a discounted fee of US$399, available only to Apache: Big
Data North America attendees.
So, please tweet away!!
I look forward to seeing you in Vancouver! Have a groovy day!!
~Melissaon behalf of the ApacheCon Team

Le Zhao | 11 Feb 18:57 2016
Picon

dismax for bigrams and phrases

Hey Solr folks,

Current dismax parser behavior is different for unigrams versus bigrams.

For unigrams, it's MAX-ed across fields (so called dismax), but for 
bigrams, it's SUM-ed from Solr 4.10 (according to 
https://issues.apache.org/jira/browse/SOLR-6062).

Given this inconsistency, the dilemma we are facing now is the following:
for a query with three terms: [A B C]
Relevant doc1: f1:[AB .. C] f2:[BC]   // here AB in field1 and BC in 
field2 are bigrams, and C is a unigram
Irrelevant doc2: f1:[AB .. C] f2:[AB] f3:[AB]  // here only bigram AB is 
present in the doc, but in three different fields.

(A B C here can be e.g. "light blue bag", and doc2 can talk about "light 
blue coat" a lot, while only mentioning a "bag" somewhere.)

Without bigram level MAX across fields, there is no way to rank doc1 
above doc2.
(doc1 is preferred because it hits two different bigrams, while doc2 
only hits one bigram in several different fields.)

Also, being a sum makes the retrieval score difficult to bound, making 
it hard to combine the retrieval score with other document level signals 
(e.g. document quality), or to trade off between unigrams and bigrams.

Are the problems clear?

Can someone offer a solution other than dismax for bigrams/phrases? i.e. 
(Continue reading)

Le Zhao | 11 Feb 18:56 2016
Picon

dismax for bigrams and phrases

Hey Solr folks,

Current dismax parser behavior is different for unigrams versus bigrams.

For unigrams, it's MAX-ed across fields (so called dismax), but for 
bigrams, it's SUM-ed from Solr 4.10 (according to 
https://issues.apache.org/jira/browse/SOLR-6062).

Given this inconsistency, the dilemma we are facing now is the following:
for a query with three terms: [A B C]
Relevant doc1: f1:[AB .. C] f2:[BC]   // here AB in field1 and BC in 
field2 are bigrams, and C is a unigram
Irrelevant doc2: f1:[AB .. C] f2:[AB] f3:[AB]  // here only bigram AB is 
present in the doc, but in three different fields.

(A B C here can be e.g. "light blue bag", and doc2 can talk about "light 
blue coat" a lot, while only mentioning a "bag" somewhere.)

Without bigram level MAX across fields, there is no way to rank doc1 
above doc2.
(doc1 is preferred because it hits two different bigrams, while doc2 
only hits one bigram in several different fields.)

Also, being a sum makes the retrieval score difficult to bound, making 
it hard to combine the retrieval score with other document level signals 
(e.g. document quality), or to trade off between unigrams and bigrams.

Are the problems clear?

Can someone offer a solution other than dismax for bigrams/phrases? i.e. 
(Continue reading)


Gmane