Chris Hostetter | 1 Oct 02:48 2011

Re: DataImportHandler using new connection on each query


: > Noble? Shalin?  what's the point of throwing away a connection that's been
: > in use for more then 10 seconds?

: Hoss, as others have noted, DIH throws away connections which have been idle
: for more than the timeout value (10 seconds). The jdbc standard way of
: checking for a valid connection is not implemented or incorrectly
: implemented by many drivers. So, either you can execute a query and get an
: exception and try to determine if the exception was a case of an invalid
: connection (which again is sometimes different from driver to driver) or
: take the easy way out and throw away connections idle for more than 10
: seconds, which is what we went for.

Hmmm...

a) at a minimum this seems like it should be a config option -- why punish 
people using "good" jdbc drivers?

b) you keep refering to this time out in relation to connections being 
*idle* longer then 10 seconds, but unless i'm missing something that's not 
what it's doing at all.  

The only time connLastUsed is assigned to is when getConnection() is 
called - so even if a connection has only been idle for 1 pico-second, it 
will still be closed/reopened if the total amount of time it was used 
before being idle was more then 1 second -- that was the scenerio 
described in the first message of this thread...

second 000: app starts
second 006: ResultSetIterator constructed on queryA
(Continue reading)

roz dev | 1 Oct 02:59 2011
Picon

Re: Production Issue: SolrJ client throwing this error even though field type is not defined in schema

This issue disappeared when we reduced the number of documents which were
being returned from Solr.

Looks to be some issue with Tomcat or Solr, returning truncated responses.

-Saroj

On Sun, Sep 25, 2011 at 9:21 AM, <pulkitsinghal <at> gmail.com> wrote:

> If I had to give a gentle nudge, I would ask you to validate your schema
> XML file. You can do so by looking for any w3c XML validator website and
> just copy pasting the text there to find out where its malformed.
>
> Sent from my iPhone
>
> On Sep 24, 2011, at 2:01 PM, Erick Erickson <erickerickson <at> gmail.com>
> wrote:
>
> > You might want to review:
> >
> > http://wiki.apache.org/solr/UsingMailingLists
> >
> > There's really not much to go on here.
> >
> > Best
> > Erick
> >
> > On Wed, Sep 21, 2011 at 12:13 PM, roz dev <rozdev29 <at> gmail.com> wrote:
> >> Hi All
> >>
(Continue reading)

Chris Hostetter | 1 Oct 03:19 2011

RE: Getting facet counts for 10,000 most relevant hits


: I figured out how to do this in a kludgey way on the client side but it 
: seems this could be implemented much more efficiently at the Solr/Lucene 
: level.  I described my kludge and posted a question about this to the 

It can, and I have -- but only for the case of a single node...

In general the faceting code in solr just needs a DocSet.  the default 
imple uses the DocSet computed as aside effect when executing the main 
search, but a custom SearchComponent could pick any DocSet it wants.

A few years back I wrote a custom faceting plugin that computed a "score" 
for each constraint based on:
 * Editorially assigned weights from a config file
 * the number of matching documents (ie: normal constraint count)
 * the number of matching documents from hte first N results

...where the last number was determined by internally executing the search 
with "rows" of N, to generate a DocList object, nad then converting that 
DocList into a DocSet, and using that as the input to SimpleFacetCounts.

Ignoring the "Editorial weights" part of the above, the logic for 
"scoring" constraints based on the other two factors is general enough 
thta it could be implemented in solr, we just need a way to configure "N" 
and what kind of function should be applied to the two counts.

	...But...

This approach really breaks down in a distributed model.  You can't do the 
same quick and easy DocList->DocSet transformation on each node, you have 
(Continue reading)

Chris Hostetter | 1 Oct 03:25 2011

Re: Best Solr escaping?


a) It depends entirely on what QueryParser you are using.

If your input is "from a human" i would suggest using dismax or edismax 
and not escaping anything - unless you get some type of error, and then 
maybe give the user a "there was a problem with your query, would you like 
to try ____" where you suggest a new query with all meta-characters striped out.

b) URL escaping is really a completley independent issue...

: * Should we use + or %20 ­ and what cases make sense:
: > * "Dr. Phil Smith" or "Dr.+Phil+Smith" or "Dr.%20Phil%20Smith" - also what is

...solr doesn't know of car wether you use "+" or "%20" when building up a 
URL.  by the time Solr sees your input, the servlet container has already 
url-decoded the query params.

in general: if you are even *thinking* about how params are getting URL 
encoded, you are probably doing something wrong.  writing custom code to 
construct Solr query strings is one thing, writting custom code to 
construct/escape values in URLs is something else -- i don't know what 
client langauge you are using, but i garuntee you it has an HTTp/CGI API 
that completely eliminates any need for you to even think about such 
issues.

-Hoss
Chris Hostetter | 1 Oct 04:25 2011

Re: multiple dateranges/timeslots per doc: modeling openinghours.


: Another, faulty, option would be to model opening/closing hours in 2
: multivalued date-fields, i.e: open, close. and insert open/close for each
: day, e.g: 
: 
: open: 2011-11-08:1800 - close: 2011-11-09:0300
: open: 2011-11-09:1700 - close: 2011-11-10:0500
: open: 2011-11-10:1700 - close: 2011-11-11:0300
: 
: And queries would be of the form:
: 
: 'open < now && close > now+3h'
: 
: But since there is no way to indicate that 'open' and 'close' are pairwise
: related I will get a lot of false positives, e.g the above document would be
: returned for:

This isn't possible out of the box, but the general idea of "position 
linked" queries is possible using the same approach as the 
FieldMaskingSpanQuery...

https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html
https://issues.apache.org/jira/browse/LUCENE-1494

..implementing something like this that would work with 
(Numeric)RangeQueries however would require some additional work, but it 
should certianly be doable -- i've suggested this before but no one has 
taken me up on it...
http://markmail.org/search/?q=hoss+FieldMaskingSpanQuery

(Continue reading)

Chris Hostetter | 1 Oct 04:28 2011

Re: Searching multiple fields


: I have a use case where I would like to search across two fields but I do not
: want to weight a document that has a match in both fields higher than a
: document that has a match in only 1 field.

use dismax, set the "tie" param to "0.0" (so it's a true "max" with no 
score boost for matching in multiple fields)

https://wiki.apache.org/solr/DisMax
http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/

-Hoss

abhayd | 1 Oct 05:45 2011
Picon

join & sort query with key, value pair

hi 
i have a document like this
video_id,keyword:seq
for example
1,service:2 support:1
2,support:2
3,service:2
4,service:1
What i want is a query will send video_id and in response i want to see
video with that video_id and all other related videos with keyword in sorted
order by seq

say query is video=1

i would like to see (because they are related with keyword service and
support 
1 
4
3
2

How I can achieve this? Will polyfield help in this but i m not sure how?
Any help?

--
View this message in context: http://lucene.472066.n3.nabble.com/join-sort-query-with-key-value-pair-tp3384333p3384333.html
Sent from the Solr - User mailing list archive at Nabble.com.

Gerold Glaser | 1 Oct 13:01 2011
Picon

Springbased DIH vs. Manual Document Index

Hi Listeners!

We have introduced solr in one of our customer projects some time ago.
In the application the search is more useful and a lot of faster than
database soundex search;-)

But the design of the index creation mechanism is still a little bit
confusing and we are unsure what is the "best" strategy to set up an index
correctly.

The main framework for our webapps is spring with all submodules like
springbatch.

What is the proper way to setup an index from different datasources?

Should we create a special springbased dataimporthandler that does all the
logic and has access to three differenct datasources (database, webservice,
file) or should we implement the business logic outside solr and send
searchdocuments afterwards to solr?

We prefer the first proposed version with the dataimporthandler, but I did
not find any example to integrate spring into a dataimporthandler.

Can you provide some hints how to integrate spring?

Thank you.

Best regards.
Ahson Iqbal | 1 Oct 18:51 2011
Picon

Re: Lucene 3.4.0 Merging

Hi Steve

Thank you very much for your valued response but adding space as you have mentioned does not solve the problem.

Regards
Ahsan

________________________________
From: Steven A Rowe <sarowe <at> syr.edu>
To: "solr-user <at> lucene.apache.org" <solr-user <at> lucene.apache.org>
Sent: Friday, September 30, 2011 10:56 AM
Subject: RE: Lucene 3.4.0 Merging

Hi Ahson,

The wiki page you got your cmdline invocation from <http://wiki.apache.org/solr/MergingSolrIndexes>
was missing a space character between the classpath and "org/apache/lucene/misc/IndexMergeTool". 
I've just updated that page.

Steve

> -----Original Message-----
> From: Ahson Iqbal [mailto:mianahson <at> yahoo.com]
> Sent: Friday, September 30, 2011 2:45 AM
> To: Solr Send Mail
> Subject: Lucene 3.4.0 Merging
> 
> Hi
> 
> I have 3 solr3.4.0 indexes i want to merge them, after searching on web i
(Continue reading)

Mikhail Khludnev | 1 Oct 19:57 2011

Re: multiple dateranges/timeslots per doc: modeling openinghours.

I agree about SpanQueries. It's a viable measure against "false-positive
matches on multivalue fields".
 we've implemented this approach some time ago. Pls find details at
http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html

and
http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html
we are going to publish the third post about an implementation approaches.

--
Mikhail Khludnev

On Sat, Oct 1, 2011 at 6:25 AM, Chris Hostetter <hossman_lucene <at> fucit.org>wrote:

>
> : Another, faulty, option would be to model opening/closing hours in 2
> : multivalued date-fields, i.e: open, close. and insert open/close for each
> : day, e.g:
> :
> : open: 2011-11-08:1800 - close: 2011-11-09:0300
> : open: 2011-11-09:1700 - close: 2011-11-10:0500
> : open: 2011-11-10:1700 - close: 2011-11-11:0300
> :
> : And queries would be of the form:
> :
> : 'open < now && close > now+3h'
> :
> : But since there is no way to indicate that 'open' and 'close' are
> pairwise
> : related I will get a lot of false positives, e.g the above document would
(Continue reading)


Gmane