marquinhocb | 1 Sep 2009 08:24
Picon
Favicon

Score for query-generated value


I would like to create a scorer that applies a score based on a value that is
calculated during a query.  More specifically, to apply a score based on
geographical distance from a latitude,longitude.

What is the easiest way to go about doing this?  The LocalLucene contrib
uses a SortComparatorSource, but I would like to, if possible, use the
scoring mechanism so that distance is scored just like all other fields,
thus allowing for searches which give more/less weight to distance (and for
example, more weight to a closer-matching search term).

Are there any good examples of how to do this? Or can someone point me in
the right direction?

Ideally I would create a CustomScoreQuery (instead of a ConstantScoreQuery)
that would take in a filter that would automatically give matching documents
a score, but I don't see a straight-forward way of doing this.
--

-- 
View this message in context: http://www.nabble.com/Score-for-query-generated-value-tp25234966p25234966.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
Michael McCandless | 1 Sep 2009 10:48

Re: Score for query-generated value

Function queries should work here? (org.apache.lucene.search.function.*).

Mike

On Tue, Sep 1, 2009 at 2:24 AM, marquinhocb<marquinhocb <at> hotmail.com> wrote:
>
> I would like to create a scorer that applies a score based on a value that is
> calculated during a query.  More specifically, to apply a score based on
> geographical distance from a latitude,longitude.
>
> What is the easiest way to go about doing this?  The LocalLucene contrib
> uses a SortComparatorSource, but I would like to, if possible, use the
> scoring mechanism so that distance is scored just like all other fields,
> thus allowing for searches which give more/less weight to distance (and for
> example, more weight to a closer-matching search term).
>
> Are there any good examples of how to do this? Or can someone point me in
> the right direction?
>
> Ideally I would create a CustomScoreQuery (instead of a ConstantScoreQuery)
> that would take in a filter that would automatically give matching documents
> a score, but I don't see a straight-forward way of doing this.
> --
> View this message in context: http://www.nabble.com/Score-for-query-generated-value-tp25234966p25234966.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe <at> lucene.apache.org
> For additional commands, e-mail: java-user-help <at> lucene.apache.org
(Continue reading)

Chris Bamford | 1 Sep 2009 11:33

RE: New "Stream closed" exception with Java 6

Hi Grant,

>>I think you code there needs to show the underlying exception, too, so  
>>we can see that stack trace.

Ummm... isn't this code already doing that?  What am I missing?

            try {

                indexWriter.addDocument(doc);

            } catch (CorruptIndexException ex) {

                throw new IndexerException ("CorruptIndexException on doc: " + doc.toString() +

                    " - " + ex.toString());

            } catch (IOException ex) {

                throw new IndexerException ("IOException on doc: " + doc.toString() +

                    " - " + ex.toString());

            }

Thanks,

- Chris

Chris Bamford
(Continue reading)

Fang_Li | 1 Sep 2009 11:37

exception to open a large index Insufficient system resources exist

I met a problem to open an index bigger than 8GB and the following
exception was thrown. There is a segment which is bigger than 4GB
already. After searching internet, it is said that not using compound
index may solve the problem.

The same exception was thrown when merging with another index happens.
If the problem is caused by the big segment, we can adjust Lucene
parameters to control the segment size. 
Anyone knows what's the cause for this exception? As reproducing the
problem will take a long time, your idea will be save us a lot of
effort.

java.io.IOException: Insufficient system resources exist to complete the
requested service
	at java.io.RandomAccessFile.readBytes(Native Method)
	at java.io.RandomAccessFile.read(RandomAccessFile.java:315)
	at
org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirector
y.java:596)
	at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.jav
a:157)
	at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.j
ava:38)
	at
org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78)
	at
org.apache.lucene.index.CompoundFileReader.<init>(CompoundFileReader.jav
a:73)
(Continue reading)

Danil ŢORIN | 1 Sep 2009 11:56
Picon

Re: exception to open a large index Insufficient system resources exist

There should be no problem with large segments.
Please describe OS, FileSystem and JDK you are running on.

There might be some problems with file >2Gb on Win32/FAT, or in some
ancient Linuxes.

On Tue, Sep 1, 2009 at 12:37, <Fang_Li <at> emc.com> wrote:
> I met a problem to open an index bigger than 8GB and the following
> exception was thrown. There is a segment which is bigger than 4GB
> already. After searching internet, it is said that not using compound
> index may solve the problem.
>
> The same exception was thrown when merging with another index happens.
> If the problem is caused by the big segment, we can adjust Lucene
> parameters to control the segment size.
> Anyone knows what's the cause for this exception? As reproducing the
> problem will take a long time, your idea will be save us a lot of
> effort.
>
> java.io.IOException: Insufficient system resources exist to complete the
> requested service
>        at java.io.RandomAccessFile.readBytes(Native Method)
>        at java.io.RandomAccessFile.read(RandomAccessFile.java:315)
>        at
> org.apache.lucene.store.FSDirectory$FSIndexInput.readInternal(FSDirector
> y.java:596)
>        at
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.jav
> a:157)
>        at
(Continue reading)

Fang_Li | 1 Sep 2009 12:03

RE: exception to open a large index Insufficient system resources exist

We are running on Windows 2003 Enterprise Edition with NTFS file system on a local disc. JDK version is 1.5.0.12.

The problem was discussed before and there is no clear solution confirmed.

Thanks.

-----Original Message-----
From: Danil ŢORIN [mailto:torindan <at> gmail.com] 
Sent: Tuesday, September 01, 2009 5:56 PM
To: java-user <at> lucene.apache.org
Subject: Re: exception to open a large index Insufficient system resources exist

There should be no problem with large segments.
Please describe OS, FileSystem and JDK you are running on.

There might be some problems with file >2Gb on Win32/FAT, or in some
ancient Linuxes.

On Tue, Sep 1, 2009 at 12:37, <Fang_Li <at> emc.com> wrote:
> I met a problem to open an index bigger than 8GB and the following
> exception was thrown. There is a segment which is bigger than 4GB
> already. After searching internet, it is said that not using compound
> index may solve the problem.
>
> The same exception was thrown when merging with another index happens.
> If the problem is caused by the big segment, we can adjust Lucene
> parameters to control the segment size.
> Anyone knows what's the cause for this exception? As reproducing the
> problem will take a long time, your idea will be save us a lot of
> effort.
(Continue reading)

Uwe Schindler | 1 Sep 2009 12:13
Picon
Favicon

RE: exception to open a large index Insufficient system resources exist

Which Lucene version, 64 bit JVM?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe <at> thetaphi.de

> -----Original Message-----
> From: Fang_Li <at> emc.com [mailto:Fang_Li <at> emc.com]
> Sent: Tuesday, September 01, 2009 12:04 PM
> To: java-user <at> lucene.apache.org
> Subject: RE: exception to open a large index Insufficient system resources
> exist
> 
> We are running on Windows 2003 Enterprise Edition with NTFS file system on
> a local disc. JDK version is 1.5.0.12.
> 
> The problem was discussed before and there is no clear solution confirmed.
> 
> Thanks.
> 
> -----Original Message-----
> From: Danil ŢORIN [mailto:torindan <at> gmail.com]
> Sent: Tuesday, September 01, 2009 5:56 PM
> To: java-user <at> lucene.apache.org
> Subject: Re: exception to open a large index Insufficient system resources
> exist
> 
> There should be no problem with large segments.
(Continue reading)

Chris Bamford | 1 Sep 2009 12:32

RE: Lucene gobbling file descriptors

Hi Erick,

>>Note that for search speed reasons, you really, really want to share your
>>readers and NOT open/close for every request.
I have often wondered about this - I hope you can help me understand it better in the context of our app, which
is an email client:

When one of our users receives email we index and store it so he (and only he) can search on it.  This means a
separate index per user.  On large customer sites this can mean hundreds/thousands of indexes.  Sharing
readers seems counter-intuitive, unless I am missing something.  What we do instead is that once a user
performs a search, we keep his IndexReader open in case he searches again.  At present, we have no expiry on
this mechanism, so they stay open indefinitely.  I'm a bit hazy on the underlying details but we have
observed that the number of open fds jumps by around 10 each time a new user performs a search.  What would be a
good strategy for managing this in your opinon?  Does it really make sense to keep the IndexReader open? 
Would performance suffer that much if we did an ope
 n/close for each search?  Or would it perhaps be better to close open readers after a period of inactivity?

Thanks for any wisdom / thoughts/ ideas.

- Chris

----- Original Message -----
From: Erick Erickson <erickerickson <at> gmail.com>
Sent: Thu, 27/8/2009 4:49pm
To: java-user <at> lucene.apache.org
Subject: Re: Lucene gobbling file descriptors

Note that for search speed reasons, you really, really want to share your
readers and NOT open/close for every request.
FWIW
(Continue reading)

Michael McCandless | 1 Sep 2009 13:03

Re: Lucene gobbling file descriptors

In this approach it's expected you'll run out of file descriptors,
when "enough" users attempt to search at the same time.

You can reduce the number of file descriptors required per IndexReader
by 1) using compound file format (it's the default for IndexWriter),
and 2) optimizing the index before opening it (though, since you have
updates trickling in, that could get costly).  Yet, if enough users
try to search, you'll run out of descriptors.

If performance is OK, I think you should in fact open IndexReader, do
search, close IndexReader, per request.  Or maybe reuse IndexReader
for the "biggest" indexes.  This reduces your "max file descriptors
envelope". Still, you can run of descriptors with the "perfect storm"
of usage.

Also make sure you're giving the JRE the max open file descriptors
allowed by the OS.

A bigger change would be to aggregate multiple users into a single
index, and use filtering to apply the entitlements constraints.  But
that's got its own set of tradeoffs... eg, scoring will be different,
respelling is dangerous (entitlements can "leak" through), it's less
"secure", etc.

Mike

On Tue, Sep 1, 2009 at 6:32 AM, Chris Bamford<Chris.Bamford <at> scalix.com> wrote:
> Hi Erick,
>
>>>Note that for search speed reasons, you really, really want to share your
(Continue reading)

Chris Bamford | 1 Sep 2009 14:20

RE: Lucene gobbling file descriptors

Hi Mike,

Thanks for the suggestions, very useful.  I would like to adopt a combination of setUseCompoundFile on the
IndexReader and perform an open/close per search.
As a start, I just tried to set compound file format on the IndexSearcher's underlying IndexReader, but it
is not available as a method.  This is for a Lucene 2.0 branch of the code...  Did it come in after 2.0 or am I
doing something wrong here?  This is what I am attempting to do:

        currentSearcher = new DelayCloseIndexSearcher(directory);
        if (currentSearcher != null) {
            currentSearcher.getIndexReader().setUseCompoundFile(true);
        }

However, the setUseCompoundFile() is not available  :-(

Thanks again,

- Chris

Chris Bamford
Senior Development Engineer
Scalix
chris.bamford <at> scalix.com
Tel: +44 (0)1344 381814
www.scalix.com

----- Original Message -----
From: Michael McCandless <lucene <at> mikemccandless.com>
Sent: Tue, 1/9/2009 12:03pm
To: java-user <at> lucene.apache.org
(Continue reading)


Gmane