Heng Mei | 1 Jun 01:57 2006
Picon

lucene memory usage, caching...

Hi experts,

Does Lucene do any caching of Document fields during a search?
If I perform a search and retrieve some fields in the Document hits,
then I repeat the same search, are those fields cached in memory?  It
doesn't seem to be -- I'm performing several thousand unique search
and retrievely a large text field from each Document, but the memory
usage by the Lucene java process seems very low compared to the total
size of all those fields.

Just curious if there are any config params or other way to tweak
Lucene's caching strategies...

Thanks,
~Heng
Van Nguyen | 1 Jun 02:10 2006

fuzzyquery question

I have a question regarding the results I get back from a fuzzyquery.
If I were to do a fuzzy search on:
 
Classic series
 
Should it come back with a result like:
 
Standard Series Non Vented Hat - Class E&G
 
If I do a search on:
 
Clssic Series
 
it will return the same results I get from a non-fuzzy search.
 
------
 
This is the code:
 
String[] searchTerms = however I parse the search string
 
BooleanQuery query = new BooleanQuery();
            
            for (int i=0; i<searchTerms.length; i++)
            {
                Query temp = new FuzzyQuery(new Term("CONTENTS",
stringField[i]));
                // add search words into query
                query.add(temp, BooleanClause.Occur.MUST);
            }
(Continue reading)

Charles Mi | 1 Jun 05:03 2006
Picon

preloading / "warming up" the index

Is there a way to preload the index into memory when the process starts?
Basically I want to warm up the index before processing user queries. What
are some recommended ways to do this? Thanks.
Monsur Hossain | 1 Jun 05:12 2006
Picon

Re: preloading / "warming up" the index

When Lucene first issues a query, it caches a hash of sort values (one
value per document, plus a bit more if you are sorting on strings),
which takes a while.  Therefore, when our application first starts up,
we issue one query per sort type.  As I understand, it doesn't matter
what the query is or how complicated it is.

Monsur

On 5/31/06, Charles Mi <charlesmi <at> gmail.com> wrote:
> Is there a way to preload the index into memory when the process starts?
> Basically I want to warm up the index before processing user queries. What
> are some recommended ways to do this? Thanks.
>
>
Cheolgoo Kang | 1 Jun 05:18 2006
Picon

Re: preloading / "warming up" the index

Check this out.

http://mail-archives.apache.org/mod_mbox/lucene-java-user/200512.mbox/%3c48b708490512102110s6913a4c3k1c2c152596e50e06 <at> mail.gmail.com%3e

On 6/1/06, Monsur Hossain <monsurh <at> gmail.com> wrote:
> When Lucene first issues a query, it caches a hash of sort values (one
> value per document, plus a bit more if you are sorting on strings),
> which takes a while.  Therefore, when our application first starts up,
> we issue one query per sort type.  As I understand, it doesn't matter
> what the query is or how complicated it is.
>
> Monsur
>
>
>
> On 5/31/06, Charles Mi <charlesmi <at> gmail.com> wrote:
> > Is there a way to preload the index into memory when the process starts?
> > Basically I want to warm up the index before processing user queries. What
> > are some recommended ways to do this? Thanks.
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe <at> lucene.apache.org
> For additional commands, e-mail: java-user-help <at> lucene.apache.org
>
>

--

-- 
Cheolgoo
(Continue reading)

Charles Mi | 1 Jun 05:55 2006
Picon

Re: preloading / "warming up" the index

Thanks for the advice guys... i'm still not entirely clear on what a search
causes Lucene to do with respect to warming up/caching portions of the index
in memory.

If I warm up lucene using a search for "apple",  does Lucene load the entire
inverted index into Memory, or just the part of the index that contains the
entry for "apple" ?   Basically I'd like to make sure that the entire
inverted index (or as much as possible) is preloaded into memory, so if I
issue a subsequent search for "microsoft", it will be fast.    Does Lucene
have any mechanism for preloading the inverted index into memory?   Also is
there a way to figure out what percentage of lucene's data storage is
occupied by the inverted index, and what percentage is occupied by the other
info, like storing the documents' field values and such.

Thanks!
Charles

On 5/31/06, Monsur Hossain <monsurh <at> gmail.com> wrote:
>
> When Lucene first issues a query, it caches a hash of sort values (one
> value per document, plus a bit more if you are sorting on strings),
> which takes a while.  Therefore, when our application first starts up,
> we issue one query per sort type.  As I understand, it doesn't matter
> what the query is or how complicated it is.
>
> Monsur
>
>
>
> On 5/31/06, Charles Mi <charlesmi <at> gmail.com> wrote:
(Continue reading)

Amaresh Kumar Yadav | 1 Jun 05:56 2006

RE: how to craete index with particular ID

i want to search for text into "title" field only.

how shuold i specify it?

Regards..
Amaresh

-----Original Message-----
From: Alexey Sorokin [mailto:alsor.net <at> gmail.com]
Sent: Wednesday, May 31, 2006 4:21 PM
To: java-user <at> lucene.apache.org
Subject: Re: how to craete index with particular ID

Actually you don't need to create text file. Get data from db and
create Document that put in index. At least you must store ID of row
in Document. Or you may store doctitle and docpath too.

For each row you shoul do something like this:

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;

Document doc = new Document();
doc.add(new Field("ID", [here you row ID value], Field.Store.YES,
Field.Index.UN_TOKENIZED));
doc.add(new Field("TITLE", [here value of doctitle field],
Field.Store.NO, Field.Index.TOKENIZED));
doc.add(new Field("PATH", [here value of docpath field],
Field.Store.NO, Field.Index.TOKENIZED));
indexWriter.addDocument(doc);
(Continue reading)

Otis Gospodnetic | 1 Jun 06:34 2006
Picon

Re: how to craete index with particular ID

Here's an example that will work with the query parser:

  title:FAQ

Otis

----- Original Message ----
From: Amaresh Kumar Yadav <Amaresh.Yadav <at> niit.com>
To: "java-user <at> lucene.apache.org" <java-user <at> lucene.apache.org>
Sent: Wednesday, May 31, 2006 11:56:19 PM
Subject: RE: how to craete index with particular ID

i want to search for text into "title" field only.

how shuold i specify it?

Regards..
Amaresh

-----Original Message-----
From: Alexey Sorokin [mailto:alsor.net <at> gmail.com]
Sent: Wednesday, May 31, 2006 4:21 PM
To: java-user <at> lucene.apache.org
Subject: Re: how to craete index with particular ID

Actually you don't need to create text file. Get data from db and
create Document that put in index. At least you must store ID of row
in Document. Or you may store doctitle and docpath too.

For each row you shoul do something like this:
(Continue reading)

Otis Gospodnetic | 1 Jun 06:38 2006
Picon

Re: preloading / "warming up" the index

Look in your index directory and look for a .tii file.  That file is read in RAM (if there is enough of it.  If
there is not, you will see OOM).  What Monsur was talking about is related to sorting and warming up of
FieldCache instances.  If you don't sort your results by criteria other than the default relevance, you
can ignore FieldCache.
Any query should cause Lucene to read the whole .tii in RAM.
If you do not see a .tii file in your index directory, and instead see one or more .cfs file, you are using the
compound index format.  Run IndexReader as a java app (e.g. java org.apache.lucene....IndexReader
/your/index/dir/file(?)) to get a listing of individual index files inside a single cfs file.

Otis

----- Original Message ----
From: Charles Mi <charlesmi <at> gmail.com>
To: java-user <at> lucene.apache.org
Sent: Wednesday, May 31, 2006 11:55:44 PM
Subject: Re: preloading / "warming up" the index

Thanks for the advice guys... i'm still not entirely clear on what a search
causes Lucene to do with respect to warming up/caching portions of the index
in memory.

If I warm up lucene using a search for "apple",  does Lucene load the entire
inverted index into Memory, or just the part of the index that contains the
entry for "apple" ?   Basically I'd like to make sure that the entire
inverted index (or as much as possible) is preloaded into memory, so if I
issue a subsequent search for "microsoft", it will be fast.    Does Lucene
have any mechanism for preloading the inverted index into memory?   Also is
there a way to figure out what percentage of lucene's data storage is
occupied by the inverted index, and what percentage is occupied by the other
info, like storing the documents' field values and such.
(Continue reading)

Amaresh Kumar Yadav | 1 Jun 06:49 2006

RE: how to craete index with particular ID

where ??

Please send me url..

amaresh

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic <at> yahoo.com]
Sent: Thursday, June 01, 2006 10:04 AM
To: java-user <at> lucene.apache.org
Subject: Re: how to craete index with particular ID

Here's an example that will work with the query parser:

  title:FAQ

Otis

----- Original Message ----
From: Amaresh Kumar Yadav <Amaresh.Yadav <at> niit.com>
To: "java-user <at> lucene.apache.org" <java-user <at> lucene.apache.org>
Sent: Wednesday, May 31, 2006 11:56:19 PM
Subject: RE: how to craete index with particular ID

i want to search for text into "title" field only.

how shuold i specify it?

Regards..
Amaresh
(Continue reading)


Gmane