Omar Cal | 1 Apr 08:55 2003
Picon

Problems on massive indexing

Hello, i'm a newby of Lucene.

I've the following scenario:
-450.000 xml files and text files
-5 indexes, two stored and three unstored
-lucene library 1.2 (tested also 1.3RC)

When i try to index the material i've an IndexOutOfBoundException in the 
call to the index.optimize() after two hours of indexing.I know there is 
the bug 14355 and i think it could be the responsable for that exception.

I've tried also to index the whole material in subsequent runs but the 
problem seems to depend on the number of the documents.

I've tried to set the maxFieldLength at its maximum but nothing appened.

If i split the material in "trunks" of about 20.000 - 30.000 documents 
in each directory, the problem doesn't appear. Obviously i've to repeat 
the searches for each "trunk" (directory).

Anyone out there with a similar scenario? Other solutions?

Thanks, Omar
Kristian Hermsdorf | 1 Apr 10:18 2003
Picon

Re: Problems on massive indexing

Hi
 
I also got the IndexOutOfBoundException while optimizing the index (index- 
size about 1GB, 50 Docs with 25 fields each).
(optimizing was called via merging of RamDirectoy to FSDirectory).
The problem was that the FieldsReader tried to read more fields than 
existed ... .I've no glue how to fix it ...
 
bye
 
Kristian

On Tue, 01 Apr 2003 08:55:23 +0200, Omar Cal <omar.cal <at> adriacom.it> wrote:

> Hello, i'm a newby of Lucene.
>
> I've the following scenario:
> -450.000 xml files and text files
> -5 indexes, two stored and three unstored
> -lucene library 1.2 (tested also 1.3RC)
>
> When i try to index the material i've an IndexOutOfBoundException in the 
> call to the index.optimize() after two hours of indexing.I know there is 
> the bug 14355 and i think it could be the responsable for that exception.
>
> I've tried also to index the whole material in subsequent runs but the 
> problem seems to depend on the number of the documents.
>
> I've tried to set the maxFieldLength at its maximum but nothing appened.
>
(Continue reading)

Rob Outar | 1 Apr 22:05 2003

Indexing Growth

Hi all,

	Will the index grow based on queries alone?  I build my index, then run
several queries against it and afterwards I check the size of the index and
in some cases it has grown quite a bit although I did not add anything???

Anyhow please let me know the cases when the index will grow.

Thanks,

Rob
Otis Gospodnetic | 1 Apr 22:07 2003
Picon

Re: Indexing Growth

Only when you add new documents to it.

Otis

--- Rob Outar <routar <at> ideorlando.org> wrote:
> Hi all,
> 
> 	Will the index grow based on queries alone?  I build my index, then
> run
> several queries against it and afterwards I check the size of the
> index and
> in some cases it has grown quite a bit although I did not add
> anything???
> 
> Anyhow please let me know the cases when the index will grow.
> 
> Thanks,
> 
> Rob
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe <at> jakarta.apache.org
> For additional commands, e-mail: lucene-user-help <at> jakarta.apache.org
> 

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://platinum.yahoo.com
(Continue reading)

Rob Outar | 1 Apr 22:13 2003

RE: Indexing Growth

Dang I must be doing something crazy cause all my client app does is search
and the index size increases.  I do not add anything.

Thanks,

Rob

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic <at> yahoo.com]
Sent: Tuesday, April 01, 2003 3:07 PM
To: Lucene Users List
Subject: Re: Indexing Growth

Only when you add new documents to it.

Otis

--- Rob Outar <routar <at> ideorlando.org> wrote:
> Hi all,
>
> 	Will the index grow based on queries alone?  I build my index, then
> run
> several queries against it and afterwards I check the size of the
> index and
> in some cases it has grown quite a bit although I did not add
> anything???
>
> Anyhow please let me know the cases when the index will grow.
>
> Thanks,
(Continue reading)

Alex Murzaku | 1 Apr 22:21 2003

RE: Indexing Growth

I don't know if I remember this correctly: I think for every query
(term) is created a file but the file should disappear after the query
is completed. 

-----Original Message-----
From: Rob Outar [mailto:routar <at> ideorlando.org] 
Sent: Tuesday, April 01, 2003 3:13 PM
To: Lucene Users List
Subject: RE: Indexing Growth

Dang I must be doing something crazy cause all my client app does is
search and the index size increases.  I do not add anything.

Thanks,

Rob

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic <at> yahoo.com]
Sent: Tuesday, April 01, 2003 3:07 PM
To: Lucene Users List
Subject: Re: Indexing Growth

Only when you add new documents to it.

Otis

--- Rob Outar <routar <at> ideorlando.org> wrote:
> Hi all,
>
(Continue reading)

Rob Outar | 1 Apr 22:32 2003

RE: Indexing Growth

I reuse the same searcher, analyzer and Query object I don't think that
should cause the problem.

Thanks,

Rob

-----Original Message-----
From: Alex Murzaku [mailto:lists <at> lissus.com]
Sent: Tuesday, April 01, 2003 3:22 PM
To: 'Lucene Users List'
Subject: RE: Indexing Growth

I don't know if I remember this correctly: I think for every query
(term) is created a file but the file should disappear after the query
is completed.

-----Original Message-----
From: Rob Outar [mailto:routar <at> ideorlando.org]
Sent: Tuesday, April 01, 2003 3:13 PM
To: Lucene Users List
Subject: RE: Indexing Growth

Dang I must be doing something crazy cause all my client app does is
search and the index size increases.  I do not add anything.

Thanks,

Rob

(Continue reading)

David Spencer | 2 Apr 02:15 2003

Re: I need a list of the indexed words

jcrowell wrote:

>Thanks for responding.  Are you referring to the solution under the title:
>
>"How do I retrieve all the values of a particular field that exists within
>an index, across all documents" ?
>
Here's some code that might do what you want.
It's shows the frequency of each term also.
Args are "-i INDEX" and optional "-f FIELD".
I haven't tested it outside my tree (to see if it's clean of 
dependencies) but
it looks reasonable at a glance.

package com.tropo.lucene;

import org.apache.lucene.analysis.*;
import org.apache.lucene.index.*;
import org.apache.lucene.document.*;

import java.io.*;
import java.util.*;

/**
 * Call IndexReader.terms(), termPositions().
 */
public class DumpTerms
{
    private DumpTerms()
    {
(Continue reading)

Amit Kapur | 2 Apr 11:43 2003
Picon

Behaviour of Lucene during Stress/Scalability Test

hi everybody

I am trying to index documents using Lucene generating about 30 MB of index (Optimized) which can be raised
to about 100 MB or More ( but that would be on a high end server machine).

Description of Current Case:
#---Each Document has four fields (One Text field, and 3 other Keyword Fields). 
#---The analyzer is based on a StopFilter and a PorterStemFilter.
#---I am using a Compaq PIII, 128 MB RAM, 650 MHz. 
#---mergeFactor is set to 25, and I am optimizing the index after adding about 20 Documents.
#---Using Lucene Release 1.2

Problem Faced
After adding about 4000 Documents generating an index of 30 MB, I initially got an error saying, ****
couldn't rename segments.new to segments **** after which the IndexReader or the IndexWriter to the
current index couldnot be opened.

Then I changed a couple of settings, 
#---mergeFactor=20 and Optimize was called after ever 10 documents.
#---Using Lucene Release 1.3

Problem Faced
After adding about 1500 Documents generating an index of 10 MB, I initially got an error saying, ****
F:\Program Files\OmniDocs Server\ftstest\_3cf.fnm (Too many open files)**** after which the
IndexWriter to the current index couldnot be opened.

Now my requirement needs to have a much much larger index (practically) and I am actually at the point where
these errors are coming unpredictably. 

Please if anyone could guide me on this ASAP.
(Continue reading)

Rob Outar | 2 Apr 15:50 2003

RE: Indexing Growth

Hi all,

	This is too odd and I do not even know where to start.  We built a Windows
Explorer type tool that indexes all files in a "sabdboxed" file system.
Each Lucene document contains stuff like path, parent directory, last
modified date, file_lock etc..  When we display the files in a given
directory through the tool we query the index about 5 times for each file in
the repository, this is done so we can display all attributes in the index
about that file.  So for example if there are 5 files in the directory, each
file has 6 attributes that means about 30 term queries are executed.  The
initial index when build it about 10.4megs, after accessing about 3 or 4
directories the index size increased to over 100megs, and we did not add
anything!!  All we are doing is querying!!  Yesterday after querying became
ungodly slow, we looked at the index size it had grown from 10megs to 1.5GB
(granted we tested the tool all morning).  But I have no idea why the index
is growing like this.  ANY help would be greatly appreciated.

Thanks,

Rob

-----Original Message-----
From: Rob Outar [mailto:routar <at> ideorlando.org]
Sent: Tuesday, April 01, 2003 3:32 PM
To: Lucene Users List; lists <at> lissus.com
Subject: RE: Indexing Growth

I reuse the same searcher, analyzer and Query object I don't think that
should cause the problem.

(Continue reading)


Gmane