HUI.OUYANG | 1 Sep 2005 07:28
Picon
Favicon

deleting documents from index

Hi,

In order to delete the documents in the index more efficiently during the incremental indexing process, I
implement the batch deleting process on the application level. First  I  get the internal document ids
based on the query, then only delete these documents based on the internal ids when the indexwriter is
closed or the index is optimized since the internal document ids change only whent the index optimized.
Could this be an issue ?
The reason for doing that is that deleting documents from the index in one thread fails sometimes when
another thread is adding new documents in the same index.

Regards,
hui
Shane O'Sullivan | 1 Sep 2005 15:36
Picon
Gravatar

Global Analysis possible in Lucene?

Hi,

Does anyone know if Lucene supports Global Analysis? I haven't seen any 
documentation on it.

Thanks

Shane O'Sullivan
Xiaozheng Ma | 1 Sep 2005 16:18
Favicon

RE: deleting documents from index

Indexing on one indexing file in a multithreaded env needs to be
serialized --you need to synchronize the call to
indexwriter.addDocument(). Otherwise Lucene will throw exceptions. After
all, Lucene uses file-based locking to ensure that only one thread can
modify the same index at the same time.  

In your situation, I believe, if you have multiple threads working on
same indexing file to index new docs, you still have same problem. But I
guess you probably only have one thread doing the indexing and another
one deletes the index by querying ids. 

One solution to multiple threaded indexing on the same index file is to
split the indexing process into independent pieces(and of course each
uses different index file): each thread works on indexing different docs
then at some point merges the segments into one index file if you will.
In the mean time, the deletion can delete the docs on the prior merged
file when the mention merging is not happening (it is not locked).

The merger code is like this:

        Directory[] inds = new Directory[fileList.length]; //each file
dir contains the complete and independent index segment
        for(int i=0; i<fileList.length;i++) { 
            String path = indexPath+"/"+fileList[i];
            inds[i] = FSDirectory.getDirectory(path, false); 
        } 
        indexPath = indexPath+"/merge";  //mergy to $(indexPath)/mergy
dir
        if(!(new File(indexPath).exists())){
            boolean success = (new File(indexPath)).mkdirs();
(Continue reading)

houyang | 1 Sep 2005 18:41
Picon
Favicon

RE: deleting documents from index

Thank you, Xiaozheng.
Actually the application could be more than 2 threads. And each thread could add/modify/delete documents
anytime (the deleting documents could be added earlier by another thread), so each thread can not work on
its own index file(thinking about any indexed document could be modified any time and you have to delete
previous version and add the new version). That is why I move the actual deleting of the document based on
the internal doc IDs to the end when all the threads finish.

-----Original Message-----
From: Xiaozheng Ma [mailto:Xiaozheng.Ma <at> redwood.com] 
Sent: Thursday, September 01, 2005 7:18 AM
To: java-dev <at> lucene.apache.org
Cc: HUI.OUYANG <at> ORACLE.COM
Subject: RE: deleting documents from index

Indexing on one indexing file in a multithreaded env needs to be
serialized --you need to synchronize the call to
indexwriter.addDocument(). Otherwise Lucene will throw exceptions. After
all, Lucene uses file-based locking to ensure that only one thread can
modify the same index at the same time.  

In your situation, I believe, if you have multiple threads working on
same indexing file to index new docs, you still have same problem. But I
guess you probably only have one thread doing the indexing and another
one deletes the index by querying ids. 

One solution to multiple threaded indexing on the same index file is to
split the indexing process into independent pieces(and of course each
uses different index file): each thread works on indexing different docs
then at some point merges the segments into one index file if you will.
In the mean time, the deletion can delete the docs on the prior merged
(Continue reading)

Erik Hatcher | 1 Sep 2005 18:49
Favicon

Re: Global Analysis possible in Lucene?

Shane - could you give us some details on what Global Analysis is and  
how it relates to full-text searching?

I googled for it and came up with some heavy duty mathematical stuff,  
but did not see a direct relationship with information retrieval.

     Erik

On Sep 1, 2005, at 9:36 AM, Shane O'Sullivan wrote:

> Hi,
>
> Does anyone know if Lucene supports Global Analysis? I haven't seen  
> any
> documentation on it.
>
> Thanks
>
> Shane O'Sullivan
>
jaina (sent by Nabble.com | 1 Sep 2005 21:39
Favicon

Performance Test Failing


Hi,
   We are running into performace problem as we use lucene.  Everything worked fine till yesterday and today
when we starte performance testing we saw an error.  Here are the details of the error

54eda4e1 SystemErr     R From Faceted SearchImpl rlinksIndexReader.open(dir) failed
[9/1/05 17:42:05:646 GMT] 54eda4e1 SystemErr     R java.io.FileNotFoundException:
/gsa/torgsa/.projects/p1/gsa_cdt_ods/projects/w3perf/datapersist/sales/support/skp/production/index/relatedlinks/secondary/_0.f20
(Too many open files)
[9/1/05 17:42:05:647 GMT] 54eda4e1 SystemErr     R              at java.io.RandomAccessFile.open(Native Method)
[9/1/05 17:42:05:647 GMT] 54eda4e1 SystemErr     R              at java.io.RandomAccessFile.
(RandomAccessFile.java(Compiled Code))
[9/1/05 17:42:05:647 GMT] 54eda4e1 SystemErr     R              at
org.apache.lucene.store.FSInputStream$Descriptor. (FSDirectory.java(Inlined Compiled Code))
[9/1/05 17:42:05:647 GMT] 54eda4e1 SystemErr     R              at org.apache.lucene.store.FSInputStream.
(FSDirectory.java(Inlined Compiled Code))
[9/1/05 17:42:05:648 GMT] 54eda4e1 SystemErr     R              at
org.apache.lucene.store.FSDirectory.openFile(FSDirectory.java(Compiled Code))
[9/1/05 17:42:05:648 GMT] 54eda4e1 SystemErr     R              at
org.apache.lucene.index.SegmentReader.openNorms(SegmentReader.java(Compiled Code))
[9/1/05 17:42:05:648 GMT] 54eda4e1 SystemErr     R              at org.apache.lucene.index.SegmentReader. (SegmentReader.java:153)
[9/1/05 17:42:05:648 GMT] 54eda4e1 SystemErr     R              at org.apache.lucene.index.SegmentReader. (SegmentReader.java:122)
[9/1/05 17:42:05:648 GMT] 54eda4e1 SystemErr     R              at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118)
[9/1/05 17:42:05:649 GMT] 54eda4e1 SystemErr     R              at
org.apache.lucene.store.Lock$With.run(Lock.java(Compiled Code))
[9/1/05 17:42:05:649 GMT] 54eda4e1 SystemErr     R              at
org.apache.lucene.index.IndexReader.open(IndexReader.java(Compiled Code))
[9/1/05 17:42:05:649 GMT] 54eda4e1 SystemErr     R              at com.ibm.sales.support.skp.fsearch.search.FacetedSearchImpl.openRelatedLinksIndex(FacetedSearchImpl.java:1603)
[9/1/05 17:42:05:649 GMT] 54eda4e1 SystemErr     R              at com.ibm.sales.support.skp.fsearch.search.FacetedSearchImpl.getMatchingRelatedLinksDocs(FacetedSearchImpl.java:1071)
[9/1/05 17:42:05:649 GMT] 54eda4e1 SystemErr     R              at
(Continue reading)

Nrupal Akolkar | 2 Sep 2005 09:01
Picon

Re: Performance Test Failing

Actually your problem is that of system virtual memory utilization. The 
thing arrived due to lack of system memory to continue indexing and reading 
the files. SystemErr is the error generated. 
The way to troubleshoot this thing from happening is to create a cluster of 
indexes and then indexing the indexes. 
To create cluster of indexes, use lucene indexing and limit index creation 
by time - i.e. limit indexing by time. And get the last file indexed name. 
Generate a unix script or java monitor program that will start indexing from 
the last file to further in the directory structure. Continue looping till 
you reach end of directory you need indexing. I think this will solve your 
problem.
Nrupal
0091-9879021334

 On 9/2/05, jaina (sent by Nabble.com <http://Nabble.com>) <lists <at> nabble.com> 
wrote: 
> 
> 
> Hi,
> We are running into performace problem as we use lucene. Everything worked 
> fine till yesterday and today when we starte performance testing we saw an 
> error. Here are the details of the error
> 
> 54eda4e1 SystemErr R From Faceted SearchImpl rlinksIndexReader.open(dir) 
> failed
> [9/1/05 17:42:05:646 GMT] 54eda4e1 SystemErr R 
> java.io.FileNotFoundException: 
>
/gsa/torgsa/.projects/p1/gsa_cdt_ods/projects/w3perf/datapersist/sales/support/skp/production/index/relatedlinks/secondary/_0.f20 
> (Too many open files)
(Continue reading)

Valmir Macário | 2 Sep 2005 19:52
Picon

Secure Server index

 Hi, i'm new lucenes' developer and i wolud like to known if lucene or nuch 
suports some security configuration. I would like to keep hide some 
information for public acces and configure this hide acces for people wich 
has permission to acces this information. How i can do that? I apreciate if 
someone can help-me. 

Thank you

Valmir
Jeff Turner | 3 Sep 2005 13:15
Picon
Favicon

Migrate from Bugzilla to JIRA?

Lucene devs,

A while ago (indeed a *long* while ago), a request was made of
Infrastructure to migrate Lucene's issues from Bugzilla to JIRA:

http://issues.apache.org/jira/browse/INFRA-199

Due to importer bugs and lack of infra <at>  volunteers it has taken a long
time, but the import has now happened:

http://issues.apache.org/jira/browse/LUCENE

This mail is to give you a chance to review the import and decide if you
still want to proceed with the migration.  Currently only Lucene devs can
create issues (for testing purposes), pending a proper switchover.

If you wish to continue, please:

 - Submit an INFRA task to disable Lucene's bugzilla
 - Ask the nearest JIRA admin (me, Erik Hatcher, Doug Cutting) to re-run
   the Bugzilla importer, importing only new Bugzilla issues (there is a
   checkbox for this).
 - Grant 'jira-users' the 'Create' permission in the Lucene permission
   scheme, and associate the 'Lucene Notification Scheme' with the Lucene
   project, so emails start going to this list (the moderator will need
   to let them through).

If you want to stick with Bugzilla, let me know and I'll delete the JIRA
LUCENE project.

(Continue reading)

Erik Hatcher | 5 Sep 2005 01:49
Picon
Favicon

Fwd: [jira] Commented: (INFRA-199) Convert Lucene's Bugzilla to JIRA

I haven't seen this come across the java-dev list (I could have  
missed it though).  Everyone ok with moving to JIRA?

     Erik

Begin forwarded message:

> From: "Jeff Turner (JIRA)" <jira <at> apache.org>
> Date: September 3, 2005 7:16:34 AM EDT
> To: ehatcher <at> apache.org
> Subject: [jira] Commented: (INFRA-199) Convert Lucene's Bugzilla to  
> JIRA
>
>
>     [ http://issues.apache.org/jira/browse/INFRA-199? 
> page=comments#action_12322579 ]
>
> Jeff Turner commented on INFRA-199:
> -----------------------------------
>
> I've done the import and mailed java-dev <at> lucene to see if they wish  
> to proceed.
>
>
>> Convert Lucene's Bugzilla to JIRA
>> ---------------------------------
>>
>>          Key: INFRA-199
>>          URL: http://issues.apache.org/jira/browse/INFRA-199
>>      Project: Infrastructure
(Continue reading)


Gmane