RE: deleting documents from index
houyang <hui.ouyang <at> oracle.com>
2005-09-01 16:41:50 GMT
Thank you, Xiaozheng.
Actually the application could be more than 2 threads. And each thread could add/modify/delete documents
anytime (the deleting documents could be added earlier by another thread), so each thread can not work on
its own index file(thinking about any indexed document could be modified any time and you have to delete
previous version and add the new version). That is why I move the actual deleting of the document based on
the internal doc IDs to the end when all the threads finish.
-----Original Message-----
From: Xiaozheng Ma [mailto:Xiaozheng.Ma <at> redwood.com]
Sent: Thursday, September 01, 2005 7:18 AM
To: java-dev <at> lucene.apache.org
Cc: HUI.OUYANG <at> ORACLE.COM
Subject: RE: deleting documents from index
Indexing on one indexing file in a multithreaded env needs to be
serialized --you need to synchronize the call to
indexwriter.addDocument(). Otherwise Lucene will throw exceptions. After
all, Lucene uses file-based locking to ensure that only one thread can
modify the same index at the same time.
In your situation, I believe, if you have multiple threads working on
same indexing file to index new docs, you still have same problem. But I
guess you probably only have one thread doing the indexing and another
one deletes the index by querying ids.
One solution to multiple threaded indexing on the same index file is to
split the indexing process into independent pieces(and of course each
uses different index file): each thread works on indexing different docs
then at some point merges the segments into one index file if you will.
In the mean time, the deletion can delete the docs on the prior merged
(Continue reading)