1 Jul 2004 10:53
Re: Optimizing for long queries? >> 40% faster by changing INDEX_INTERVAL
Julien Nioche <Julien.Nioche <at> lingway.com>
2004-07-01 08:53:13 GMT
2004-07-01 08:53:13 GMT
I got a little bit deeper in my experiments with INDEX_INTERVAL. In a previous mail to the user list I reported a 10% improvement over the regular setting (128) with one of my application. I refined the measures by taking the time spent not in the whole application, but in a method that encapsulates Lucene searches. Only the search time is measured, not the access to the Documents. Two sets of queries are generated using a log of user queries from our application. Theses queries are in natural language and are expanded by our product into a Lucene boolean query. Attached is the boolean generated for the query "Burgundy wine" - just to give you an idea of what I mean by large query (this one is particularly big). These queries are used on an optimized index (INDEX_INTERVAL=16) and a regular index. The index used for this test is 720 MB - FSDirectory on Fedora 1 the .tii file is 3398 Kb in the modified version against 488Kb in the original. Both sets of queries have the same size (783). The xls file contains the times for both indexes sorted by decreasing order. Actually the numbers indicates not a single search but a group of up to 4 searches. In average, changing the indexinterval to 16 yields an improvement of about 40% compared to the regular setting. I will try with a bigger sample of 40.000 queries and with smaller queries as well. The original motivation for this feature can be found at http://www.mail-archive.com/lucene-dev <at> jakarta.apache.org/msg04092.html What is the best way to set up this value in IndexWriter? Maybe we could limit to a few possible values like :(Continue reading)
RSS Feed