Julien Nioche | 1 Jul 10:53 2004

Re: Optimizing for long queries? >> 40% faster by changing INDEX_INTERVAL

I got a little bit deeper in my experiments with INDEX_INTERVAL. In a
previous mail to the user list I reported a 10% improvement over the regular
setting (128) with one of my application.
I refined the measures by taking the time spent not in the whole
application, but in a method that encapsulates Lucene searches. Only the
search time is measured, not the access to the Documents.

Two sets of queries are generated using a log of user queries from our
application. Theses queries are in natural language and are expanded by our
product into a Lucene boolean query. Attached is the boolean generated for
the query "Burgundy wine" - just to give you an idea of what I mean by large
query (this one is particularly big).

These queries are used on an optimized index (INDEX_INTERVAL=16) and a
regular index. The index used for this test is 720 MB - FSDirectory on
Fedora 1 the .tii file is 3398 Kb in the modified version against 488Kb in
the original. Both sets of queries have the same size (783). The xls file
contains the times for both indexes sorted by decreasing order. Actually the
numbers indicates not a single search but a group of up to 4 searches.

In average, changing the indexinterval to 16 yields an improvement of about
40% compared to the regular setting.
I will try with a bigger sample of 40.000 queries and with smaller queries
as well.

The original motivation for this feature can be found at
http://www.mail-archive.com/lucene-dev <at> jakarta.apache.org/msg04092.html

What is the best way to set up this value in IndexWriter? Maybe we could
limit to a few possible values like :
(Continue reading)

Julien Nioche | 1 Jul 14:32 2004

Re: Optimizing for long queries? >> 40% faster by changing INDEX_INTERVAL

A similar experiment with 500 shorter queries shows a 20% speed improvement.
(see xls file for details)
By shorter query I mean something like that :
((titre:"burgundy wines"~3 titre:"burgundy wine"~3)) ((texte:"burgundy
wines"~3^3.0 texte:"burgundy wine"~3^3.0)) ((descr:"burgundy wines"~3^4.0
descr:"burgundy wine"~3^4.0)) ((kw:"burgundy wines"~3^4.0 kw:"burgundy
wine"~3^4.0))

----- Original Message ----- 

From: "Julien Nioche" <Julien.Nioche <at> lingway.com>
To: "Lucene Developers List" <lucene-dev <at> jakarta.apache.org>
Cc: <drew.farris <at> gmail.com>
Sent: Thursday, July 01, 2004 10:53 AM
Subject: Re: Optimizing for long queries? >> 40% faster by changing
INDEX_INTERVAL

> I got a little bit deeper in my experiments with INDEX_INTERVAL. In a
> previous mail to the user list I reported a 10% improvement over the
regular
> setting (128) with one of my application.
> I refined the measures by taking the time spent not in the whole
> application, but in a method that encapsulates Lucene searches. Only the
> search time is measured, not the access to the Documents.
>
> Two sets of queries are generated using a log of user queries from our
> application. Theses queries are in natural language and are expanded by
our
> product into a Lucene boolean query. Attached is the boolean generated for
> the query "Burgundy wine" - just to give you an idea of what I mean by
(Continue reading)

Julien Nioche | 1 Jul 14:38 2004

Re: Optimizing for long queries? >> 40% faster by changing INDEX_INTERVAL

The xls files did not pass. You can download them from the following URLs :
http://jnioche.freesurf.fr/shortQueries.xls
http://jnioche.freesurf.fr/longQueries.xls

----- Original Message ----- 
From: "Julien Nioche" <Julien.Nioche <at> lingway.com>
To: "Lucene Developers List" <lucene-dev <at> jakarta.apache.org>
Sent: Thursday, July 01, 2004 2:32 PM
Subject: Re: Optimizing for long queries? >> 40% faster by changing
INDEX_INTERVAL

> A similar experiment with 500 shorter queries shows a 20% speed
improvement.
> (see xls file for details)
> By shorter query I mean something like that :
> ((titre:"burgundy wines"~3 titre:"burgundy wine"~3)) ((texte:"burgundy
> wines"~3^3.0 texte:"burgundy wine"~3^3.0)) ((descr:"burgundy wines"~3^4.0
> descr:"burgundy wine"~3^4.0)) ((kw:"burgundy wines"~3^4.0 kw:"burgundy
> wine"~3^4.0))
>
> ----- Original Message ----- 
>
> From: "Julien Nioche" <Julien.Nioche <at> lingway.com>
> To: "Lucene Developers List" <lucene-dev <at> jakarta.apache.org>
> Cc: <drew.farris <at> gmail.com>
> Sent: Thursday, July 01, 2004 10:53 AM
> Subject: Re: Optimizing for long queries? >> 40% faster by changing
> INDEX_INTERVAL
>
>
(Continue reading)

lucene-cvs | 1 Jul 18:15 2004
Picon
Picon

[Jakarta Lucene Wiki] Updated: FrontPage

   Date: 2004-07-01T09:15:13
   Editor: 80.76.64.251 <>
   Wiki: Jakarta Lucene Wiki
   Page: FrontPage
   URL: http://wiki.apache.org/jakarta-lucene/FrontPage

   no comment

Change Log:

------------------------------------------------------------------------------
 <at>  <at>  -41,3 +41,7  <at>  <at> 
 [http://nagoya.apache.org/eyebrowse/SummarizeList?listId=30]

 For other projects that extend, support, or in some way enhance the use of Lucene, see LuceneResources. 
+
+
+
+siete tutti dei cazzoni
lucene-cvs | 1 Jul 18:27 2004
Picon
Picon

[Jakarta Lucene Wiki] Updated: IntroductionToLucene

   Date: 2004-07-01T09:27:51
   Editor: 128.230.38.212 <>
   Wiki: Jakarta Lucene Wiki
   Page: IntroductionToLucene
   URL: http://wiki.apache.org/jakarta-lucene/IntroductionToLucene

   no comment

Change Log:

------------------------------------------------------------------------------
 <at>  <at>  -6,4 +6,6  <at>  <at> 

 [http://conferences.oreillynet.com/presentations/os2003/hatcher_erik_lucene.pdf Introducing
Lucene] (by Erik Hatcher - Powerpoint presentation)

+[http://www.darksleep.com/lucene/ Lucene Tutorial] (by Steven J. Owens)
+
 [http://www.chedong.com/tech/lucene.html Lucene Introduction in Chinese]
Lucene&#65306;&#22522;&#20110;Java&#30340;&#20840;&#25991;&#26816;&#32034;&#24341;&#25806;&#31616;&#20171;
(by Che Dong; &#20316;&#32773;&#65306; &#36710;&#19996;)
lucene-cvs | 1 Jul 18:31 2004
Picon
Picon

[Jakarta Lucene Wiki] Updated: FrontPage

   Date: 2004-07-01T09:31:17
   Editor: 82.135.7.186 <>
   Wiki: Jakarta Lucene Wiki
   Page: FrontPage
   URL: http://wiki.apache.org/jakarta-lucene/FrontPage

   delete crap

Change Log:

------------------------------------------------------------------------------
 <at>  <at>  -41,7 +41,3  <at>  <at> 
 [http://nagoya.apache.org/eyebrowse/SummarizeList?listId=30]

 For other projects that extend, support, or in some way enhance the use of Lucene, see LuceneResources. 
-
-
-
-siete tutti dei cazzoni
Giulio Cesare Solaroli | 1 Jul 19:15 2004
Picon

Request for Lucene mentoring

Hi all,

this is message is not about a development topic, but it is meant for
developers, so I hope you don't feel too upset.

I am writing here because I am evaluating the option of hiring a
Lucene expert for a couple of days to come to us and investigate with
us the option to improve our Lucene usage.

At the moment we are using Lucene (mainly) internally to keep indexed
a DB of about 5 millions documents, with 50.000 new documents entering
every day (along the 24 hours) and about the same amount of documents
deleted daily (the delete procedure is under our control and we run it
mainly by night).

The size of the index is about 50Gb, and no field (except for the
Primary key) is stored into the index.

The availability of the Lucene index is getting more and more
important daily, but we have many more things to develop, and we are
not able to get the required competences fast enough.

For this reason we thought  that a few days of mentor ship by some
expert Lucene developer could help us in improving efficiently our
current situation.

Our office is locate in Bagnacavallo - Italy
(http://corporate.extrapola.com/aboutus/where/research/en?set_language=en&cl=en).

We don't have an exact time frame, but as we are close to the sea,
(Continue reading)

cutting | 1 Jul 19:40 2004
Picon

cvs commit: jakarta-lucene/xdocs index.xml

cutting     2004/07/01 10:40:41

  Modified:    .        CHANGES.txt build.xml
               docs     index.html
               xdocs    index.xml
  Log:
  Preparing for 1.4 final release.

  Revision  Changes    Path
  1.94      +2 -2      jakarta-lucene/CHANGES.txt

  Index: CHANGES.txt
  ===================================================================
  RCS file: /home/cvs/jakarta-lucene/CHANGES.txt,v
  retrieving revision 1.93
  retrieving revision 1.94
  diff -u -r1.93 -r1.94
  --- CHANGES.txt	9 Jun 2004 11:28:46 -0000	1.93
  +++ CHANGES.txt	1 Jul 2004 17:40:41 -0000	1.94
   <at>  <at>  -2,7 +2,7  <at>  <at> 

   $Id$

  -1.4 RC4
  +1.4 final

    1. Added "an" to the list of stop words in StopAnalyzer, to complement
       the existing "a" there.  Fix for bug 28960

  
(Continue reading)

Doug Cutting | 1 Jul 20:01 2004
Picon

new Lucene release: 1.4 final

Version 1.4-final of Lucene is available for download from:

http://cvs.apache.org/dist/jakarta/lucene/v1.4-final/

Changes are described at:

http://cvs.apache.org/viewcvs.cgi/*checkout*/jakarta-lucene/CHANGES.txt?rev=1.94

Doug
Christoph Goller | 2 Jul 14:10 2004
Picon

Performance of TermVectors and skipTo

Hi folks,

I have done some performance tests for TermVectors and the new
TermDocs.skipTo() implementation, both introduced with 1.4.
I am very pleased with the results. I did these tests with the
Reuters news corpus (roughly 800000 documents).

*) I compared TermVectors with the solution of storing the
respective fields and re-analyzing the documents in order to
get their terms. According to my measurements, TermVectors speed
up accesss to the terms by a factor of 7!

*) For testing skipTo, I used my implementation for getting highly
correlated terms. For computing the correlation measure I have to
compare a lot of TermDocs lists with each other or other lists of
document ids. According to my measurements on an optimized index
skipTo speeds up my term correlation implementation by a factor of
2. And the benefit of skipTo probably increases with index size.

regards,
Christoph

Gmane