Mary Smith | 1 May 01:09 2006
Picon

integrate TREC with Lucene

Hello,

This is an old question, but I am not able to find the solution yet.

I want to integrate TREC with Lucene.
Any hint?

Mary
Marvin Humphrey | 1 May 02:28 2006

Re: Lucene search benchmark/stress test tool


On Apr 26, 2006, at 9:34 AM, Otis Gospodnetic wrote:

> I'm about to write a little command-line Lucene search benchmark  
> tool.  I'm interested in benchmarking search performance and the  
> ability to specify concurrency level (# of parallel search threads)  
> and response timing, so I can calculate min, max, average, and mean  
> times.  Something like 'ab' (Apache Benchmark) tool, but for Lucene.
>
> Has anyone already written something like this?

I'm about to.  The predecessor to the indexing benchmarker tests I  
recently published results for was enormously helpful while  
streamlining the indexing process.  Now that I'm considering  
modifications to search logic and file format which may have a  
substantial impact on search-time performance, I'll need a search  
benchmarker to complement the indexing benchmarker.  I'll be writing  
a both a Perl/KinoSearch and a Java Lucene version, and they will use  
the Reuters corpus.

Where are you at with your app?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
Chris Hostetter | 1 May 20:39 2006

Re: SpanFirstQuery and SpanNotQuery


After consulting LIA, I find the following sentence at the end of 5.4.2
"Finding spans at the beinning of a field" ...

	The resulting span matches are the same as the orriginal SpanQuery
	spans, ...

...which indicates that it is working as documented -- but it remains to
be seen if that's what was intended or not.

In any case, it begs the question, how would someone go about building a
SpanQuery where the intent is:

   SpanQuery X must be in the first N positions
      AND
   SpanQuery Z must not come before SpanQuery X

The best i can figure would be something like...

   SpanQuery q = new SpanNotQuery(
       new SpanFirstQuery(X, N),
       new SpanFirstQuery(new SpanNearQuery(new SpanQuery[]{Z, X.clone()},
                                            N-1, true),
                          N-1);

...but this doesn't seem to work at the moment because of LUCENE-560.
Even if it did work, it seems kind of kludgy, anyone have a better
suggestion?

: I'm looking at SpanQueries as I work on new test cases for LUCENE-557, and
(Continue reading)

Otis Gospodnetic | 1 May 21:34 2006
Picon

Re: Lucene search benchmark/stress test tool

Marvin,
I wrote my Lucene search benchmarker, but will have to check with my employer about contributing it to
Lucene.  It's rather simple - I used Java 1.5 concurrency package's ThreadedPoolExecutor for executing N
parallel search requests, measured elaphsed time for each request, and then when all searches were done,
I calculated min/max/median/percentile/etc.

Otis

----- Original Message ----
From: Marvin Humphrey <marvin <at> rectangular.com>
To: java-user <at> lucene.apache.org
Sent: Sunday, April 30, 2006 8:28:20 PM
Subject: Re: Lucene search benchmark/stress test tool

On Apr 26, 2006, at 9:34 AM, Otis Gospodnetic wrote:

> I'm about to write a little command-line Lucene search benchmark  
> tool.  I'm interested in benchmarking search performance and the  
> ability to specify concurrency level (# of parallel search threads)  
> and response timing, so I can calculate min, max, average, and mean  
> times.  Something like 'ab' (Apache Benchmark) tool, but for Lucene.
>
> Has anyone already written something like this?

I'm about to.  The predecessor to the indexing benchmarker tests I  
recently published results for was enormously helpful while  
streamlining the indexing process.  Now that I'm considering  
modifications to search logic and file format which may have a  
substantial impact on search-time performance, I'll need a search  
benchmarker to complement the indexing benchmarker.  I'll be writing  
(Continue reading)

Michael Chan | 1 May 21:37 2006
Picon
Picon

Stemming terms in SpanQuery

Hi,

I'm trying to build a SpanQuery using word stems. Is parsing each term 
with a QueryParser, constructed with an Analyzer giving stemmed 
tokenStream, the right approach? It just seems to me that QueryParser is 
designed to parse queries, and so my hunch is that there might be a 
better way.

Any help will be much appreciated.

Michael
Mariano Barcia | 2 May 12:38 2006
Picon

Kneobase: open source, enterprise search

Hi list,

 

I’m glad to announce Colaborativa.net has released Kneobase, an open source "enterprise search" product, based on Lucene.

 

Kneobase can accept many data sources as searchable elements, and can provide search results in multiple formats, including SOAP, which might make it a good search engine for use in a service-oriented environment, because it doesn't need search indexes published to a web server.

 

See this article in TSS for more information,

 

Cheers!

 

--mariano

 

 

Lideramos la innovación

 

Mariano Barcia
Founder, CEO

COLABORATIVA.NET
Córdoba 1147 Piso 6 Oficinas 3 y 4
(S2000AWO) Rosario, SF,
Argentina

mariano.barcia <at> colaborativa.net
IM: mariano_barcia <at> hotmail.com
Skype: mbarcia
http://www.colaborativa.net
http://www.kneobase.com

tel:
mobile:

+54 (341) 528 9987
+54 9 (341) 587 6695

 

 

Ramana Jelda | 2 May 12:55 2006

OutOfMemoryError while enumerating through reader.terms(fieldName)

Hi,
I am getting OutOfMemoryError , while enumerating through  TermEnum  after
invoking reader.terms(fieldName).

Just to provide you more information, I have almost 10000 unique terms in
field A. I can successfully enumerate around 5000terms but later I am
gettting OutOfMemoryError.

I set jvm max memory as 512MB , Ofcourse my index is bigger than this memory
around 1GB-2GB..
How can I ask lucene to cleanup loaded index and traverse through remaining
terms.. It seems while enumerating memory always grows in steps of some MBs.

Any help would be really appreciable.

Thanks in advance,
Jelda
Lukas Vlcek | 2 May 14:52 2006
Picon

Re: Kneobase: open source, enterprise search

I was quickly looking at its web page eariler this day and it looks good so
far! Good news!

However, I have one question: does Kneobase contain any kind of web crawler
functionality (like Nutch) or do I have to feed it with all sources
*manually*? How much can be gathering of web data automated?

May be this questions should go to Kneobase maillist or directly to TSS site
but they require registration... and I am too lazy about giving out
information about budget and my position details now :-)

Regards,
Lukas

On 5/2/06, Mariano Barcia <mariano.barcia <at> colaborativa.net> wrote:
>
>  Hi list,
>
>
>
> I'm glad to announce Colaborativa.net has released Kneobase, an open
> source "enterprise search" product, based on Lucene.
>
>
>
> *Kneobase can accept many data sources as searchable elements, and can
> provide search results in multiple formats, including SOAP, which might make
> it a good search engine for use in a service-oriented environment, because
> it doesn't need search indexes published to a web server.*
>
>
>
> See this article in TSS<http://www.theserverside.com/news/thread.tss?thread_id=40160>for more information,
>
>
>
> Cheers!
>
>
>
> --mariano
>
>
>
>
>
> <http://www.colaborativa.net/>
>
> Lideramos la innovación
>
>
>
> *Mariano Barcia*
> *Founder, CEO*
>
> *COLABORATIVA.NET*
> Córdoba 1147 Piso 6 Oficinas 3 y 4
> (S2000AWO) Rosario, SF,
> Argentina
>
> mariano.barcia <at> colaborativa.net
> IM: mariano_barcia <at> hotmail.com
> Skype: mbarcia
> http://www.colaborativa.net
> http://www.kneobase.com
>
> tel:
> mobile:
>
> +54 (341) 528 9987
> +54 9 (341) 587 6695
>
>
>
>
>
Ramana Jelda | 2 May 15:07 2006

RE: OutOfMemoryError while enumerating through reader.terms(fieldName)


Hi,
I just debugged it closely.. Sorry I am getting OutOfMemoryError not because
of reader.terms()
But because of invoking QueryFilter.bits() method for each unique term.
I will try explain u with psuedo code.

 while(term != null){
       if(term.field().equals(name)){
          String termText = term.text();
          keys.addElement(termText);
       }else{
         break;
       }
      if(te.next()){
        term = te.term();
       }else{
      break;
      }
 }

for(Iterator iter = keys.iterator(); iter.hasNext();){
  String termText = (String) iter.next();
 TermQuery termQuery = new TermQuery(new Term(fieldName, termText));
   QueryFilter filter = new QueryFilter(termQuery);
   final BitSet bits;
   bits = filter.bits(ciaoReader.getIndexReader());
   BitSet pr = cache.put(termText, bits);
}
}

Second for loop which gets BitSet using QueryFilter is now throwing
OutOfMemoryError.

Any advise is relly welcome.

Thx,
Jelda

> -----Original Message-----
> From: Ramana Jelda [mailto:ramana.jelda <at> ciao-group.com] 
> Sent: Tuesday, May 02, 2006 12:55 PM
> To: java-user <at> lucene.apache.org
> Subject: OutOfMemoryError while enumerating through 
> reader.terms(fieldName)
> 
> Hi,
> I am getting OutOfMemoryError , while enumerating through  
> TermEnum  after invoking reader.terms(fieldName).
> 
> Just to provide you more information, I have almost 10000 
> unique terms in field A. I can successfully enumerate around 
> 5000terms but later I am gettting OutOfMemoryError.
> 
> I set jvm max memory as 512MB , Ofcourse my index is bigger 
> than this memory around 1GB-2GB..
> How can I ask lucene to cleanup loaded index and traverse 
> through remaining terms.. It seems while enumerating memory 
> always grows in steps of some MBs.
> 
> Any help would be really appreciable.
> 
> Thanks in advance,
> Jelda
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe <at> lucene.apache.org
> For additional commands, e-mail: java-user-help <at> lucene.apache.org
> 
trupti mulajkar | 2 May 15:38 2006
Picon

creating indexReader object

i am trying to create an object of index reader class that reads my index. i
need this to further generate the document and term frequency vectors.
however when i try to print the contents of the documents (doc.get("contents"))
it shows -null .
any suggestions, 
if i cant read the contents then i cannot create the other vectors.

any help will be apprecisted

cheers,
trupti mulajkar
MSc Advanced Computer Science

Gmane