Chris Lu | 1 Mar 2009 05:08
Picon

Re: Merging database index with fulltext index

Yes. DBSight helps to flatten database objects into Lucene's documents. It's
more like Lucene-On-Rails. Custom crawler is supported via java api to crawl
outside database. DBSight query syntax and Lucene query syntax are both
supported, in addition to customizable analyzer, similarity, ranking, etc.

I think you better try it first. It's faster to install it, select the
content with your sql, and get the search up and running, than reading
introduction materials.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Sat, Feb 28, 2009 at 1:33 PM, <spring <at> gmx.eu> wrote:

> > Actually you can use DBSight(disclaimer:I work on it) to
> > collect the data
> > and keep them in sync.
>
> Hm... it fulltext-indexes a database?
> It supports document content outside the database (custom crawler)?
> What query-syntax it supports?
>
(Continue reading)

spring | 1 Mar 2009 08:19
Picon

RE: Merging database index with fulltext index

> Yes. DBSight helps to flatten database objects into Lucene's 
> documents.

OK, thx for the advice.

But back to my original question.

When I have to merge both resultsets, what is the best approach to do this?
Raymond Balmès | 1 Mar 2009 13:04
Picon

N-grams with numbers and Shinglefilters

Hi,

I'm trying to index (& search later) documents that contain tri-grams
however they have the following form:

<string> <2 digit> <2 digit>

Does the ShingleFilter work with numbers in the match ?

Another complication, in future features I'd like to add optional digits
like

[<1 digit>] <string> <2 digit> <2 digit>

I suppose the ShingleFilter won't do it ?
Any better advice ?

Any help appreciated.

-RB-
Michael McCandless | 1 Mar 2009 13:11

Re: Faceted Search using Lucene


On a quick look, I think there are a few problems with the code:

   * I don't see any synchronization -- it looks like two search
     requests are allowed into this method at the same time?  Which is
     dangerous... eg both (or, more) will wastefully reopen the
     readers.

   * You are over-incRef'ing (the reader.incRef inside the loop) -- I
     don't see a corresponding decRef.

   * You reopen and warm your searchers "live" (vs with BG thread);
     meaning the unlucky search request that hits a reopen pays the
     cost.  This might be OK if the index is small enough that
     reopening & warming takes very little time.  But if index gets
     large, making a random search pay that warming cost is not nice to
     the end user.  It erodes their trust in you.

   * You always make a new IndexSearcher and a new MultiSearcher even
     when nothing has changed.  This just generates unnecessary garbage
     which GC then must sweep up.

   * You are creating a new Analyzer & QueryParser every time, also
     creating unnecessary garbage; instead, they should be created once
     & reused.

You should consider simply using Solr -- it handles all this logic for
you and has been well debugged with time...

Mike
(Continue reading)

Amin Mohammed-Coleman | 1 Mar 2009 13:37
Picon

Re: Faceted Search using Lucene

Hi
Thanks for your input.  I would like to have a go at doing this myself
first, Solr may be an option.

* You are creating a new Analyzer & QueryParser every time, also
   creating unnecessary garbage; instead, they should be created once
   & reused.

-- I can moved the code out so that it is only created once and reused.

 * You always make a new IndexSearcher and a new MultiSearcher even
   when nothing has changed.  This just generates unnecessary garbage
   which GC then must sweep up.

-- This was something I thought about.  I could move it out so that it's
created once.  However I presume inside my code i need to check whether the
indexreaders are update to date.  This needs to be synchronized as well I
guess(?)

 * I don't see any synchronization -- it looks like two search
   requests are allowed into this method at the same time?  Which is
   dangerous... eg both (or, more) will wastefully reopen the
   readers.
--  So i need to extract the logic for reopening and provide a
synchronisation mechanism.

Ok.  So I have some work to do.  I'll refactor the code and see if I can get
inline to your recommendations.

On Sun, Mar 1, 2009 at 12:11 PM, Michael McCandless <
(Continue reading)

Michael McCandless | 1 Mar 2009 14:20

Re: Faceted Search using Lucene


Amin Mohammed-Coleman wrote:

> Hi
> Thanks for your input.  I would like to have a go at doing this myself
> first, Solr may be an option.
>
> * You are creating a new Analyzer & QueryParser every time, also
>   creating unnecessary garbage; instead, they should be created once
>   & reused.
>
> -- I can moved the code out so that it is only created once and  
> reused.
>
>
> * You always make a new IndexSearcher and a new MultiSearcher even
>   when nothing has changed.  This just generates unnecessary garbage
>   which GC then must sweep up.
>
> -- This was something I thought about.  I could move it out so that  
> it's
> created once.  However I presume inside my code i need to check  
> whether the
> indexreaders are update to date.  This needs to be synchronized as  
> well I
> guess(?)

Yes you should synchronize the check for whether the IndexReader is  
current.

(Continue reading)

Amin Mohammed-Coleman | 1 Mar 2009 14:25
Picon

Re: Faceted Search using Lucene

thanks.  i will rewrite..in between giving my baby her feed and playing with
the other child and my wife who wants me to do several other things!

On Sun, Mar 1, 2009 at 1:20 PM, Michael McCandless <
lucene <at> mikemccandless.com> wrote:

>
> Amin Mohammed-Coleman wrote:
>
>  Hi
>> Thanks for your input.  I would like to have a go at doing this myself
>> first, Solr may be an option.
>>
>> * You are creating a new Analyzer & QueryParser every time, also
>>  creating unnecessary garbage; instead, they should be created once
>>  & reused.
>>
>> -- I can moved the code out so that it is only created once and reused.
>>
>>
>> * You always make a new IndexSearcher and a new MultiSearcher even
>>  when nothing has changed.  This just generates unnecessary garbage
>>  which GC then must sweep up.
>>
>> -- This was something I thought about.  I could move it out so that it's
>> created once.  However I presume inside my code i need to check whether
>> the
>> indexreaders are update to date.  This needs to be synchronized as well I
>> guess(?)
>>
(Continue reading)

Amin Mohammed-Coleman | 1 Mar 2009 14:27
Picon

Re: Faceted Search using Lucene

just a quick point:
 public void maybeReopen() throws IOException {                 //D
   long currentVersion = currentSearcher.getIndexReader().getVersion();
   if (IndexReader.getCurrentVersion(dir) != currentVersion) {
     IndexReader newReader = currentSearcher.getIndexReader().reopen();
     assert newReader != currentSearcher.getIndexReader();
     IndexSearcher newSearcher = new IndexSearcher(newReader);
     warm(newSearcher);
     swapSearcher(newSearcher);
   }
 }

should the above be synchronised?

On Sun, Mar 1, 2009 at 1:25 PM, Amin Mohammed-Coleman <aminmc <at> gmail.com>wrote:

> thanks.  i will rewrite..in between giving my baby her feed and playing
> with the other child and my wife who wants me to do several other things!
>
>
>
> On Sun, Mar 1, 2009 at 1:20 PM, Michael McCandless <
> lucene <at> mikemccandless.com> wrote:
>
>>
>> Amin Mohammed-Coleman wrote:
>>
>>  Hi
>>> Thanks for your input.  I would like to have a go at doing this myself
>>> first, Solr may be an option.
(Continue reading)

Michael McCandless | 1 Mar 2009 14:36

Re: Faceted Search using Lucene


I was wondering the same thing ;)

It's best to call this method from a single BG "warming" thread, in  
which case it would not need its own synchronization.

But, to be safe, I'll add internal synchronization to it.  You can't  
simply put synchronized in front of the method, since you don't want  
this to block searching.

Mike

Amin Mohammed-Coleman wrote:

> just a quick point:
> public void maybeReopen() throws IOException {                 //D
>   long currentVersion = currentSearcher.getIndexReader().getVersion();
>   if (IndexReader.getCurrentVersion(dir) != currentVersion) {
>     IndexReader newReader = currentSearcher.getIndexReader().reopen();
>     assert newReader != currentSearcher.getIndexReader();
>     IndexSearcher newSearcher = new IndexSearcher(newReader);
>     warm(newSearcher);
>     swapSearcher(newSearcher);
>   }
> }
>
> should the above be synchronised?
>
> On Sun, Mar 1, 2009 at 1:25 PM, Amin Mohammed-Coleman <aminmc <at> gmail.com 
> >wrote:
(Continue reading)

Amin Mohammed-Coleman | 1 Mar 2009 15:17
Picon

Re: Faceted Search using Lucene

Hi
I've now done the following:

public Summary[] search(final SearchRequest searchRequest)
throwsSearchExecutionException {

final String searchTerm = searchRequest.getSearchTerm();

if (StringUtils.isBlank(searchTerm)) {

throw new SearchExecutionException("Search string cannot be empty. There
will be too many results to process.");

}

List<Summary> summaryList = new ArrayList<Summary>();

StopWatch stopWatch = new StopWatch("searchStopWatch");

stopWatch.start();

List<IndexSearcher> indexSearchers = new ArrayList<IndexSearcher>();

try {

LOGGER.debug("Ensuring all index readers are up to date...");

maybeReopen();

LOGGER.debug("All Index Searchers are up to date. No of index searchers '" +
(Continue reading)


Gmane