Wesley MacDonald | 1 Nov 02:40 2005
Picon

RE: Java Indexer + DotLucene + IIS question

Hi,

I think it would. 

Wes.

-----Original Message-----
From: msftblows <at> aol.com [mailto:msftblows <at> aol.com] 
Sent: October 31, 2005 4:06 PM
To: java-user <at> lucene.apache.org
Subject: Re: Java Indexer + DotLucene + IIS question

My mistake...he sent this: http://www.nsisoftware.com/ 

-----Original Message-----
From: msftblows <at> aol.com
To: java-user <at> lucene.apache.org
Sent: Mon, 31 Oct 2005 15:59:27 -0500
Subject: Re: Java Indexer + DotLucene + IIS question

My webmaster sent me this:
http://www.lyonware.co.uk/DoubleTake/DoubleTake.htm

He has used this...can I assume this would work just as well? 

-----Original Message-----
From: Wesley MacDonald <wes <at> extremesoftware.ca>
To: java-user <at> lucene.apache.org
Sent: Mon, 31 Oct 2005 15:50:46 -0500
Subject: RE: Java Indexer + DotLucene + IIS question
(Continue reading)

tcorbet | 1 Nov 07:43 2005
Picon

BooleanQuery

I have an index over the titles to .mp3 songs.
It is not unreasonable for the user to want to
see the results from:  "Show me Everything".

I understand that title:* is not a valid wildcard query.
I understand that title:[a* TO z*] is a valid wildcard query.

What I cannot understand is this behavior which
throws no exceptions:

title:[a* TO z*] returns 0 hits.

title [a* TO m*] OR [n* TO z*] returns *almost* the
correct answer -- one title [of approximately 1200] is missing.

title:[a* TO m*] OR [m* TO z*] correctly returns
all the available titles.

I arrived at  this 'bifurcation' solution because trial
and error indicated that the problem was somehow
related to the number of terms involved.

In addition to doing the bifurcation, I also setMaxClauseCount
up to 4096 from 1024, but that did not change
the observed behavior; it is still necessary to
present the Search engine with two smaller Ranges
in lieu of a single large Range to get correct
results.  [Note, just for completeness, I have
tested and confirmed that almost any bifurcation
works.  I can go from a-k and from k-z, just
(Continue reading)

Oren Shir | 1 Nov 12:03 2005
Picon

what is the best way to sort by document ids

Hi,

My documents contain a field called SORT_ID, which contains an int that
increases with every document added to the index. I want my results to be
sorted by it.

Which approach will prove the best performance:

1) Zero pad SORT_ID field and sort by it as plain text.
2) Sort using SortField for an INT.
3) Trust that using INDEXORDER will always return the same order as SORT_ID,
and use that.

Thanks,
Oren Shir
Erik Hatcher | 1 Nov 12:11 2005

Re: BooleanQuery

On 1 Nov 2005, at 01:43, tcorbet wrote:
> I have an index over the titles to .mp3 songs.
> It is not unreasonable for the user to want to
> see the results from:  "Show me Everything".
>
> I understand that title:* is not a valid wildcard query.
> I understand that title:[a* TO z*] is a valid wildcard query.

That last one is NOT a valid wildcard query.  RangeQuery does not  
also do wildcards.

> What I cannot understand is this behavior which
> throws no exceptions:
>
> title:[a* TO z*] returns 0 hits.

This is literally searching for "a*" through "z*", with the asterisk  
being literal, not a wildcard.

> title [a* TO m*] OR [n* TO z*] returns *almost* the
> correct answer -- one title [of approximately 1200] is missing.

Is that your exact query?  Maybe you're finding "title"?

> title:[a* TO m*] OR [m* TO z*] correctly returns
> all the available titles.

Maybe this has to do with your default field, if that is your exact  
query.  [m* TO z*] is going to the default field for QueryParser, not  
the title field.
(Continue reading)

Erik Hatcher | 1 Nov 12:15 2005

Re: what is the best way to sort by document ids


On 1 Nov 2005, at 06:03, Oren Shir wrote:

> Hi,
>
> My documents contain a field called SORT_ID, which contains an int  
> that
> increases with every document added to the index. I want my results  
> to be
> sorted by it.
>
> Which approach will prove the best performance:
>
> 1) Zero pad SORT_ID field and sort by it as plain text.
> 2) Sort using SortField for an INT.
> 3) Trust that using INDEXORDER will always return the same order as  
> SORT_ID,
> and use that.

#3 - to sort by order indexed, there is no need to have a custom  
incrementing field.  I'd recommend dropping SORT_ID unless you need  
it for some other purpose.

     Erik
Michael D. Curtin | 1 Nov 14:15 2005

Re: what is the best way to sort by document ids

Oren Shir wrote:

> My documents contain a field called SORT_ID, which contains an int that
> increases with every document added to the index. I want my results to be
> sorted by it.
> 
> Which approach will prove the best performance:
> 
> 1) Zero pad SORT_ID field and sort by it as plain text.
> 2) Sort using SortField for an INT.
> 3) Trust that using INDEXORDER will always return the same order as SORT_ID,
> and use that.

My experience is that #3 works very well in terms of performance, much 
better than sorting on any arbitrary field.  I haven't ever noticed a 
problem with it not working.  Good luck!

--MDC
Michael D. Curtin | 1 Nov 14:17 2005

Re: BooleanQuery

tcorbet wrote:

> I have an index over the titles to .mp3 songs.
> It is not unreasonable for the user to want to
> see the results from:  "Show me Everything".
> 
> I understand that title:* is not a valid wildcard query.
> I understand that title:[a* TO z*] is a valid wildcard query.
> 
> What I cannot understand is this behavior which
> throws no exceptions:
> 
> title:[a* TO z*] returns 0 hits.
> 
> title [a* TO m*] OR [n* TO z*] returns *almost* the
> correct answer -- one title [of approximately 1200] is missing.
> 
> title:[a* TO m*] OR [m* TO z*] correctly returns
> all the available titles.

What I have done in a case like this is short-circuit the search for an 
empty query and just read the documents out of the index, in order. 
That is, I don't really run a search in this case.  It's only a few 
lines of extra code, and you don't have to retrieve every document in 
the index until you need them.  Using a workaround query as in your 
example causes Lucene to examine every document in the index before 
returning you any.  Good luck!

--MDC
(Continue reading)

Erik Hatcher | 1 Nov 14:34 2005

Re: BooleanQuery


On 1 Nov 2005, at 08:17, Michael D. Curtin wrote:
> tcorbet wrote:
>
>
>> I have an index over the titles to .mp3 songs.
>> It is not unreasonable for the user to want to
>> see the results from:  "Show me Everything".
>> I understand that title:* is not a valid wildcard query.
>> I understand that title:[a* TO z*] is a valid wildcard query.
>> What I cannot understand is this behavior which
>> throws no exceptions:
>> title:[a* TO z*] returns 0 hits.
>> title [a* TO m*] OR [n* TO z*] returns *almost* the
>> correct answer -- one title [of approximately 1200] is missing.
>> title:[a* TO m*] OR [m* TO z*] correctly returns
>> all the available titles.
>>
>
> What I have done in a case like this is short-circuit the search  
> for an empty query and just read the documents out of the index, in  
> order. That is, I don't really run a search in this case.  It's  
> only a few lines of extra code, and you don't have to retrieve  
> every document in the index until you need them.  Using a  
> workaround query as in your example causes Lucene to examine every  
> document in the index before returning you any.  Good luck!

In the trunk of Lucene's codebase (and in the upcoming 1.9/2.0  
releases), there is a new MatchAllDocsQuery.  Just FYI.

(Continue reading)

Miles Barr | 1 Nov 14:56 2005

Re: Search problems

On Thu, 2005-10-27 at 16:35 -0400, Sharma, Siddharth wrote:
> My index has 4 keyword fields and one unindexed field.
> I want to search by the 4 keyword fields and return the one unindexed field.
> 
> I can iterate over the documents via Luke.
> But when I search for the same values that I see via Luke, it does not find
> the document.
> Out of the 4 fields, 2 are alphanumeric and searching on just these two
> fields does succeed and I can find the document in question.
> 
> The other 2 fields can have numeric values. When I include these two fields
> in the search, the same document cannot be found.
> 
> I thought that the fact that these fields had numeric values might be the
> reason for the search to be unsuccessful. So I browsed for another document
> via Luke where these fields had alphanumeric values, but again could not
> find the document? Returns no result.
> 
> What could the problem be? Any ideas?
> I have added all the 4 fields with 'Field.Keyword'.

Field.Keyword requires an exact match, i.e. you should manually create a
TermQuery. Luke will analyze your query and hence tokenise it. Almost
certainly the tokens it creates won't match the values in your field,
because they have to be an exact match.

The StandardAnalyzer is the analyzer Luke uses by default. It will make
the search terms lower case, and AFAIK it almost removes numbers from
the query.

(Continue reading)

msftblows | 1 Nov 16:09 2005
Picon

Re: Java Indexer + DotLucene + IIS question

One more question...if I do use a tool for replication and I only have the indexer running on one
machine...say it creates 10 file *.cfs...the tool replicates all the files...then the machine with the
indexer compresses those files and then are now all gone...how will they be removed from the slave
machines? 

-----Original Message-----
From: Wesley MacDonald <wes <at> extremesoftware.ca>
To: java-user <at> lucene.apache.org
Sent: Mon, 31 Oct 2005 20:40:55 -0500
Subject: RE: Java Indexer + DotLucene + IIS question

Hi,

I think it would. 

Wes.

-----Original Message-----
From: msftblows <at> aol.com [mailto:msftblows <at> aol.com] 
Sent: October 31, 2005 4:06 PM
To: java-user <at> lucene.apache.org
Subject: Re: Java Indexer + DotLucene + IIS question

My mistake...he sent this: http://www.nsisoftware.com/ 

-----Original Message-----
From: msftblows <at> aol.com
To: java-user <at> lucene.apache.org
Sent: Mon, 31 Oct 2005 15:59:27 -0500
Subject: Re: Java Indexer + DotLucene + IIS question
(Continue reading)


Gmane