Yousef Ourabi | 1 Jul 2005 01:43
Picon
Gravatar

Nested Boolean Query

Hello:
I have two fields in my document a title, term and content. I want to
programatically create a query that will return all documents that has
both title AND content. I created a boolean query that looks like
this. The logican return I want are any documents that Must have the
alias passed to AliasQuery, and the query passed also shows up in a
field that has that alias. Basically filtering all documents of
alias=X with term = Y. Does this look right?

What I am getting are all documents by the alias (aliasQuery) orderd
by the search term, but also documents that don't have the query.

BooleanQuery termSearch = new BooleanQuery();
BooleanQuery combinedQuery = new BooleanQuery();
			
			
termSearch.add(termQuery, false, false);
termSearch.add(tittleQuery, false, false);
			
combinedQuery.add(aliasQuery, true, false);
combinedQuery.add(termSearch,false,false);

Best,
Yousef
Chris Lu | 1 Jul 2005 05:27
Picon

Re: Design question [too many fields?]

Mark, your suggestion will incur another trip to the database. And if 
the search results is large, filtering in DB by pk is not really good.

Erik, your original "date" field is good when there is not many 
dates(<1024) in the database. Otherwise, Range Query can not handle it.

My suggestion is, use "year" + "month" + "day" three fields to store 
date. And when searching, for example, any date that's greater than 
2005-06-30, you can use this query to search: ( year > 2005 ) or  ( 
year=2005 and month>=6) or ( year=2005 and month=6 and day > 30 ).
It's a combination of BooleanQuery, TermQuery, and RangeQuery.

This may seem cumbersome, but it can save one trip to database, and 
circumvent Lucene's limitation.

Chris Lu
http://www.dbsight.net

Erik Hatcher wrote:

> I second Mark's suggestion over the alternative I posted.  My  
> alternative was merely to invert the field structure originally  
> described, but using a Filter for the volatile information is wiser.
>
>     Erik
>
> On Jun 29, 2005, at 9:58 AM, mark harwood wrote:
>
>> Presumably there is also a free-text element to the
>> search or you wouldn't be using Lucene.
(Continue reading)

Naimdjon Takhirov | 1 Jul 2005 08:57
Picon
Favicon

Vedr. Re: Design question [too many fields?]

Hi Chris,

It is anyway going to be too many fields then? Days of
year for the whole year ahead? Since the fromDate and
toDate can be across two months and the customer wants
the data be available for one year.

Naimdjon

--- Chris Lu <chris.lu <at> gmail.com> skrev:

> Mark, your suggestion will incur another trip to the
> database. And if 
> the search results is large, filtering in DB by pk
> is not really good.
> 
> Erik, your original "date" field is good when there
> is not many 
> dates(<1024) in the database. Otherwise, Range Query
> can not handle it.
> 
> My suggestion is, use "year" + "month" + "day" three
> fields to store 
> date. And when searching, for example, any date
> that's greater than 
> 2005-06-30, you can use this query to search: ( year
> > 2005 ) or  ( 
> year=2005 and month>=6) or ( year=2005 and month=6
> and day > 30 ).
> It's a combination of BooleanQuery, TermQuery, and
(Continue reading)

Chris Lu | 1 Jul 2005 09:37
Picon

Re: Vedr. Re: Design question [too many fields?]

> It is anyway going to be too many fields then? Days of
> year for the whole year ahead? Since the fromDate and
> toDate can be across two months and the customer wants
> the data be available for one year.

It won't have too many fields.
> > My suggestion is, use "year" + "month" + "day" three
> > fields to store
"day" field means days for one month. So "month" and "day" two fields
will have 12 and 31 values respectively.
And "year" field depends on what data you got. I guess your data won't
span accross 1024 years.

--

-- 
Chris Lu
---------------------
Full-Text Search on Any Database
http://www.dbsight.net

On 6/30/05, Naimdjon Takhirov <tnaim1 <at> yahoo.com> wrote:
> Hi Chris,
> 
> It is anyway going to be too many fields then? Days of
> year for the whole year ahead? Since the fromDate and
> toDate can be across two months and the customer wants
> the data be available for one year.
> 
> Naimdjon
> 
> --- Chris Lu <chris.lu <at> gmail.com> skrev:
(Continue reading)

Harald Stowasser | 1 Jul 2005 10:41
Picon

[Project-advertisement] myDbSearcher.

Hello lucene list readers.

We have decided to put my MySQL-Lucene-search-engine under the GPL.
Maybe you can use it:
http://sourceforge.net/projects/mydbsearcher/

First implementation is running there:
http://www.idowa.de/ueberblick/suche/index_html (German Newspaper)

And it would be nice, if anyone want to aid me a little bit. Simply
because I don't have plenty of time. The next project is already in the
fledgling stages.

Thank you for reading,
  Harald.

Erik Hatcher | 1 Jul 2005 11:39
Favicon

Re: Design question [too many fields?]


On Jun 30, 2005, at 11:27 PM, Chris Lu wrote:
> Mark, your suggestion will incur another trip to the database. And  
> if the search results is large, filtering in DB by pk is not really  
> good.

Chris - I disagree with that last comment.  It can be a great  
solution when the filter is cached.  Certainly building a filter for  
every search would be inefficient, but filters are really best when  
cached.

> Erik, your original "date" field is good when there is not many  
> dates(<1024) in the database. Otherwise, Range Query can not handle  
> it.

Not quite correct... it would not matter how many dates were in the  
index/database, only how many were within the range used by  
RangeQuery.  The original requirement was a years worth of days,  
which at most would be 366 days and I suspect someone looking for  
hotel room availability would be narrowing things down to a week or  
month.

> My suggestion is, use "year" + "month" + "day" three fields to  
> store date. And when searching, for example, any date that's  
> greater than 2005-06-30, you can use this query to search: ( year >  
> 2005 ) or  ( year=2005 and month>=6) or ( year=2005 and month=6 and  
> day > 30 ).
> It's a combination of BooleanQuery, TermQuery, and RangeQuery.

How would you represent multiple dates for a Document using that  
(Continue reading)

Erik Hatcher | 1 Jul 2005 12:15
Favicon

Re: Does highlighter highlight phrases only?


On Jun 30, 2005, at 4:35 PM, markharw00d wrote:

> Hi Erik,
> Yes I was thinking that code could form the basis of a new  
> highlighter.
>
> I've just attached a QuerySpansExtractor to the bugzilla entry for  
> the new highlighter. This class produces Spans from queries other  
> than SpanXxxxQueries eg phrase, term and booleans.
> I'm thinking you can throw the text to be highligted  as a single  
> doc into a MemIndex , extracts the spans using the  
> QuerySpansExtractor and the  MemIndex's reader (need to expose a  
> getReader method on this - I'm working on it), then use some new  
> highlighting logic on the Spans.
>
> Sound reasonable?

I think so.

One minor issue... a SpanNearQuery is not entirely equal to a  
PhraseQuery when there is slop involved.  You have this:

     SpanNearQuery sp = new SpanNearQuery(clauses,query.getSlop 
(),false);

Here's a test from Lucene in Action that demonstrates:

public void testSpanNearQuery() throws Exception {
   SpanQuery[] quick_brown_dog =
(Continue reading)

BOUDOT Christian | 1 Jul 2005 14:06
Picon

free text search with numbers

Hi folks,

It is the first time that I implement a search with Lucene, so please don't
laugh if my question seam trivial.

When I enter some text in my free text search the query gets build correctly
but when I enter number (as string) the query parser seam to ignore them.
What am I doing wrong?

Query: org.apache.lucene.search.Query

Parser: org.apache.lucene.queryParser.MultiFieldQueryParser

Many thanks for your help

Chris

Peter Laurinc | 1 Jul 2005 14:16

Sentence and Paragraph searching

Hi,

I'm newbie to lucene.
I wan to ask, how to implement search for phrase that must be in
sentence/paragraph.
I did see som examples, that uses term position changing, but I think
that this is not the way, because it breaks classic proximity search. 
(if one word is on end and second of begining of next sentence)

 
Thanks for any help

Peter
Erik Hatcher | 1 Jul 2005 15:10
Favicon

Re: free text search with numbers


On Jul 1, 2005, at 8:06 AM, BOUDOT Christian wrote:
> It is the first time that I implement a search with Lucene, so  
> please don't
> laugh if my question seam trivial.
>
> When I enter some text in my free text search the query gets build  
> correctly
> but when I enter number (as string) the query parser seam to ignore  
> them.
> What am I doing wrong?
>
>
>
> Query: org.apache.lucene.search.Query
>
> Parser: org.apache.lucene.queryParser.MultiFieldQueryParser

Chris - the answer to this dilemma is here: http://wiki.apache.org/ 
jakarta-lucene/AnalysisParalysis

In short, the analyzer is eating the numbers as it was designed to do  
- you'll want to adjust the analyzer to achieve the indexing/ 
searching for numbers somehow.

     Erik

Gmane