liminghit | 14 Nov 04:06
Favicon

Xapian Indexer problem.

Hi, Guys:

Now, I have a document and a phase as its term.

Document:  BZN[0 001A HEATHROW TAXIS] 

Term:  “0 001a heathrow taxis”

But, If I use “0” or “taxis” do searching, it match percent is nearly 100%.

To my understand,

Only “0 001a heathrow taxis” can have 100% matching.

Shorter or longer query, should less than 100% matching, right?

If I want to archive this, how to do indexing?

Thanks,
Ming



网易邮箱10周年,技术见证辉煌
_______________________________________________
Xapian-devel mailing list
Xapian-devel <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Olly Betts | 14 Nov 17:12
Favicon
Gravatar

Re: Xapian Indexer problem.

On Fri, Nov 14, 2008 at 11:06:55AM +0800, liminghit wrote:
> Only ??0 001a heathrow taxis?? can have 100% matching.
> 
> Shorter or longer query, should less than 100% matching, right?

A longer query would since (unless you repeat terms) it must have
words which aren't in the document.

But otherwise no, and this behaviour is as intended.  It's not
"percentage of document text matched" it's a measure of "how well your
query matches this document".

If all the query terms match the highest scoring document, we give it
100%.  If not all the terms match the highest scoring document, we give
it a proportion of 100% based on the term weights

And then we calculate percentage scores for all other documents based on
this assigned percentage value.

Your definition seems unhelpful to me - in most uses the query is quite
a lot shorter than the document, and a 3 word query would score at most
0.3% for a 1000 word document.

> If I want to archive this, how to do indexing?

You might be able to achieve something like what you describe at search
time by writing your own weighting scheme and making get_sumpart()
return 1/(unnormalised document length)

Cheers,
    Olly
liminghit | 18 Nov 14:51
Favicon

Re: Xapian Indexer problem.

Thanks very much for your reply!

For a documents, it has its own term list.

That will be some terms.

So, how to calculate the term weight for these terms.


For example:

D1->term1, term2, term3

I want to get the term weight of term1, and term2 and term3.

I noticed that there is function “calc_termweight”. But it’s a private function.


 

Thanks,

Ming


 
 

在2008-11-15,"Olly Betts" <olly <at> survex.com> 写道: >On Fri, Nov 14, 2008 at 11:06:55AM +0800, liminghit wrote: >> Only ??0 001a heathrow taxis?? can have 100% matching. >> >> Shorter or longer query, should less than 100% matching, right? > >A longer query would since (unless you repeat terms) it must have >words which aren't in the document. > >But otherwise no, and this behaviour is as intended. It's not >"percentage of document text matched" it's a measure of "how well your >query matches this document". > >If all the query terms match the highest scoring document, we give it >100%. If not all the terms match the highest scoring document, we give >it a proportion of 100% based on the term weights > >And then we calculate percentage scores for all other documents based on >this assigned percentage value. > >Your definition seems unhelpful to me - in most uses the query is quite >a lot shorter than the document, and a 3 word query would score at most >0.3% for a 1000 word document. > >> If I want to archive this, how to do indexing? > >You might be able to achieve something like what you describe at search >time by writing your own weighting scheme and making get_sumpart() >return 1/(unnormalised document length) > >Cheers, > Olly

网易邮箱10周年,技术见证辉煌
_______________________________________________
Xapian-devel mailing list
Xapian-devel <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Henry | 26 Nov 11:08
Picon

Trying to patch xapian perl add/remove_spelling

Greets,

I'm giving a stab at patching the CPAN module to add the missing  
WritableDatabase::add_spelling and remove_spelling, but need a bit of  
guidance since I'm coming in cold, and pressed for time (aren't we all).

I've modified XS/WritableDatabase.xs and added the two necessary  
functions, and also added the two basic tests in t/index.t.

Compilation completes cleanly, but running

perl index.t

fails with

Exception: This backend doesn't implement spelling correction at  
index.t line 48.

I've obviously missed some file or something that needs to be set.   
The patch is attached.

I'd appreciate some comments wrt what I'm missing.

Thanks
Henry

Attachment (xapian-perl-spelling.patch): application/x-download, 1257 bytes
_______________________________________________
Xapian-devel mailing list
Xapian-devel <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Olly Betts | 28 Nov 03:05
Favicon
Gravatar

Re: Trying to patch xapian perl add/remove_spelling

On Wed, Nov 26, 2008 at 12:08:40PM +0200, Henry wrote:
> Exception: This backend doesn't implement spelling correction at  
> index.t line 48.
> 
> I've obviously missed some file or something that needs to be set.   
> The patch is attached.
> 
> I'd appreciate some comments wrt what I'm missing.

Much of the code in index.t is run for both inmemory and "auto"
backends, but the inmemory backend doesn't implement spelling
correction.

Also, you can't use "ok(...)" around methods which don't return
anything (if you do call "ok(...)" more, you need to increase the
expected number of calls at the top of the file).

I've tweaked the patch to compile and pass "make check".  I haven't
applied it yet as it doesn't seem to be useful without
QueryParser::get_corrected_query_string() and/or
Database::get_spelling_suggestion() also being wrapped...

http://oligarchy.co.uk/xapian/patches/xapian-perl-spelling-updated.patch

Cheers,
    Olly
Henry | 28 Nov 07:05
Picon

Re: Trying to patch xapian perl add/remove_spelling

Quoting "Olly Betts" <olly <at> survex.com>:
> Much of the code in index.t is run for both inmemory and "auto"
> backends, but the inmemory backend doesn't implement spelling
> correction.

Gotcha.

> Also, you can't use "ok(...)" around methods which don't return
> anything (if you do call "ok(...)" more, you need to increase the
> expected number of calls at the top of the file).

Thanks - 'fraid I've never got around to using Test::More...  /makes  
mental note to do so.

> I've tweaked the patch to compile and pass "make check".  I haven't
> applied it yet as it doesn't seem to be useful without
> QueryParser::get_corrected_query_string() and/or
> Database::get_spelling_suggestion() also being wrapped...

Hm, will take a few stabs there too...

Cheers
Henry

Gmane