Yann ROBIN | 28 Dec 16:01 2008
Picon

NearPostList and get_wdf

Hi,

I'm trying to make a near search that would give better scoring for
document where the words a nearer.
So i thought that i could change de wdf in the NearPostList according
to the distance between words. But it seems that the get_wdf of the
NearPostList is never called ... Instead it's the get_wdf of the
ChertPostList that it is called.
I don't think this is something wanted ? should i open a ticket ?

Thanks!

--

-- 
Yann
Yann ROBIN | 28 Dec 16:09 2008
Picon

Re: NearPostList and get_wdf

On Sun, Dec 28, 2008 at 4:01 PM, Yann ROBIN <me.show <at> gmail.com> wrote:
> Hi,
>
> I'm trying to make a near search that would give better scoring for
> document where the words a nearer.
> So i thought that i could change de wdf in the NearPostList according
> to the distance between words. But it seems that the get_wdf of the
> NearPostList is never called ... Instead it's the get_wdf of the
> ChertPostList that it is called.
> I don't think this is something wanted ? should i open a ticket ?
>
> Thanks!
>

Ok, i do understand why it is not called :

NearPostList inherit from SelectPostList which only do call on a given
postlist (that should be the database postlist).

So when the get_weight is made on the NearPostList, it calls the
SelectPostList implementation which calls source->get_weight();

source->get_weight() call get_wdf but it can't but the NearPostList
implementation ...

--

-- 
Yann
Richard Boulton | 29 Dec 13:50 2008

Re: NearPostList and get_wdf

On Sun, Dec 28, 2008 at 04:01:07PM +0100, Yann ROBIN wrote:
> Hi,
> 
> I'm trying to make a near search that would give better scoring for
> document where the words a nearer.

Fair enough.  This isn't what the current NearPostList is intended to do -
the current NearPostList is used to implement the OP_NEAR operator, which
returns only those documents in which the terms occur within the specified
window size, but returns a weight calculated simply by adding the weights
of the component terms.  This is sometimes what is wanted, but it would be
nice to have a way to do a NEAR search which weighted results based on how
near the terms are.

> So i thought that i could change de wdf in the NearPostList according
> to the distance between words. But it seems that the get_wdf of the
> NearPostList is never called ... Instead it's the get_wdf of the
> ChertPostList that it is called.

Indeed; the wdf is used in the weight calculation, and the weight
calculation is performed on each "leaf" postlist.

I'm not sure that modifying the wdf is really the way to go about this - it
seems to me that you might do better to use a custom weight class, which
factored in the frequencies of the individual terms, as well as their
proximity.

For an example of a postlist which combines several terms together and
calculates a weight on them, take a look at the SynonymPostList (and
corresponding OP_SYNONYM operator) on the "opsynonym" branch in SVN.  This
(Continue reading)

Yann ROBIN | 29 Dec 14:09 2008
Picon

Re: NearPostList and get_wdf

On Mon, Dec 29, 2008 at 1:50 PM, Richard Boulton
<richard <at> lemurconsulting.com> wrote:
>> So i thought that i could change de wdf in the NearPostList according
>> to the distance between words. But it seems that the get_wdf of the
>> NearPostList is never called ... Instead it's the get_wdf of the
>> ChertPostList that it is called.
>
> Indeed; the wdf is used in the weight calculation, and the weight
> calculation is performed on each "leaf" postlist.
>
> I'm not sure that modifying the wdf is really the way to go about this - it
> seems to me that you might do better to use a custom weight class, which
> factored in the frequencies of the individual terms, as well as their
> proximity.
>
> For an example of a postlist which combines several terms together and
> calculates a weight on them, take a look at the SynonymPostList (and
> corresponding OP_SYNONYM operator) on the "opsynonym" branch in SVN.  This
> combines the wdfs of the terms being "synonymed" together, and passes that
> into the standard weighting algorithm.  It has a few issues, though (which
> is why it's not on trunk, yet).  See http://trac.xapian.org/ticket/50
>

Ok thanks i'll take a look. But i just wanna point out that the get_wdf
method in the NearPostList is never called, it's like you implemented
it for nothing ?
And making a new weight class would be certainly the best way,
but i would need to have access to the the weight class that is
only available in the LeafPostList in protected ... Maybe you address
this issue in the SynonymPostList ?
(Continue reading)


Gmane