Re: NearPostList and get_wdf
Richard Boulton <richard <at> lemurconsulting.com>
2008-12-29 12:50:51 GMT
On Sun, Dec 28, 2008 at 04:01:07PM +0100, Yann ROBIN wrote:
> I'm trying to make a near search that would give better scoring for
> document where the words a nearer.
Fair enough. This isn't what the current NearPostList is intended to do -
the current NearPostList is used to implement the OP_NEAR operator, which
returns only those documents in which the terms occur within the specified
window size, but returns a weight calculated simply by adding the weights
of the component terms. This is sometimes what is wanted, but it would be
nice to have a way to do a NEAR search which weighted results based on how
near the terms are.
> So i thought that i could change de wdf in the NearPostList according
> to the distance between words. But it seems that the get_wdf of the
> NearPostList is never called ... Instead it's the get_wdf of the
> ChertPostList that it is called.
Indeed; the wdf is used in the weight calculation, and the weight
calculation is performed on each "leaf" postlist.
I'm not sure that modifying the wdf is really the way to go about this - it
seems to me that you might do better to use a custom weight class, which
factored in the frequencies of the individual terms, as well as their
For an example of a postlist which combines several terms together and
calculates a weight on them, take a look at the SynonymPostList (and
corresponding OP_SYNONYM operator) on the "opsynonym" branch in SVN. This