6 Jan 2009 04:47
Re: NearPostList and get_wdf
Olly Betts <olly <at> survex.com>
2009-01-06 03:47:24 GMT
2009-01-06 03:47:24 GMT
On Mon, Dec 29, 2008 at 02:09:14PM +0100, Yann ROBIN wrote: > On Mon, Dec 29, 2008 at 1:50 PM, Richard Boulton > <richard <at> lemurconsulting.com> wrote: > > I'm not sure that modifying the wdf is really the way to go about this - it > > seems to me that you might do better to use a custom weight class, which > > factored in the frequencies of the individual terms, as well as their > > proximity. You have to choose a weight class for the whole query - it can't be different for different subqueries. So I'm not sure how this would work. A sane approach would probably be in NewNearPostList::get_weight() to multiply the weight returned by the AND query's get_weight() method by a non-negative factor which varies depending how close the terms are - largest when they're together, much smaller when they are far apart. This will be slower to run than the current NearPostList though as it can't stop working on a document when it finds a match within the window size - instead it has to check all the positional data for each document matching the AND query to find the closest match. This factor needs to have a known upper bound, which you multiply get_maxweight() and recalc_maxweight() from the AND query by. > > Feel free to open a feature request ticket, describing the feature that you > > would like to exist. OP_NEAR as it is currently implemented is behaving as > > intended, though. > > The ticket was more for the get_wdf not being called, i don't think this was(Continue reading)
RSS Feed