2 Jun 2008 21:19
Re: Ordering search results and defining a custom Weight class in python
Robert Kaye <rob <at> eorbit.net>
2008-06-02 19:19:58 GMT
2008-06-02 19:19:58 GMT
On May 30, 2008, at 6:21 PM, Olly Betts wrote: > Yes, BM25Weight has several parameters which can be adjusted to change > the emphasis of the weighting. If your documents are typically quite > short, then you probably will get better results if you make the > document length less important. Awesome -- thanks for the excellent tip. With just a little tweaking the search results have improved drastically. I've asked for some help testing our new search service and that has turned up that we're having problems properly tokenizing Chinese text. Our database can conceivably have text from all languages supported by Unicode and we'd need to find a way to properly tokenize chinese text. I've seen a few posts from last year talking about a Chinese tokenization scheme, but I haven't found anything about that in the official docs. Is there a preferred way (in python) to handle the tokenization of Chinese characters? Thanks for your help! -- --ruaok Somewhere in Texas a village is *still* missing its idiot. Robert Kaye -- rob <at> eorbit.net -- http://mayhem-chaos.net
RSS Feed