2 Jan 2006 08:26
Re: Query Scoring
Harini Raghavan <harini.raghavan <at> insideview.com>
2006-01-02 07:26:57 GMT
2006-01-02 07:26:57 GMT
Yes I was refering to how IDF is used in the Highlighter code to find out how to prioritize fragments of the documents. My requirement is to show the relevant fragments of the news article for each company along with the search results. But the highlighter api sometimes picks up the fragments which are not so relevant to the news article/company. I would like to know if there is anyway that I can modify the scoring/ranking of these fragments in such a way that the news items in which a company name & keywords in the headline gets assigned a very strong relevancy ranking, closely followed by a company name mention in the first paragraph and a multiple-mention within the entire story. Something like headline = 5 points, first paragraph = four, etc. Thanks, Harini markharw00d wrote: > Sorry to contradict, Erik, but the Highlighter's QueryScorer will make > use of IDF, given a reader, in order to better prioritise which are > the "best" bits of a document. > However, In the particular example given, the criteria includes > several non-text fields which are not useful for IDF and general > scoring purposes - these are perhaps better expressed using a filter > of some form. Otherwise, why should the scarcity of a particular date > in the given range boost one matching document above others? These > numeric-type fields are simply mandatory boolean "hygiene factors" > and should ideally play no part in highlight selection or results > ordering in general based on their IDF or TF.(Continue reading)
RSS Feed