New Idea on Ranking in IR
2011-04-01 09:18:28 GMT
I want to discuss my idea on ranking in IR system which I think can be good extension to Xapian. If I am not too late to discuss it then please consider it. I first give you brief background of me, I am a Masters student working on my thesis in the Information Retrieval. I today only got a mail from one of the professor from Europe whom i am going to join for Ph.D about GSoC and more precisely Xapian.
Generally the ranking is unsupervised, where the rank list is produced based on the score provided by the ranking function. Ranking functions are unsupervised like BM25, TF-IDF and so on. So we give the rank list in the dercreasing order of the score.
Well learning to rank involves supervised learning. If we can extract features for a query and intial retrieval of documents pairs then we can say which document should come above which. Basically search engine requires relevant documents in top order, because user gnerally never bothers to click on the next page of the retrieval rether he chooses to modify the query.
In Laarning to Rank (Letor) we prepare the features which can represent a query document pair. So now after the initial retrieval we take say first 20 or 30 documents and represent them in form of feature vactors, now based on the training data our supervised leaning will give a score to each document for a particular query. For example if this learning is from regression then we have to learn 'W' vector which will give a score to the document vector by dot product.
Here the features can be term frequency, TF-IDF score, BM25 Score etc, as good as many. For Learning there are many machine learning techniques available.
_______________________________________________ Xapian-devel mailing list Xapian-devel <at> lists.xapian.org http://lists.xapian.org/mailman/listinfo/xapian-devel