1 Feb 2011 10:59
Re: Token position vs. token offset - how to bring them together?
Karolina Bernat <karolina.bernat <at> googlemail.com>
2011-02-01 09:59:29 GMT
2011-02-01 09:59:29 GMT
Hi, perhaps there is someone, who's trying to do the same thing so I just write down, how I got along with this problem. It is NOT the most elegant solution, but it works for me. I don't really know yet, how the performance of my search will be, but the tests look so far ok. For my phrase search I actually used the SpnQuery - I read that the QueryParser cant't handle this kind of queries so I do it manually by checking, if the user entered the search text within the " ". Handling a SpanQuery one have an access to the query spans - and the spans give you the start and the end position of the searched words. Furthermore I could find the positions and the offset information of the WeightedTerms by using : QueryTermExtractor.getIdfWeightedTerms(...) and TermPositionVector Because there is no possibility (or none that I know of) of getting the offset information if you know the terms positions, I thought of saving all the term positions and the term offset informations.. and since I get the span start- and end-position from a SpanQuery I can look up in the terms positions-array, at which index/place in the array I find the position I got from the SpanQuery and then go to my array with terms offset information and get the one at the same index/position in this array... With those informations I can get the start offset of the first term in the SpanQuery and the end offset of the last term - and I can highlight those continuous. That is really not the best way to process, but I couldn't find any better. Please let me know, if there is any other (better) way to do it.(Continue reading)
RSS Feed