Re: Re: [ no subject ]
Kamal Najib <kamal.najib <at> mytum.de>
2009-05-01 09:06:29 GMT
Thanks Anshum Gupta for the reply,
"As per my knowledge, you'd have to index one of the docs and then run a
query (second doc) to get the similarity score."
which docs do you mean? do you mean i have to create a doc for each Vector, do you mean somthing like this:
Vector1 =<"term1","term2","term3"> --->doc1.add("id",new Field ("term1"+"term2"+"term3",Field.Store.YES,Field.Index.TOKENIZED));
Vector2 =<"term4","term5","term6"> --->doc1.add("id",new Field ("term4"+"term5"+"term6",Field.Store.YES,Field.Index.TOKENIZED));
or
Vector1 =<"term1","term2","term3"> --->
doc1.add("id",new Field("term1",Field.Store.YES,Field.Index.TOKENIZED));
doc1.add("id",new Field("term2",Field.Store.YES,Field.Index.TOKENIZED));
doc1.add("id",new Field("term3",Field.Store.YES,Field.Index.TOKENIZED));
Vector2 =<"term4","term5","term6"> --->
doc1.add("id",new Field ("term4",Field.Store.YES,Field.Index.TOKENIZED));
doc1.add("id",new Field ("term5",Field.Store.YES,Field.Index.TOKENIZED));
doc1.add("id",new Field ("term6",Field.Store.YES,Field.Index.TOKENIZED));
and then get the similarity score between the two docs?
please help.
thanks in advance.
Kamal
Original Message:
As per my knowledge, you'd have to index one of the docs and then run a
<br />query (second doc) to get the similarity score.
<br />Also, the default similarity would take into account more factors than the
<br />regular VSM hence, you'd even have to look into it.
<br />You may write code that on the fly creates a volatile index, runs a query,
(Continue reading)