1 Sep 2005 18:15
Re: Italian Stemmer with C#
Martin Porter <martin.porter <at> grapeshot.co.uk>
2005-09-01 16:15:21 GMT
2005-09-01 16:15:21 GMT
Federico, The differences you note are because alzare is a verb with a very short stem - alz - and my Italian stemmer demands a longer stem length before it takes anything off. So the difference must be in determining R1 and R2. Short verb stem are a problem for the stemmers in the romance languages: rier in French, orare in Italian etc. If you believe you are getting better overall results with a different measure of R1 and R2, let me know the rules you are using! Martin ------------------ >Dear Mr. Porter, > I've found "Snowball" page during my search in the >internet about available stemmer softwares. >I've created (starting from a German version program >on http://www.codeproject.com/csharp/destemming.asp) >an Italian version using your rules describe in the >page on italian language. > >After some tests between my code and your snowball >results on the italian languages, I noticed some >differences. There can be a little mismatch into the >code (in mine program or in snowball)? >(Continue reading)
RSS Feed