Martin Porter | 1 Sep 2005 18:15
Picon

Re: Italian Stemmer with C#


Federico,

The differences you note are because alzare is a verb with a very short stem
- alz - and my Italian stemmer demands a longer stem length before it takes
anything off. So the difference must be in determining R1 and R2.

Short verb stem are a problem for the stemmers in the romance languages:
rier in French, orare in Italian etc. 

If you believe you are getting better overall results with a different
measure of R1 and R2, let me know the rules you are using! 

Martin

------------------

>Dear Mr. Porter,
>  I've found "Snowball" page during my search in the
>internet about available stemmer softwares.
>I've created (starting from a German version program
>on http://www.codeproject.com/csharp/destemming.asp)
>an Italian version using your rules describe in the
>page on italian language.
>
>After some tests between my code and your snowball
>results on the italian languages, I noticed some
>differences. There can be a little mismatch into the
>code (in mine program or in snowball)?
>
(Continue reading)


Gmane