Picon
Favicon

Spanish stop words

Hello,

I am fairly new to Snowball and to this list, so please excuse me for my ignorance in advance.

I am planning to use the Spanish stemming algorithm and I would like to point out some words in the Spanish
stop word list which I think may be wrong.

line 123: vosostros     | correct word would be 'vosotros' (you male plural)
line 124: vosostras     | correct word would be 'vosotras' (you female plural)
        | the following words are from verb sentir (to feel) and not from verb ser (to be)
line 294: sintiendo     | correct word would be 'siendo'
line 295: sentido               | correct word would be 'sido' (same word for male/female/singular/plural)
line 296: sentida               | correct word would be 'sido' (same word for male/female/singular/plural)
line 297: sentidos      | correct word would be 'sido' (same word for male/female/singular/plural)
line 298: sentidas      | correct word would be 'sido' (same word for male/female/singular/plural)
line 299: siente                | correct word would be 'es'
line 300: sentid                | correct word would be 'sed' (the same word than in thirst)

I fixed the words in my file, but I think it would be useful to everyone to fix the original file too.

Which would be the right procedure to do that?

Thank you.
Julio Fraire | 6 Jun 2008 18:48
Picon

Re: Spanish stop words

Actually:

       | the following words are from verb sentir (to feel) and not from verb ser (to be)
line 294: sintiendo     | correct word would be 'siendo'
line 295: sentido               | correct word would be 'sido' (same word for male/female/singular/plural)
line 296: sentida               | correct word would be 'sido' (same word for male/female/singular/plural)
line 297: sentidos      | correct word would be 'sido' (same word for male/female/singular/plural)
line 298: sentidas      | correct word would be 'sido' (same word for male/female/singular/plural)
line 299: siente                | correct word would be 'es'
line 300: sentid                | correct word would be 'sed' (the same word than in thirst)

Those are correct. Words are derived from verb "sentir" and not from "ser", as your correction suggests. Verb "ser" is escaped in other parts of the stop list.

The first two words you mention are indeed a mistake (vosostros and vosostras).

Julio Fraire

On Fri, Jun 6, 2008 at 2:14 AM, Gonzalez, Francisco (C&I Spain) <francisco.gonzalez-pascual <at> hp.com> wrote:
Hello,

I am fairly new to Snowball and to this list, so please excuse me for my ignorance in advance.

I am planning to use the Spanish stemming algorithm and I would like to point out some words in the Spanish stop word list which I think may be wrong.

line 123: vosostros     | correct word would be 'vosotros' (you male plural)
line 124: vosostras     | correct word would be 'vosotras' (you female plural)
       | the following words are from verb sentir (to feel) and not from verb ser (to be)
line 294: sintiendo     | correct word would be 'siendo'
line 295: sentido               | correct word would be 'sido' (same word for male/female/singular/plural)
line 296: sentida               | correct word would be 'sido' (same word for male/female/singular/plural)
line 297: sentidos      | correct word would be 'sido' (same word for male/female/singular/plural)
line 298: sentidas      | correct word would be 'sido' (same word for male/female/singular/plural)
line 299: siente                | correct word would be 'es'
line 300: sentid                | correct word would be 'sed' (the same word than in thirst)

I fixed the words in my file, but I think it would be useful to everyone to fix the original file too.

Which would be the right procedure to do that?

Thank you.


_______________________________________________
Snowball-discuss mailing list
Snowball-discuss <at> lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss <at> lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss
Martin Porter | 9 Jun 2008 16:22
Picon
Favicon

Re: Spanish stop words


Thanks to Francisco Gonzalez and Julio Fraire for this. I'll sort it out
and put something new in place in the next few days,

Martin
Martin Porter | 23 Jun 2008 15:46
Picon
Favicon

Re: Spanish stop words

Francisco,

I should have explained that a new Spanish stop list has been put in
place (on 16 June), with your corrections,

Martin

Gmane