Martin Porter | 5 Apr 20:06 2011
Picon

Re: Contributing to a Yiddish stemmer


Will,

Hi. Your request is a bit unusual ... let's see what I could suggest. 

If you're not a programmer, I would not advise trying to write programs.
What you might do is to formulate a set of rules for normalising the
vocabulary of Yiddish, and present it on the internet as a "challenge" for
others to code up. The rules could be set out like one of the stemmer
definitions in the snowball site,

http://snowball.tartarus.org/algorithms/german/stemmer.html

I think you should also try & contact others with an interest in retrieval
of texts in Yiddish. Searching in Google is perhaps the best way forward here.

I did not realise the stemming algorithms might be useful in translation.
I'm so involved in IR I tend to think of them as just an adjuct to IR work.

I take it that willhelton.com is your eponymous website. I may mail again
after thinking it over further, meanwhile (if you don't mind) I'll post this
to snowball-discuss, which sometimes generates extra useful ideas, and to
Pat Miles, who helped create the German and Russsian stemmers at snowball,

Martin

At 02:58 PM 4/5/2011 +0100, Will Helton wrote:
>
>Dear Dr Porter,
>
(Continue reading)

adriano allora | 29 Apr 06:35 2011
Picon

less than zero (enriching Italian stemmer)

Hi to all,

first of all: sorry, I'm not a programmer and I read the manual page but I didn't understand how to do what I want to do. And my knowledge of English language isn't good enough, so I hope I can explain what I need.
Well, I'd like to do something not so complicate: simply add some morphemes to the Italian stemmer: at this time it doesn't stem correctly superlative adjectives. For example, bello and bellissimo are only two different forms of the same word:

bello (handsome, beauty) -> bell
bellissimo (very handsome, very beauty) -> bellissim

should be

bello (handsome, beauty) -> bell
bellissimo (very handsome, very beauty) -> bell

so, I opened the files stem_ISO_8859_1_italian.c and stem_UTF8_italian.c and i modified this block:

static const symbol s_4_0[2] = { 'i', 'c' };
static const symbol s_4_1[4] = { 'a', 'b', 'i', 'l' };
static const symbol s_4_2[2] = { 'o', 's' };
static const symbol s_4_3[2] = { 'i', 'v' };

static const struct among a_4[4] =
{
/*  0 */ { 2, s_4_0, -1, -1, 0},
/*  1 */ { 4, s_4_1, -1, -1, 0},
/*  2 */ { 2, s_4_2, -1, -1, 0},
/*  3 */ { 2, s_4_3, -1, 1, 0}
};

this way:

static const symbol s_4_0[2] = { 'i', 'c' };
static const symbol s_4_1[4] = { 'a', 'b', 'i', 'l' };
static const symbol s_4_2[2] = { 'o', 's' };
static const symbol s_4_3[2] = { 'i', 'v' };
static const symbol s_4_4[5] = { 'i', 's', 's', 'i', 'm' };
static const symbol s_4_5[5] = { 'e', 'r', 'r', 'i', 'm' };

static const struct among a_4[6] =
{
/*  0 */ { 2, s_4_0, -1, -1, 0},
/*  1 */ { 4, s_4_1, -1, -1, 0},
/*  2 */ { 2, s_4_2, -1, -1, 0},
/*  3 */ { 2, s_4_3, -1, 1, 0},
/*  4 */ { 2, s_4_4, -1, -1, 0},
/*  5 */ { 2, s_4_5, -1, -1, 0}
};

I hoped just adding two new morphemes could be enough. I rebuilt and reinstall the pystemmer module but it doesn't work.
Can someone help me? Where I've done my mistake?
Please don't be too much specific: you're writing to a guy who can just open a shell and put in few commands.

thank you!

alladr

_______________________________________________
Snowball-discuss mailing list
Snowball-discuss <at> lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss
Martin Porter | 29 Apr 11:02 2011
Picon

Re: less than zero (enriching Italian stemmer)


Ciao, Adriano!

This problem has come up before. The generated tables in the C/Java code
have a structure that is determined by a clever algorithm that does fast
lookups of the endings. If you add in extra endings the tables need to be
completely changed. The only way to extend the tables therefore is to alter
the snowball source, download a compile the snowball compiler, and generate
new C/Java code.

Martin

>....Well, I'd like to do something not so complicate: simply add some morphemes
>to the Italian stemmer ....
adriano allora | 29 Apr 16:19 2011
Picon

Re: less than zero (enriching Italian stemmer)

Hi Martin!

Wow, you've just opened a new world to me! It's my first C script and, you see, I can be stupid, but I never be coward when I see something completely... °___°

No. ok, seriously: I downloaded a gzip archive named snowball_web_and_code which contains all the source code for snowball.
I opened it and see several things very interesting (for instance adding some stopwords, but it's not necessary doing all now: there is time for further improvements), so: thank you for this.
But I beg your pardon: now I'm not sure about what I have to do.
1) can I simply change the files stem_ISO_8859.sbl and stem_MS_DOS_LATIN.sbl and in the directory named algorithms/italian? if not: where is the source file I have to change in order to add morphemes to Italian algorithm?
2) after changing the algorithm what I exactly have to do? It's reasonable to assume that compiling it (gcc -O -o Snowball compiler/*.c) will not result in in the python module. and the guide to wrappers didn't help me. hmmm... is there a howto for this cases?

Thank you for your patience, I figure my questions seem silly to a person who know all that stuff I ignore, but probably for this kind of software it's necessary that progammers mix with grammarians.

thank you a lot for all

adriano

2011/4/29 Martin Porter <martin <at> porterloo.wanadoo.co.uk>

Ciao, Adriano!

This problem has come up before. The generated tables in the C/Java code
have a structure that is determined by a clever algorithm that does fast
lookups of the endings. If you add in extra endings the tables need to be
completely changed. The only way to extend the tables therefore is to alter
the snowball source, download a compile the snowball compiler, and generate
new C/Java code.

Martin



>....Well, I'd like to do something not so complicate: simply add some morphemes
>to the Italian stemmer ....




_______________________________________________
Snowball-discuss mailing list
Snowball-discuss <at> lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss
Neil Ghosh | 17 Apr 13:45 2011
Picon

sbStemmer not found

I am getting following exception while running snowball analyzer

Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: org.tartarus.snowball.ext.sbStemmer

Kindly suggest where to download this.


--
Thanks and Regards
Neil
http://neilghosh.com



_______________________________________________
Snowball-discuss mailing list
Snowball-discuss <at> lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss

Gmane