Martin Porter | 1 Jun 2013 09:31
Picon

Re: A stemmer for latvian

Can someone else help with or comment on this? The libstemmer driver
was put together by Richard Boulton, and I myself have never actually
used it, so I'm not best placed to assist.

Vitālijs, it does sound like a problem in modifying the make script,
or something similar, and it might be easier to get help locally in
the Department in Latvia, -- Martin

On Fri, May 31, 2013 at 10:32 AM, Vitālijs Mikeļevičs
<vitalijs.mikelevics <at> gmail.com> wrote:
> Hello,
>
> I'm currently studying Computer Science in University of Latvia and as a
> part of my bachelor's thesis I'm recreating Karlis Kreslin's stemmer in
> Snowball for later use in Sphinx SE and (possibly) adding it to Snowball
> project.
>
> Alas, I'm having problems running it:
> 1. I've downloaded "Snowball, algorithms, and libstemmer library." from
> http://snowball.tartarus.org/download.php
> 2. I've written stem_UTF_8.sbl and put it into algorithms/latvian
> 3. I've added Latvian to list of languages in GNUmakefile and also add it to
> "other_languages", because it requires UTF-8
> 4. I've added Latvian to libstemmer/modules.txt and modules_utf8.txt as
> latvian --> UTF_8 --> latvian,lv
> 5. I run "make", everything compiles, headers are created, modules are
> updated, libstemmer/mkinc.mak is updated, yet... when I run "stemwords -l
> latvian" it tells me that "language 'latvian' is not available for
> stemming". I've tried what seems to be everything, yet it still somehow
> doesn't work.
(Continue reading)

Martin Porter | 1 Jun 2013 09:21
Picon

Re: Bad Stem: university; universe; universal -> univers

Clem,

You are the first person to point that one out to me. Thanks. I will
look into it, and there may be an upgrade in the future --- Martin
jf | 24 May 2013 09:07

English stemmer and 'ian' suffix

Hello,

Is there is a reason why the English stemmer does not seem to
handle a 'ian' suffix: politician, orwellian, keynesian... ?

I guess that the question already came up ?

Regards,

J.F. Dockès
Martin Porter | 17 Apr 2013 07:47
Picon

Re: Porter stemming not dealing well with -gist -gists

Hi Marc,

The "Porter stemmer" is now frozen in time, but this could be a
something to add to the snowball "English stemmer". It's connected to
the fact that -ist is not, in general, removed. -ologist -> olog
could, however, be added.

I've put your suggestion on the snowball-discuss list,

Thanks, Martin

On Tue, Apr 16, 2013 at 6:29 PM, Marc Schipperheijn
<m.schipperheyn <at> gmail.com> wrote:
> Hi,
>
> The Oncologist reviewed the results and decided oncology was not for him.
>
> In this example, the stemming filter does't correctly identify the stem of Oncologists as oncol. Since
there is a whole class of -gists in the world, particularly in the medical world, this seems an omission.
>
> Perhaps a next version of the algorithm can deal with this.
>
> Kind regards,
> Marc
Hajime Senuma | 14 Apr 2013 20:16
Picon

Snowball Plugin for Pygments Syntax Highlighter

Hello all,

I wrote a Snowball lexer plugin for Pygments, a popular syntax highlighter written with Python:

For ease of implementation, the lexer has a defect that it assumes stringescapes are curly brackets, but it'll work for most Snowball codes.
I would be glad if you could use this to make codes in your html documents more readable.
I'm also willing to receive any suggestions before I commit the lexer to the official Pygments repo.

Below are two output examples of Pygments Snowball Plugin.

Default(-styled) Porter Snowball:

Colorful Russian Snowball (far more vivid than colorless green):

Thanks,
Hajime
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss <at> lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss
Adrien Grand | 6 Apr 2013 16:54
Picon
Gravatar

Adding "cela" to the list of French stop words

Hi,

I am working on Lucene and Solr, which use Snowball's stop words lists
for their analyzers. One of our users, Pierre Kobylanski, suggested to
add "cela" to the list of French stop words:
  https://issues.apache.org/jira/browse/LUCENE-4911

I thought you would be interested to add it to
http://svn.tartarus.org/snowball/trunk/website/algorithms/french/stop.txt
as well. Don't hesitate to let me know if I can help get this
included.

Best regards,

--
Adrien
Martin Porter | 2 Apr 2013 10:45
Picon

Re: JavaScript version of the stemmer

> In passing, I did not find the link below at what I thought might be the main page for this work: http://snowball.tartarus.org/index.php.

It is a bit confusing, but the snowball system is a later development.
snowball goes back to 2002, the porter stemmer page was put up some
time in the 1990s, I can't recall exactly when.
John Gage | 31 Mar 2013 22:38
Picon

JavaScript version of the stemmer

Is there a JavaScript version of the stemmer lurking out there?  Please note that there is a server and a client/browser version of JavaScript, the former being nodejs.

John Gage
_______________________________________________
Snowball-discuss mailing list
Snowball-discuss <at> lists.tartarus.org
http://lists.tartarus.org/mailman/listinfo/snowball-discuss
Shrinivasan T | 27 Mar 2013 13:05
Picon

Make the list archives public

Hello all,

I am new to the list.

I am wondering why our list archives are private?
http://lists.tartarus.org/mailman/private/snowball-discuss/

Please make them public so that it reaches all the public.

Thanks.

--

-- 
Regards,
T.Shrinivasan

My Life with GNU/Linux : http://goinggnu.wordpress.com
Free/Open Source Jobs : http://fossjobs.in

Get CollabNet Subversion Edge :     http://www.collab.net/svnedge
Shrinivasan T | 27 Mar 2013 13:03
Picon

Fwd: How to add Tamil Support to stemmer?

Hi Martin,

Thanks for the reply.

>
> What is supplied there is a stemmer written in snowball, rather than a
> patch. To get using it, I'd just dowload the snowball compiler,
> compile it into C or java, and follow the instructions for running it.
> Best help would probably come from the author (R,Damodharan ?).
>

The patch for stemmer for tamil language is here.
https://github.com/rdamodharan/tamil-stemmer/blob/master/snowball-tamil.patch

We apply the patch and compile stemmer to make it work with tamil language.

How to add the patch to the upstream stemmer?

--
Regards,
T.Shrinivasan

My Life with GNU/Linux : http://goinggnu.wordpress.com
Free/Open Source Jobs : http://fossjobs.in

Get CollabNet Subversion Edge :     http://www.collab.net/svnedge
Shrinivasan T | 27 Mar 2013 13:03
Picon

Fwd: How to add Tamil Support to stemmer?

Forwarding Martin's reply to the list.

---------- Forwarded message ----------
From: Martin Porter <martin.f.porter <at> gmail.com>
Date: Wed, Mar 27, 2013 at 4:25 PM
Subject: Re: [Snowball-discuss] How to add Tamil Support to stemmer?
To: Shrinivasan T <tshrinivasan <at> gmail.com>

T.Shrinivasan,

Thank you for telling us about this. It is very interesting to see
snowball being used for one of the many languages of the Indian
sub-continent.

What is supplied there is a stemmer written in snowball, rather than a
patch. To get using it, I'd just dowload the snowball compiler,
compile it into C or java, and follow the instructions for running it.
Best help would probably come from the author (R,Damodharan ?).

I don't know if anyone else on snowball-discuss wants to add to that.

But looking at the source, the long sequences

    string or string or string ....

really should be replaced by 'among' expressions. As well as looking
tidier, it will then run a zillion times faster.

Martin

On Wed, Mar 27, 2013 at 9:16 AM, Shrinivasan T <tshrinivasan <at> gmail.com> wrote:
> Hello All,
>
> Tamil is a language spoken in India.
> http://en.wikipedia.org/wiki/Tamil_language
>
> One of my friend created a patch to snowball for Tamil language.
>
> We can get the patch from here.
> https://github.com/rdamodharan/tamil-stemmer
>
> Please guide me on how to add the tamil language support to snowball.
>
> So that tamil support will be available for python-stemmer too.
>
> Thanks.
>
> --
> Regards,
> T.Shrinivasan
>

--

-- 
Regards,
T.Shrinivasan

My Life with GNU/Linux : http://goinggnu.wordpress.com
Free/Open Source Jobs : http://fossjobs.in

Get CollabNet Subversion Edge :     http://www.collab.net/svnedge

Gmane