Jean-Christophe Helary | 5 Feb 09:11 2007
Picon

library for tokenization of natural languages ?

I am looking for a library that would do basic to reasonably smart  
tokenization of natural language strings.

Like, if fed something in English or French, it creates tokens for  
the things between the spaces, for Japanese, it deals with the non- 
spaced strings in a rule based fashion.

I think Lucene can do that and so montezuma would be a candidate (?),  
but I wonder if any of you has experience with such tools, especially  
for languages that do not use spaces.

Jean-Christophe Helary
Ian Eslick | 5 Feb 14:29 2007
Picon
Picon

Re: library for tokenization of natural languages ?

check out langutils and remind me to do another release soon!

On Feb 5, 2007, at 3:11 AM, Jean-Christophe Helary wrote:

> I am looking for a library that would do basic to reasonably smart
> tokenization of natural language strings.
>
> Like, if fed something in English or French, it creates tokens for
> the things between the spaces, for Japanese, it deals with the non-
> spaced strings in a rule based fashion.
>
> I think Lucene can do that and so montezuma would be a candidate (?),
> but I wonder if any of you has experience with such tools, especially
> for languages that do not use spaces.
>
> Jean-Christophe Helary
>
>
>
>
> _______________________________________________
> Gardeners mailing list
> Gardeners@...
> http://www.lispniks.com/mailman/listinfo/gardeners
Jean-Christophe Helary | 6 Feb 02:12 2007
Picon

Re: library for tokenization of natural languages ?


On 5 févr. 07, at 22:29, Ian Eslick wrote:

> check out langutils and remind me to do another release soon!

Thanks !

Do another release soon !

:)

JC Helary

> On Feb 5, 2007, at 3:11 AM, Jean-Christophe Helary wrote:
>
>> I am looking for a library that would do basic to reasonably smart
>> tokenization of natural language strings.
>>
>> Like, if fed something in English or French, it creates tokens for
>> the things between the spaces, for Japanese, it deals with the non-
>> spaced strings in a rule based fashion.
>>
>> I think Lucene can do that and so montezuma would be a candidate (?),
>> but I wonder if any of you has experience with such tools, especially
>> for languages that do not use spaces.
>>
>> Jean-Christophe Helary
Peter Seibel | 22 Feb 16:12 2007

Re: A batteries package for Common Lisp? [long]

On Wed, 2006-12-20 at 14:32 -0800, Andrew Philpot wrote:
> Battery people:

Hey folks, sorry about the lengthy delay--this message got held up for
being too long and I just now got it out of jail. Sorry about that.

-Peter

Gmane