Language Identification
Following up on:
http://bloc.eurion.net/archives/2010/language-identification-and-free-software/
"""
F Wolff says:
September 27, 2010 at 10:32
We already implemented our own Python code based on the n-gram
technique in the Translate Toolkit, which is currently used by Virtaal
to help users do language selection. You can see the code here:
http://translate.svn.sourceforge.net/viewvc/translate/src/trunk/translate/lang/ngram.py
http://translate.svn.sourceforge.net/viewvc/translate/src/trunk/translate/lang/identify.py
It is based on a toy Python implementation we found at the time.
Please work with us to make the best Python language detection
available. Our code works well, but the models aren’t great, and there
are some languages it struggles to identify at the moment. We tried to
remove some of the incorrect models from our copy to try to improve
the quality. Let me know if you want to discuss some possibilities for
reuse/factoring out.
"""
I would like to factor the language detection out into a separate package.
Background: language detection is useful in many use cases. The existing
packages on Pypi are not up to par with your implementation.
I need e.g. to detect the language of a document uploaded to a CMS
(editors just do not put in the metadata)
Proposal:
1) implement language detection as is into an independent package which can
be installed from pypi with easy_install or pip
2) add Bigram language detection: this is useful as many languages can be
a) identified on Bigrams alone or b) to get a narrowed down choice of languages
to be processed by trigrams
e.g. 'bork bork' would be narrowed down to (maybe) some European languages
to determine the language more accurately it does not have to be compared to the
trigrams of say Russian or Chinese.
Please let me know what you think and if you prefer a locations (e.g your svn)
--
Best Regards,
Christian Ledermann
Nairobi - Kenya
Mobile : +254 702978914
<*)))>{
If you save the living environment, the biodiversity that we have left,
you will also automatically save the physical environment, too. But If
you only save the physical environment, you will ultimately lose both.
1) Don’t drive species to extinction
2) Don’t destroy a habitat that species rely on.
3) Don’t change the climate in ways that will result in the above.
}<(((*>
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Translate-devel mailing list
Translate-devel <at> lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/translate-devel