1 Feb 2010 10:23
[jira] Updated: (LUCENE-2055) Fix buggy stemmers and Remove duplicate analysis functionality
Robert Muir (JIRA <jira <at> apache.org>
2010-02-01 09:23:51 GMT
2010-02-01 09:23:51 GMT
[
https://issues.apache.org/jira/browse/LUCENE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-2055:
--------------------------------
Attachment: LUCENE-2055.patch
apologies for the large patch.
this patch does the following:
* deprecates RussianTokenizer, RussianStemmer, RussianStemFilter, DutchStemmer, DutchStemFilter,
FrenchStemmer, FrenchStemFilter
* use snowball in the above analyzers instead, depending upon version.
* doesn't deprecate germanstemmer, but uses snowball instead (which is maintained and relevance-tested
and supports things like u+umlaut = ue, etc). the old stemmer is kept because it is a different algorithm (alternate).
* the dutchstemmer had 'dictionary based stemming override' support, so to implement this, add
StemmerOverrideFilter which does this in a generic way with KeywordAttribute
* adds KeywordAttribute support to SnowballFilter
* deprecates SnowballAnalyzer in favor of language-specific analyzers.
* adds Romanian and Turkish stopword lists, since snowball is missing them.
* implements language-specific analyzers in place of all the ones snowball tried to do at once before.
> Fix buggy stemmers and Remove duplicate analysis functionality
> --------------------------------------------------------------
>
> Key: LUCENE-2055
> URL: https://issues.apache.org/jira/browse/LUCENE-2055
> Project: Lucene - Java
(Continue reading)
RSS Feed