1 May 2006 05:48
[jira] Commented: (LUCENE-503) Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene
Samphan Raruenrom (JIRA <jira <at> apache.org>
2006-05-01 03:48:47 GMT
2006-05-01 03:48:47 GMT
[ http://issues.apache.org/jira/browse/LUCENE-503?page=comments#action_12377206 ]
Samphan Raruenrom commented on LUCENE-503:
------------------------------------------
> -It uses the english stop words, does that make sense?
Yes. Thai usually mix English words in Thai text here and there. So English stop words should apply but this
is arguable. I'll consull with the developer community.
> -Could you write some test cases, similar maybe to those for the French analyzer?
OK. I'm thinking of writing them.
> Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene
> ---------------------------------------------------------------
>
> Key: LUCENE-503
> URL: http://issues.apache.org/jira/browse/LUCENE-503
> Project: Lucene - Java
> Type: New Feature
> Components: Analysis
> Versions: 1.4
> Reporter: Samphan Raruenrom
> Attachments: ThaiAnalyzer.java, ThaiWordFilter.java
>
> Thai text don't have space between words. Usually, a dictionary-based algorithm is used to break string
into words. For Lucene to be usable for Thai, an Analyzer that know how to break Thai words is needed.
> I've implemented such Analyzer, ThaiAnalyzer, using ICU4j DictionaryBasedBreakIterator for word
(Continue reading)
RSS Feed