1 Sep 2011 15:35
Re: ????: How to add support of Chinese & Japanese
Olly Betts <olly <at> survex.com>
2011-09-01 13:35:50 GMT
2011-09-01 13:35:50 GMT
On Thu, Jul 28, 2011 at 11:52:30AM +0800, Bruce Zhang wrote: > As online materials said, seems Xapian is going to support CJK, > so what's current status of supporting Chinese(simplified, traditional)? > what's current status of supporting Korean, Japanese respectively? There's the n-gram approach (ticket#180) which should work for any of these languages. That's now merged to trunk and the 1.2 branch, but you currently have to set an environment variable to enable it. There's also the segmentation code for Chinese which Dai Youli has been working on for GSoC, which we're hoping to get merged in fairly soon too. As far as I know, nobody has worked on adding specific support for segmenting Japanese or Korean (there was a potential GSoC applicant who was looking at Japanese, but they didn't apply in the end). > I downloaded Xapian-core-1.2.6, xapian-omega-1,2,6, I saw from online > document that Chinese Segment is in separate folder named segmentation, > > I wonder if Chinese segment code is in 1.2.6 or still beta release? Neither of the approaches being worked on are in a release yet. > how should I integrate segmentation code with xapian-core-1,2,6 and > xapian-omega-1.2.6? It'll need a fair bit of work to integrate it. The places you'd want to hook in are similar to where the n-gram CJK code hooks in if you want to look into this.(Continue reading)
RSS Feed