Olly Betts | 1 Sep 15:35 2011

Re: ????: How to add support of Chinese & Japanese

On Thu, Jul 28, 2011 at 11:52:30AM +0800, Bruce Zhang wrote:
> As online materials said, seems Xapian is going to support CJK, 
> so what's current status of supporting Chinese(simplified, traditional)?
> what's current status of supporting Korean, Japanese respectively?

There's the n-gram approach (ticket#180) which should work for any of
these languages.  That's now merged to trunk and the 1.2 branch, but
you currently have to set an environment variable to enable it.

There's also the segmentation code for Chinese which Dai Youli has
been working on for GSoC, which we're hoping to get merged in fairly
soon too.

As far as I know, nobody has worked on adding specific support for
segmenting Japanese or Korean (there was a potential GSoC applicant
who was looking at Japanese, but they didn't apply in the end).

> I downloaded Xapian-core-1.2.6, xapian-omega-1,2,6, I saw from online
> document that Chinese Segment is in separate folder named segmentation,
>
> I wonder if Chinese segment code is in 1.2.6 or still beta release?

Neither of the approaches being worked on are in a release yet.

> how should I integrate segmentation code with xapian-core-1,2,6 and
> xapian-omega-1.2.6?

It'll need a fair bit of work to integrate it.  The places you'd
want to hook in are similar to where the n-gram CJK code hooks in
if you want to look into this.
(Continue reading)

Olly Betts | 7 Sep 08:17 2011

Re: apply for attend xapian develope !

On Thu, Aug 25, 2011 at 10:55:35PM +0800, ????????? wrote:
> apply for attend xapian develope !

Hi,

I'm not exactly clear what you're asking, but if you saw our listing at
teachingopensource.org, then there's nothing to "attend" - we are simply
offering to help mentor new contributors to Xapian.  Such mentoring
happens remotely over the internet.

Cheers,
    Olly
Samuel Williams | 7 Sep 17:27 2011
Picon

Re: Problems with /bin/cat and flintlock?

Hi,

It seems like the latest patch has fixed the issue. I'm now running with 1.2.7 and don't seem to have the same
deadlock in waitpid on either of my servers.

I'm happy that the problem has been fixed, but even thought I did spend some time working on this, I couldn't
figure out how to create a synthetic test that revealed the problem.

Kind regards,
Samuel

On 2/06/2011, at 12:58 AM, Olly Betts wrote:

> On Sat, May 28, 2011 at 02:04:32AM +1200, Samuel Williams wrote:
>> In my synthetic test, I got the following behavior when attached to /bin/cat:
>> 
>> # strace -p 2091
>> Process 2091 attached - interrupt to quit
>> read(0, 0x89e7000, 32768)               = ? ERESTARTSYS (To be restarted)
>> --- SIGHUP (Hangup)  <at>  0 (0) ---
>> read(0, "", 32768)                      = 0
>> close(0)                                = 0
>> close(1)                                = 0
>> close(2)                                = -1 EBADF (Bad file descriptor)
>> exit_group(0)                           = ?
>> Process 2091 detached
>> 
>> This seems like the desired behavior.
> 
> So in this case, the restarted read() is reporting EOF on stdin.
(Continue reading)

Olly Betts | 14 Sep 07:21 2011

Re: Problems with /bin/cat and flintlock?

On Thu, Sep 08, 2011 at 03:27:46AM +1200, Samuel Williams wrote:
> It seems like the latest patch has fixed the issue. I'm now running
> with 1.2.7 and don't seem to have the same deadlock in waitpid on
> either of my servers.

Cool.

> I'm happy that the problem has been fixed, but even thought I did
> spend some time working on this, I couldn't figure out how to create a
> synthetic test that revealed the problem.

Thanks for trying anyway.

Cheers,
    Olly

Gmane