Olly Betts | 1 Jul 16:49
Favicon
Gravatar

Re: Remote Backend Question

On Sun, May 20, 2007 at 05:30:23PM -0400, Ram Peters wrote:
> Can you have two indexing process on two different machine, accessing
> the backend on a different machine?

I just noticed this old message, and it doesn't seem to have been
replied to.

Xapian only supports a single writer on a database at a time, so you
can't have two remote clients writing to the same remote server at
once.

A single writable server can allow non-concurrent connections from
multiple clients.  A client will fail to connect if another client is
already writing, so you'll need to implement a "wait and retry"
strategy, and make sure your clients close the database when they
aren't using it.

Cheers,
    Olly
Jim | 2 Jul 12:53
Favicon

Would like to have a sort by a combination of date and relevance.

I've searched and saw a reference to a SORTBAND in the archives but 
don't see anything in the docs for OMEGA referencing it.  It seems like 
I saw a comment that it was being taken out.

Anyway, I have a client that would like to bias the sort by relevance by 
the date somehow.  I don't see any way to do that in omega.  I don't 
even know what's reasonable.  I could see taking the MSD from the 
relevance and averaging that with the inverse of age of the document 
with some kind of floor placed on the age so recent docs with lesser 
relevance would be moved closer to the top.   The floor would be 
necessary to prevent old docs from being pushed too far  down.

Has anyone actually done anything like that?

Thanks,
Jim.
Yan Hao | 4 Jul 11:54
Picon

Stemming problem

Does anyone know if xapian stemming support suffix -er? I tried -s and -ing 
both work, but not -er.

_________________________________________________________________
与联机的朋友进行交流,请使用 MSN Messenger:  http://messenger.msn.com/cn  
James Aylett | 4 Jul 12:23

Re: Stemming problem

On Wed, Jul 04, 2007 at 05:54:32PM +0800, Yan Hao wrote:

> Does anyone know if xapian stemming support suffix -er? I tried -s
> and -ing both work, but not -er.

Do you have an example word where -er should be removed in stemming?

J

--

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james <at> tartarus.org                               uncertaintydivision.org
Olly Betts | 4 Jul 13:45
Favicon
Gravatar

Re: Stemming problem

On Wed, Jul 04, 2007 at 05:54:32PM +0800, Yan Hao wrote:
> Does anyone know if xapian stemming support suffix -er? I tried -s and -ing 
> both work, but not -er.

Yes, suffix -er is removed in step 4:

http://snowball.tartarus.org/algorithms/english/stemmer.html

And "computer" stems to "comput" (as do "compute", "computation", etc).

There are rules about when steps are applied, so short words like "infer"
and "over" don't have -er removed.  Certainly in these two cases, removing
-er wouldn't be beneficial.

Cheers,
    Olly
Yan Hao | 4 Jul 14:38
Picon

Re: Stemming problem

I created sample files with test, tester, testing, tests. A query of "test" 
could find all of them except "tester".

>From: James Aylett <james-xapian <at> tartarus.org>
>To: xapian-discuss <at> lists.xapian.org
>Subject: Re: [Xapian-discuss] Stemming problem
>Date: Wed, 4 Jul 2007 11:23:16 +0100
>
>On Wed, Jul 04, 2007 at 05:54:32PM +0800, Yan Hao wrote:
>
> > Does anyone know if xapian stemming support suffix -er? I tried -s
> > and -ing both work, but not -er.
>
>Do you have an example word where -er should be removed in stemming?
>
>J
>
>--
>/--------------------------------------------------------------------------\

>   James Aylett                                                  
xapian.org
>   james <at> tartarus.org                               
uncertaintydivision.org
>
>_______________________________________________
>Xapian-discuss mailing list
>Xapian-discuss <at> lists.xapian.org
>http://lists.xapian.org/mailman/listinfo/xapian-discuss

(Continue reading)

Richard Boulton | 4 Jul 14:43

Re: Stemming problem

Yan Hao wrote:
> I created sample files with test, tester, testing, tests. A query of 
> "test" could find all of them except "tester".

I'm not convinced that "tester" should stem to "test" - the meaning is
quite different.  Also, that would have to be a special case: for
example, a rule to convert "tester" to "test" would also convert
"master" to "mast" - which is definitely wrong.

--

-- 
Richard
James Aylett | 4 Jul 14:59

Re: Stemming problem

On Wed, Jul 04, 2007 at 01:43:27PM +0100, Richard Boulton wrote:

> > I created sample files with test, tester, testing, tests. A query of 
> > "test" could find all of them except "tester".
> 
> I'm not convinced that "tester" should stem to "test" - the meaning is
> quite different.  Also, that would have to be a special case: for
> example, a rule to convert "tester" to "test" would also convert
> "master" to "mast" - which is definitely wrong.

Most short -er words shouldn't stem the -er off, I suspect. (In
general, verbs?) I think we're stemming if the prefix >= 6 characters?
I don't really speak Snowball...

'Computer' is an interesting one. Stemming is doing semantic
conflation with 'compute'. Not sure if 'compute' is common enough we
should care, though.

J

--

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james <at> tartarus.org                               uncertaintydivision.org
Richard Boulton | 4 Jul 15:03

Re: Stemming problem

James Aylett wrote:
> 'Computer' is an interesting one. Stemming is doing semantic
> conflation with 'compute'. Not sure if 'compute' is common enough we
> should care, though.

Well, given that compute is uncommon, but pretty much always related to 
computers when it does occur, I'm fairly sure it doesn't matter. :)
James Aylett | 4 Jul 15:05

Re: Stemming problem

On Wed, Jul 04, 2007 at 02:03:12PM +0100, Richard Boulton wrote:

> >'Computer' is an interesting one. Stemming is doing semantic
> >conflation with 'compute'. Not sure if 'compute' is common enough we
> >should care, though.
> 
> Well, given that compute is uncommon, but pretty much always related to 
> computers when it does occur, I'm fairly sure it doesn't matter. :)

Yeah, that's kind of what I meant. :)

J

--

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james <at> tartarus.org                               uncertaintydivision.org

Gmane