Lucian Nicolescu | 1 Aug 15:03

Multithreaded read access

Hi all,

I have a Xapian index set up and a mutithreaded xmlrpc server serving
results. 

Inside the xmlrpc server I only open the xapian database for reading once.
This causes requests to "stay in line" in order to get results and sometimes
a one operation can take up to 1 second to complete (a time-taking operation
includes text analisys and some more operations besides the actual Xapian
interaction) thus delaying all the queued requests.

I read lots of documents about xapian in mutithreaded environment and
concluded that even for read access one cannot open the same database more
than once and perform operations at the same time.

Because I did not find a straight answer to this I am hoping one of you guys
can clear this once and for all.

I am using Xapian 0.9.6, Python bindings and Twisted Python XMLRPC package.

Thanks,
Lucian Nicolescu 
Richard Boulton | 1 Aug 15:17

Re: Multithreaded read access

Lucian Nicolescu wrote:
> Inside the xmlrpc server I only open the xapian database for reading once.
> This causes requests to "stay in line" in order to get results and sometimes
> a one operation can take up to 1 second to complete (a time-taking operation
> includes text analisys and some more operations besides the actual Xapian
> interaction) thus delaying all the queued requests.
> 
> I read lots of documents about xapian in mutithreaded environment and
> concluded that even for read access one cannot open the same database more
> than once and perform operations at the same time.

This is a misconception - for read access, you can open the same 
database as many times as you like concurrently (well, as many as your 
operating system will allow - eventually, you'll run out of file 
handles).  However, each of the Database objects you get by opening the 
database must not be accessed concurrently.  A typical model might be to 
create N threads, and use N Database objects, one for each thread.

> I am using Xapian 0.9.6, Python bindings and Twisted Python XMLRPC package.

Just to note - Xapian 0.9.6 is rather old - the latest release is 1.0.2. 
  1.0.2 has the benefit that the Python bindings are set up to release 
the GIL whenever a potentially long-running xapian operation is in 
progress,  which will give better performance in a concurrent environment.

--

-- 
Richard
Lucian Nicolescu | 1 Aug 15:37

RE: Multithreaded read access

Thanks a lot, things are clear now, this will definitely solve my problem.
I am planning a update to 1.0.2 but I need to thouroughly test it before and
reindex all documents.

Lucian Nicolescu  

> -----Original Message-----
> From: Richard Boulton [mailto:richard <at> lemurconsulting.com] 
> Sent: Wednesday, August 01, 2007 4:17 PM
> To: Lucian Nicolescu
> Cc: xapian-discuss <at> lists.xapian.org
> Subject: Re: [Xapian-discuss] Multithreaded read access
> 
> Lucian Nicolescu wrote:
> > Inside the xmlrpc server I only open the xapian database 
> for reading once.
> > This causes requests to "stay in line" in order to get 
> results and sometimes
> > a one operation can take up to 1 second to complete (a 
> time-taking operation
> > includes text analisys and some more operations besides the 
> actual Xapian
> > interaction) thus delaying all the queued requests.
> > 
> > I read lots of documents about xapian in mutithreaded 
> environment and
> > concluded that even for read access one cannot open the 
> same database more
> > than once and perform operations at the same time.
> 
(Continue reading)

alan runyan | 1 Aug 15:23

Re: Multithreaded read access

> I read lots of documents about xapian in mutithreaded environment and
> concluded that even for read access one cannot open the same database more
> than once and perform operations at the same time.

You can open up multiple (ReadOnly) instances of the database.  The problem is
the concurrency between write & read operations with different connections to a
Xapian database.  If your read operations need to be in-sync with committed
write operations,  you will need to re-open the ReadOnly instances before each
read. 

alan
Lucian Nicolescu | 1 Aug 18:06

RE: Re: Multithreaded read access

Took care of that, the database is reopened every 60 minutes.

Thanks,
Lucian Nicolescu  

> -----Original Message-----
> From: xapian-discuss-bounces <at> lists.xapian.org 
> [mailto:xapian-discuss-bounces <at> lists.xapian.org] On Behalf Of 
> alan runyan
> Sent: Wednesday, August 01, 2007 4:23 PM
> To: xapian-discuss <at> lists.xapian.org
> Subject: [Xapian-discuss] Re: Multithreaded read access
> 
> > I read lots of documents about xapian in mutithreaded 
> environment and
> > concluded that even for read access one cannot open the 
> same database more
> > than once and perform operations at the same time.
> 
> You can open up multiple (ReadOnly) instances of the 
> database.  The problem is
> the concurrency between write & read operations with 
> different connections to a
> Xapian database.  If your read operations need to be in-sync 
> with committed
> write operations,  you will need to re-open the ReadOnly 
> instances before each
> read. 
> 
> alan
(Continue reading)

Kevin Duraj | 1 Aug 23:09
Picon

Xapian based spam filter using Bayesian algorithm.

Hi,

I am building Xapian based spam filter using Bayesian algorithm.
Building two separate search engines for spam and ham corpus that can
efficiently determine whether the message is spam or ham. Let me know
if there is some spam filter implementation using Xapian, thanks.

Bayesian algorithm ...

p = Probability of term
s = Number of occurrences in Spam Corpus
m = Number of messages in Spam Corpus
h = Number of occurrences in Ham Corpus
n = Number of messages in Ham Corpus

                 (s / m)
  p = -----------------------------
      ( (s / m) + ( (h * 2) / n ) )

--

-- 
Cheers,
   Kevin Duraj
   http://pacificair.com
David Morris | 3 Aug 08:24
Picon
Favicon

Boolean merging of DB's

Hi,

I'm wondering if there is a xapian-compact tool that lets me do boolean merging
of  2 xapian db's into one db.

Take this scenario: I have db 'A' with 100 doc's and db 'B' with 30 doc's. Most
of the docs in B will also be in A, but with newer content, so when I merge
them, I want to make a db 'C' with everything in A, but anything newer that's
contained in B will overwrite any A entries.

Makes sense?

(captcha: compacts)
David Morris | 3 Aug 08:42
Picon
Favicon

Re: Boolean merging of DB's


David Morris <dmorris <at> sirca.org.au> writes:

> 
> Hi,
> 
> I'm wondering if there is a xapian-compact tool that lets me do boolean merging
> of  2 xapian db's into one db.
> 
> Take this scenario: I have db 'A' with 100 doc's and db 'B' with 30 doc's. Most
> of the docs in B will also be in A, but with newer content, so when I merge
> them, I want to make a db 'C' with everything in A, but anything newer that's
> contained in B will overwrite any A entries.
> 
> Makes sense?
> 
> (captcha: compacts)
> 

Actually, please ignore me, this doesn't quite make sense. There's no way to
know the mapping of a document in one db to the "same" document in another db...

It's all good, I can make a custom merge app that can do this...
Andreas Marienborg | 6 Aug 08:16
Picon
Gravatar

Re: Re: Filtering Search Results By Date in PHP


On Jul 26, 2007, at 11:27 PM, Benny Chan wrote:

> OK,
>
> I found my mistake below. I actually left out the friggin'  
> queryparser:
>
>        $query_parser = new_queryparser();
>
>        $rp = new_datevaluerangeprocessor(1, true);
>        queryparser_add_valuerangeprocessor($query_parser, $rp);
>
> So now I can search for ranges and stuff. When I use a search  
> string like
>
> "1/1/2000..1/1/2007"
>
> I print out the parsed query and get:
>
> VALUE_RANGE 1 20000101 20070101
>
> I look at the term list for a document and it shows
>
> D20050101 1 1
> M200501 1 1
> Y2005 1 1
>
> So I'm thinking the documents are being indexed correctly. BUT I'm  
> still not
(Continue reading)

Charlie Hull | 7 Aug 16:02

Re: Re: Xapian pubmeet

Charlie Hull wrote:
> Fabrice Colin wrote:
>> On 7/25/07, " ? ??? ? (Yung-chung Lin) " <henearkrxern <at> gmail.com> wrote:
>>> Hi,
>>>
>>> A gathering in Taipei? That would be great! I'll be there.
>>>
>> I should be able to make it too ! :-) I'll confirm closer to the date.
>>
>> Fabrice
>>
> 
> I've added this to the Wiki. Please edit and add to the Wiki page if you 
> can attend one of the meetings (we still need to confirm a date for 
> London).
> 
> Cheers
> 
> Charlie
> 
I suggest we fix the details of the London Xapian pubmeet as:

Thursday 13th September
from 6pm
at the Pembury Tavern http://www.individualpubs.co.uk/pembury/ (details 
on the site of how to get there)

Hope to meet some of you there!

Cheers
(Continue reading)


Gmane