JENNINGS, Ashley | 4 Aug 2010 17:02
Favicon

Xapian binary download

I have been looking for an open source search engine for use on a company intranet site to search certain file
repositories. I am not a software developer so compiling myself is not an option. Is there a binary
download available for use in JavaScript? Also, is Xapian able to index the contents of office documents?

Regards

Ashley

Ashley Jennings
Fuel Management & Indication - EDYUCC
07L Module 2
Phone: +44 (0)117 93 61448
Mobile: +44 (0)7725 082030

The information in this e-mail is confidential. The contents may not be disclosed or used by anyone other
than the addressee. Access to this e-mail by anyone else is unauthorised.
If you are not the intended recipient, please notify Airbus immediately and delete this e-mail.
Airbus cannot accept any responsibility for the accuracy or completeness of this e-mail as it has been sent
over public networks. If you have any concerns over the content of this message or its Accuracy or
Integrity, please contact Airbus immediately.
All outgoing e-mails from Airbus are checked using regularly updated virus scanning software but you
should take whatever measures you deem to be appropriate to ensure that this message and any attachments
are virus free.
Charlie Hull | 5 Aug 2010 10:23
Picon
Favicon

Re: Xapian binary download

On 04/08/2010 16:02, JENNINGS, Ashley wrote:
> I have been looking for an open source search engine for use on a company intranet site to search certain
file repositories.
 > am not a software developer so compiling myself is not an option. Is 
there a binary download available for use in JavaScript?

There isn't any kind of Javascript-compatible API for Xapian I'm afraid.

 >Also, is Xapian able to index the contents of office documents?

Not itself, but the Omega application and our own Flax application can 
do so.

If you're running on Windows you could try Flax Basic:
http://www.flax.co.uk/the_software

This is a (deliberately) very simple search application, but it will let 
you index Office documents, PDFs and HTML, and gives you a basic search 
front end. If you have any questions do contact us via the Flax website.

Regards

Charlie
>
> Regards
>
> Ashley
>
> Ashley Jennings
> Fuel Management&  Indication - EDYUCC
(Continue reading)

Joost Cassee | 9 Aug 2010 14:41
Favicon

File descriptor leak (?) in Python

Hi all,

Recently I have upgraded a Python application from Xapian 1.0.7 to
1.2.2 in order to use the PostingSource class. It is a long-running
process, and I am seeing the number of open file descriptors to the
Xapian database steadily increase. I suspect what I am seeing is some
kind of resource leak.

I have no idea if it is a problem in our code or in the Xapian Python
bindings. How do I debug this problem?

Regards,
Joost

--

-- 
Joost Cassee
http://joost.cassee.net
Olly Betts | 9 Aug 2010 16:17
Favicon
Gravatar

Re: File descriptor leak (?) in Python

On Mon, Aug 09, 2010 at 02:41:03PM +0200, Joost Cassee wrote:
> Recently I have upgraded a Python application from Xapian 1.0.7 to
> 1.2.2 in order to use the PostingSource class. It is a long-running
> process, and I am seeing the number of open file descriptors to the
> Xapian database steadily increase. I suspect what I am seeing is some
> kind of resource leak.

We have machinery to report leaked file descriptors in the xapian-core
testsuite on Linux and some other platforms, so if there is a leak at
the C++ level, it is in a case that isn't covered by the testsuite, or
which is specific to certain non-Linux platforms.  Or the fd leak checking
machinery doesn't work fully!

> I have no idea if it is a problem in our code or in the Xapian Python
> bindings. How do I debug this problem?

If you're on Linux (or another platform which supports it) try:

ls -l /proc/PID/fd

where PID is the process id of your Xapian-using Python process.  This
will show all the fds it has open, and which files they are, which should
provide a clue.

Cheers,
    Olly
Joost Cassee | 9 Aug 2010 16:22
Favicon

Re: File descriptor leak (?) in Python

Hi Olly,

On Mon, Aug 9, 2010 at 16:17, Olly Betts <olly <at> survex.com> wrote:
> On Mon, Aug 09, 2010 at 02:41:03PM +0200, Joost Cassee wrote:
>> I have no idea if it is a problem in our code or in the Xapian Python
>> bindings. How do I debug this problem?
>
> If you're on Linux (or another platform which supports it) try:
>
> ls -l /proc/PID/fd
>
> where PID is the process id of your Xapian-using Python process.  This
> will show all the fds it has open, and which files they are, which should
> provide a clue.

lsof (and /proc/PID/fd) reports lots of descriptors pointing to
various files from the Xapian database. It's just that I don't know
where in the code (Python or otherwise) these files were opened (/
should have been closed).

Regards,
Joost

--

-- 
Joost Cassee
http://joost.cassee.net
Joost Cassee | 10 Aug 2010 17:35
Favicon

Re: File descriptor leak (?) in Python

Hi Michel,

On Mon, Aug 9, 2010 at 18:09, Michel Pelletier
<pelletier.michel <at> gmail.com> wrote:
> Is your code creating database objects in a loop that you are holding
> references to?  If so each object will hold an open fd to various
> database files.  If you don't close or release all references to a
> database then it will not be garbage collected and it's fds will
> remain open.

Your comment have set me on the right track. It is a problem with a
circular reference. I am trying to extract a minimal example, but in
the mean time, how can this happen:

>>> import xapian, weakref, gc
>>> test = Test()
>>> ref = weakref.ref(test)
>>> test = Test()
>>> ref()
<Test object at 0xa16890c>
>>> len(gc.get_referrers(ref()))
1
>>> len(gc.get_referrers(gc.get_referrers(ref())[0]))
1
>>> len(gc.get_referrers(gc.get_referrers(gc.get_referrers(ref())[0])[0]))
1
>>> len(gc.get_referrers(gc.get_referrers(gc.get_referrers(gc.get_referrers(ref())[0])[0])[0]))
1
>>>
gc.get_referrers(gc.get_referrers(gc.get_referrers(gc.get_referrers(ref())[0])[0])[0])[0]
(Continue reading)

Olly Betts | 11 Aug 2010 06:29
Favicon
Gravatar

Re: File descriptor leak (?) in Python

On Tue, Aug 10, 2010 at 05:35:28PM +0200, Joost Cassee wrote:
> I realize that this may no longer be a Xapian issue, but it only seems
> to happen with instances holding Xapian objects.

I don't know about the reference counting issues, but Xapian 1.2 added
a close() method to databases - if you call this then files are closed
and any write locks released, so delays in releasing objects are much less of
an issue (which is why it was added - in some languages garbage collection
happens at unpredictable times, and inadvertent references such as you seem to
have are easy to create and hard to track down).

Cheers,
    Olly
Joost Cassee | 11 Aug 2010 12:47
Favicon

Re: File descriptor leak (?) in Python

Hi all,

On Wed, Aug 11, 2010 at 06:29, Olly Betts <olly <at> survex.com> wrote:
> On Tue, Aug 10, 2010 at 05:35:28PM +0200, Joost Cassee wrote:
>> I realize that this may no longer be a Xapian issue, but it only seems
>> to happen with instances holding Xapian objects.
>
> [...] in some languages garbage collection
> happens at unpredictable times, and inadvertent references such as you seem to
> have are easy to create and hard to track down).

If I call the garbage collector myself [gc.collect()], then the
database is closed. Additionally, I cannot seem to replicate this in a
new module. I have too little time to chase down this problem, and
(although it feels like defeat) I will just break the cycle with a
weak reference...

Thanks, Olly and Michel, for thinking with me!

Regards,
Joost

--

-- 
Joost Cassee
http://joost.cassee.net
Luca Barbieri | 12 Aug 2010 10:53
Picon

thread locked while flushing to database

On a multicore Linux platform I'm running a simple c test program, to
evaluate xapian performance, and inspect advantages in multiple indexing.
I'm starting two threads, and each thread writes to his database.

   main th          -> indexing thread_1 -> db1
(dispatcher)     -> indexing thread_2 -> db2

I use sched_setaffinity to bind each indexing thread to a specific core.

During indexing phase i see both core running, but when my threads try to
flush to the databases one of them keeps working, the other thread stop the
execution (0% cpu) and stracing his pid seems that it's blocked in a futex.

Why does this happens if the 2 databases are different objects (with
different path)?
I'm doing something wrong?
This seem to happen both with xapian-core.1.1.3 and xapian-core.1.2.2, and
different gcc versions.

Here are some details, strace outpud, gdb output, and top results:
http://pastebin.com/udGQTi6K

Cheers.

--

-- 
---------------------
Luca Barbieri
William Crawford | 12 Aug 2010 11:53
Picon

Re: thread locked while flushing to database

On Thursday 12 August 2010 09:53:42 Luca Barbieri wrote:

> Here are some details, strace outpud, gdb output, and top results:
> http://pastebin.com/udGQTi6K

The output shows the other thread is blocked trying to call a thread library 
function (pthread_setcanceltype) from the bowels of libstdc++. While this 
could be a Xapian bug at heart, it's not obvious without seeing more code what 
could be causing this (it's basically a threading deadlock of some sort).

Gmane