Sander Pilon | 2 May 19:00 2004

Perl binding: crash & missing functions?

Hi List, 

I've been playing with Xapian the last few days, and I got a few problems
with Perl.

First of all, when I add +/- 6000 documents (small ones, avg. less then 200
words) it crashes. 
(It justs quits with "Aborted".)

When I do this is batches of 500, it doesn't. (add 500, quit process, add
another 500, etc)
Adding a flush() every few hundred documents or even closing and opening the
database doesn't help. Help? 

Anyway, the problem above isn't such a big deal. I can do it in batches of
500 docs. 

However, I want to use a boolean, unweighted query sorted on date with the
most recently added documents on the top. (Sorted on a value key) Because
the indexer crashes (see problem above) sometimes, the same document can be
present more then once in the database, so I'll want to use the
"set_collapse_key" feature.

The problem is that the Perl binding doesn't seem to support
set_collapse_key and set_sort_key. I can call them (without errors) but they
don't seem to do anything. Could it be that I'm doing something wrong
somewhere, or are these functions really not supported? (And if so, are they
going to be added?)

My code looks like this:
(Continue reading)

Olly Betts | 4 May 15:31 2004

Re: Perl binding: crash & missing functions?

On Sun, May 02, 2004 at 07:00:37PM +0200, Sander Pilon wrote:
> I've been playing with Xapian the last few days, and I got a few problems
> with Perl.
> 
> First of all, when I add +/- 6000 documents (small ones, avg. less then 200
> words) it crashes. 

This should work - I've added millions of documents in a single run from
C++ and never had a crash.

> (It justs quits with "Aborted".)

There are a couple of abort()s in the code - in cases like "this should
never happen" buffer overflows.  You might be seeing an exception which
the perl bindings aren't catching, though it's odd that the problem goes
away with smaller batches.

I think we need to see a full example indexing script (and any sample data)
to be able to track this down.

> The problem is that the Perl binding doesn't seem to support
> set_collapse_key and set_sort_key. I can call them (without errors) but they
> don't seem to do anything.

These methods aren't currently wrapped.  It's not hard to add though,
and Alex is working on the Perl bindings this week so this should be
fixed soon.

I'm suprised you don't get an error - it's bad if someone can misspell a
method name and not be told.  Do you still get no warning or error with
(Continue reading)

Alex Bowley | 4 May 17:59 2004

Re: Perl binding: crash & missing functions?

On Tue, May 04, 2004 at 02:31PM, Olly Betts wrote:
> These methods aren't currently wrapped.  It's not hard to add though,
> and Alex is working on the Perl bindings this week so this should be
> fixed soon.

A new, albeit very undertested version of Search::Xapian is up. If
someone else can confirm it compiles / tests, then I'll upload it to
CPAN.

http://hyperspeed.org/projects/perl/modules/Search-Xapian-0.8.0.0.tar.gz

--

-- 
Alex Bowley                                           http://hyperspeed.org/
"I loathe people who keep dogs. They are cowards who haven't the guts to
 bite people themselves."                                - August Strindberg
Sander Pilon | 4 May 21:58 2004

RE: Perl binding: crash & missing functions?


> -----Original Message-----
> From: Alex Bowley [mailto:alex <at> ixion.tartarus.org] On Behalf 
> Of Alex Bowley
> Sent: Tuesday, May 04, 2004 14:01
> To: Sander Pilon
> Subject: Re: [Xapian-discuss] Perl binding: crash & missing functions?
> 
> On Sun, May 02, 2004 at 07:00PM, Sander Pilon wrote:
> > First of all, when I add +/- 6000 documents (small ones, avg. less 
> > then 200
> > words) it crashes. 
> > (It justs quits with "Aborted".)
> > 
> > When I do this is batches of 500, it doesn't. (add 500, 
> quit process, 
> > add another 500, etc) Adding a flush() every few hundred 
> documents or 
> > even closing and opening the database doesn't help. Help?
> 
> Hmmm. Which version of xapian are you using? 0.8.0? 
> Seach::Xapian is 0.0.5, I assume?
> 

Correct. 

> Any chance you could mail me some sample code / input data? 
> (I'll understand if this is confidential)

Neither the code or the data is confidential. It's just the data is, well,
(Continue reading)

Sander Pilon | 4 May 22:01 2004

RE: Perl binding: crash & missing functions?


> -----Original Message-----
> From: xapian-discuss-admin <at> lists.xapian.org 
> [mailto:xapian-discuss-admin <at> lists.xapian.org] On Behalf Of Olly Betts
> Sent: Tuesday, May 04, 2004 15:31
> To: Sander Pilon
> Cc: xapian-discuss <at> lists.xapian.org
> Subject: Re: [Xapian-discuss] Perl binding: crash & missing functions?
> 
> On Sun, May 02, 2004 at 07:00:37PM +0200, Sander Pilon wrote:
> > I've been playing with Xapian the last few days, and I got a few 
> > problems with Perl.
> > 
> > First of all, when I add +/- 6000 documents (small ones, avg. less 
> > then 200
> > words) it crashes. 
> 
> This should work - I've added millions of documents in a 
> single run from C++ and never had a crash.
> 

I have no doubt this should work :) 

> > (It justs quits with "Aborted".)
> 
> There are a couple of abort()s in the code - in cases like 
> "this should never happen" buffer overflows.  

You know what they say.... "Expect the unexpected!".

(Continue reading)

Olly Betts | 5 May 01:55 2004

Re: Perl binding: crash & missing functions?

On Tue, May 04, 2004 at 09:58:17PM +0200, Sander Pilon wrote:
> Could it be unicode-related? (The documents I'm trying to index could
> contain unicode (UTF-8))

Xapian doesn't really care what is in terms or data (expect for the
stemmers of course).  It's 8-bit clean, and should also be zero byte
clean except that zero bytes take up extra room in the internal storage
scheme for terms, so a term with zero bytes can't be as long as one
without.

> Are there certain terms Xapian doesn't like?

There's a limit on the term length - is slightly over 240 bytes (I don't
recall the exact value offhand).  Each zero byte counts double so a term
of all zero bytes can't be more than just over 120 bytes.

The limit actually comes from the keys of the posting list B-tree
inside the quartz backend - for a common term, the list is split into
chunks, and these are keyed on the termname and first document id
in the chunk.

There's currently an odd effect where the exact length limit depends on
the encoded length of this document id (this should really be fixed by
enforcing a standard limit rather than letting the Btree catch it).
Perhaps that's what you're hitting, and why running the indexer multiple
times avoids the problem (because the documents are added in a different
order).

You're limiting terms with positional info to 64 characters - only URL
terms can be longer than 240-ish.  I suspect you've got a common
(Continue reading)

Sander Pilon | 5 May 09:00 2004

RE: Perl binding: crash & missing functions?


> 
> You're limiting terms with positional info to 64 characters - 
> only URL terms can be longer than 240-ish.  I suspect you've 
> got a common URL which has length between 240 and 250 
> characters.  Change the URL length check to "> 240" instead 
> of "> 512" and all should be well.

Ok, I'll try it.

> 
> If you want to index longer terms, look at the technique used 
> in omega's omindex.cc where the tail of the URL is hashed.
> 
> > (Still, no excuse for "Aborted" ... )
> 
> Indeed.  This case throws Xapian::InvalidArgumentError in C++ 
> (I just tested it to make sure).  It looks like the Perl 
> bindings only actually check for C++ exceptions when opening 
> a Database or WritableDatabase so it's probably not being 
> handled by anything which is why we end up with just "Aborted".
> 

Oops! Now... if those could be converted to just error codes in perl, or
something less destructive then "Aborted". {hint, hint ;)}

Regards,

-S
(Continue reading)

Alex Bowley | 5 May 09:49 2004

Re: Perl binding: crash & missing functions?

On Tue, May 04, 2004 at 09:58PM, Sander Pilon wrote:
> Could it be unicode-related? (The documents I'm trying to index could
> contain unicode (UTF-8))
> Are there certain terms Xapian doesn't like? (Still, no excuse for "Aborted"
> ... )

One thought - which version of perl are you using? Afaiaa versions prior
to 5.8.0 had _disticntly_ dodgy unicode support ...

--

-- 
Alex Bowley                                           http://hyperspeed.org/
"The key to immortality is first to live a life worth remembering."
                                                                 - Bruce Lee
Alex Bowley | 5 May 09:57 2004

Re: Perl binding: crash & missing functions?

On Wed, May 05, 2004 at 09:00AM, Sander Pilon wrote:
> Oops! Now... if those could be converted to just error codes in perl, or
> something less destructive then "Aborted". {hint, hint ;)}

Yeah, that's been on my todo list for a while - but it's a rather
painful, tedious task; so remains undone.

I'll see if I can get exception checking for add_posting, and as much
else of ::Document as I can done today.

--

-- 
Alex Bowley                                           http://hyperspeed.org/
"Written laws are like spider's webs; they will catch, it is true, the weak
 and the poor, but would be torn in pieces by the rich and powerful."
                                                               - Anarcharsis
Olly Betts | 5 May 12:23 2004

Re: Perl binding: crash & missing functions?

On Wed, May 05, 2004 at 08:57:57AM +0100, Alex Bowley wrote:
> I'll see if I can get exception checking for add_posting, and as much
> else of ::Document as I can done today.

This exception is actually thrown by Database::add_document - you don't
get the error until you try to add the document to the database since
different backends can have different term length limits - inmemory
has no limit for example.

Cheers,
    Olly

Gmane