Jean-Francois Dockes | 11 Mar 19:15
Picon

Duplicate docids in mset

Hello,

I seem to be seeing a (very unfrequent) case where I get the same document
twice inside a result list (same xapian docid as last and first entries of
consecutive msets).

This was reported by a user running tests on a "CACM test collection and
evaluating the performance with trec eval software", not under normal
usage, where it would probably not be noticed.

Is there a known condition that would cause this ? Is it worth investigating ?

Regards,

J.F. Dockes
Olly Betts | 11 Mar 21:54
Favicon
Gravatar

Re: Duplicate docids in mset

On Tue, Mar 11, 2008 at 07:15:07PM +0100, Jean-Francois Dockes wrote:
> I seem to be seeing a (very unfrequent) case where I get the same document
> twice inside a result list (same xapian docid as last and first entries of
> consecutive msets).
> 
> This was reported by a user running tests on a "CACM test collection and
> evaluating the performance with trec eval software", not under normal
> usage, where it would probably not be noticed.
> 
> Is there a known condition that would cause this ? Is it worth investigating ?

Assuming that the database hasn't been modified between the two
searches, this shouldn't happen - the "split" results should be the same
as the "unsplit".

So it would be interesting to find out what's causing this.  Is it
repeatable with the same query on the same data?

Cheers,
    Olly
Jean-Francois Dockes | 12 Mar 10:48
Picon

Re: Duplicate docids in mset

Olly Betts writes:
 > On Tue, Mar 11, 2008 at 07:15:07PM +0100, Jean-Francois Dockes wrote:
 > > I seem to be seeing a (very unfrequent) case where I get the same document
 > > twice inside a result list (same xapian docid as last and first entries of
 > > consecutive msets).
 > 
 > Assuming that the database hasn't been modified between the two
 > searches, this shouldn't happen - the "split" results should be the same
 > as the "unsplit".
 > 
 > So it would be interesting to find out what's causing this.  Is it
 > repeatable with the same query on the same data?

Yes it is repeatable with the same query on a readonly index. Unfortunately
I tried to reproduce it with "quest" but I can't, only with Recoll (can be
done with the command line interface).

I placed the data used to reproduce the problem in:

http://www.lesbonscomptes.com/recoll/repeatDocid.tgz

There is a README with the data, with more instructions and explanations.

I see this as a really minor issue, I am not sure it's worth a lot of effort
on your side. However, I am at your disposition for explaining or tweaking
how Recoll calls Xapian if needed.

Regards,
J.F. Dockes
(Continue reading)

Picon
Gravatar

ruby bindings

Hello. I've improved the Xapian Ruby bindings by adding aliasing a few
methods and making the code and examples more in line with Ruby
conventions. I also wrote a simple script to generate a skeleton so
that rdoc can automatically document some of the SWIG code to a
limited extent. I used a template I find somewhat nicer than the
default, but if you don't like it, it can easily be changed. The
generated html documentation is included. There is a rakefile for
generating the documentation, which most Ruby programmers would use,
although I provided the identical command in generate_rdocs.sh. Also
attached, I ported Enrico Zini's apt-xapian-index to Ruby from Python.
It looks like he hasn't gotten around to adding that to the Debian SVN
repository, so I've provided it here that it might also be used as
examples. I hope to use Xapian more in the future, thank you for
providing an excellent application.

Daniel Brumbaugh Keeney
Attachment (ruby-xapian.tar.gz): application/x-gzip, 202 KiB
_______________________________________________
Xapian-devel mailing list
Xapian-devel <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-devel
Olly Betts | 30 Mar 16:29
Favicon
Gravatar

Re: ruby bindings

On Sat, Mar 29, 2008 at 05:46:00AM -0500, Daniel Brumbaugh Keeney wrote:
> Hello. I've improved the Xapian Ruby bindings by adding aliasing a few
> methods and making the code and examples more in line with Ruby
> conventions.

Excellent.  I don't speak ruby fluently, so most of the oddness is
probably my fault.

Can you supply a patch for the changed files?  It's rather fiddly to
reconstruct one from the list of filenames and SVN revisions (and I
don't know which branch to look at either).  Just "svn diff" in a
modified SVN checkout will do.

Cheers,
    Olly

Gmane