Olly Betts | 5 May 14:10 2004

buffered tables, sessions, and transactions

Quartz has a QuartzDiskTable class which is a thin wrapper for a pair of
Btree objects (or just one if the table is opened readonly):

http://www.xapian.org/docs/sourcedoc/html/classQuartzDiskTable.html

There's also a QuartzBufferedTable class which adds memory buffering of
changes to this:

http://www.xapian.org/docs/sourcedoc/html/classQuartzBufferedTable.html

However, as of 0.8.0 we now buffer changes to the posting lists in
QuartzWritableDatabase (in the "Private attributes" totlen_added through
to mod_plists):

http://www.xapian.org/docs/sourcedoc/html/classQuartzWritableDatabase.html

This probably removes the main advantage of having QuartzBufferedTable.
If we're adding new documents to a database, it probably doesn't help us
at all.  It probably still helps efficiency a little for scattered
updates to a database (as these will be applied sorted by key, which
is probably a small win).

However, all these changes are being buffered in memory.  We could use
that memory to cache more posting list changes, or just let the OS use
it to cache more disk blocks.

So I propose stripping out QuartzBufferedTable.  I believe any remaining
benefit it provides is small, that it uses a lot of memory which could
be better used, and that it's really just unneeded code which serves to
make quartz harder to understand and debug.
(Continue reading)

Olly Betts | 11 May 19:30 2004

Re: [Xapian-commits] Changes in xapian/xapian-applications/queryserver/ xapian/xapian-applications/queryserver/source/

On Tue, May 11, 2004 at 05:41:58PM +0100, Richard Boulton wrote:
> * If the queryserver can't parse a query, strip out all special
>   characters, and then retry.  This means that we get some kind of
>   result even if the query is broken.

That's probably a feature worth pushing down into Xapian::QueryParser...

Cheers,
    Olly
Olly Betts | 11 May 19:39 2004

Re: [Xapian-commits] Changes in xapian/xapian-bindings/ xapian/xapian-bindings/guile/ xapian/xapian-bindings/php4/ xapian/xapian-bindings/python/ xapian/xapian-bindings/tcl8/

On Tue, May 11, 2004 at 06:11:36PM +0100, Richard Boulton wrote:
>   "xapian-config --cxxflags" assumes that it doesn't need to emit
>   -I/usr/include in this situation (presumably on the basis that
>   any sane C compiler will already have that on its path).

Worse, passing -I/usr/include causes problems with some versions of GCC
on some platforms as GCC generates "fixed" versions of vendor supplied
include headers at build time.  Normally these are used instead of those
in /usr/include, but if you explicitly pass /usr/include you get the
non-fixed versions.  The most recent GCC versions ignore "-I/usr/include"
in this situation.

So that's why xapian-config carefully excludes -I/usr/include from
--cxxflags.

I'll put the above explanation as a comment in the script...

Cheers,
    Olly
Olly Betts | 11 May 23:03 2004

Re: [Xapian-commits] Changes in xapian/xapian-bindings/

On Tue, May 11, 2004 at 08:11:23PM +0100, Richard Boulton wrote:
>   Also, change tests for non-zeroness of interpreter paths to tests
>   for existence and executability of interpreter paths: this is
>   relevant if configure is passed an interpreter path by the user
>   which doesn't exist (as my debian packaging makefile just did).

A problem is that "test -x" isn't portable, at least according to the
goat book - see the bottom of this page:

http://sources.redhat.com/autobook/autobook/autobook_216.html

I don't know which platforms it isn't supported on, but this message
suggests it's not a totally academic concern:

http://mail.gnu.org/archive/html/autoconf/2002-12/msg00098.html

With your fix to use $PYTHON instead of python, test -x isn't actually
needed AFAICS - the case statement will just get empty stdout and so the
bindings will be disabled.  I tested "./configure TCLSH=/no/such/path"
and tcl8 doesn't get put in SUBDIRS in Makefile.

Are you OK with reverting this?

Cheers,
    Olly
Olly Betts | 13 May 18:55 2004

Reparsing queries (was Re: Re: [Xapian-commits] Changes in xapian/xapian-applications/queryserver/ xapian/xapian-applications/queryserver/source/)

On Tue, May 11, 2004 at 06:30:47PM +0100, Olly Betts wrote:
> On Tue, May 11, 2004 at 05:41:58PM +0100, Richard Boulton wrote:
> > * If the queryserver can't parse a query, strip out all special
> >   characters, and then retry.  This means that we get some kind of
> >   result even if the query is broken.
> 
> That's probably a feature worth pushing down into Xapian::QueryParser...

I've now done this.  It's essentially the same as the queryserver patch,
except that " <at> " is also stripped.  It seems arbitrary to leave " <at> " but
to strip other phrase generators (especially "'" as then contractions
such as "isn't" get broken up).  As it is now it works well on the
sample of real world queries from tweakers.net, whereas not stripping
*any* phrase generators seems to do slightly less well.

I think this is something to revisit after this is addressed:

http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=22

Thoughts?

Cheers,
    Olly
Alex Bowley | 13 May 19:13 2004

Re: Re: [Xapian-commits] Changes in xapian/xapian-applications/queryserver/ xapian/xapian-applications/queryserver/source/

On Tue, May 11, 2004 at 06:30PM, Olly Betts wrote:
> That's probably a feature worth pushing down into Xapian::QueryParser...

What is ::QueryParser? The name would suggest it takes strings (say of
the form 'hotel AND (beach OR "swimming pool")' and converts them into
::Query objects. Is this correct (I can't see any reference to the class
in the API docs)?

I'm doing a fair amount of hacking on the Perl bindings atm, and it
occurs to me that this would be a useful feature, even if I have to do
it in the perl wrapper. IIrc, the python bindings do something similar -
do they just use ::QueryParser under the hood?

--

-- 
Alex Bowley                                           http://hyperspeed.org/
"Written laws are like spider's webs; they will catch, it is true, the weak
 and the poor, but would be torn in pieces by the rich and powerful."
                                                               - Anarcharsis
Olly Betts | 13 May 19:21 2004

Xapian::QueryParser (was Re: Re: [Xapian-commits] Changes in xapian/xapian-applications/queryserver/ xapian/xapian-applications/queryserver/source/)

On Thu, May 13, 2004 at 06:13:35PM +0100, Alex Bowley wrote:
> On Tue, May 11, 2004 at 06:30PM, Olly Betts wrote:
> > That's probably a feature worth pushing down into Xapian::QueryParser...
> 
> What is ::QueryParser? The name would suggest it takes strings (say of
> the form 'hotel AND (beach OR "swimming pool")' and converts them into
> ::Query objects. Is this correct

Exactly.

> (I can't see any reference to the class in the API docs)?

It doesn't seem to be being picked up by doxygen correctly (I'll
investigate).  It's in the "internal" documentation though:

http://www.xapian.org/docs/sourcedoc/html/classXapian_1_1QueryParser.html

And the query syntax is documented here:

http://www.xapian.org/docs/queryparser.html

> I'm doing a fair amount of hacking on the Perl bindings atm, and it
> occurs to me that this would be a useful feature, even if I have to do
> it in the perl wrapper.

It would be nice to have it wrapped too.  Having it in a separate directory
is rather an arbitrary historical thing.

> IIrc, the python bindings do something similar - do they just use
> ::QueryParser under the hood?
(Continue reading)

James Aylett | 13 May 19:33 2004

Re: Xapian::QueryParser (was Re: Re: [Xapian-commits] Changes in xapian/xapian-applications/queryserver/ xapian/xapian-applications/queryserver/source/)

On Thu, May 13, 2004 at 06:21:37PM +0100, Olly Betts wrote:

> > I'm doing a fair amount of hacking on the Perl bindings atm, and it
> > occurs to me that this would be a useful feature, even if I have to do
> > it in the perl wrapper.
> 
> It would be nice to have it wrapped too.  Having it in a separate directory
> is rather an arbitrary historical thing.

Well, it should be an a separate directory still. It's just it
probably shouldn't be in 'extra', but 'queryparser' or something.

> > IIrc, the python bindings do something similar - do they just use
> > ::QueryParser under the hood?
> 
> I'd imagine so.

Indeed, as do any other SWIG bindings. They're in the same module as
the rest of Xapian (and as the stemming class), which is perhaps not
quite right. But SWIG is a pain to get working across multiple
modules, so I'm not even going to think about trying.

Java doesn't have wrappers for QueryParser. I guess SWIG just makes it
an awful lot easier.

J

--

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
(Continue reading)

Olly Betts | 13 May 21:59 2004

Re: Xapian::QueryParser (was Re: Re: [Xapian-commits] Changes in xapian/xapian-applications/queryserver/ xapian/xapian-applications/queryserver/source/)

On Thu, May 13, 2004 at 06:33:58PM +0100, James Aylett wrote:
> On Thu, May 13, 2004 at 06:21:37PM +0100, Olly Betts wrote:
> 
> > > I'm doing a fair amount of hacking on the Perl bindings atm, and it
> > > occurs to me that this would be a useful feature, even if I have to do
> > > it in the perl wrapper.
> > 
> > It would be nice to have it wrapped too.  Having it in a separate directory
> > is rather an arbitrary historical thing.
> 
> Well, it should be an a separate directory still. It's just it
> probably shouldn't be in 'extra', but 'queryparser' or something.

I mean the way it's a separate header and library stuck away by
themselves in the source tree.

Although having it as a separate library isn't so bad.  It would be nice
if the library could be split a bit more, at least from a conceptual
point of view - if you wanted to create a search frontend on an embedded
device, it could be desirable to be easily able to completely exclude
indexing functionality.

Under Unix, the linker should do a good job of excluding unused code
when using static libraries, and the VM should do a good job of not
loading unused code with shared libraries.  Hopefully Windows is the
same.

> > > IIrc, the python bindings do something similar - do they just use
> > > ::QueryParser under the hood?
> > 
(Continue reading)

Richard Boulton | 13 May 23:12 2004

Re: Reparsing queries (was Re: Re: [Xapian-commits] Changes in xapian/xapian-applications/queryserver/ xapian/xapian-applications/queryserver/source/)

Olly Betts wrote:
> I've now done this.  It's essentially the same as the queryserver patch,
> except that " <at> " is also stripped.

I'll pull the queryserver patch out again when I have time.

 > It seems arbitrary to leave " <at> " but
> to strip other phrase generators (especially "'" as then contractions
> such as "isn't" get broken up).  As it is now it works well on the
> sample of real world queries from tweakers.net, whereas not stripping
> *any* phrase generators seems to do slightly less well.
> 
> I think this is something to revisit after this is addressed:
> 
> http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=22

Agreed.

The thinking behind my patch was that if the queryparser fails to parse 
the query entered, it's probably because the query is actually simply a 
piece of text pasted into the search box (or generated by some other 
application).  In this case, we might well have unmatched '"' 
characters, which cause the queryserver to fail.

What we actually want to do in this situation is probably to pass the 
query to a slightly different, more tolerant, parser.  In particular, we 
probably do still want to do phrase searches on things like "Olly's" and 
"e-mail", but we don't want to pay attention to double quotes, and we 
also don't want to exclude terms which are prefixed by a '-' (or require 
terms which are prefixed by a '+').  And we always want to try and 
(Continue reading)


Gmane