Olly Betts | 7 Oct 2007 10:30
Favicon
Gravatar

Re: [Xapian-commits] 9389: trunk/xapian-core/ trunk/xapian-core/net/

On Mon, Oct 01, 2007 at 01:13:07PM +0100, richard wrote:
> Log message (3 lines):
> net/remoteconnection.cc: After calling read(), check for received
> == 0 after checking for errors, so that if an EINTR occurs in
> read, we don't report EOF instead of retrying.
> [...]
> http://xapian.org/C?9389?trunk/xapian-core/net/remoteconnection.cc

I don't think this change is correct - if read() sets errno then it
returns -1, so if read() returns 0 then it didn't set errno and any
errno value we see comes from some earlier system or library call.

Does this change fix a bug?  If so, there's probably a missing errno
check elsewhere (and also, there should really be a regression test!)

Cheers,
    Olly
Olly Betts | 7 Oct 2007 11:03
Favicon
Gravatar

Re: [Xapian-commits] 9389: trunk/xapian-core/ trunk/xapian-core/net/

On Sun, Oct 07, 2007 at 09:30:05AM +0100, Olly Betts wrote:
> On Mon, Oct 01, 2007 at 01:13:07PM +0100, richard wrote:
> > http://xapian.org/C?9389?trunk/xapian-core/net/remoteconnection.cc
> 
> I don't think this change is correct [...]

I've now noticed this change has since been backed out - sorry for the
noise.

Cheers,
    Olly
Richard Boulton | 10 Oct 2007 03:09

Something to think about

I'm planning to add multiple-database support for searches to my "Xappy" 
python wrapper (more on this wrapper later, but for now, see 
http://code.google.com/p/xappy for details).  This is reasonably 
straightforward, because Xapian supports this nicely: except that 
"Xappy" generates a "fieldname->prefix" mapping automatically.  The 
prefix which corresponds to a particular field is therefore hidden from 
the user, and crucially, it may be different in different databases.

My current plan is to add a "databaseID" term to each document, and 
construct a composite query.  For example, the search "author:foo" 
across databases with ids "db1" and "db2", where the prefix for author 
in db1 is "A" and the prefix for author in db2 is "B", would become:

  (Afoo FILTER db1) OR (Bfoo FILTER db2)

This should give the right sort of results, but the statistics for the 
terms will be a bit broken.  (Actually, I'm not totally convinced 
they'll be broken in a harmful way, because if the term is more frequent 
in one collection than another, this could correspond to it being more 
significant when it occurs in the collection in which it is less 
frequent.)  At some point it would be nice to add the ability to have a 
mapping from "human-readable field name" to "prefix code" inside xapian, 
so the multidatabase stuff could be aware of this issue and generate the 
prefixes correctly for each database.  However, that's not urgent, and 
not what I'm thinking about right now.

It would also be nice to have a "virtual" posting list, which 
effectively returned a list of all the document IDs in a particular 
database, so I didn't have to explicitly store the "databaseID" terms. 
But that's also not what I'm thinking about right now.
(Continue reading)

Kevin Duraj | 11 Oct 2007 09:21
Picon

Xapian 1.0.3 installation issues.

Xapian 1.0.3 installation issues,

I installed Xapian 1.0.3 and the search would not execute when run as
Apache user. I could run the search fine inside ssh. I rolled Xapian
to previous version 1.0.2 and the search still does not work even when
I put back the old index made by Xapian 1.0.2

... my search engine is out of work ...

Kevin Duraj
http://myhealthcare.com
Olly Betts | 11 Oct 2007 13:25
Favicon
Gravatar

Re: Xapian 1.0.3 installation issues.

On Thu, Oct 11, 2007 at 12:21:15AM -0700, Kevin Duraj wrote:
> I installed Xapian 1.0.3 and the search would not execute when run as
> Apache user. I could run the search fine inside ssh. I rolled Xapian
> to previous version 1.0.2 and the search still does not work even when
> I put back the old index made by Xapian 1.0.2

All I can suggest is to reinstall Xapian 1.0.3.  Then check if the
search is working from the web interface.  If it still doesn't, run
the "delve" utility on each of your databases from the command-line as a
user which can write to the database directory.  Then retry the search
from the web interface.

If that doesn't help, I'm afraid you'll have write a bug report which
actually contains some useful details.

Cheers,
    Olly
Olly Betts | 11 Oct 2007 17:48
Favicon
Gravatar

Re: Xapian 1.0.3 installation issues.

On Thu, Oct 11, 2007 at 12:21:15AM -0700, Kevin Duraj wrote:
> I installed Xapian 1.0.3 and the search would not execute when run as
> Apache user. I could run the search fine inside ssh. I rolled Xapian
> to previous version 1.0.2 and the search still does not work even when
> I put back the old index made by Xapian 1.0.2

Try this patch: http://oligarchy.co.uk/flint-1.0.3-readonly-db.patch

Cheers,
    Olly
Olly Betts | 12 Oct 2007 15:22
Favicon
Gravatar

Re: Something to think about

On Wed, Oct 10, 2007 at 02:09:51AM +0100, Richard Boulton wrote:
> I'm planning to add multiple-database support for searches to my "Xappy" 
> python wrapper (more on this wrapper later, but for now, see 
> http://code.google.com/p/xappy for details).  This is reasonably 
> straightforward, because Xapian supports this nicely: except that 
> "Xappy" generates a "fieldname->prefix" mapping automatically.  The 
> prefix which corresponds to a particular field is therefore hidden from 
> the user, and crucially, it may be different in different databases.

I think the simplest solution here would be to just use the user's
fieldname as the prefix.  So the "shoe_size" field could be mapped to
"XSHOE_SIZE".  You could add special handling for standard prefixes
if you wish.

If you want case sensitivity of field names, you could either just
eschew the usual Xapian scheme, or provide some sort of encoding
for the case.

> One way to fix this would be to add a flag (or similar mechanism) 
> telling a multiple database to generate composite IDs by sequentially 
> combining the databases; so DB1 might have IDs from 1 to 13498 and DB2 
> might have IDs from 13499 onwards.  [...]
> Of course, this scheme relies on the document IDs used by each database 
> being relatively compact, and would result in the document IDs in a 
> multidatabase changing each time the highest document ID in the first 
> database changed, so isn't a perfect scheme by any means.

I think it would be useful to support this in some way anyway.
Interleaving isn't a perfect solution either.  Really its main benefit
is simply that it does provide stable merged document ids even if the
(Continue reading)

James Aylett | 12 Oct 2007 17:45

Re: Something to think about

On Fri, Oct 12, 2007 at 02:22:32PM +0100, Olly Betts wrote:

> > One way to fix this would be to add a flag (or similar mechanism) 
> > telling a multiple database to generate composite IDs by sequentially 
> > combining the databases; so DB1 might have IDs from 1 to 13498 and DB2 
> > might have IDs from 13499 onwards.  [...]
> 
> I think it would be useful to support this in some way anyway.
> Interleaving isn't a perfect solution either.  Really its main benefit
> is simply that it does provide stable merged document ids even if the
> constituent databases are updated.

Could we have a mechanism where the size of each opened database is
taken into account, perhaps doubled to provide padding, and if any one
overflows its padding a new exception is raised? If that were an
optional strategy, it would work in the majority of cases (could even
be the default).

J

--

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james <at> tartarus.org                               uncertaintydivision.org
Olly Betts | 12 Oct 2007 18:27
Favicon
Gravatar

Re: Something to think about

On Fri, Oct 12, 2007 at 04:45:13PM +0100, James Aylett wrote:
> Could we have a mechanism where the size of each opened database is
> taken into account, perhaps doubled to provide padding, and if any one
> overflows its padding a new exception is raised? If that were an
> optional strategy, it would work in the majority of cases (could even
> be the default).

Obviously we could, but it wouldn't help in the common case where the
database is opened afresh by each search process.

One approach (rather a long term one) is to allow arbitrary docids -
then searching over multiple databases can simple prefix the docids from
each.  Probably the main challenge there is to maintain the ability to
store them compactly.

Cheers,
    Olly
Olly Betts | 13 Oct 2007 05:19
Favicon
Gravatar

Re: Xapian 1.0.3 installation issues.

[Please keep discussion on list]

On Thu, Oct 11, 2007 at 08:26:17PM -0700, Kevin Duraj wrote:
> How do I apply this patch ? do I just run it from console?
> ./flint-1.0.3-readonly-db.patch

It's a standard unified diff which can be applied with GNU patch.  If
you don't know how, it's not hard to find out:

http://www.google.com/search?q=applying+patches

Cheers,
    Olly

Gmane