Mark Hammond | 1 May 04:36
Picon
Favicon
Gravatar

multiple writers and remotetcp backends

Hi guys,
  We've been in discussion with Richard and Olly on this issue, in various
different forums, but as the correct answer isn't immediately obvious, I'm
opening it up for wider discussion and comment.

The problem is that a xapian tcp-server in 'writable' mode makes no attempt
to ensure only one 'active' connection at a time is trying to modify the
database.  If multiple connections are made to a writable server, the
behaviour is undefined (or even it is was defined, it is unlikely to be
defined in a way that would make it useful).  While some applications can
ensure internally that only a single connection is made to such a server,
some applications are architected such that multiple processes, possibly
even multiple machines, must coordinate this "single writer" approach.  This
becomes quite difficult without support inside xapian itself.

It seems there are 2 general solutions we can implement:

* Only ever allow a single connection to the writable server.  When a second
connection is attempted, we either refuse the connection, or allow the
connection just to send back an authorative 'writer already connected'
response, and then close the connection.

* Implement a kind of 'queue' or some other way to block the incoming
connection.  In this case we would accept the connection, respond with a
message indicating they are in a queue (your call is important!) and then
block until the first writer is complete.  The client side of the connection
then has a choice regarding waiting in the queue, or hanging up and trying
again later.

In my opinion, the second option sounds the "best", but the first option
(Continue reading)

Richard Boulton | 1 May 10:27

Re: multiple writers and remotetcp backends

Mark Hammond wrote:
> It seems there are 2 general solutions we can implement:
> 
> * Only ever allow a single connection to the writable server.  When a second
> connection is attempted, we either refuse the connection, or allow the
> connection just to send back an authorative 'writer already connected'
> response, and then close the connection.
> 
> * Implement a kind of 'queue' or some other way to block the incoming
> connection.  In this case we would accept the connection, respond with a
> message indicating they are in a queue (your call is important!) and then
> block until the first writer is complete.  The client side of the connection
> then has a choice regarding waiting in the queue, or hanging up and trying
> again later.
 >
> In my opinion, the second option sounds the "best", but the first option
> seems "good enough" and easier to implement.

I agree. If we implement the first solution, I imagine that many clients 
will need to emulate the behaviour of the second solution by repeatedly 
opening a connection to poll for the lock.  While the first option is 
slightly simpler to implement, I think that the majority of the 
implementation work is likely to be in enforcing the single-writer 
constraint.  I recommend attempting to implement the first solution 
first anyway, since the second solution requires everything that it 
does: once we've got reliable checking of the locks on the connection, 
we can implement a queue on top of that.

One thing which both of these will probably need is a change to the 
remote protocol to allow a connection to specify whether it is writable 
(Continue reading)

Mark Hammond | 1 May 11:29
Picon
Favicon
Gravatar

RE: multiple writers and remotetcp backends

> One thing which both of these will probably need is a change to the
> remote protocol to allow a connection to specify whether it
> is writable
> or readonly at the time the connection is opened.  This would
> allow the
> lock on the database to be checked and obtained at this point for
> writable database, rather than waiting for the connection to
> attempt a
> write operation.

Actually, there is yet another shortcut we can probably take: run 2 servers,
with one being writable.  The writable server can then enforce a simple "one
connection only" instead of "one writable connection".  Is there any reason
that will not work?

> I'm not sure what happens for the windows variant of the
> server (which
> is a threaded implementation), but I imagine that there the same
> instance of the database is accessed by each connection: if this is
> correct, there are likely to be other problems since the
> database is not safe for concurrent access.  A threaded implementation
needs
> to create a new instance of the database for each thread.

Yes, that is likely a problem - I'll open a bug on that too.

> Opening a bug sooner than later would probably be advisable,
> so that the
> history of this discussion is easy to find in future.
>
(Continue reading)

Richard Boulton | 1 May 11:31

Re: multiple writers and remotetcp backends

Mark Hammond wrote:
> Actually, there is yet another shortcut we can probably take: run 2 servers,
> with one being writable.  The writable server can then enforce a simple "one
> connection only" instead of "one writable connection".  Is there any reason
> that will not work?

I can't see any reason.

>> In particular, discussion of this bug may make it clear whether it
>> should block the 1.0.0 release, and if so we
> 
> I assume we lost something at the end there - but yet, I'll open the bugs in
> the next hour or so

I meant to finish by saying "if so, we can add the bugs as blockers for 
bug #118".  Thanks.

--

-- 
Richard
Olly Betts | 1 May 13:49
Favicon
Gravatar

Re: multiple writers and remotetcp backends

On Tue, May 01, 2007 at 12:36:54PM +1000, Mark Hammond wrote:
> The problem is that a xapian tcp-server in 'writable' mode makes no attempt
> to ensure only one 'active' connection at a time is trying to modify the
> database.  If multiple connections are made to a writable server, the
> behaviour is undefined (or even it is was defined, it is unlikely to be
> defined in a way that would make it useful).

I'd not appreciated this happened from the previous discussion - this
is certainly a bug.  I understood the issue was just that of trying to
marshal multiple processes wanting to write to the same remote server
in a sane way.

Looking at the code, I believe it's also wrong that we open the database
and then fork multiple processes which can make use of it, even for a
read-only Database.  We certainly don't promise that you can use the same
Xapian object from different threads.  I think similar rules ought to
apply over fork.

But this matters much more for writers - with the current backends, it
happens to work OK for readers I think.

So we should probably leave the reader issue for now, as it can be fixed
without API or ABI changes, but fix the writer issue.

Cheers,
    Olly
Richard Boulton | 1 May 14:42

Re: multiple writers and remotetcp backends

Olly Betts wrote:
> On Tue, May 01, 2007 at 12:36:54PM +1000, Mark Hammond wrote:
>> The problem is that a xapian tcp-server in 'writable' mode makes no attempt
>> to ensure only one 'active' connection at a time is trying to modify the
>> database.  If multiple connections are made to a writable server, the
>> behaviour is undefined (or even it is was defined, it is unlikely to be
>> defined in a way that would make it useful).
> 
> I'd not appreciated this happened from the previous discussion - this
> is certainly a bug.  I understood the issue was just that of trying to
> marshal multiple processes wanting to write to the same remote server
> in a sane way.

I thought that was the problem too until looking at the code this morning.

> Looking at the code, I believe it's also wrong that we open the database
> and then fork multiple processes which can make use of it, even for a
> read-only Database.  We certainly don't promise that you can use the same
> Xapian object from different threads.  I think similar rules ought to
> apply over fork.

Yes - I would imagine that the behaviour over fork() is system dependent 
- it works okay for readers on Linux, but may well not on other platforms.

> But this matters much more for writers - with the current backends, it
> happens to work OK for readers I think.

I'm dubious that it will work correctly under load even with just 
readers on Windows, having looked at the code, due to it being a 
threaded rather than a forking model.
(Continue reading)

Olly Betts | 1 May 15:30
Favicon
Gravatar

Re: multiple writers and remotetcp backends

On Tue, May 01, 2007 at 01:42:08PM +0100, Richard Boulton wrote:
> Olly Betts wrote:
> >Looking at the code, I believe it's also wrong that we open the database
> >and then fork multiple processes which can make use of it, even for a
> >read-only Database.  We certainly don't promise that you can use the same
> >Xapian object from different threads.  I think similar rules ought to
> >apply over fork.
> 
> Yes - I would imagine that the behaviour over fork() is system dependent 
> - it works okay for readers on Linux, but may well not on other platforms.

I believe it should be OK currently.  But (for example) it will be a
problem when we start trying to lock the revisions which readers are
currently using, so we want to avoid people relying on it working.

> >But this matters much more for writers - with the current backends, it
> >happens to work OK for readers I think.
> 
> I'm dubious that it will work correctly under load even with just 
> readers on Windows, having looked at the code, due to it being a 
> threaded rather than a forking model.

Yes, that could well be a problem already.

> >So we should probably leave the reader issue for now, as it can be fixed
> >without API or ABI changes, but fix the writer issue.
> 
> I imagine that a simple fix would be to open the database when a new 
> connection comes in, instead of opening the database when the server is 
> started and passing it to each sub-process (or sub-thread on windows). 
(Continue reading)

Richard Boulton | 4 May 14:35

Last minute feature for 1.0.0

I'd like to draw people's attention to bug report #143 that I've just 
submitted.  This is a proposal (and patch) to add the ability to store 
arbitrary metadata associated with a database (rather than with an 
individual document in the database).  The rationale for this feature is 
explained more fully in the bug report, but briefly I've come across 
several situations where I need to store information about how a 
database has been indexed, or which version of an external database 
corresponds to a xapian database, and it needs to be possible to ensure 
that modifications to this information are transactionally modified 
together with the Xapian database contents.

If at all possible, I'd like to get this into version 1.0.0, although I 
appreciate that it's very late in the day for such changes.  Partly, my 
desire to do this is because I want to use it now, but it's also because 
the feature requires a minor change to the flint database format, so now 
seems a good time to apply it (since people are going to have to rebuild 
their databases to use 1.0.0 anyway).

The patch only adds the feature to the flint and inmemory databases, but 
I think that adding it to remote databases wouldn't be hard at all.  I 
don't think it's worth adding it to quartz databases.

It's probably best to add further comments to the bug report at:
http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=143

--

-- 
Richard
Olly Betts | 4 May 16:20
Favicon
Gravatar

Re: Last minute feature for 1.0.0

On Fri, May 04, 2007 at 01:35:27PM +0100, Richard Boulton wrote:
> If at all possible, I'd like to get this into version 1.0.0, although I 
> appreciate that it's very late in the day for such changes.  Partly, my 
> desire to do this is because I want to use it now, but it's also because 
> the feature requires a minor change to the flint database format, so now 
> seems a good time to apply it (since people are going to have to rebuild 
> their databases to use 1.0.0 anyway).

I understand your motivation for wanting this in 1.0.0, but I'm afraid
the time really has passed for crowbarring random new features into the
release.  I've already said "not until after 1.0.0" to patches submitted
weeks or months ago!

We were hoping to release 1.0.0 in April, and it's already May 4th!
Each new feature diverts us from addressing the issues we've already
identified as release blockers, and has the risk of introducing new
bugs which we'll have to address before the release.  Or identify the
problematic change and back it out, which also takes time.

In fact, after a quick look at the previous change in this direction
(rev 8426) I'm a little dubious about including even that in the
release.  It's mostly a step in the right direction, but it doesn't
appear to fix any actual bugs, and for a WritableDatabase, you're now
carefully setting the total_length and lastdocid members of the
database_ro member when the database is opened, but AFAICS they'll never
get updated.  I hope they never get used, but if they do that's a new
bug!

Also, we now always retrieve and parse the metainfo record, even if we
never use the information - if you're using a weighting scheme which
(Continue reading)

Richard Boulton | 4 May 16:38

Re: Last minute feature for 1.0.0

Olly Betts wrote:
> I understand your motivation for wanting this in 1.0.0, but I'm afraid
> the time really has passed for crowbarring random new features into the
> release.  I've already said "not until after 1.0.0" to patches submitted
> weeks or months ago!

Fair enough - I just thought I'd ask, since the feature requires an 
incompatible database change.

> We were hoping to release 1.0.0 in April, and it's already May 4th!

Indeed.

> In fact, after a quick look at the previous change in this direction
> (rev 8426) I'm a little dubious about including even that in the
> release.  It's mostly a step in the right direction, but it doesn't
> appear to fix any actual bugs, and for a WritableDatabase, you're now
> carefully setting the total_length and lastdocid members of the
> database_ro member when the database is opened, but AFAICS they'll never
> get updated.  I hope they never get used, but if they do that's a new
> bug!

I don't _think_ there's a bug here, but I understand your worries.  I 
think I'll pull the patch back out and merge it with the patch in bug 
#143 for later application.

> Also, we now always retrieve and parse the metainfo record, even if we
> never use the information - if you're using a weighting scheme which
> ignores document length, I think that means that the first search on a
> cold database will now be slower.  At least that's not the default
(Continue reading)


Gmane