tridge | 1 Feb 02:11 2010
Picon

Re: Bugfix for tdb transactions

Hi Volker,

That looks better, but I'm still not sure we've completely fixed
this. Your patch will fix the case where a 2nd process opens the tdb
during the commit of the first process, but I don't think it fixes the
following case:

  - process 1 and 2 are both connected to the tdb
  - process 1 starts a transaction
  - process 1 starts a commit
  - admin uses kill -9 on process 1, while transaction is partially
    written (or segv or similar)
  - process 2 starts a transaction, and uses corrupt data from process 1

The problem is that the transaction code assumes that in the case of a
crash that all processes accessing the tdb die (such as when the
machine as a whole crashes). That is not always true, especially if
you have an admin who likes to use kill -9.

Fixing this completely could be quite expensive (check for recover
flag on every IO??), but perhaps we could do the following:

  - when we start a transaction, check the recovery flag. If set then
    db needs recovery.

  - in tdb_transaction_recover() get a write lock from FREELIST_TOP to
    EOF if recovery is needed.

That will at least mean that in databases where writes are wrapped in
transactions that we don't compound the problem caused by the kill -9.
(Continue reading)

simo | 1 Feb 02:31 2010
Picon

Re: Bugfix for tdb transactions

On Mon, 2010-02-01 at 12:11 +1100, tridge <at> samba.org wrote:
> Hi Volker,
> 
> That looks better, but I'm still not sure we've completely fixed
> this. Your patch will fix the case where a 2nd process opens the tdb
> during the commit of the first process, but I don't think it fixes the
> following case:
> 
>   - process 1 and 2 are both connected to the tdb
>   - process 1 starts a transaction
>   - process 1 starts a commit
>   - admin uses kill -9 on process 1, while transaction is partially
>     written (or segv or similar)
>   - process 2 starts a transaction, and uses corrupt data from process 1
> 
> The problem is that the transaction code assumes that in the case of a
> crash that all processes accessing the tdb die (such as when the
> machine as a whole crashes). That is not always true, especially if
> you have an admin who likes to use kill -9.
> 
> Fixing this completely could be quite expensive (check for recover
> flag on every IO??), but perhaps we could do the following:
> 
>   - when we start a transaction, check the recovery flag. If set then
>     db needs recovery.
> 
>   - in tdb_transaction_recover() get a write lock from FREELIST_TOP to
>     EOF if recovery is needed.
> 
> That will at least mean that in databases where writes are wrapped in
(Continue reading)

tridge | 1 Feb 03:49 2010
Picon

Re: Bugfix for tdb transactions

Hi Simo,

 > For example we could make the first process to open a tdb file to do a
 > lock on a specific bit of data, so that any other opener will know the
 > file is shared.

That is what TDB_CLEAR_IF_FIRST does, for non-persistent databases.

I removed the use TDB_CLEAR_IF_FIRST in s4 quite a while ago as it is
terrible for scalability. Many OSes (including Linux) use a linear
list of locks per inode. So each time you lock/unlock a single range
in a file, it needs to walk the entire list.

If you look at http://samba.org/ftp/unpacked/junkcode/lock_scaling.c
you'll see a demo of this. On my Linux laptop I see a 1000x slowdown
in locking between having 1 lock on a file and having 10000 locks on a
file. The slowdown is much worse on some other platforms. On systems
like zLinux the VM context switching makes the cost of this list
extremely high (I've seen zLinux print servers where this locking cost
completely dominated the CPU usage of the system).

s3 still uses TDB_CLEAR_IF_FIRST for non-persistent databases, but I
wouldn't want to enshrine this in the code forever. 

(there was some discussion of changing the Linux kernel to use
red-black trees for locking to solve this scaling problem, but it
hasn't happened).

 > At the same time we set up an inotify watch to know from
 > the first process if any other process opens the tdb.
(Continue reading)

tridge | 1 Feb 05:55 2010
Picon

Re: Bugfix for tdb transactions

Hi Volker,

Rusty and I have discussed the kill -9 problem a bit more today, and
we think that we can fix that one properly and the bug you've found
at the same time.

The trick will be to read the recover offset whenever we
go from having zero total locks to one total lock. At that point we
can safely read the recovery magic and check if recovery is needed.

If it is needed then we will grab a write lock over the whole db (from
FREELIST_TOP to EOF), and re-check the recovery magic. If still set we
do a recovery, but don't do the ftruncate().

The cost of this will be 2 mmap deferences per tdb operation. We'll
need to benchmark it, but I think it will be quite cheap. It will be
more expensive if mmap is disabled of course.

Rusty is going to try working up a patch for this.

Cheers, Tridge

Volker Lendecke | 1 Feb 08:31 2010
Picon

Re: Bugfix for tdb transactions

On Mon, Feb 01, 2010 at 12:11:56PM +1100, tridge <at> samba.org wrote:
> Hi Volker,
> 
> That looks better, but I'm still not sure we've completely fixed
> this. Your patch will fix the case where a 2nd process opens the tdb
> during the commit of the first process, but I don't think it fixes the
> following case:
> 
>   - process 1 and 2 are both connected to the tdb
>   - process 1 starts a transaction
>   - process 1 starts a commit
>   - admin uses kill -9 on process 1, while transaction is partially
>     written (or segv or similar)
>   - process 2 starts a transaction, and uses corrupt data from process 1

Before that is fixed, should we commit my patch to fix the
problem that happens without the kill -9?

Volker
Rusty Russell | 1 Feb 09:19 2010
Picon

Re: Bugfix for tdb transactions

On Mon, 1 Feb 2010 06:01:12 pm Volker Lendecke wrote:
> On Mon, Feb 01, 2010 at 12:11:56PM +1100, tridge <at> samba.org wrote:
> > Hi Volker,
> > 
> > That looks better, but I'm still not sure we've completely fixed
> > this. Your patch will fix the case where a 2nd process opens the tdb
> > during the commit of the first process, but I don't think it fixes the
> > following case:
> > 
> >   - process 1 and 2 are both connected to the tdb
> >   - process 1 starts a transaction
> >   - process 1 starts a commit
> >   - admin uses kill -9 on process 1, while transaction is partially
> >     written (or segv or similar)
> >   - process 2 starts a transaction, and uses corrupt data from process 1
> 
> Before that is fixed, should we commit my patch to fix the
> problem that happens without the kill -9?

I prefer that.  I like it from a simplicity point of view, even though the
larger fix will revert it.

Cheers,
Rusty.

Volker Lendecke | 1 Feb 09:47 2010
Picon

Re: Bugfix for tdb transactions

On Mon, Feb 01, 2010 at 06:49:27PM +1030, Rusty Russell wrote:
> On Mon, 1 Feb 2010 06:01:12 pm Volker Lendecke wrote:
> > On Mon, Feb 01, 2010 at 12:11:56PM +1100, tridge <at> samba.org wrote:
> > > Hi Volker,
> > > 
> > > That looks better, but I'm still not sure we've completely fixed
> > > this. Your patch will fix the case where a 2nd process opens the tdb
> > > during the commit of the first process, but I don't think it fixes the
> > > following case:
> > > 
> > >   - process 1 and 2 are both connected to the tdb
> > >   - process 1 starts a transaction
> > >   - process 1 starts a commit
> > >   - admin uses kill -9 on process 1, while transaction is partially
> > >     written (or segv or similar)
> > >   - process 2 starts a transaction, and uses corrupt data from process 1
> > 
> > Before that is fixed, should we commit my patch to fix the
> > problem that happens without the kill -9?
> 
> I prefer that.  I like it from a simplicity point of view, even though the
> larger fix will revert it.

Can you as tdb maintainer push it then? Or shall we wait for
Tridge to give is okay?

Thanks,

Volker
(Continue reading)

Volker Lendecke | 1 Feb 10:04 2010
Picon

Re: regcomp() / regexec() available on all supported samba platforms?

On Sun, Jan 31, 2010 at 04:34:56PM +0100, Olivier Sessink wrote:
> I got a feature request to hide some files in the vfs_scannedonly  
> module, but because of the stackable feature of the VFS module I thought  
> it would be better to create a separate VFS module (vfs_hide ?).

Doesn't the "veto files" parameter fulfill your requirements?

>
> But: it requires some kind of regex support. I was wondering if  
> regexec() and regcomp() are available on all platforms?

The canonical answer to that is a configure test.

Volker
Rusty Russell | 1 Feb 10:54 2010
Picon

Re: Bugfix for tdb transactions

On Mon, 1 Feb 2010 06:49:27 pm Rusty Russell wrote:
> On Mon, 1 Feb 2010 06:01:12 pm Volker Lendecke wrote:
> > Before that is fixed, should we commit my patch to fix the
> > problem that happens without the kill -9?
> 
> I prefer that.  I like it from a simplicity point of view, even though the
> larger fix will revert it.

Actually, I changed my mind.  Here is the simplest fix:

tdb: grab entire db lock during recovery

Volker tracked down an open vs commit race which was causing corruption,
as the commit code was dropping the GLOBAL_LOCK before finishing.

And despite the comment above tdb_transaction_recover(), the open path calls
it with only GLOBAL_LOCK rather than the entire db locked.

The simple fix is to lock all the records before invoking recovery: for
this we introduce a new internal function tdb_check_recovery() which
optimistically checks the recovery area before obtaining the lock and
rechecking, and if necessary, performs recovery.

(This method avoids adding a slowdown to tdb_open, as well as allowing us
to call it in future on all write operations).

Signed-off-by: Rusty Russell <rusty <at> rustcorp.com.au>

diff --git a/lib/tdb/common/open.c b/lib/tdb/common/open.c
index 4d4f95a..cffdf55 100644
(Continue reading)

Volker Lendecke | 1 Feb 11:09 2010
Picon

Re: Bugfix for tdb transactions

On Mon, Feb 01, 2010 at 08:24:44PM +1030, Rusty Russell wrote:
> On Mon, 1 Feb 2010 06:49:27 pm Rusty Russell wrote:
> > On Mon, 1 Feb 2010 06:01:12 pm Volker Lendecke wrote:
> > > Before that is fixed, should we commit my patch to fix the
> > > problem that happens without the kill -9?
> > 
> > I prefer that.  I like it from a simplicity point of view, even though the
> > larger fix will revert it.
> 
> Actually, I changed my mind.  Here is the simplest fix:

Well, to be honest, I disagree. This changes the rules which
locks are to be taken when a recovery is done. A
semantically minimal fix would not change this.

But then, I am not the tdb author nor its maintainer.

Volker

Gmane