David Arendt | 3 May 2009 00:55

nilfs_cpfile_delete_checkpoints: cannot delete block

Hi,

Until now nilfs-2.0.12 has run very stable without data corruption.
However on one partition (600G) I have got the following errors while 
running the cleaner:

nilfs_cpfile_delete_checkpoints: cannot delete block
NILFS: GC failed during preparation: cannot delete checkpoints: err=-2

This is a partition mainly holding large temporary render files (can be 
up to 25gb/file). There are currently 132702 snapshots.

As this partition is not used during the next few days, I will leave it 
with the error so if you would like me to test further things, please 
let me know.

Bye,
David Arendt
Ryusuke Konishi | 3 May 2009 10:08

Re: nilfs_cpfile_delete_checkpoints: cannot delete block

Hi David,
On Sun, 03 May 2009 00:55:43 +0200, David Arendt wrote:
> Hi,
> 
> Until now nilfs-2.0.12 has run very stable without data corruption.
> However on one partition (600G) I have got the following errors while 
> running the cleaner:
> 
> nilfs_cpfile_delete_checkpoints: cannot delete block
> NILFS: GC failed during preparation: cannot delete checkpoints: err=-2
> 
> This is a partition mainly holding large temporary render files (can be 
> up to 25gb/file). There are currently 132702 snapshots.
> 
> As this partition is not used during the next few days, I will leave it 
> with the error so if you would like me to test further things, please 
> let me know.
> 
> Bye,
> David Arendt

I have reviewed the function in question, but could not find any
likely problems.

Could you try the following patch?

It's applicable to v2.0.12.

I have some pending patches later than 2.0.12, but they seem to be
independent with your problem.
(Continue reading)

David Arendt | 3 May 2009 11:26

Re: nilfs_cpfile_delete_checkpoints: cannot delete block

Hi,

I have tried your patch.

The more verbose error message is:

nilfs_cpfile_delete_checkpoints: cannot delete block: cno=1407, range = 
[11, 75990)
NILFS: GC failed during preparation: cannot delete checkpoints: err=-2

Bye,
David Arendt

Ryusuke Konishi wrote:
> Hi David,
> On Sun, 03 May 2009 00:55:43 +0200, David Arendt wrote:
>   
>> Hi,
>>
>> Until now nilfs-2.0.12 has run very stable without data corruption.
>> However on one partition (600G) I have got the following errors while 
>> running the cleaner:
>>
>> nilfs_cpfile_delete_checkpoints: cannot delete block
>> NILFS: GC failed during preparation: cannot delete checkpoints: err=-2
>>
>> This is a partition mainly holding large temporary render files (can be 
>> up to 25gb/file). There are currently 132702 snapshots.
>>
>> As this partition is not used during the next few days, I will leave it 
(Continue reading)

Ryusuke Konishi | 3 May 2009 11:44

Re: nilfs_cpfile_delete_checkpoints: cannot delete block

Hi!
On Sun, 03 May 2009 11:26:49 +0200, David Arendt wrote:
> Hi,
> 
> I have tried your patch.
> 
> The more verbose error message is:
> 
> nilfs_cpfile_delete_checkpoints: cannot delete block: cno=1407, range = 
> [11, 75990)
> NILFS: GC failed during preparation: cannot delete checkpoints: err=-2

You didn't see any DAT warnings?

If so, do you think the range of deleting checkpoints
(i.e. 11 ~ 75990 - 1) is proper?

How is the output of lscp?

Ryusuke Konishi

> Bye,
> David Arendt
> 
> Ryusuke Konishi wrote:
> > Hi David,
> > On Sun, 03 May 2009 00:55:43 +0200, David Arendt wrote:
> >   
> >> Hi,
> >>
(Continue reading)

David Arendt | 3 May 2009 12:06

Re: nilfs_cpfile_delete_checkpoints: cannot delete block

Hi,

I didn't see any DAT warnings.

Using lscp I see that the first entry is

1428  2009-03-30 02:13:06   cp    -        259      74436

The last entry is

134128  2009-05-03 00:04:28   cp    i      81813        876

If you want to have the full output of lscp please tell me, then I will 
send it to you without sending to the mailinglist as the file has 10mb.

Bye,
David Arendt

Ryusuke Konishi wrote:
> Hi!
> On Sun, 03 May 2009 11:26:49 +0200, David Arendt wrote:
>   
>> Hi,
>>
>> I have tried your patch.
>>
>> The more verbose error message is:
>>
>> nilfs_cpfile_delete_checkpoints: cannot delete block: cno=1407, range = 
>> [11, 75990)
(Continue reading)

David Arendt | 4 May 2009 06:16

Re: nilfs_cpfile_delete_checkpoints: cannot delete block

Hi,

This night. I had lots of:

nilfs_btree_propagate: key = 67, level == 0

On the parition where cleanerd has failed.

A try to umount it resulted in a hang with the following message:

NILFS warning (device sda10): nilfs_segctor_destroy: dirty file(s) after 
the final construction

Bye,
David Arendt

Ryusuke Konishi wrote:
> Hi David,
> On Sun, 03 May 2009 00:55:43 +0200, David Arendt wrote:
>   
>> Hi,
>>
>> Until now nilfs-2.0.12 has run very stable without data corruption.
>> However on one partition (600G) I have got the following errors while 
>> running the cleaner:
>>
>> nilfs_cpfile_delete_checkpoints: cannot delete block
>> NILFS: GC failed during preparation: cannot delete checkpoints: err=-2
>>
>> This is a partition mainly holding large temporary render files (can be 
(Continue reading)

Ryusuke Konishi | 5 May 2009 13:23

Re: nilfs_cpfile_delete_checkpoints: cannot delete block

Hi David,
On Mon, 04 May 2009 06:16:24 +0200, David Arendt wrote:
> Hi,
> 
> This night. I had lots of:
> 
> nilfs_btree_propagate: key = 67, level == 0
> 
> On the parition where cleanerd has failed.

This error is related to the GC failure.

Both logs indicate that btree look-up of the 67th block on the
checkpoint file failed.

I suspect inconsistency between the block on page cache and btree; the
block was removed from the btree but were remaining on the page cache.

Could you try the following bugfix patch?

The patch ensures to clear dirty state of page and buffer after
removal of block, and would prevent the inconsistency.

Thanks in advance,
Ryusuke Konishi
--
diff --git a/fs/btnode.c b/fs/btnode.c
index 5e83c60..11a7305 100644
--- a/fs/btnode.c
+++ b/fs/btnode.c
(Continue reading)

Ryusuke Konishi | 5 May 2009 14:33
Picon
Gravatar

[PATCH 1/2] nilfs2: fix possible recovery failure due to block creation without writer

Some function calls in nilfs_prepare_segment_for_recovery() may fail
because they can create blocks on meta data files without configuring
a writable FS-instance.  Concretely, nilfs_mdt_create_block() routine
of meta data files will fail in that case.

This fixes the problem by temporarily attaching a writable FS-instace
during the function is called.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@...>
---
 fs/nilfs2/recovery.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nilfs2/recovery.c b/fs/nilfs2/recovery.c
index 4fc081e..57afa9d 100644
--- a/fs/nilfs2/recovery.c
+++ b/fs/nilfs2/recovery.c
 <at>  <at>  -407,6 +407,7  <at>  <at>  void nilfs_dispose_segment_list(struct list_head *head)
 }

 static int nilfs_prepare_segment_for_recovery(struct the_nilfs *nilfs,
+					      struct nilfs_sb_info *sbi,
 					      struct nilfs_recovery_info *ri)
 {
 	struct list_head *head = &ri->ri_used_segments;
 <at>  <at>  -421,6 +422,7  <at>  <at>  static int nilfs_prepare_segment_for_recovery(struct the_nilfs *nilfs,
 	segnum[2] = ri->ri_segnum;
 	segnum[3] = ri->ri_nextnum;

+	nilfs_attach_writer(nilfs, sbi);
(Continue reading)

Ryusuke Konishi | 5 May 2009 14:33
Picon
Gravatar

[PATCH 2/2] nilfs2: fix circular locking dependency of writer mutex

This fixes the following circular locking dependency problem:

 =======================================================
 [ INFO: possible circular locking dependency detected ]
 2.6.30-rc3 #5
 -------------------------------------------------------
 segctord/3895 is trying to acquire lock:
  (&nilfs->ns_writer_mutex){+.+...}, at: [<d0d02172>]
   nilfs_mdt_get_block+0x89/0x20f [nilfs2]

 but task is already holding lock:
  (&bmap->b_sem){++++..}, at: [<d0d02d99>]
   nilfs_bmap_propagate+0x14/0x2e [nilfs2]

 which lock already depends on the new lock.

The bugfix is done by replacing call sites of nilfs_get_writer() which
are never called from read-only context with direct dereferencing of
pointer to a writable FS-instance.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@...>
---
 fs/nilfs2/ioctl.c |    8 +++++---
 fs/nilfs2/mdt.c   |   13 +++++++------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 108d281..be387c6 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
(Continue reading)

Ryusuke Konishi | 5 May 2009 14:33
Picon
Gravatar

[PATCH 0/2] nilfs2 bug-fixes against 2.6.30-rc4

This series fixes two known problems.

The "fix possible recovery failure due to block creation without writer"
    fixes possible recovery failure on mount.

The "fix circular locking dependency of writer mutex"
    fixes a possible circular locking problem detected by lockdep check
    on the latest kernel.

The lockdep has detected more circular locking problems, but they are
not solved yet.  I will send them to linux-next for compile test
to work on other problems prior to remains.

This post is for record and review.  These are not applicable to the
standalone package (i.e. nilfs2-module) and users do not have to apply
these piece by piece; I will include them in the next release of
nilfs2-module after merged into upstream.

Thanks,
Ryusuke Konishi
--
Ryusuke Konishi (2):
      nilfs2: fix possible recovery failure due to block creation without writer
      nilfs2: fix circular locking dependency of writer mutex

 fs/nilfs2/ioctl.c    |    8 +++++---
 fs/nilfs2/mdt.c      |   13 +++++++------
 fs/nilfs2/recovery.c |    6 ++++--
 3 files changed, 16 insertions(+), 11 deletions(-)
(Continue reading)


Gmane