Andrew Morton | 1 Dec 2002 09:11

data corrupting bug in 2.4.20 ext3, data=journal


In 2.4.20-pre5 an optimisation was made to the ext3 fsync function
which can very easily cause file data corruption at unmount time.  This
was first reported by Nick Piggin on November 29th (one day after 2.4.20 was
released, and three months after the bug was merged.  Unfortunate timing)

This only affects filesystems which were mounted with the `data=journal'
option.  Or files which are operating under `chattr -j'.  So most people
are unaffected.  The problem is not present in 2.5 kernels.

The symptoms are that any file data which was written within the thirty
seconds prior to the unmount may not make it to disk.   A workaround is
to run `sync' before unmounting.

The optimisation was intended to avoid writing out and waiting on the
inode's buffers when the subsequent commit would do that anyway. This
optimisation was applied to both data=journal and data=ordered modes.
But it is only valid for data=ordered mode.

In data=journal mode the data is left dirty in memory and the unmount
will silently discard it.

The fix is to only apply the optimisation to inodes which are operating
under data=ordered.

--- linux-akpm/fs/ext3/fsync.c~ext3-fsync-fix	Sat Nov 30 23:37:33 2002
+++ linux-akpm-akpm/fs/ext3/fsync.c	Sat Nov 30 23:39:30 2002
 <at>  <at>  -63,10 +63,12  <at>  <at>  int ext3_sync_file(struct file * file, s
 	 */
 	ret = fsync_inode_buffers(inode);
(Continue reading)

Andrew Morton | 1 Dec 2002 09:52

Re: data corrupting bug in 2.4.20 ext3, data=journal

Andrew Morton wrote:
> 
> ...
> The fix is to only apply the optimisation to inodes which are operating
> under data=ordered.
> 

That "fix" didn't fix it.  Sorry about that.

Please avoid ext3/data=journal until it is sorted out.
Nick Piggin | 1 Dec 2002 13:41
Picon
Favicon

Re: data corrupting bug in 2.4.20 ext3, data=journal

Andrew Morton wrote:

>In 2.4.20-pre5 an optimisation was made to the ext3 fsync function
>which can very easily cause file data corruption at unmount time.  This
>was first reported by Nick Piggin on November 29th (one day after 2.4.20 was
>released, and three months after the bug was merged.  Unfortunate timing)
>
In fact it was reported on lkml on 18th July IIRC before 2.4.19 was
released if that is any help to you. 2.4.19 and 2.4.20 are affected
and I haven't tested previous releases. I was going to re-report it
sometime, but Alan brought it to light just the other day.

Nick

Norm Hanson | 1 Dec 2002 22:48

another idiot and ext3 - The inode is from a bad block in the inode table

i too am an idiot.. and request help
here's the sccop
 
i originally formatted the drive with 3 partitions and installed rh8
/boot
/
swap

everything worked.
my plan was to have this drive only be a data drive so i inatlled rh8 on
another drive and mounted this one on /share
so /share looked like
 /share
    /bin
    /sbin
    /mp3
    /usr
    /etc
    /dev
    and so on

so i deleted all the dirs that i did not need on /share (bin, dev, sbin,
...)
still fine.

then i think rh8 ran fsck on it and it failed. (the console said something
like interrupt failed)
since than i cannot mount it.

the disk has about 10G of mp3s on it so i'd like to get it working.
fsck reveals this...
 
[root <at> pro180 root]# fsck.ext2 -vf /dev/hdf2
e2fsck 1.27 (8-Mar-2002)
Pass 1: Checking inodes, blocks, and sizes
Error while scanning inodes (0): The inode is from a bad block in the inode table
 
  105597 inodes used (2%)
       0 non-contiguous inodes (0.0%)
         # of inodes with ind/dind/tind blocks: 0/0/0
 2538385 blocks used (26%)
       0 bad blocks
       0 large files
 
       0 regular files
       0 directories
       0 character device files
       0 block device files
       0 fifos
       1 link
       0 symbolic links (0 fast symbolic links)
       0 sockets
--------
       1 file
 
tune2fs reveals this...
[root <at> pro180 root]#  tune2fs -l /dev/hdf2
tune2fs 1.27 (8-Mar-2002)
Filesystem volume name:   /
Last mounted on:          <not available>
Filesystem UUID:          7084cbb4-053c-4763-b340-a2232265cd36
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      filetype sparse_super
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              4800512
Block count:              9600845
Reserved block count:     480042
Free blocks:              7062460
Free inodes:              4694915
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Last mount time:          Sun Nov 10 23:10:09 2002
Last write time:          Sun Dec  1 15:40:31 2002
Mount count:              0
Maximum mount count:      -1
Last checked:             Sun Dec  1 15:40:31 2002
Check interval:           0 (<none>)
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               128
[root <at> pro180 root]#
any help would be much appreciated

 
 
Norm Hanson | 2 Dec 2002 03:27

Re: another idiot and ext3 - The inode is from a bad block in the inode table

some more info...
the drive is connected to a maxtor ata 100 pci card
here is the /var/log/ from went things went wrong...
 
Nov 26 04:02:25 pro180 kernel: hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 26 04:02:36 pro180 kernel: hdf: dma_intr: error=0x00 { }
Nov 26 04:02:36 pro180 kernel: hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 26 04:02:36 pro180 kernel: hdf: dma_intr: error=0x80 { BadSector }, LBAsect=77019210, sector=3600
Nov 26 04:02:36 pro180 kernel: end_request: I/O error, dev 21:43 (hdf), sector 3600
Nov 26 04:02:36 pro180 kernel: EXT3-fs error (device ide2(33,67)): ext3_readdir: directory #2 contains a hole at offset 0
Nov 26 04:02:36 pro180 kernel: hdf: status error: status=0x11 { SeekComplete Error }
Nov 26 04:02:36 pro180 kernel: hdf: status error: error=0x00 { }
Nov 26 04:02:36 pro180 kernel: hdf: drive not ready for command
Nov 26 04:02:36 pro180 kernel: hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 26 04:02:36 pro180 kernel: hdf: dma_intr: error=0x80 { BadSector }, LBAsect=77015610, sector=0
Nov 26 04:02:36 pro180 kernel: end_request: I/O error, dev 21:43 (hdf), sector 0
Nov 26 04:02:36 pro180 kernel: hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 26 04:02:57 pro180 kernel: hdf: dma_intr: error=0x80 { BadSector }, LBAsect=5451773, sector=5242928
Nov 26 04:03:07 pro180 kernel: end_request: I/O error, dev 21:42 (hdf), sector 5242928
Nov 26 04:04:08 pro180 kernel: EXT3-fs error (device ide2(33,66)): ext3_get_inode_loc: unable to read inode block - inode=327681, block=655366
Nov 26 04:04:08 pro180 kernel: hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 26 04:04:18 pro180 kernel: hdf: dma_intr: error=0x80 { BadSector }, LBAsect=208845, sector=0
Nov 26 04:04:18 pro180 kernel: end_request: I/O error, dev 21:42 (hdf), sector 0
Nov 26 04:04:29 pro180 kernel: hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 26 04:04:29 pro180 kernel: hdf: dma_intr: error=0x00 { }
Nov 26 04:04:39 pro180 kernel: hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 26 04:04:49 pro180 kernel: hdf: dma_intr: error=0x00 { }
Nov 26 04:08:50 pro180 kernel: hdf: recal_intr: status=0x10 { SeekComplete }
Nov 26 04:08:51 pro180 kernel: hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Nov 26 04:08:51 pro180 kernel: hdf: dma_intr: error=0x00 { }
Nov 26 04:08:51 pro180 kernel: hde: DMA disabled
Nov 26 04:08:51 pro180 kernel: hdf: DMA disabled
Andrew Morton | 2 Dec 2002 08:17

Re: data corrupting bug in 2.4.20 ext3, data=journal

Nick Piggin wrote:
> 
> Andrew Morton wrote:
> 
> >In 2.4.20-pre5 an optimisation was made to the ext3 fsync function
> >which can very easily cause file data corruption at unmount time.  This
> >was first reported by Nick Piggin on November 29th (one day after 2.4.20 was
> >released, and three months after the bug was merged.  Unfortunate timing)
> >
> In fact it was reported on lkml on 18th July IIRC before 2.4.19 was
> released if that is any help to you. 2.4.19 and 2.4.20 are affected
> and I haven't tested previous releases. I was going to re-report it
> sometime, but Alan brought it to light just the other day.
> 

Are you sure?  I can't make it happen on 2.4.19.  And disabling the new
BH_Freed logic (which went into 2.4.20-pre5) makes it go away.

--- linux-akpm/fs/jbd/commit.c~a	Sun Dec  1 23:10:12 2002
+++ linux-akpm-akpm/fs/jbd/commit.c	Sun Dec  1 23:10:27 2002
 <at>  <at>  -695,7 +695,7  <at>  <at>  skip_commit: /* The journal should be un
 		 * use in a different page. */
 		if (__buffer_state(bh, Freed)) {
 			clear_bit(BH_Freed, &bh->b_state);
-			clear_bit(BH_JBDDirty, &bh->b_state);
+//			clear_bit(BH_JBDDirty, &bh->b_state);
 		}
 			
 		if (buffer_jdirty(bh)) {

_
Ralf Hildebrandt | 2 Dec 2002 08:56
Picon
Favicon

Re: how often to 'fsck -D' ?

* Stephen C. Tweedie <sct <at> redhat.com>:

> That said, I expect 2.4 htree patches to continue to be maintained.

I'd like to try them -- where are they?

--

-- 
Ralf Hildebrandt (Im Auftrag des Referat V a)   Ralf.Hildebrandt <at> charite.de
Charite Campus Mitte                            Tel.  +49 (0)30-450 570-155
Referat V a - Kommunikationsnetze -             Fax.  +49 (0)30-450 570-916
Why you can't find your system administrators:
(S)he's sitting under the desk, hysterical at what the (l)user just asked. 
Nick Piggin | 2 Dec 2002 09:26
Picon
Favicon

Re: data corrupting bug in 2.4.20 ext3, data=journal

Andrew Morton wrote:

>Nick Piggin wrote:
>> 
>> Andrew Morton wrote:
>> 
>> >In 2.4.20-pre5 an optimisation was made to the ext3 fsync function
>> >which can very easily cause file data corruption at unmount time.  This
>> >was first reported by Nick Piggin on November 29th (one day after 2.4.20 was
>> >released, and three months after the bug was merged.  Unfortunate timing)
>> >
>> In fact it was reported on lkml on 18th July IIRC before 2.4.19 was
>> released if that is any help to you. 2.4.19 and 2.4.20 are affected
>> and I haven't tested previous releases. I was going to re-report it
>> sometime, but Alan brought it to light just the other day.
>> 
>
>Are you sure?  I can't make it happen on 2.4.19.  And disabling the new
>BH_Freed logic (which went into 2.4.20-pre5) makes it go away.
>
>
>--- linux-akpm/fs/jbd/commit.c~a	Sun Dec  1 23:10:12 2002
>+++ linux-akpm-akpm/fs/jbd/commit.c	Sun Dec  1 23:10:27 2002
> <at>  <at>  -695,7 +695,7  <at>  <at>  skip_commit: /* The journal should be un
> 		 * use in a different page. */
> 		if (__buffer_state(bh, Freed)) {
> 			clear_bit(BH_Freed, &bh->b_state);
>-			clear_bit(BH_JBDDirty, &bh->b_state);
>+//			clear_bit(BH_JBDDirty, &bh->b_state);
> 		}
> 			
> 		if (buffer_jdirty(bh)) {
>
I reported the bug for 2.4.19-rc1 and 2 but I can't remember if I tested 
2.4.19
when it was released. It has an external journal on a seperate disk. I can't
really do any testing with the machine unfortunately.

Regards,
Nick

Stephen C. Tweedie | 2 Dec 2002 11:54
Picon
Favicon

Re: how often to 'fsck -D' ?

On Mon, 2002-12-02 at 07:56, Ralf Hildebrandt wrote:

> > That said, I expect 2.4 htree patches to continue to be maintained.
> 
> I'd like to try them -- where are they?

Ted posted 2.4 patches a while back, and I've got a set of ext3 patches
including those plus a few bug-fixes and other stuff, available at

	http://people.redhat.com/sct/patches/ext3-2.4/

In particular,

	http://people.redhat.com/sct/patches/ext3-2.4/dev-20021115/2.4.20-rc1.allpatches.patch

contains all of the non-debug patches in one easy-to-swallow capsule,
although it also includes some slightly experimental
performance-improving code.

Cheers,
 Stephen
Stephen C. Tweedie | 2 Dec 2002 12:08
Picon
Favicon

Re: another idiot and ext3 - The inode is from a bad block in the inode table

On Mon, 2002-12-02 at 02:27, Norm Hanson wrote:
> some more info...
> the drive is connected to a maxtor ata 100 pci card
> here is the /var/log/ from went things went wrong...
> 
> Nov 26 04:02:25 pro180 kernel: hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> Nov 26 04:02:36 pro180 kernel: hdf: dma_intr: error=0x00 { }
> Nov 26 04:02:36 pro180 kernel: hdf: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> Nov 26 04:02:36 pro180 kernel: hdf: dma_intr: error=0x80 { BadSector }, LBAsect=77019210, sector=3600
> Nov 26 04:02:36 pro180 kernel: end_request: I/O error, dev 21:43 (hdf), sector 3600

Yes, things _have_ gone wrong --- the disk is dying.  Ted posted a
message not long ago about how to start recovering from major disk
death, and I've put up a copy at

	http://people.redhat.com/sct/notes/recovery.txt

to get you started.

Cheers,
 Stephen

Gmane