Vincent McIntyre | 1 Nov 02:48 2005
Picon
Picon

Re: ext3 + fs > 2Tbyte

thanks for your response, Andreas.

> It sounds like you have overflowed the end of the 2TB device limit and
> clobbered the beginning of your filesystem.  This can happen if the
> SCSI driver, kernel, or even ext3 isn't handling offsets > 2^31 properly.
> I know RH has only recently started supporting ext3 filesystems > 2TB,
> and it isn't clear that all drivers handle this properly yet.

This box is using the fusion mpt drivers as in 2.6.7 - mptbase,mptscsih
etc. Do you recall any >2Tb issue being fixed in later kernels?

When the machine was last in a good state, the filesystem had 1.5Tbyte
used, ie as far as I can tell nothing would have written past 2Tb,
although I suppose there is no guarantee the space is used up in order
of increasing offset.

The filesystem was exported over NFS, and was being written to by
client machines. It is using NFSv3 (nfs-kernel-server 1.0-2woody3).
Worked great for several months.

> Please update your e2fsprogs to the latest.  You also need to use
> "e2fsck -b 32768" (or multiple thereof) for such large filesystems.
> I think newer e2fsprogs will print this message properly in that case.
>
I downloaded 1.38 from sourceforge and built it. No change in behaviour.
I tried e2fsck with block offsets from 1025 to 4194305 in steps of 1024.
I also tried dumpe2fs with the same range of offsets, also nothing.

I've attached an strace of dumpe2fs, perhaps it is helpful?

(Continue reading)

Theodore Ts'o | 1 Nov 05:46 2005
Picon
Picon

Re: What is the history of CONFIG_EXT{2,3}_CHECK?

On Mon, Oct 31, 2005 at 02:25:03PM -0700, Andreas Dilger wrote:
> On Oct 31, 2005  01:13 +0100, Adrian Bunk wrote:
> > Can anyone tell me the history of CONFIG_EXT{2,3}_CHECK?
> > 
> > There is code for a "check" option for mount if these options are 
> > enabled, but there's no way to enable them.
> 
> These are expensive debugging options, which walk the inode/block bitmaps
> for getting the group inode/block usage instead of using the group
> summary data.  Not used very often but I suspect occasionally useful for
> developers mucking with ext[23] internals.  Since it is developer-only
> code it needs to be enabled with #define CONFIG_EXT[23]_CHECK in a
> header or compile option.

It's basically a stripped down version of e2fsck pass #5, though.  Is
there any reason why this needs to be in the kernel?  If it would be
useful I could easily make a userspace implementation of these checks.

						- Ted
Andreas Dilger | 1 Nov 07:08 2005

Re: ext3 + fs > 2Tbyte

On Nov 01, 2005  12:45 +1100, Vincent.McIntyre <at> csiro.au wrote:
> >It sounds like you have overflowed the end of the 2TB device limit and
> >clobbered the beginning of your filesystem.  This can happen if the
> >SCSI driver, kernel, or even ext3 isn't handling offsets > 2^31 properly.
> >I know RH has only recently started supporting ext3 filesystems > 2TB,
> >and it isn't clear that all drivers handle this properly yet.
> 
> This box is using the fusion mpt drivers as in 2.6.7 - mptbase,mptscsih
> etc. Do you recall any >2Tb issue being fixed in later kernels?

Sorry, I don't know, I've just heard of occasional problems in this area
and very few people reporting success.

> When the machine was last in a good state, the filesystem had 1.5Tbyte
> used, ie as far as I can tell nothing would have written past 2Tb,
> although I suppose there is no guarantee the space is used up in order
> of increasing offset.

No, it is "kind of" used in increasing offset, but not strictly so.

> >Please update your e2fsprogs to the latest.  You also need to use
> >"e2fsck -b 32768" (or multiple thereof) for such large filesystems.
> >I think newer e2fsprogs will print this message properly in that case.

You might also need to add "-B 4096".

> I downloaded 1.38 from sourceforge and built it. No change in behaviour.
> I tried e2fsck with block offsets from 1025 to 4194305 in steps of 1024.
> I also tried dumpe2fs with the same range of offsets, also nothing.
> 
(Continue reading)

Vincent McIntyre | 1 Nov 14:38 2005
Picon
Picon

Re: ext3 + fs > 2Tbyte


>>> Please update your e2fsprogs to the latest.  You also need to use
>>> "e2fsck -b 32768" (or multiple thereof) for such large filesystems.
>>> I think newer e2fsprogs will print this message properly in that case.
>
> You might also need to add "-B 4096".

I gave that a try as well (and -B 8192), with the same results.

I tried to make a copy of the first part of the filesystem with dd;

    # dd if=/dev/sdb1 of=/tmp/sdb1.dd bs=1 count=16384 \
        conv=noerror,sync,notrunc

This returned a file supposedly 16384 bytes long , but it didn't make
much sense - looking at it with 'od' or 'hexdump' I get only 17 lines
of output, not the roughly 178 I get for the same exercise with a good
ext3 filesystem. (The /tmp filesystem has 128-byte inodes.)

The output appears to be just the EFI GPT partition label.

I'm starting to suspect something in the raid device is in a strange
state. Or that the whole filesystem has just totally disappeared. :(

A bit more digging in the logs found this, from the first boot when
power was reapplied
   sdb : very big device. try to use READ CAPACITY(16).
   kernel: SCSI device sdb: 4688461824 512-byte hdwr sectors (2400492 MB)
   kernel: SCSI device sdb: drive cache: write back
   kernel:  /dev/scsi/host2/bus0/target0/lun0: p1
(Continue reading)

bloch | 1 Nov 16:59 2005

Re: Recover original superblock on corrupted filesystem?

On Tue, 25 Oct 2005, bloch <at> verdurin.com wrote:

> On Tue, 25 Oct 2005, Stephen C. Tweedie wrote:
> 
> > Hi,
> > 
> > On Fri, 2005-10-21 at 15:51 +0100, bloch <at> verdurin.com wrote:
> > 
> > > It appears the original superblock is corrupted too, as it has an inode
> > > count of 0.  When I start fsck with -b 32760, it uses the alternate
> > > superblock and proceeds.  However, it restarts from the beginning a
> > > couple of times and after the second restart it doesn't use the
> > > alternate superblock, stopping instead as it can't find the original
> > > one.
> > 
> > Do you have a log of the fsck output, and which e2fsprogs version is
> > this?  Sounds like it may be an e2fsck bug if we don't honour the backup
> > superblock flag on subsequent passes.
> > 
> 
> I do have a log, yes.  It's rather large...
> 
> It's version 1.38
> 
> > > Is there a way around this, such as using one of the alternate
> > > superblocks to replace the broken one
> > 
> > Yes, "dd" of the appropriate block should work... but do this with
> > extreme care, as getting it slightly wrong will cause major havoc.
> > 
(Continue reading)

Andreas Dilger | 1 Nov 19:09 2005

Re: ext3 + fs > 2Tbyte

On Nov 02, 2005  00:37 +1100, Vincent.McIntyre <at> csiro.au wrote:
> I tried to make a copy of the first part of the filesystem with dd;
> 
>   # dd if=/dev/sdb1 of=/tmp/sdb1.dd bs=1 count=16384 \
>       conv=noerror,sync,notrunc
> 
> This returned a file supposedly 16384 bytes long , but it didn't make
> much sense - looking at it with 'od' or 'hexdump' I get only 17 lines
> of output, not the roughly 178 I get for the same exercise with a good
> ext3 filesystem. (The /tmp filesystem has 128-byte inodes.)

"od" will compress lines that are identical (usually all-zero) as "*".
If you want all the output, use -v.

> The output appears to be just the EFI GPT partition label.

The EFI GPT label can be restored from the backup (which is located
at the end of the device) so that might have happened.

> I'm starting to suspect something in the raid device is in a strange
> state. Or that the whole filesystem has just totally disappeared. :(

od -Ax -tx4 /dev/sdb1 | grep "^[0-9a-f]30 [0-9a-f]* [0-9a-f]* 000[1-3]ef53 "

should locate the ext2 superblock magic number(s) eventually.  There is
also a utility in e2fsprogs source (misc/findsuper) that is not installed
that you could build that does this more efficiently.

If those don't appear anywhere, then something dramatically bad has
happened to your filesystem.  Aliasing would only damage at most (if
(Continue reading)

bloch | 2 Nov 14:09 2005

Re: Recover original superblock on corrupted filesystem?

On Tue, 01 Nov 2005, bloch <at> verdurin.com wrote:

> 
> As an update to this, the problem seems to have re-occurred.  Here are
> the relevant error messages:
> 
> EXT3-fs error (device sdb1): ext3_new_block: Allocating block in system
> zone - block = 41484288
> Aborting journal on device sdb1.
> EXT3-fs error (device sdb1) in ext3_new_block: Journal has aborted
> ext3_abort called.
> EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted
> journal
> Remounting filesystem read-only
> EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has
> aborted
> __journal_remove_journal_head: freeing b_committed_data
> 

Another update - exactly the same problem has occurred on an identical
machine.  The disks are on a Megaraid RAID1 array.

Two other machines which only differ from the problem ones in that they
have 4G RAM instead of 8G have not shown any such symptoms.

Adam
Kent Tong | 3 Nov 02:35 2005

filesystem remounted as read only

Hi,

I'm running kernel 2.6.8-15, lvm2 v2.01.04-5 and acl v2.2.23-1 on a 
Sunblade 100 (sparc). In a few months we have experienced for several 
times that an ext3 filesystem is remounted as read-only (this is due 
to the option "errors=remount-ro" in /etc/fstab). Sometimes there is 
no error in log files but sometimes we see:

kernel: init_special_inode: bogus i_mode (3016)
kernel: init_special_inode: bogus i_mode (3125)
kernel: init_special_inode: bogus i_mode (3144)
kernel: init_special_inode: bogus i_mode (3231)
kernel: init_special_inode: bogus i_mode (3423)
kernel: init_special_inode: bogus i_mode (3452)

In the former case (no error in the logs), then running fsck will find 
no error. In the latter case, it may find some errors and fix them.

I've run smartmontools to check the disks but no errors are found.

I've run "fsck -c" to look up bad blocks but nothing is found.

What else can I do to troubleshoot the problem? In particular, the
most strange is if it is remounting as read-only, why there is no
error in the logs? Could remounting as read-only prevent it from
writing to the logs?

Thanks!
Vincent.McIntyre | 1 Nov 14:37 2005
Picon
Picon

Re: ext3 + fs > 2Tbyte

>>> Please update your e2fsprogs to the latest.  You also need to use
>>> "e2fsck -b 32768" (or multiple thereof) for such large filesystems.
>>> I think newer e2fsprogs will print this message properly in that case.
>
> You might also need to add "-B 4096".

I gave that a try as well (and -B 8192), with the same results.

I tried to make a copy of the first part of the filesystem with dd;

   # dd if=/dev/sdb1 of=/tmp/sdb1.dd bs=1 count=16384 \
       conv=noerror,sync,notrunc

This returned a file supposedly 16384 bytes long , but it didn't make
much sense - looking at it with 'od' or 'hexdump' I get only 17 lines
of output, not the roughly 178 I get for the same exercise with a good
ext3 filesystem. (The /tmp filesystem has 128-byte inodes.)

The output appears to be just the EFI GPT partition label.

I'm starting to suspect something in the raid device is in a strange
state. Or that the whole filesystem has just totally disappeared. :(

A bit more digging in the logs found this, from the first boot when
power was reapplied
  sdb : very big device. try to use READ CAPACITY(16).
  kernel: SCSI device sdb: 4688461824 512-byte hdwr sectors (2400492 MB)
  kernel: SCSI device sdb: drive cache: write back
  kernel:  /dev/scsi/host2/bus0/target0/lun0: p1
  kernel: Attached scsi disk sdb at scsi2, channel 0, id 0, lun 0
(Continue reading)

Vincent.McIntyre | 1 Nov 02:45 2005
Picon
Picon

Re: ext3 + fs > 2Tbyte

thanks for your response, Andreas.

> It sounds like you have overflowed the end of the 2TB device limit and
> clobbered the beginning of your filesystem.  This can happen if the
> SCSI driver, kernel, or even ext3 isn't handling offsets > 2^31 properly.
> I know RH has only recently started supporting ext3 filesystems > 2TB,
> and it isn't clear that all drivers handle this properly yet.

This box is using the fusion mpt drivers as in 2.6.7 - mptbase,mptscsih
etc. Do you recall any >2Tb issue being fixed in later kernels?

When the machine was last in a good state, the filesystem had 1.5Tbyte
used, ie as far as I can tell nothing would have written past 2Tb,
although I suppose there is no guarantee the space is used up in order
of increasing offset.

The filesystem was exported over NFS, and was being written to by
client machines. It is using NFSv3 (nfs-kernel-server 1.0-2woody3).
Worked great for several months.

> Please update your e2fsprogs to the latest.  You also need to use
> "e2fsck -b 32768" (or multiple thereof) for such large filesystems.
> I think newer e2fsprogs will print this message properly in that case.
>
I downloaded 1.38 from sourceforge and built it. No change in behaviour.
I tried e2fsck with block offsets from 1025 to 4194305 in steps of 1024.
I also tried dumpe2fs with the same range of offsets, also nothing.

I've attached an strace of dumpe2fs, perhaps it is helpful?

(Continue reading)


Gmane