Swapana Ghosh | 1 Oct 15:18 2007
Picon

Re: ext3 file system becoming read only

Thanks Jordi,

Yes,  we are checking everything, then only we will proceed for update the
kernel.

Thanks again

--- Jordi Prats <jprats <at> cesca.es> wrote:

> Hi Swapana,
> A update is always a good idea. On RHEL updates use to go smoothly, but 
> I have you checked your FC switch for errors on each port? You could 
> also check your SAN controllers, or run some diagnostics to be sure it's 
> not a problem on your SAN. If your active controller reboots suddenly it 
> can cause some IO errors causing your journal corruption.
> 
> regards,
> Jordi
> 
> 
> 
> Swapana Ghosh wrote:
> > Hi,
> >
> > As I explained in my first posting that the 'read-only' issue is not for
> one
> > server, it is happening for few servers which are generally 'oracle'
> database
> > oriented. Very recently it happned to an 'oracle' application server. For
> > temporary basis , we are re-mounting the file system and also doing fsck.  
(Continue reading)

Thomas Watt | 2 Oct 21:38 2007
Picon
Picon

Re: How are alternate superblocks repaired?

Hi Ted,

Ok, I think I understand now.  I was assuming the backup superblocks played a role without the intervention
of using e2fsck and were ready to be used in a standby mode when the primary superblock gets corrupted.  But,
of course, there is a very real reason to be cautious when the kernel may do things unknown to users.

My point-of-view was more flavored by something like the Multics structure marking that kept backup data
structures free from damage.  It is clear there is another strategy at work here, but one that is workable
and sufficient for the ext2/ext3 filesystem.

In case you are interested, here is link to a web page on Structure Marking:
http://www.multicians.org/thvv/marking.html

I'm so happy you sent the tip on using the e2label to correct my problem.

I've attached my script which I wrote more out of curiosity than anything else:
ca18e1eb99c1279e0298db56f43b1ab1  genallsbs.sh

Regards,

-- Tom

From:  Theodore Tso <tytso <at> mit.edu>   [Add to Address Book]
To: Thomas Watt <tango <at> tiac.net>
Cc: Andreas Dilger <adilger <at> clusterfs.com>, ext3-users <at> redhat.com
Subject: Re: How are alternate superblocks repaired?
Date: Sep 29, 2007 9:01 AM

On Sat, Sep 29, 2007 at 03:29:13AM -0400, Thomas Watt wrote:
> The only field not updated was the Filesystem state field. So, all
(Continue reading)

Nickel Cadmium | 2 Oct 23:27 2007
Picon

Bad magic number in super-block

Hi,

After a power failure, I can't mount one of my partitions anymore. Here is what I get from fsck:

--
fsck.ext3 /dev/sdb1
e2fsck 1.39 (29-May-2006)
Couldn't find ext2 superblock, trying backup blocks...
fsck.ext3: Bad magic number in super-block while trying to open /dev/sdb1

The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
--

I tried to give the suggested superblock as parameter but I get the same error message. And with dumpe2fs and tune2fs as well.
Since I can't get the backup-superblock positions with dumpe2fs, I used a block size of 1K and tried all the supposed-to-be backup superblocks but it does not help.

Is there anything I can try to mount the partition again?

Cheers,
NiCd

_______________________________________________
Ext3-users mailing list
Ext3-users <at> redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
Theodore Tso | 2 Oct 23:59 2007
Picon
Picon

Re: How are alternate superblocks repaired?

On Tue, Oct 02, 2007 at 03:38:47PM -0400, Thomas Watt wrote:
> In case you are interested, here is link to a web page on Structure Marking:
> http://www.multicians.org/thvv/marking.html

I actually have used a Multics system way back when (I was actually
logged into MIT Multics when it was finally shutdown[1]).  The com_err
library and the ss library in e2fsprogs was largely inspired from
Multics, and I do use structure magic numbers in memory to protect
against programming errors, which is basically a very simple structure
marking technique.

I'm a bit dubious about how useful simply structure matching would be
for modern Linux systems, since a large number of errors really are
silent bit flips in the data, that wouldn't be detected simply by
checking the expected structure ID at the beginning of the on-disk
object.  We are planning on adding checksum to metadata for ext4,
which will help a lot in terms of detected bad metadata.

Regards,   ("You are protected from preemption"  :-)

[1]  http://stuff.mit.edu/afs/sipb/project/eichin/sipbscan/

					- Ted
Thomas Watt | 3 Oct 05:30 2007
Picon
Picon

Re: Bad magic number in super-block

Hi Nickel Cadmium,

First, try running the command (as root): fdisk -l

That should confirm whether /dev/sdb1 is a valid filesystem partition and not a 
swap partition.  Look for an ID of 83 which identifies valid filesystem partitions.  A partition with ID of 82
is usually swap and won't have a superblock.

That said, if /dev/sdb1 is not a valid filesystem partition, then choose one
that with an ID of 83 and looks like it has the majority of space.  Then you
should be able to use: dumpe2fs -h /dev/sdb2, for example, and see if you get 
any other errors or can then successfully mount the partition.

Sometimes after a reboot, the fdisk -l command reports partitions not in
partition table order and will assign different partition names than the ones
you may normally see to the disk/partition of interest.

-- Tom
Nickel Cadmium | 3 Oct 08:48 2007
Picon

Re: Bad magic number in super-block

Hi!

Tom, thanks a lot: you solved my problem!

With fdisk -l I discovered that the partition I was trying to mount was a Windows partition. The weird thing is that /dev/sdb1 used to be a Linux partition. Thinking of it again, I had to pull apart my computer after the crash and I probably shuffled the disks around (or could the renumbering / device reassignement occur even without hardware change?).
But in short, the partition I was looking for is now in /dev/sdc1 and updating the partition table solved it all.

Thanks & cheers,
NiCd

On 10/3/07, Thomas Watt <tango <at> tiac.net> wrote:
Hi Nickel Cadmium,

First, try running the command (as root): fdisk -l

That should confirm whether /dev/sdb1 is a valid filesystem partition and not a
swap partition.  Look for an ID of 83 which identifies valid filesystem partitions.  A partition with ID of 82 is usually swap and won't have a superblock.

That said, if /dev/sdb1 is not a valid filesystem partition, then choose one
that with an ID of 83 and looks like it has the majority of space.  Then you
should be able to use: dumpe2fs -h /dev/sdb2, for example, and see if you get
any other errors or can then successfully mount the partition.

Sometimes after a reboot, the fdisk -l command reports partitions not in
partition table order and will assign different partition names than the ones
you may normally see to the disk/partition of interest.

-- Tom


_______________________________________________
Ext3-users mailing list
Ext3-users <at> redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
Bryan Kadzban | 3 Oct 13:01 2007
Picon

Re: Bad magic number in super-block


Nickel Cadmium wrote:
> (or could the renumbering / device reassignement occur even without
> hardware change?)

For SCSI, yes, it could have changed (depending on your hardware setup).
SCSI disk scanning happens in parallel, and has ever since kernel 2.6.18
or .19 or somewhere around there.  I believe it still depends on your
low-level SCSI driver though.

In any case, the sdX device names are no longer necessarily stable.
That's why udev now creates the /dev/disk/by-* trees of symlinks, whose
names are supposed to be stable.  (I'd recommend by-id myself, but it
depends on how your disks are set up.)
Theodore Tso | 3 Oct 16:52 2007
Picon
Picon

Re: Bad magic number in super-block

On Wed, Oct 03, 2007 at 07:01:08AM -0400, Bryan Kadzban wrote:
> 
> In any case, the sdX device names are no longer necessarily stable.
> That's why udev now creates the /dev/disk/by-* trees of symlinks, whose
> names are supposed to be stable.  (I'd recommend by-id myself, but it
> depends on how your disks are set up.)

The recommended way of dealing with this is to putting something like
this in your /etc/fstab:

UUID=57299143-64a5-45f3-8c3d-9b68e38247bd / ext3 defaults,errors=remount-ro 0 1

or 

LABEL=root / ext3 defaults,errors=remount-ro 0 1

Mount and fsck will automatically find the appropriate device, and
this will work even if udev changes in the future.  This approach also
will work on much older systems, including ones that are pre-udev.
(i.e, RHEL4, etc.)

Note that you can get yourself in trouble with either approach if you
have multiple filesystems with the same label or partition.  With
UUID's, that shouldn't ever happen unless you provision systems via
partition images or use dd to copy filesystems around.  If you do
this, a *really* good idea is to use the command:

      tune2fs -U random /dev/sdXX

... after you copy a filesystem image, and then use dumpe2fs -h to
determine the new UUID.  That way, each filesystem will have its own
unique filesystem.  This is especially important if you have a large
cluster of machines which access their root filesystem across a SAN
network to some large enterprise storage array.  It is a really,
really good idea to keep each filesystem image separate with its own
universally unique ID.

						- Ted
Thomas Watt | 3 Oct 19:16 2007
Picon
Picon

Re: How are alternate superblocks repaired?

Hi Ted,

That was pretty funny being "protected from preemption"!

It turns out I did discover a bug in my script that I previously sent, and have
fixed it.  Only filesystem blocksize of 2048 needs testing/verification.

Sorry for the resend - it appears my mailer decided I needed to loosen the
priviledges to send the script.

Here is the reworked script attached:
003a2b57b7d0c798b6d1044506634c3c  genallsbs.sh

Cheers,

-- Tom

-----Original Message-----
>From: Theodore Tso <tytso <at> mit.edu>
>Sent: Oct 2, 2007 5:59 PM
>To: Thomas Watt <tango <at> tiac.net>
>Cc: Andreas Dilger <adilger <at> clusterfs.com>, ext3-users <at> redhat.com
>Subject: Re: How are alternate superblocks repaired?
>
>On Tue, Oct 02, 2007 at 03:38:47PM -0400, Thomas Watt wrote:
>> In case you are interested, here is link to a web page on Structure Marking:
>> http://www.multicians.org/thvv/marking.html
>
>I actually have used a Multics system way back when (I was actually
>logged into MIT Multics when it was finally shutdown[1]).  The com_err
>library and the ss library in e2fsprogs was largely inspired from
>Multics, and I do use structure magic numbers in memory to protect
>against programming errors, which is basically a very simple structure
>marking technique.
>
>I'm a bit dubious about how useful simply structure matching would be
>for modern Linux systems, since a large number of errors really are
>silent bit flips in the data, that wouldn't be detected simply by
>checking the expected structure ID at the beginning of the on-disk
>object.  We are planning on adding checksum to metadata for ext4,
>which will help a lot in terms of detected bad metadata.
>
>Regards,   ("You are protected from preemption"  :-)
>
>[1]  http://stuff.mit.edu/afs/sipb/project/eichin/sipbscan/
>
>					- Ted
Attachment (genallsbs.sh): application/x-shellscript, 13 KiB
_______________________________________________
Ext3-users mailing list
Ext3-users <at> redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
Theodore Tso | 3 Oct 20:44 2007
Picon
Picon

Re: How are alternate superblocks repaired?

On Tue, Oct 02, 2007 at 05:59:11PM -0400, Theodore Tso wrote:
> I'm a bit dubious about how useful simply structure matching would be
> for modern Linux systems, since a large number of errors really are
    sorry, I meant to say "filesystems", not "systems" above
> silent bit flips in the data, that wouldn't be detected simply by
> checking the expected structure ID at the beginning of the on-disk
> object.  We are planning on adding checksum to metadata for ext4,
> which will help a lot in terms of detected bad metadata.

  	     	    	   	 - Ted

Gmane