Alex Deucher | 2 Aug 2005 16:38
Picon

Re: Re: strange jfs problem: disappearing/reappearing files.

On 8/1/05, Dave Kleikamp <shaggy <at> austin.ibm.com> wrote:
> On Fri, 2005-07-29 at 17:02 -0400, Arshavir Grigorian wrote:
> > GDB:
> >
> > (gdb) run /dev/vg00/lvol0
> > Starting program: /sbin/fsck.jfs /dev/vg00/lvol0
> > (no debugging symbols found)
> > (no debugging symbols found)
> > (no debugging symbols found)
> > /sbin/fsck.jfs version 1.1.7, 22-Jul-2004
> > processing started: 7/29/2005 16.59.27
> > Using default parameter: -p
> > The current device is:  /dev/vg00/lvol0
> > Block size in bytes:  4096
> > Filesystem size in blocks:  1855565824
> > **Phase 0 - Replay Journal Log
> > **Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
> >
> > Program received signal SIGBUS, Bus error.
> > 0x0002f198 in ?? ()
> > (gdb) bt
> > #0  0x0002f198 in ?? ()
> > #1  0x0002f178 in ?? ()
> > Previous frame identical to this frame (corrupt stack?)
> > (gdb)
> 
> Without the debugging symbols, this isn't very helpful.  I believe if
> you build jfsutils from the source
> (http://jfs.sourceforge.net/project/pub/jfsutils-1.1.8.tar.gz),
> fsck/jfs_fsck will be unstripped, and you may get something useful from
(Continue reading)

Dave Kleikamp | 2 Aug 2005 16:34
Picon
Favicon

Re: Re: strange jfs problem: disappearing/reappearing files.

On Mon, 2005-08-01 at 16:41 -0400, Alex Deucher wrote:
> On 7/29/05, Dave Kleikamp <shaggy <at> austin.ibm.com> wrote:
> > I think there's something about sparc64 that jfs isn't handling
> > correctly.  I've run a lot on ppc64, so I don't know what the difference
> > would be.
> > 
> 
> Just wondering if you had any ideas about this behavior.  I'm willing
> to run any tests or try any patches.  For reference it seems to be a
> sparc64 thing since we have the exact same set up on AMD64 and it
> works flawlessly (7 TB JFS volume).

I have a theory that jfs's use of 24-bit structures may be causing
alignment problems not seen on other architectures.  I have made these
patches to both the kernel and utilities to get rid of bit-fields from
our structures.  Can you give them a try?  Even if it doesn't fix the
problem, I kind of like it from a cleanup point of view.
--

-- 
David Kleikamp
IBM Linux Technology Center
Attachment (24bit.patch): text/x-patch, 7942 bytes
Attachment (utils.patch): text/x-patch, 20 KiB
Alex Deucher | 2 Aug 2005 16:43
Picon

Re: Re: strange jfs problem: disappearing/reappearing files.

On 8/2/05, Dave Kleikamp <shaggy <at> austin.ibm.com> wrote:
> On Mon, 2005-08-01 at 16:41 -0400, Alex Deucher wrote:
> > On 7/29/05, Dave Kleikamp <shaggy <at> austin.ibm.com> wrote:
> > > I think there's something about sparc64 that jfs isn't handling
> > > correctly.  I've run a lot on ppc64, so I don't know what the difference
> > > would be.
> > >
> >
> > Just wondering if you had any ideas about this behavior.  I'm willing
> > to run any tests or try any patches.  For reference it seems to be a
> > sparc64 thing since we have the exact same set up on AMD64 and it
> > works flawlessly (7 TB JFS volume).
> 
> I have a theory that jfs's use of 24-bit structures may be causing
> alignment problems not seen on other architectures.  I have made these
> patches to both the kernel and utilities to get rid of bit-fields from
> our structures.  Can you give them a try?  Even if it doesn't fix the
> problem, I kind of like it from a cleanup point of view.

I'll give them a try and get back to you.  Thanks for looking into this.

Alex

> --
> David Kleikamp
> IBM Linux Technology Center
> 
> 
>

(Continue reading)

Dave Kleikamp | 2 Aug 2005 16:46
Picon
Favicon

Re: Re: strange jfs problem: disappearing/reappearing files.

On Tue, 2005-08-02 at 10:43 -0400, Alex Deucher wrote:
> On 8/2/05, Dave Kleikamp <shaggy <at> austin.ibm.com> wrote:
> > I have a theory that jfs's use of 24-bit structures may be causing
> > alignment problems not seen on other architectures.  I have made these
> > patches to both the kernel and utilities to get rid of bit-fields from
> > our structures.  Can you give them a try?  Even if it doesn't fix the
> > problem, I kind of like it from a cleanup point of view.
> 
> I'll give them a try and get back to you.  Thanks for looking into this.

The gdb output you sent me tells me something else is amiss.  I'll let
you know what I find out.

Thanks,
Shaggy
--

-- 
David Kleikamp
IBM Linux Technology Center

-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
Alex Deucher | 2 Aug 2005 16:51
Picon

Re: Re: strange jfs problem: disappearing/reappearing files.

On 8/2/05, Dave Kleikamp <shaggy <at> austin.ibm.com> wrote:
> On Tue, 2005-08-02 at 10:43 -0400, Alex Deucher wrote:
> > On 8/2/05, Dave Kleikamp <shaggy <at> austin.ibm.com> wrote:
> > > I have a theory that jfs's use of 24-bit structures may be causing
> > > alignment problems not seen on other architectures.  I have made these
> > > patches to both the kernel and utilities to get rid of bit-fields from
> > > our structures.  Can you give them a try?  Even if it doesn't fix the
> > > problem, I kind of like it from a cleanup point of view.
> >
> > I'll give them a try and get back to you.  Thanks for looking into this.
> 
> The gdb output you sent me tells me something else is amiss.  I'll let
> you know what I find out.
> 

Sounds good.  I'll hold off on the patches then.

Alex

> Thanks,
> Shaggy
> --
> David Kleikamp
> IBM Linux Technology Center
> 
>

-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
(Continue reading)

Will | 3 Aug 2005 00:38
Picon
Favicon

Recover/Repair JFS raid setup

Recently when adding another few drives to my raid
card I carelessly moved the position of a number of
drives in a 5 drive raid 5 array. I spent quite a bit
of time determining the proper order for the drives,
and I am 95% sure I have it correct. I know for
certain the first drive is in the correct position
because I can again see the partition table. I can
not, however, access the file system.

I figured if I ran a read-only JFS fsck I could
determine if all of the drives are in the proper
order. However, when I try to run the fsck I get an
error saying that if the device described contains a
jfs filesystem my primary and secondary superblocks
must be corrupted.

I did a little research (I am fairly new to linux) and
found the jfs_debugfs command and used it to
investigate the file system. I am not really sure what
I proper jfs file system superblock should look like,
so I was hoping someone might be able to tell me what
could be wrong and how I could fix it. The ouput is
below:

jfs_debugfs /dev/sda1
jfs_debugfs version 1.1.7, 22-Jul-2004

Aggregate Block Size: 4352

> su
(Continue reading)

evilninja | 3 Aug 2005 15:22
Picon

Re: Recover/Repair JFS raid setup


Will schrieb:
> order. However, when I try to run the fsck I get an
> error saying that if the device described contains a
> jfs filesystem my primary and secondary superblocks
> must be corrupted.

so, does it (fsck) offer fixing it?
if so, did you try to let jfs_fsck fix it, did it work?

and, perhaps most important: is the raid in sync again?

Christian.
--
BOFH excuse #157:

Incorrect time synchronization
Dave Kleikamp | 3 Aug 2005 16:02
Picon
Favicon

Re: Recover/Repair JFS raid setup

On Tue, 2005-08-02 at 15:38 -0700, Will wrote:
> Recently when adding another few drives to my raid
> card I carelessly moved the position of a number of
> drives in a 5 drive raid 5 array. I spent quite a bit
> of time determining the proper order for the drives,
> and I am 95% sure I have it correct. I know for
> certain the first drive is in the correct position
> because I can again see the partition table. I can
> not, however, access the file system.
> 
> I figured if I ran a read-only JFS fsck I could
> determine if all of the drives are in the proper
> order. However, when I try to run the fsck I get an
> error saying that if the device described contains a
> jfs filesystem my primary and secondary superblocks
> must be corrupted.
> 
> I did a little research (I am fairly new to linux) and
> found the jfs_debugfs command and used it to
> investigate the file system. I am not really sure what
> I proper jfs file system superblock should look like,
> so I was hoping someone might be able to tell me what
> could be wrong and how I could fix it. 

The superblock looks very bad.  It almost has no resemblance to the jfs
superblock, except that the s_magic and version fields are almost
correct.  What's strange is that the s_magic field should be 'JFS1' (all
capitals).  Nearly everything else looks completely wrong.

> The ouput is
(Continue reading)

Will | 3 Aug 2005 22:12
Picon
Favicon

Re: Recover/Repair JFS raid setup

> On Tue, 2005-08-02 at 15:38 -0700, Will wrote:
> > Recently when adding another few drives to my raid
> > card I carelessly moved the position of a number
> of
> > drives in a 5 drive raid 5 array. I spent quite a
> bit
> > of time determining the proper order for the
> drives,
> > and I am 95% sure I have it correct. I know for
> > certain the first drive is in the correct position
> > because I can again see the partition table. I can
> > not, however, access the file system.
> > 
> > I figured if I ran a read-only JFS fsck I could
> > determine if all of the drives are in the proper
> > order. However, when I try to run the fsck I get
> an
> > error saying that if the device described contains
> a
> > jfs filesystem my primary and secondary
> superblocks
> > must be corrupted.
> > 
> > I did a little research (I am fairly new to linux)
> and
> > found the jfs_debugfs command and used it to
> > investigate the file system. I am not really sure
> what
> > I proper jfs file system superblock should look
> like,
(Continue reading)

Dave Kleikamp | 3 Aug 2005 22:39
Picon
Favicon

Re: Recover/Repair JFS raid setup

On Wed, 2005-08-03 at 13:12 -0700, Will wrote:

> > The beginning of the superblock has enough
> > resemblance to a correct
> > superblock to make it look like you have the right
> > blocks, but there is
> > too much wrong.  I know very little about raid. 
> > Could you be using
> > raid3 or raid7?  I think the use of byte-size
> > striping may explain the
> > problem if all 5 disks aren't ordered properly.
> > 
> 
> Alright, if everything look pretty much wrong, then I
> image the disks are not in the correct order. I know
> for certain that the drives were a raid 5, and they
> have the proper stripe size (128K), and I am certain I
> have the proper first disk (the partition table only
> shows us when one specific disk is in a certain
> position). I guess what I will do is try each of the
> 24 different combinations and record the primary and
> secondary superblock data for each configuration.

I'm trying to learn a little about raid and figure out why some of the
data in the superblock seems close to the right data.  What if instead
of the proper block from disk x, you are looking at the parity block on
disk y instead?  If the other 3 blocks which determine the parity are
mostly zeroed, some fields may be close enough to the 4th block to be
recognizable, but there is enough data there to mess up the rest of the
block.  This is just a silly theory, but it seems possible.
(Continue reading)


Gmane