Kevin P. Fleming | 1 Dec 2003 15:06

Reproducable OOPS with MD RAID-5 on 2.6.0-test11

I've got a new system here with six SATA disks set up in a RAID-5 array 
(no partition tables, using the whole disks). I then used LVM2 tools to 
make the RAID array a physical volume, created a logical volume and 
formatted that volume with an XFS filesystem.

Mounting the filesystem and copying over the 2.6 kernel source tree 
produces this OOPS (and is pretty reproducable):

kernel BUG at fs/bio.c:177!
invalid operand: 0000 [#1]
CPU:    0
EIP:    0060:[<c014db9a>]    Not tainted
EFLAGS: 00010246
EIP is at bio_put+0x2c/0x36
eax: 00000000   ebx: f6221080   ecx: c1182180   edx: edcbf780
esi: c577b998   edi: 00000002   ebp: edcbf780   esp: f78ffeb0
ds: 007b   es: 007b   ss: 0068
Process md0_raid5 (pid: 65, threadinfo=f78fe000 task=f7924080)
Stack: c71e2640 c021d88d edcbf780 00000000 00000001 c1182180 00000009 
0001000
        edcbf780 00000000 00000000 00000000 c014e2fc edcbf780 00000000 
00000000
        f23a0ff0 f23a0ff0 edcbf7c0 c02ca51d edcbf780 00000000 00000000 
00000000
Call Trace:
  [<c021d88d>] bio_end_io_pagebuf+0x9a/0x138
  [<c014e2fc>] bio_endio+0x59/0x7e
  [<c02ca51d>] clone_endio+0x82/0xb5
  [<c02c0dc3>] handle_stripe+0x8f2/0xec0
  [<c02c17d1>] raid5d+0x71/0x105
(Continue reading)

Jens Axboe | 1 Dec 2003 15:11
Picon

Re: Reproducable OOPS with MD RAID-5 on 2.6.0-test11

On Mon, Dec 01 2003, Kevin P. Fleming wrote:
> I've got a new system here with six SATA disks set up in a RAID-5 array 
> (no partition tables, using the whole disks). I then used LVM2 tools to 
> make the RAID array a physical volume, created a logical volume and 
> formatted that volume with an XFS filesystem.
> 
> Mounting the filesystem and copying over the 2.6 kernel source tree 
> produces this OOPS (and is pretty reproducable):
> 
> kernel BUG at fs/bio.c:177!

It's doing a put on an already freed bio, that's really bad.

> invalid operand: 0000 [#1]
> CPU:    0
> EIP:    0060:[<c014db9a>]    Not tainted
> EFLAGS: 00010246
> EIP is at bio_put+0x2c/0x36
> eax: 00000000   ebx: f6221080   ecx: c1182180   edx: edcbf780
> esi: c577b998   edi: 00000002   ebp: edcbf780   esp: f78ffeb0
> ds: 007b   es: 007b   ss: 0068
> Process md0_raid5 (pid: 65, threadinfo=f78fe000 task=f7924080)
> Stack: c71e2640 c021d88d edcbf780 00000000 00000001 c1182180 00000009 
> 0001000
>        edcbf780 00000000 00000000 00000000 c014e2fc edcbf780 00000000 
> 00000000
>        f23a0ff0 f23a0ff0 edcbf7c0 c02ca51d edcbf780 00000000 00000000 
> 00000000
> Call Trace:
>  [<c021d88d>] bio_end_io_pagebuf+0x9a/0x138
(Continue reading)

Kevin P. Fleming | 1 Dec 2003 15:15

Re: Reproducable OOPS with MD RAID-5 on 2.6.0-test11

Jens Axboe wrote:

>>Hardware is a 2.6CGHz P4, 1G of RAM (4G highmem enabled), SMP kernel but 
>>no preemption. Kernel config is at:
> 
> 
> Are you using ide or libata as the backing for the sata drives?
> 

libata, two of the disks are on an ICH5 and the other four are on a 
Promise SATA150 TX4.

Jens Axboe | 1 Dec 2003 16:51
Picon

Re: Reproducable OOPS with MD RAID-5 on 2.6.0-test11

On Mon, Dec 01 2003, Kevin P. Fleming wrote:
> Jens Axboe wrote:
> 
> >>Hardware is a 2.6CGHz P4, 1G of RAM (4G highmem enabled), SMP kernel but 
> >>no preemption. Kernel config is at:
> >
> >
> >Are you using ide or libata as the backing for the sata drives?
> >
> 
> libata, two of the disks are on an ICH5 and the other four are on a 
> Promise SATA150 TX4.

Alright, so no bouncing should be happening. Could you boot with
mem=800m (and reproduce) just to rule it out completely?

--

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bailey, Scott | 1 Dec 2003 22:49
Picon

RE: What raid to build on that drives?

>>    I have 3 * 36GB discs and 5 * 18GB discs on two u160 scsi channels?
>>    What possibilities do I have to build a raid on them? I want to be
>>    able to survive one failed disc, but get as much space and
>>    performance out of the discs as possible.

> How about two RAID5 arrays, one with the 3*36GB=72GB and the other using
> the 5*18GB=72GB.  Then throw both of these under a RAID0 which will get
> you 144GB and still be able to survive 1 disk failure.

Here's a twisted thought, which would increase your available space, but
probably at the expense of some amount of performance degradation:

1. Partition your 36GB disks so first partition is the same size as one of
your 18GB disks, and the second partition is the remainder of the disk (also
about 18GB).

2. Create a RAID5 array containing 8*18GB disks (5 18GB disks + 3 18GB
partitions) = approx 126GB usable.

3. Create a RAID5 array containing 3*18GB partitions = approx 36 GB usable.

4. At this point, I would *NOT* recommend building a RAID0 array from (2)
and (3), as you will end up generating a lot of contention on the 36GB disks
by doing so. Either use them as separate devices, or concatenate them (using
LVM perhaps) if you really need all of the space in "one place".

This will get you a total of about 162GB usable while still being able to
survive a single disk failure.

I imagine simultaneous rebuilds of both RAID5 devices would not be a
(Continue reading)

Neil Brown | 1 Dec 2003 23:25
X-Face
Picon
Picon
Favicon

RE: What raid to build on that drives?

On Monday December 1, scott.bailey <at> eds.com wrote:
> 
> 4. At this point, I would *NOT* recommend building a RAID0 array from (2)
> and (3), as you will end up generating a lot of contention on the 36GB disks
> by doing so. Either use them as separate devices, or concatenate them (using
> LVM perhaps) if you really need all of the space in "one place".

or md/linear.

> 
> I imagine simultaneous rebuilds of both RAID5 devices would not be a
> pleasant experience. :-)
> 

It is never nice to have to rebuild an array (though it is nice that
it just works), however this case is no worse than any other.  md will
notice that the two raid5 arrays share some physical drives and will
serialise the rebuilds.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Neil Brown | 2 Dec 2003 00:06
X-Face
Picon
Picon
Favicon

Re: Reproducable OOPS with MD RAID-5 on 2.6.0-test11 - with XFS

On Monday December 1, axboe <at> suse.de wrote:
> On Mon, Dec 01 2003, Kevin P. Fleming wrote:
> > I've got a new system here with six SATA disks set up in a RAID-5 array 
> > (no partition tables, using the whole disks). I then used LVM2 tools to 
> > make the RAID array a physical volume, created a logical volume and 
> > formatted that volume with an XFS filesystem.
> > 
> > Mounting the filesystem and copying over the 2.6 kernel source tree 
> > produces this OOPS (and is pretty reproducable):
> > 
> > kernel BUG at fs/bio.c:177!
> 
> It's doing a put on an already freed bio, that's really bad.
> 

That makes 2 bug reports that seem to suggest that raid5 is calling
bi_end_io twice on the one bio. 

The other one was from Eric Jensen <ej <at> xmission.com>
with Subject: PROBLEM: 2.6.0-test10 BUG/panic in mpage_end_io_read
on  26 Nov 2003 

Both involve xfs and raid5.
I, of course, am tempted to blame xfs.....

In this case, I don't think that raid5 calling bi_end_io twice would
cause the problem as the bi_end_io that raid5 calls is  clone_end_io,
and that has an atomic_t to make sure it only calls it's bi_end_io
(bio_end_io_pagebuf) once, even if it were called multiple times itself.

(Continue reading)

Kevin P. Fleming | 2 Dec 2003 05:02

Re: Reproducable OOPS with MD RAID-5 on 2.6.0-test11

Jens Axboe wrote:

> Alright, so no bouncing should be happening. Could you boot with
> mem=800m (and reproduce) just to rule it out completely?

Tested with mem=800m, problem still occurs. Additional test was done 
without device-mapper in place, though, and I could not reproduce the 
problem! I copied > 500MB of stuff to the XFS filesystem created using 
the entire /dev/md/0 device without a single unusual message. I then 
unmounted the filesystem and used pvcreate/vgcreate/lvcreate to make a 
3G volume on the array, made an XFS filesystem on it, mounted it, and 
tried copying data over. The oops message came back.

I'm copying this message to linux-lvm; the original oops message is 
repeated below for the benefit of those list readers. I've got one more 
round of testing to do (after the array resyncs itself), which is to try 
a filesystem other than XFS.

----

kernel BUG at fs/bio.c:177!
invalid operand: 0000 [#1]
CPU:    0
EIP:    0060:[<c014db9a>]    Not tainted
EFLAGS: 00010246
EIP is at bio_put+0x2c/0x36
eax: 00000000   ebx: f6221080   ecx: c1182180   edx: edcbf780
esi: c577b998   edi: 00000002   ebp: edcbf780   esp: f78ffeb0
ds: 007b   es: 007b   ss: 0068
Process md0_raid5 (pid: 65, threadinfo=f78fe000 task=f7924080)
(Continue reading)

Mike Fedyk | 2 Dec 2003 05:15

Re: Reproducable OOPS with MD RAID-5 on 2.6.0-test11

On Mon, Dec 01, 2003 at 09:02:40PM -0700, Kevin P. Fleming wrote:
> Tested with mem=800m, problem still occurs. Additional test was done 
> without device-mapper in place, though, and I could not reproduce the 
> problem! I copied > 500MB of stuff to the XFS filesystem created using 
> the entire /dev/md/0 device without a single unusual message. I then 
> unmounted the filesystem and used pvcreate/vgcreate/lvcreate to make a 
> 3G volume on the array, made an XFS filesystem on it, mounted it, and 
> tried copying data over. The oops message came back.

Can you try with DM on regular disk tm, instead of sw raid?
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jens Axboe | 2 Dec 2003 09:27
Picon

Re: Reproducable OOPS with MD RAID-5 on 2.6.0-test11

On Mon, Dec 01 2003, Kevin P. Fleming wrote:
> Jens Axboe wrote:
> 
> >Alright, so no bouncing should be happening. Could you boot with
> >mem=800m (and reproduce) just to rule it out completely?
> 
> Tested with mem=800m, problem still occurs. Additional test was done 

Suspected as much, just wanted to make sure.

> without device-mapper in place, though, and I could not reproduce the 
> problem! I copied > 500MB of stuff to the XFS filesystem created using 
> the entire /dev/md/0 device without a single unusual message. I then 
> unmounted the filesystem and used pvcreate/vgcreate/lvcreate to make a 
> 3G volume on the array, made an XFS filesystem on it, mounted it, and 
> tried copying data over. The oops message came back.

Smells like a bio stacking problem in raid/dm then. I'll take a quick
look and see if anything obvious pops up, otherwise the maintainers of
those areas should take a closer look.

> I'm copying this message to linux-lvm; the original oops message is 
> repeated below for the benefit of those list readers. I've got one more 
> round of testing to do (after the array resyncs itself), which is to try 
> a filesystem other than XFS.

That might be a good idea, although it's not very likely to be an XFS
problem as it happens further down the io stack. It should trigger just
as happily on IDE or SCSI if that was the case.

(Continue reading)


Gmane