Kristleifur Daðason | 1 Nov 08:17 2009
Picon

Re: unbelievably bad performance: 2.6.27.37 and raid6

On Sat, Oct 31, 2009 at 11:55 PM, Jon Nelson
<jnelson-linux-raid <at> jamponi.net> wrote:
>
> I have a 4 disk raid6. The disks are individually capable of (at
> least) 75MB/s on average.
> [...]
>
> While rsyncing a file from an ext3 filesystem to a jfs filesystem, I
> am observing speeds in the 10-15MB/s range.
> That seems really really slow.

Hi,
Is the system unresponsive and laggy while you're doing this copy?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thomas Fjellstrom | 1 Nov 20:16 2009
Picon

raid disk failure, options?

My main raid array just had a disk failure. I tried to hot remove the 
device, and use the scsi bus rescan sysfs entries, but it seems to fail on 
IDENTIFY.

Can I assume my disk is dead?

[5015721.851044] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0                                                                                                                                                                  
[5015721.851089] ata3.00: irq_stat 0x40000001                                                                                                                                                                                               
[5015721.851124] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0                                                                                                                                                                     
[5015721.851125]          res 71/04:03:80:01:32/00:00:00:00:00/e0 Emask 0x1 
(device error)                                                                                                                                                  
[5015721.851193] ata3.00: status: { DRDY DF ERR }                                                                                                                                                                                           
[5015721.851225] ata3.00: error: { ABRT }                                                                                                                                                                                                   
[5015726.848684] ata3.00: qc timeout (cmd 0xec)                                                                                                                                                                                             
[5015726.848729] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5)                                                                                                                                                                      
[5015726.848763] ata3.00: revalidation failed (errno=-5)                                                                                                                                                                                    
[5015726.848798] ata3: hard resetting link                                                                                                                                                                                                  
[5015734.501527] ata3: softreset failed (device not ready)                                                                                                                                                                                  
[5015734.501565] ata3: failed due to HW bug, retry pmp=0                                                                                                                                                                                    
[5015734.665530] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)                                                                                                                                                                     
[5015734.707085] ata3.00: both IDENTIFYs aborted, assuming NODEV                                                                                                                                                                            
[5015734.707089] ata3.00: revalidation failed (errno=-2)                                                                                                                                                                                    
[5015739.664923] ata3: hard resetting link                                                                                                                                                                                                  
[5015740.148277] ata3: softreset failed (device not ready)                                                                                                                                                                                  
[5015740.148314] ata3: failed due to HW bug, retry pmp=0                                                                                                                                                                                    
[5015740.313532] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[5015740.337129] ata3.00: both IDENTIFYs aborted, assuming NODEV
[5015740.337132] ata3.00: revalidation failed (errno=-2)
[5015740.337167] ata3.00: disabled
[5015740.337231] ata3: EH complete
(Continue reading)

Andrew Dunn | 1 Nov 20:37 2009
Picon

Re: unbelievably bad performance: 2.6.27.37 and raid6

Are we to expect some resolution in newer kernels?

I am going to rebuild my array (backup data and re-create) to modify the
chunk size this week. I hope to get a much higher performance when
increasing from 64k chunk size to 1024k.

Is there a way to modify chunk size in place or does the array need to
be re-created?

Thomas Fjellstrom wrote:
> On Sat October 31 2009, Jon Nelson wrote:
>   
>> I have a 4 disk raid6. The disks are individually capable of (at
>> least) 75MB/s on average.
>> The raid6 looks like this:
>>
>> md0 : active raid6 sda4[0] sdc4[5] sdd4[4] sdb4[6]
>>       613409536 blocks super 1.1 level 6, 64k chunk, algorithm 2 [4/4]
>>  [UUUU]
>>
>> The raid serves basically as an lvm physical volume.
>>
>> While rsyncing a file from an ext3 filesystem to a jfs filesystem, I
>> am observing speeds in the 10-15MB/s range.
>> That seems really really slow.
>>
>> Using vmstat, I see similar numbers (I'm averaging a bit, I'll see
>> lows of 6MB/s and highs of 18-20MB/s, but these are infrequent.)
>> The system is, for the most part, otherwise unloaded.
>>
(Continue reading)

Thomas Fjellstrom | 1 Nov 20:41 2009
Picon

Re: unbelievably bad performance: 2.6.27.37 and raid6

On Sun November 1 2009, Andrew Dunn wrote:
> Are we to expect some resolution in newer kernels?

I assume all of the new per-bdi-writeback work going on in .33+ will have a 
large impact. At least I'm hoping.

> I am going to rebuild my array (backup data and re-create) to modify the
> chunk size this week. I hope to get a much higher performance when
> increasing from 64k chunk size to 1024k.
> 
> Is there a way to modify chunk size in place or does the array need to
> be re-created?

This I'm not sure about. I'd like to be able to reshape to a new chunk size 
for testing.

> Thomas Fjellstrom wrote:
> > On Sat October 31 2009, Jon Nelson wrote:
> >> I have a 4 disk raid6. The disks are individually capable of (at
> >> least) 75MB/s on average.
> >> The raid6 looks like this:
> >>
> >> md0 : active raid6 sda4[0] sdc4[5] sdd4[4] sdb4[6]
> >>       613409536 blocks super 1.1 level 6, 64k chunk, algorithm 2 [4/4]
> >>  [UUUU]
> >>
> >> The raid serves basically as an lvm physical volume.
> >>
> >> While rsyncing a file from an ext3 filesystem to a jfs filesystem, I
> >> am observing speeds in the 10-15MB/s range.
(Continue reading)

Justin Piszcz | 2 Nov 00:19 2009

Re: raid disk failure, options?


On Sun, 1 Nov 2009, Thomas Fjellstrom wrote:

> My main raid array just had a disk failure. I tried to hot remove the
> device, and use the scsi bus rescan sysfs entries, but it seems to fail on
> IDENTIFY.
>
> Can I assume my disk is dead?
>
>
> [5015721.851044] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [5015721.851089] ata3.00: irq_stat 0x40000001
> [5015721.851124] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> [5015721.851125]          res 71/04:03:80:01:32/00:00:00:00:00/e0 Emask 0x1
> (device error)
> [5015721.851193] ata3.00: status: { DRDY DF ERR }
> [5015721.851225] ata3.00: error: { ABRT }
> [5015726.848684] ata3.00: qc timeout (cmd 0xec)
> [5015726.848729] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5)
> [5015726.848763] ata3.00: revalidation failed (errno=-5)
> [5015726.848798] ata3: hard resetting link
> [5015734.501527] ata3: softreset failed (device not ready)
> [5015734.501565] ata3: failed due to HW bug, retry pmp=0
> [5015734.665530] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [5015734.707085] ata3.00: both IDENTIFYs aborted, assuming NODEV
> [5015734.707089] ata3.00: revalidation failed (errno=-2)
> [5015739.664923] ata3: hard resetting link
> [5015740.148277] ata3: softreset failed (device not ready)
> [5015740.148314] ata3: failed due to HW bug, retry pmp=0
> [5015740.313532] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
(Continue reading)

NeilBrown | 2 Nov 00:43 2009
Picon

Re: unbelievably bad performance: 2.6.27.37 and raid6

On Mon, November 2, 2009 6:41 am, Thomas Fjellstrom wrote:
> On Sun November 1 2009, Andrew Dunn wrote:
>> Are we to expect some resolution in newer kernels?
>
> I assume all of the new per-bdi-writeback work going on in .33+ will have
> a
> large impact. At least I'm hoping.
>
>> I am going to rebuild my array (backup data and re-create) to modify the
>> chunk size this week. I hope to get a much higher performance when
>> increasing from 64k chunk size to 1024k.
>>
>> Is there a way to modify chunk size in place or does the array need to
>> be re-created?
>
> This I'm not sure about. I'd like to be able to reshape to a new chunk
> size
> for testing.

Reshaping to a new chunksize is possible with the latest mdadm and kernel,
but I would recommend waiting for mdadm-3.1.1 and 2.6.32.
With the current code, a device failure during reshape followed by an
unclean shutdown while reshape is happening can lead to unrecoverable
data loss.  Even a clean shutdown before the shape finishes in that case
might be a problem.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
(Continue reading)

Thomas Fjellstrom | 2 Nov 00:45 2009
Picon

Re: raid disk failure, options?

On Sun November 1 2009, Justin Piszcz wrote:
> On Sun, 1 Nov 2009, Thomas Fjellstrom wrote:
> > My main raid array just had a disk failure. I tried to hot remove the
> > device, and use the scsi bus rescan sysfs entries, but it seems to fail
> > on IDENTIFY.
> >
> > Can I assume my disk is dead?
> >
> >
> > [5015721.851044] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
> > 0x0 [5015721.851089] ata3.00: irq_stat 0x40000001
> > [5015721.851124] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
> > [5015721.851125]          res 71/04:03:80:01:32/00:00:00:00:00/e0 Emask
> > 0x1 (device error)
> > [5015721.851193] ata3.00: status: { DRDY DF ERR }
> > [5015721.851225] ata3.00: error: { ABRT }
> > [5015726.848684] ata3.00: qc timeout (cmd 0xec)
> > [5015726.848729] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x5)
> > [5015726.848763] ata3.00: revalidation failed (errno=-5)
> > [5015726.848798] ata3: hard resetting link
> > [5015734.501527] ata3: softreset failed (device not ready)
> > [5015734.501565] ata3: failed due to HW bug, retry pmp=0
> > [5015734.665530] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > [5015734.707085] ata3.00: both IDENTIFYs aborted, assuming NODEV
> > [5015734.707089] ata3.00: revalidation failed (errno=-2)
> > [5015739.664923] ata3: hard resetting link
> > [5015740.148277] ata3: softreset failed (device not ready)
> > [5015740.148314] ata3: failed due to HW bug, retry pmp=0
> > [5015740.313532] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> > [5015740.337129] ata3.00: both IDENTIFYs aborted, assuming NODEV
(Continue reading)

Thomas Fjellstrom | 2 Nov 00:47 2009
Picon

Re: unbelievably bad performance: 2.6.27.37 and raid6

On Sun November 1 2009, NeilBrown wrote:
> On Mon, November 2, 2009 6:41 am, Thomas Fjellstrom wrote:
> > On Sun November 1 2009, Andrew Dunn wrote:
> >> Are we to expect some resolution in newer kernels?
> >
> > I assume all of the new per-bdi-writeback work going on in .33+ will
> > have a
> > large impact. At least I'm hoping.
> >
> >> I am going to rebuild my array (backup data and re-create) to modify
> >> the chunk size this week. I hope to get a much higher performance when
> >> increasing from 64k chunk size to 1024k.
> >>
> >> Is there a way to modify chunk size in place or does the array need to
> >> be re-created?
> >
> > This I'm not sure about. I'd like to be able to reshape to a new chunk
> > size
> > for testing.
> 
> Reshaping to a new chunksize is possible with the latest mdadm and
>  kernel, but I would recommend waiting for mdadm-3.1.1 and 2.6.32.
> With the current code, a device failure during reshape followed by an
> unclean shutdown while reshape is happening can lead to unrecoverable
> data loss.  Even a clean shutdown before the shape finishes in that case
> might be a problem.

That's good to know. Though I'm stuck with 2.6.26 till the performance 
regressions in the io and scheduling subsystems are solved.

(Continue reading)

Jon Nelson | 2 Nov 00:53 2009
Picon

Re: unbelievably bad performance: 2.6.27.37 and raid6

On Sun, Nov 1, 2009 at 5:47 PM, Thomas Fjellstrom <tfjellstrom <at> shaw.ca> wrote:
> On Sun November 1 2009, NeilBrown wrote:

>> Reshaping to a new chunksize is possible with the latest mdadm and
>>  kernel, but I would recommend waiting for mdadm-3.1.1 and 2.6.32.
>> With the current code, a device failure during reshape followed by an
>> unclean shutdown while reshape is happening can lead to unrecoverable
>> data loss.  Even a clean shutdown before the shape finishes in that case
>> might be a problem.

Do you know if the stable series 2.6.31.XX incorporates the appropriate fixes?

--

-- 
Jon
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andrew Dunn | 2 Nov 00:55 2009
Picon

Re: unbelievably bad performance: 2.6.27.37 and raid6

Thanks for the update Neil, good to have something to look forward to.

I am using Ubuntu 9.10, hopefully the new kernel will be incorporated
sometime in the near future. In the mean time I will back everything up
and create the ARRAY all over again.

Thomas Fjellstrom wrote:
> On Sun November 1 2009, NeilBrown wrote:
>   
>> On Mon, November 2, 2009 6:41 am, Thomas Fjellstrom wrote:
>>     
>>> On Sun November 1 2009, Andrew Dunn wrote:
>>>       
>>>> Are we to expect some resolution in newer kernels?
>>>>         
>>> I assume all of the new per-bdi-writeback work going on in .33+ will
>>> have a
>>> large impact. At least I'm hoping.
>>>
>>>       
>>>> I am going to rebuild my array (backup data and re-create) to modify
>>>> the chunk size this week. I hope to get a much higher performance when
>>>> increasing from 64k chunk size to 1024k.
>>>>
>>>> Is there a way to modify chunk size in place or does the array need to
>>>> be re-created?
>>>>         
>>> This I'm not sure about. I'd like to be able to reshape to a new chunk
>>> size
>>> for testing.
(Continue reading)


Gmane