Alexander Lyakas | 1 Jul 2012 09:08
Picon

Question about raid5 disk recovery logic

Hi everybody,
I am trying to understand what happens when raid5 is recovering a
disk, and a write comes to a stripe that has not been recovered yet.
Does md first reconstruct the missing chunk and then applies the
write, or first the write is applied as if the array is still degraded
(and not recovering), and only later the missing chunk is
reconstructed (when the md_do_sync() loop gets to this area)?
I am looking at the stripe handling logic (kernel 2.6.38), can anybody
pls point me at the path that handle_stripe5() takes in that case?

Thanks,
Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

NeilBrown | 1 Jul 2012 10:00
Picon
Gravatar

Re: Question about raid5 disk recovery logic

On Sun, 1 Jul 2012 10:08:40 +0300 Alexander Lyakas <alex.bolshoy <at> gmail.com>
wrote:

> Hi everybody,
> I am trying to understand what happens when raid5 is recovering a
> disk, and a write comes to a stripe that has not been recovered yet.
> Does md first reconstruct the missing chunk and then applies the
> write, or first the write is applied as if the array is still degraded
> (and not recovering), and only later the missing chunk is
> reconstructed (when the md_do_sync() loop gets to this area)?
> I am looking at the stripe handling logic (kernel 2.6.38), can anybody
> pls point me at the path that handle_stripe5() takes in that case?
> 
>

Hi Alex,

 The stripe is still degraded, so md/raid5 treats it like a write to a
 degraded array.
 Exactly what happens depends one which block is being written.
 If the block being written would be stored on the recovering devices, then
 md will perform a reconstruct-write.  It will read the other data blocks,
 calculate the parity, and write out the parity and the changed data.
 Similarly if the parity block is on the recovering device a
 reconstruct-write will be needed.
 If some other block is being written, md will do a read-modify-write to
 calculate the new parity and then write out the parity and data.  In this
 case the block on the recovering device will not be written.

 I hope that clarifies the situation.
(Continue reading)

Jonathan Tripathy | 1 Jul 2012 13:20
Picon

Resync Every Sunday

Hi Everyone,

We have a few servers that use md raid with mdadm. Each server has 4 
arrays (md0,md1,md2,md3). md0,1,2 are small and md3 is very large. Every 
Sunday at 4:22am, the servers will start to resync. Here is some text 
from /var/log/messages for one of the servers:

Jul  1 04:22:01 server1 kernel: md: syncing RAID array md0
Jul  1 04:22:01 server1 kernel: md: minimum _guaranteed_ reconstruction 
speed: 1000 KB/sec/disc.
Jul  1 04:22:01 server1 kernel: md: using maximum available idle IO 
bandwidth (but not more than 200000 KB/sec) for reconstruction.
Jul  1 04:22:01 server1 kernel: md: using 128k window, over a total of 
104320 blocks.
Jul  1 04:22:01 server1 kernel: md: delaying resync of md2 until md0 has 
finished resync (they share one or more physical units)
Jul  1 04:22:01 server1 kernel: md: delaying resync of md3 until md0 has 
finished resync (they share one or more physical units)
Jul  1 04:22:05 server1 kernel: md: md0: sync done.
Jul  1 04:22:05 server1 kernel: md: delaying resync of md3 until md2 has 
finished resync (they share one or more physical units)
Jul  1 04:22:05 server1 kernel: md: delaying resync of md2 until md3 has 
finished resync (they share one or more physical units)
Jul  1 04:22:05 server1 kernel: md: syncing RAID array md3
Jul  1 04:22:05 server1 kernel: md: minimum _guaranteed_ reconstruction 
speed: 1000 KB/sec/disc.
Jul  1 04:22:05 server1 kernel: md: using maximum available idle IO 
bandwidth (but not more than 200000 KB/sec) for reconstruction.
Jul  1 04:22:05 server1 kernel: md: using 128k window, over a total of 
1888295936 blocks.
(Continue reading)

Jonathan Tripathy | 1 Jul 2012 14:04
Picon

Re: Resync Every Sunday


On 01/07/2012 12:20, Jonathan Tripathy wrote:
> Hi Everyone,
>
> We have a few servers that use md raid with mdadm. Each server has 4 
> arrays (md0,md1,md2,md3). md0,1,2 are small and md3 is very large. 
> Every Sunday at 4:22am, the servers will start to resync. Here is some 
> text from /var/log/messages for one of the servers:
>
> Jul  1 04:22:01 server1 kernel: md: syncing RAID array md0
> Jul  1 04:22:01 server1 kernel: md: minimum _guaranteed_ 
> reconstruction speed: 1000 KB/sec/disc.
> Jul  1 04:22:01 server1 kernel: md: using maximum available idle IO 
> bandwidth (but not more than 200000 KB/sec) for reconstruction.
> Jul  1 04:22:01 server1 kernel: md: using 128k window, over a total of 
> 104320 blocks.
> Jul  1 04:22:01 server1 kernel: md: delaying resync of md2 until md0 
> has finished resync (they share one or more physical units)
> Jul  1 04:22:01 server1 kernel: md: delaying resync of md3 until md0 
> has finished resync (they share one or more physical units)
> Jul  1 04:22:05 server1 kernel: md: md0: sync done.
> Jul  1 04:22:05 server1 kernel: md: delaying resync of md3 until md2 
> has finished resync (they share one or more physical units)
> Jul  1 04:22:05 server1 kernel: md: delaying resync of md2 until md3 
> has finished resync (they share one or more physical units)
> Jul  1 04:22:05 server1 kernel: md: syncing RAID array md3
> Jul  1 04:22:05 server1 kernel: md: minimum _guaranteed_ 
> reconstruction speed: 1000 KB/sec/disc.
> Jul  1 04:22:05 server1 kernel: md: using maximum available idle IO 
> bandwidth (but not more than 200000 KB/sec) for reconstruction.
(Continue reading)

Mikael Abrahamsson | 1 Jul 2012 14:44
Picon
Favicon

Re: Resync Every Sunday

On Sun, 1 Jul 2012, Jonathan Tripathy wrote:

> - Is it safe to disable these checks? Would monitoring the SMART status of 
> the disks serve as a good substitute?

Well, that's a decision you will have to make for yourself. The rationale 
behind it is to find latent read errors and correct them while you have 
parity already. Another term for this is "data scrubbing", you'll find 
quite a lot of discussion on that topic.

Personally, my view is that I make sure all my data are read at least once 
a month, I have experienced data loss historically because of lack of 
scrubbing.

--

-- 
Mikael Abrahamsson    email: swmike <at> swm.pp.se
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jonathan Tripathy | 1 Jul 2012 14:53
Picon

Re: Resync Every Sunday


On 01/07/2012 13:44, Mikael Abrahamsson wrote:
> On Sun, 1 Jul 2012, Jonathan Tripathy wrote:
>
>> - Is it safe to disable these checks? Would monitoring the SMART 
>> status of the disks serve as a good substitute?
>
> Well, that's a decision you will have to make for yourself. The 
> rationale behind it is to find latent read errors and correct them 
> while you have parity already. Another term for this is "data 
> scrubbing", you'll find quite a lot of discussion on that topic.
>
> Personally, my view is that I make sure all my data are read at least 
> once a month, I have experienced data loss historically because of 
> lack of scrubbing.
>
Thanks, I think I'll change it to monthly as well. That seems like a 
good compromise.

I'm still very interested in the other questions though. Especially the 
one relating to why not all arrays are checked

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexander Lyakas | 1 Jul 2012 15:36
Picon

Re: Question about raid5 disk recovery logic

Thanks, Neil!
That clarifies.

Does this also mean, that when md_do_sync() gets to such
already-reconstructed stripe, it might reconstruct it once again,
unless the stripe stays in the stripe cache?

Thanks for helping,
Alex.

On Sun, Jul 1, 2012 at 11:00 AM, NeilBrown <neilb <at> suse.de> wrote:
> On Sun, 1 Jul 2012 10:08:40 +0300 Alexander Lyakas <alex.bolshoy <at> gmail.com>
> wrote:
>
>> Hi everybody,
>> I am trying to understand what happens when raid5 is recovering a
>> disk, and a write comes to a stripe that has not been recovered yet.
>> Does md first reconstruct the missing chunk and then applies the
>> write, or first the write is applied as if the array is still degraded
>> (and not recovering), and only later the missing chunk is
>> reconstructed (when the md_do_sync() loop gets to this area)?
>> I am looking at the stripe handling logic (kernel 2.6.38), can anybody
>> pls point me at the path that handle_stripe5() takes in that case?
>>
>>
>
> Hi Alex,
>
>  The stripe is still degraded, so md/raid5 treats it like a write to a
>  degraded array.
(Continue reading)

Western Union, Malaysia | 1 Jul 2012 13:00

[UTEXAS: SUSPECTED SPAM] Your Funds Notification

Your funds payment notification.
Do send your Name, Address & Phone Number to;
Mrs Franca Lee
+6010 3770 946.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Western Union, Malaysia | 1 Jul 2012 14:29

Your Funds Notification

Your funds payment notification.
Do send your Name, Address & Phone Number to;
Mrs Franca Lee
+6010 3770 946.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Keith Keller | 1 Jul 2012 22:41
Picon
Picon

Re: Resync Every Sunday

On 2012-07-01, Jonathan Tripathy <jonnyt <at> abpni.co.uk> wrote:
>
> What's going on? Am I missing something here? Is data on the arrays at 
> risk? We're using CentOS 5 with mdadm v2.6.9. Kernel version is 
> 2.6.18-274.18.1.el5

As you are running CentOS, check /etc/sysconfig/raid-check.  Someone may
have configured certain arrays not to be checked.

--keith

--

-- 
kkeller <at> wombat.san-francisco.ca.us

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane