Dick Snippe | 1 May 2006 01:17
Picon

try to write back redundant data before failing disk in raid5 setup

Hello,

Suppose a read action on a disk which is member of a raid5 (or raid1 or any
other raid where there's data redundancy) fails.
What ahppens next is that the entire disk is marked as "failed" and a raid5
rebuild is initiated.

However, that seems like overkill to me. If only one sector on one disk
failed, that sector could be re-calculated  (using parity calculations)
AND written back to the original disk (i.e. the disk with the bad sector).
Any modern disk will do sector remapping, so the bad sector will simply be
replaced by a good one and there's no need to fail the entire disk.

The reason I bring this up is that I think raid5 rebuilds are 'scary'
things. Suppose a raid5 rebuild is initiated while other members of the
raid5 set have bad -but yet undetected- sectors scattered around the disc
(Current_Pending_Sector in smartd speak). Now this raid5 rebuild would fail,
losing the entire raid5 set. While each and every bit in the raid5 set might
still be salvagable!  (I've seen this happen on 5x250Gb raid5 sets.)

Does anyone on this list have any opinions about this issue?

--

-- 
Dick Snippe - Publieke Omroep Internet Services
Gebouw 12.401 (peperbus) Sumatralaan 45 Hilversum  \ fight war
tel +31 35 6774252, email beheer <at> omroep.nl []()     \ not wars
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
(Continue reading)

gearewayne | 1 May 2006 01:40
Picon

(unknown)

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dick Snippe | 1 May 2006 01:55
Picon

Re: try to write back redundant data before failing disk in raid5 setup

On Mon, May 01, 2006 at 01:17:42AM +0200, Dick Snippe wrote:

> Suppose a read action on a disk which is member of a raid5 (or raid1 or any
> other raid where there's data redundancy) fails.
> What ahppens next is that the entire disk is marked as "failed" and a raid5
> rebuild is initiated.
>
> However, that seems like overkill to me. If only one sector on one disk
> failed, that sector could be re-calculated  (using parity calculations)
> AND written back to the original disk (i.e. the disk with the bad sector).
> Any modern disk will do sector remapping, so the bad sector will simply be
> replaced by a good one and there's no need to fail the entire disk.

Sigh. Just checked the kernel source. Recent 2.6 kernels (>= 2.6.15) appear
to have support for this (See raid5_end_read_request in drivers/md/raid5.c).
Earlier versions don't.

--

-- 
Dick Snippe - Publieke Omroep Internet Services
Gebouw 12.401 (peperbus) Sumatralaan 45 Hilversum  \ fight war
tel +31 35 6774252, email beheer <at> omroep.nl []()     \ not wars
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Neil Brown | 1 May 2006 01:59
X-Face
Picon
Gravatar

Re: try to write back redundant data before failing disk in raid5 setup

On Monday May 1, Dick.Snippe <at> tech.omroep.nl wrote:
> Hello,
> 
> Suppose a read action on a disk which is member of a raid5 (or raid1 or any
> other raid where there's data redundancy) fails.
> What ahppens next is that the entire disk is marked as "failed" and a raid5
> rebuild is initiated.
> 
> However, that seems like overkill to me. If only one sector on one disk
> failed, that sector could be re-calculated  (using parity calculations)
> AND written back to the original disk (i.e. the disk with the bad sector).
> Any modern disk will do sector remapping, so the bad sector will simply be
> replaced by a good one and there's no need to fail the entire disk.
> 

... and any modern linux kernel (since about 2.6.15) will to exactly
what you suggest.

> The reason I bring this up is that I think raid5 rebuilds are 'scary'
> things. Suppose a raid5 rebuild is initiated while other members of the
> raid5 set have bad -but yet undetected- sectors scattered around the disc
> (Current_Pending_Sector in smartd speak). Now this raid5 rebuild would fail,
> losing the entire raid5 set. While each and every bit in the raid5 set might
> still be salvagable!  (I've seen this happen on 5x250Gb raid5 sets.)
> 

For this reason it is good to regularly do a background read check of
the entire array.
  echo check > /sys/block/mdX/md/sync_action

(Continue reading)

CaT | 1 May 2006 07:23
Picon
Picon

raid5 resizing

Hey folks.

There's no point in using LVM on a raid5 setup if all you intend to do
in the future is resize the filesystem on it, is there? The new raid5
resizing code takes care of providing the extra space and then as long
as the say ext3 filesystem is created with resize_inode all should be
sweet. Right? Or have I missed something crucial here? :)

--

-- 
    "To the extent that we overreact, we proffer the terrorists the
    greatest tribute."
    	- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

NeilBrown | 1 May 2006 07:29
X-Face
Picon
Gravatar

[PATCH 000 of 11] md: Introduction - assort md enhancements for 2.6.18

The follow 11 patch are assorted tidy-ups and functionality
enhancements suitable for 2.6.18 when that opens up.

More interesting patches are:
 5 - merge raid5 and raid6 code into a single module.  There is a lot
      of common code here, and there are advantages in uniting it all.
 8 - allow a linear array to be expanded while online by adding an
     extra drive.
 10- A new flavour of 'raid10' which matches one of the layouts
     supported by 'DDF' - an industry standard raid metadata format
     which might be supported one day.  This will need an updated
     mdadm to experiment with.

Thanks,
NeilBrown

 [PATCH 001 of 11] md: Reformat code in raid1_end_write_request to avoid goto
 [PATCH 002 of 11] md: Remove arbitrary limit on chunk size.
 [PATCH 003 of 11] md: Remove useless ioctl warning.
 [PATCH 004 of 11] md: Increase the delay before marking metadata clean, and make it configurable.
 [PATCH 005 of 11] md: Merge raid5 and raid6 code
 [PATCH 006 of 11] md: Remove nuisance message at shutdown
 [PATCH 007 of 11] md: Allow checkpoint of recovery with version-1 superblock.
 [PATCH 008 of 11] md: Allow a linear array to have drives added while active.
 [PATCH 009 of 11] md: Support stripe/offset mode in raid10
 [PATCH 010 of 11] md: make md_print_devices() static
 [PATCH 011 of 11] md: Split reshape portion of raid5 sync_request into a separate function.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
(Continue reading)

NeilBrown | 1 May 2006 07:30
X-Face
Picon
Gravatar

[PATCH 002 of 11] md: Remove arbitrary limit on chunk size.


The largest chunk size the code can support without substantial
surgery is 2^30 bytes, so make that the limit instead of an arbitrary
4Meg.
Some day, the 'chunksize' should change to a sector-shift
instead of a byte-count.  Then no limit would be needed.

Signed-off-by: Neil Brown <neilb <at> suse.de>

### Diffstat output
 ./drivers/md/raid10.c       |    2 +-
 ./drivers/md/raid5.c        |    4 ++--
 ./drivers/md/raid6main.c    |    4 ++--
 ./include/linux/raid/md_k.h |    3 ++-
 4 files changed, 7 insertions(+), 6 deletions(-)

diff ./drivers/md/raid10.c~current~ ./drivers/md/raid10.c
--- ./drivers/md/raid10.c~current~	2006-05-01 15:09:20.000000000 +1000
+++ ./drivers/md/raid10.c	2006-05-01 15:10:17.000000000 +1000
 <at>  <at>  -2050,7 +2050,7  <at>  <at>  static int run(mddev_t *mddev)
 	 * maybe...
 	 */
 	{
-		int stripe = conf->raid_disks * mddev->chunk_size / PAGE_SIZE;
+		int stripe = conf->raid_disks * (mddev->chunk_size / PAGE_SIZE);
 		stripe /= conf->near_copies;
 		if (mddev->queue->backing_dev_info.ra_pages < 2* stripe)
 			mddev->queue->backing_dev_info.ra_pages = 2* stripe;

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
(Continue reading)

NeilBrown | 1 May 2006 07:30
X-Face
Picon
Gravatar

[PATCH 001 of 11] md: Reformat code in raid1_end_write_request to avoid goto


A recent change made this goto unnecessary, so reformat the
code to make it clearer what is happening.

Signed-off-by: Neil Brown <neilb <at> suse.de>

### Diffstat output
 ./drivers/md/raid1.c |   34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~	2006-05-01 15:09:20.000000000 +1000
+++ ./drivers/md/raid1.c	2006-05-01 15:10:00.000000000 +1000
 <at>  <at>  -374,26 +374,26  <at>  <at>  static int raid1_end_write_request(struc
 	 * already.
 	 */
 	if (atomic_dec_and_test(&r1_bio->remaining)) {
-		if (test_bit(R1BIO_BarrierRetry, &r1_bio->state)) {
+		if (test_bit(R1BIO_BarrierRetry, &r1_bio->state))
 			reschedule_retry(r1_bio);
-			goto out;
+		else {
+			/* it really is the end of this request */
+			if (test_bit(R1BIO_BehindIO, &r1_bio->state)) {
+				/* free extra copy of the data pages */
+				int i = bio->bi_vcnt;
+				while (i--)
+					safe_put_page(bio->bi_io_vec[i].bv_page);
+			}
+			/* clear the bitmap if all writes complete successfully */
(Continue reading)

NeilBrown | 1 May 2006 07:31
X-Face
Picon
Gravatar

[PATCH 010 of 11] md: make md_print_devices() static


From: Adrian Bunk <bunk <at> stusta.de>

This patch makes the needlessly global md_print_devices() static.

Signed-off-by: Adrian Bunk <bunk <at> stusta.de>
Signed-off-by: Neil Brown <neilb <at> suse.de>

### Diffstat output
 ./drivers/md/md.c         |    7 +++++--
 ./include/linux/raid/md.h |    4 ----
 2 files changed, 5 insertions(+), 6 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2006-05-01 15:13:14.000000000 +1000
+++ ./drivers/md/md.c	2006-05-01 15:13:29.000000000 +1000
 <at>  <at>  -72,6 +72,10  <at>  <at>  static void autostart_arrays (int part);
 static LIST_HEAD(pers_list);
 static DEFINE_SPINLOCK(pers_lock);

+static void md_print_devices(void);
+
+#define MD_BUG(x...) { printk("md: bug in file %s, line %d\n", __FILE__, __LINE__);
md_print_devices(); }
+
 /*
  * Current RAID-1,4,5 parallel reconstruction 'guaranteed speed limit'
  * is 1000 KB/sec, so the extra system load does not show up that much.
 <at>  <at>  -1512,7 +1516,7  <at>  <at>  static void print_rdev(mdk_rdev_t *rdev)
 		printk(KERN_INFO "md: no rdev superblock!\n");
(Continue reading)

NeilBrown | 1 May 2006 07:30
X-Face
Picon
Gravatar

[PATCH 007 of 11] md: Allow checkpoint of recovery with version-1 superblock.


For a while we have had checkpointing of resync.
The version-1 superblock allows recovery to be checkpointed
as well, and this patch implements that.

Due to early carelessness we need to add a feature flag
to signal that the recovery_offset field is in use, otherwise
older kernels would assume that a partially recovered array
is in fact fully recovered.

Signed-off-by: Neil Brown <neilb <at> suse.de>

### Diffstat output
 ./drivers/md/md.c           |  115 +++++++++++++++++++++++++++++++++++---------
 ./drivers/md/raid1.c        |    3 -
 ./drivers/md/raid10.c       |    3 -
 ./drivers/md/raid5.c        |    1 
 ./include/linux/raid/md_k.h |    6 ++
 ./include/linux/raid/md_p.h |    5 +
 6 files changed, 109 insertions(+), 24 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2006-05-01 15:10:18.000000000 +1000
+++ ./drivers/md/md.c	2006-05-01 15:12:34.000000000 +1000
 <at>  <at>  -1165,7 +1165,11  <at>  <at>  static int super_1_validate(mddev_t *mdd
 			set_bit(Faulty, &rdev->flags);
 			break;
 		default:
-			set_bit(In_sync, &rdev->flags);
+			if ((le32_to_cpu(sb->feature_map) &
(Continue reading)


Gmane