Asdo | 1 Feb 2012 01:51

Found data loss report with write-mostly

I came across this page
http://superuser.com/questions/379472/how-does-one-enable-write-mostly-with-linux-raid
(a data loss report using write-mostly in linux 3.1)
I thought I would report it here
Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

NeilBrown | 1 Feb 2012 02:14
Picon
Gravatar

Re: Found data loss report with write-mostly

On Wed, 01 Feb 2012 01:51:14 +0100 Asdo <asdo <at> shiftmail.org> wrote:

> I came across this page
> http://superuser.com/questions/379472/how-does-one-enable-write-mostly-with-linux-raid
> (a data loss report using write-mostly in linux 3.1)
> I thought I would report it here

Thanks.
This is fixed in 3.1.10, 3.2.2, and 3.3-rc1.

NeilBrown
Keith Keller | 1 Feb 2012 05:42
Picon
Picon

Re: rebuild raid6 after two failures

On 2012-01-31, Keith Keller <kkeller <at> wombat.san-francisco.ca.us> wrote:
>
> I recently had a RAID6 lose two drives in quick succession, with one
> spare already in place.  The rebuild started fine with the spare, but
> now that I've replaced the failed disks, should I expect the current
> rebuild to finish, then rebuild on another spare?

[snip]

Well, for better or worse, this is now a moot question--I had another
drive kicked out of the array, I believe prematurely by the controller.
I was able to --assemble --force the array, and it is now rebuilding
two spares instead of one.  AFAIR there was no activity on the
filesystem at the time, so I am optimistic that the filesystem should be
fine after an fsck.  Thanks to the advice from last time which suggested
--assemble --force instead of --assume-clean in this situation.

Could it have been the older version of mdadm that didn't tell the
kernel to start rebuilding the added spare?  I have made 3.2.3 my
default mdadm, which I hope alleviates some of the issues I've had with
rebuilds not starting.  (As an aside, I've also bitten the bullet and
decided to swap out all the WD-EARS drives for real RAID drives; ideally
I'd replace the controller, but I don't want to invest the time needed
to replace and test all the components properly.)

--keith

--

-- 
kkeller <at> wombat.san-francisco.ca.us

(Continue reading)

NeilBrown | 1 Feb 2012 06:31
Picon
Gravatar

Re: rebuild raid6 after two failures

On Tue, 31 Jan 2012 20:42:28 -0800 Keith Keller
<kkeller <at> wombat.san-francisco.ca.us> wrote:

> On 2012-01-31, Keith Keller <kkeller <at> wombat.san-francisco.ca.us> wrote:
> >
> > I recently had a RAID6 lose two drives in quick succession, with one
> > spare already in place.  The rebuild started fine with the spare, but
> > now that I've replaced the failed disks, should I expect the current
> > rebuild to finish, then rebuild on another spare?
> 
> [snip]
> 
> Well, for better or worse, this is now a moot question--I had another
> drive kicked out of the array, I believe prematurely by the controller.
> I was able to --assemble --force the array, and it is now rebuilding
> two spares instead of one.  AFAIR there was no activity on the
> filesystem at the time, so I am optimistic that the filesystem should be
> fine after an fsck.  Thanks to the advice from last time which suggested
> --assemble --force instead of --assume-clean in this situation.
> 
> Could it have been the older version of mdadm that didn't tell the
> kernel to start rebuilding the added spare?  I have made 3.2.3 my
> default mdadm, which I hope alleviates some of the issues I've had with
> rebuilds not starting.  (As an aside, I've also bitten the bullet and
> decided to swap out all the WD-EARS drives for real RAID drives; ideally
> I'd replace the controller, but I don't want to invest the time needed
> to replace and test all the components properly.)

If a spare is being rebuild when another spare is added, it keeps with the
first rebuild rather than restarting from the beginning.
(Continue reading)

Keith Keller | 1 Feb 2012 06:48
Picon
Picon

Re: rebuild raid6 after two failures

On 2012-02-01, NeilBrown <neilb <at> suse.de> wrote:
>
> If a spare is being rebuild when another spare is added, it keeps with the
> first rebuild rather than restarting from the beginning.
>
> This means that you get some redundancy sooner, which is probably a good
> thing.

Great, thanks for the info.  I just wanted to check that the
behavior I saw earlier was expected.  (Yes, it's a good thing!)

--keith

--

-- 
kkeller <at> wombat.san-francisco.ca.us

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

John Robinson | 1 Feb 2012 15:38
Picon

GRUB doesn't like ServeRAID M1015/LSI 9220-8i (was Re: mvsas with 3.1)

On 01/02/2012 14:03, Thomas Fjellstrom wrote:
[...]
>  I picked up a refurb IBM
> ServeRaid M1015 (aka: LSI 9220-8i) card. It arrived yesterday. Just have to
> wait and see why grub doesn't like it.

When you find out, please let me know - I bought one via ebay, its 
firmware was elderly and wouldn't let me put it in JBOD mode, after a 
bit of googling I flashed it with IBM's latest firmware, then GRUB 
thought my machine had no RAM in it. The card has been sitting in a box 
since.

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Alexander Lyakas | 1 Feb 2012 21:56
Picon

Re: sb->resync_offset value after resync failure

Hi Neil,
based on your hints I dug some more into resync failure cases, and
handling of sb->resync_offset and mddev->recovery_cp. Here are aome
observations:

The only cases, in which recovery_cp is set (except setting via sysfs,
setting bitmap bits via sysfs etc.) are:
# on array creation, recovery_cp is set to 0 (and bitmap is fully set)
# recovery_cp becomes MaxSector only if MD_RECOVERY_SYNC (with/without
REQUESTED) completes successfully. On raid6 this may happen when a
singly-degraded array completes resync.
# if resync does not complete successfully (MD_RECOVERY_INTR or
crash), then recovery_cp remains valid (not MaxSector).

# The only real influence that recovery_cp seems to have is:
1) abort the assembly if ok_start_degraded is not set
2) when loading the superblock, it looks like recovery_cp may cause
the beginning of the bitmap not being loaded. I did not dig further
into bitmap at this point.
3) resume the resync in md_check_recovery() if recovery_cp is valid.

Are these observations valid?

With this scheme I saw several interesting issues:
# After resync is aborted/interrupted, recovery_cp is updated (either
to MaxSector or another value). However, the superblock is not updated
at this point. If there is no additional activity on the array, the
superblock will not be updated. I saw cases when resync completes
fine, recovery_cp is set to MaxSector, but not persisted in the
superblock. If I crash the machine at this point, then after reboot,
(Continue reading)

NeilBrown | 1 Feb 2012 23:19
Picon
Gravatar

Re: sb->resync_offset value after resync failure

On Wed, 1 Feb 2012 22:56:56 +0200 Alexander Lyakas <alex.bolshoy <at> gmail.com>
wrote:

> Hi Neil,
> based on your hints I dug some more into resync failure cases, and
> handling of sb->resync_offset and mddev->recovery_cp. Here are aome
> observations:
> 
> The only cases, in which recovery_cp is set (except setting via sysfs,
> setting bitmap bits via sysfs etc.) are:
> # on array creation, recovery_cp is set to 0 (and bitmap is fully set)
> # recovery_cp becomes MaxSector only if MD_RECOVERY_SYNC (with/without
> REQUESTED) completes successfully. On raid6 this may happen when a
> singly-degraded array completes resync.
> # if resync does not complete successfully (MD_RECOVERY_INTR or
> crash), then recovery_cp remains valid (not MaxSector).
> 
> # The only real influence that recovery_cp seems to have is:
> 1) abort the assembly if ok_start_degraded is not set
> 2) when loading the superblock, it looks like recovery_cp may cause
> the beginning of the bitmap not being loaded. I did not dig further
> into bitmap at this point.
> 3) resume the resync in md_check_recovery() if recovery_cp is valid.
> 
> Are these observations valid?

Looks about right.  'recovery_cp' reflects the 'dirty' flag in the superblock.
'0' means the whole array is dirty.
'MaxSectors' means none of the array is dirty.
other numbers mean that part of the array is dirty, part isn't.
(Continue reading)

Thomas Fjellstrom | 2 Feb 2012 05:04
Picon
Gravatar

Re: GRUB doesn't like ServeRAID M1015/LSI 9220-8i (was Re: mvsas with 3.1)

On Wed Feb 1, 2012, Brad Campbell wrote:
> On 01/02/12 22:42, Thomas Fjellstrom wrote:
> > Yeah, GRUB immediately throws an "out of memory. Aborted" error, its
> > annoying, but I reported the issue on GRUB's bug tracker after talking
> > to a guy in #grub on freenode. Hopefully someone looks at it soon. It
> > appears it happens so early in grub's startup that it can't even output
> > any debug info (I was asked to run: `grub-install --debug-image=all
> > /dev/sdX`, and did so, I get debug output when the card is not
> > installed, but absolutely nothing when it is installed.
> 
> Just another data point. I've seen this issue also. It does not appear to
> be GRUB's fault. It appears the BIOS on the card does something funky as
> it also prevents syslinux/isolinux/pxelinux and its friends from properly
> detecting/reporting the machines memory. Machine has 16GB but a PXE boot
> believes it has 3GB.
> 
> I "worked around" the problem by swapping out the mainboard for a later
> model with a different BIOS and the problem went away. Not pretty, but it
> was the only way I could get the machine to boot with those cards in it. I
> have 4 IBM cards in 2 machines and they work fine here. I still have the
> dodgy mainboard and can set it up if you need help testing/reproducing the
> issue.

I meant to do some testing on it tonight, but that doesn't seem like its going 
to happen. Basically I'm going to try that grub change and see if it helps, if 
you have the time, and desire to help, that would be nice, but I'm most likely 
going to try it out tomorrow sometime regardless.

> Regards,
> Brad
(Continue reading)

Jes.Sorensen | 2 Feb 2012 12:45
Picon
Favicon

[PATCH] Use MDMON_DIR for pid files created in Monitor.c

From: Jes Sorensen <Jes.Sorensen <at> redhat.com>

Other parts of mdadm/mdmon place .pid/.sock files in MDMON_DIR. This
makes Monitor.c consistent with the rest.

Signed-off-by: Jes Sorensen <Jes.Sorensen <at> redhat.com>
---
 Monitor.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/Monitor.c b/Monitor.c
index 77f22aa..7ed5282 100644
--- a/Monitor.c
+++ b/Monitor.c
 <at>  <at>  -294,8 +294,10  <at>  <at>  static int check_one_sharer(int scan)
 	int pid, rv;
 	FILE *fp;
 	char dir[20];
+	char path[100];
 	struct stat buf;
-	fp = fopen("/var/run/mdadm/autorebuild.pid", "r");
+	sprintf(path, "%s/autorebuild.pid", MDMON_DIR);
+	fp = fopen(path, "r");
 	if (fp) {
 		if (fscanf(fp, "%d", &pid) != 1)
 			pid = -1;
 <at>  <at>  -317,12 +319,12  <at>  <at>  static int check_one_sharer(int scan)
 		fclose(fp);
 	}
 	if (scan) {
(Continue reading)


Gmane