Troy Cauble | 1 Feb 2009 06:34
Picon

problem growing raid-1

I want to replace both disks in a RAID-1 with larger ones using the
instructions here:
http://linux-raid.osdl.org/index.php/Growing#Extending_an_existing_RAID_array

It looks straightforward: mdadm -f, mdmadm -r, swap a drive, etc., etc.

Except if I -f & -r a drive, I won't know which physical drive to pull.
(And I have no spare SATA ports.)

So I figure I'll pull a drive first, then -f & -r whatever mdadm tells
me is missing.
But when I pull the drive, connect a new one and boot, I get dropped
into a repair
shell with:

fsck.ext3: Unable to resolve 'UUID=806153bf-6917-440d-ae48-553418cfbbeb'

which is the UUID of the raid filesystem.

I put the drive back in and reboot, and everything is fine.

1)  So why doesn't my RAID-1 survive pulling a drive?
Seems like a standard failure mode.

2)  How do I proceed with the upgrade?

Thanks,
-troy
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
(Continue reading)

Guy Watkins | 1 Feb 2009 07:58

RE: problem growing raid-1

Years ago I had a problem booting with a failed disk.  I was told to add
this to the kernel command line.

md-mod.start_dirty_degraded=1

For me the file to edit is this:  /boot/grub/grub.conf

I think the above should be the default.  During a failure is not the time
to learn this!  Or it should be an attribute of the array, so you can have
some arrays that can start degraded and some that can't.

Guy

} -----Original Message-----
} From: linux-raid-owner <at> vger.kernel.org [mailto:linux-raid-
} owner <at> vger.kernel.org] On Behalf Of Troy Cauble
} Sent: Sunday, February 01, 2009 12:35 AM
} To: linux-raid <at> vger.kernel.org
} Subject: problem growing raid-1
} 
} I want to replace both disks in a RAID-1 with larger ones using the
} instructions here:
} http://linux-
} raid.osdl.org/index.php/Growing#Extending_an_existing_RAID_array
} 
} It looks straightforward: mdadm -f, mdmadm -r, swap a drive, etc., etc.
} 
} Except if I -f & -r a drive, I won't know which physical drive to pull.
} (And I have no spare SATA ports.)
} 
(Continue reading)

Jan Ceuleers | 1 Feb 2009 13:10
Picon
Favicon

Spares not automatically added on boot

All,

I've been having a long-standing problem with spares not being 
automatically re-added to my raid sets when the system is booted. I 
reported this problem here:

https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/252365

The problem appears to be that if spare drives are served by a disk 
controller that is of another type than the one serving the active 
volumes, then the modules for both types of disk controller need to be 
included in the initrd image for things to work properly. This is on 
Ubuntu 7.10 and 8.04.

I'd be interested in views here as to the real root cause (since the fix 
that works for me and for one other user may not a generic fix).

Thanks, Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Troy Cauble | 1 Feb 2009 15:33
Picon

Re: problem growing raid-1

Thanks, but...

I had another problem recently and it the system DID boot with the
RAID degraded.  (In fact, I posted a Q here 12 days ago about
putting my degraded raid back together.)

Also, the documentation I can find on md-mod.start_dirty_degraded=1
says it's for RAID 5 or 6 and root filesystems.  Mine is RAID-1 and /home.

Any other ideas?
-troy

On Sun, Feb 1, 2009 at 1:58 AM, Guy Watkins <linux-raid <at> watkins-home.com> wrote:
> Years ago I had a problem booting with a failed disk.  I was told to add
> this to the kernel command line.
>
> md-mod.start_dirty_degraded=1
>
> For me the file to edit is this:  /boot/grub/grub.conf
>
> I think the above should be the default.  During a failure is not the time
> to learn this!  Or it should be an attribute of the array, so you can have
> some arrays that can start degraded and some that can't.
>
> Guy
>
> } -----Original Message-----
> } From: linux-raid-owner <at> vger.kernel.org [mailto:linux-raid-
> } owner <at> vger.kernel.org] On Behalf Of Troy Cauble
> } Sent: Sunday, February 01, 2009 12:35 AM
(Continue reading)

GeneralNMX | 1 Feb 2009 18:23
Picon

Raid starts dirty every boot on Ubuntu


On my Ubuntu server, ALL my raids start dirty every single time and kicks
one drive (any one of the four), re-adds it, then reconstructs the raid
array. For the example below, this is a raid1 with two spares.  When it
finishes booting, the device that was kicked is still in the array, as a
spare. This makes the boot literally take hours, and I don’t know exactly
what it’s doing since it just sits there without a progress indicator. I
don’t think it’s fsck because it doesn’t even get to fsck yet in the
runlevel. It goes like this:

md: md7 stopped.
md: unbind<sda9>
md: export_rdev(sda9)
md: ubind<sdb9>
md: export_rdev(sdb9)
md: bind<sdc9>
md: bind<sdb9>
md: bind<sdd9>
md: bind<sda9>
md: Kicking non-fresh sdd9 from array!
md: unbind<sdd9>
md: export_rdev(sdd9)
raid1: raid set md7 active with 2 out of 2 mirrors

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

(Continue reading)

Richard Scobie | 1 Feb 2009 19:56
Picon
Favicon

Re: problem growing raid-1

Troy Cauble wrote:
> I want to replace both disks in a RAID-1 with larger ones using the
> instructions here:
> http://linux-raid.osdl.org/index.php/Growing#Extending_an_existing_RAID_array
> 
> It looks straightforward: mdadm -f, mdmadm -r, swap a drive, etc., etc.
> 
> Except if I -f & -r a drive, I won't know which physical drive to pull.
> (And I have no spare SATA ports.)

smartctl -a -d ata /dev/sdx

will tell you the serial number of the drive, which then lets you 
shutdown, identify and remove the correct one.

Regards,

Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Troy Cauble | 1 Feb 2009 20:12
Picon

Re: problem growing raid-1

On Sun, Feb 1, 2009 at 1:56 PM, Richard Scobie <richard <at> sauce.co.nz> wrote:
> Troy Cauble wrote:
>>
>> I want to replace both disks in a RAID-1 with larger ones using the
>> instructions here:
>>
>> http://linux-raid.osdl.org/index.php/Growing#Extending_an_existing_RAID_array
>>
>> It looks straightforward: mdadm -f, mdmadm -r, swap a drive, etc., etc.
>>
>> Except if I -f & -r a drive, I won't know which physical drive to pull.
>> (And I have no spare SATA ports.)
>
> smartctl -a -d ata /dev/sdx
>
> will tell you the serial number of the drive, which then lets you shutdown,
> identify and remove the correct one.
>
> Regards,
>
> Richard

Thanks,

Since my original post, I found out about getting the serial number
via smartctl or hdparm.

But before I start failing and removing drives, I'd like to understand
why my set up didn't handle a physically removed drive.  Because
if that happens after the -f & -r, then I'm really stuck.
(Continue reading)

Bill Davidsen | 1 Feb 2009 20:41

Re: some ?? re failed disk and resyncing of array

whollygoat <at> letterboxes.org wrote:
> On Sat, 31 Jan 2009 10:38:22 +0000, "David Greaves" <david <at> dgreaves.com>
> said:
>   
>> whollygoat <at> letterboxes.org wrote:
>>     
>>> On a boot a couple of days ago, mdadm failed a disk and
>>> started resyncing to spare (raid5, 6 drives, 5 active, 1
>>> spare).  smartctl -H <disk> returned info (can't remember
>>> the exact text) that made me suspect the drive was
>>> fine, but the data connection was bad.  Sure enough the
>>> data cable was damaged.  Replaced the cable and smartctl
>>> sees the disk just fine and reports no errors.
>>>
>>> - I'd like to readd the drive as a spare.  Is it enough
>>> to "mdadm --add /dev/hdk" or do I need to prep the drive to
>>> remove any data that said where it previously belonged
>>> in the array?
>>>       
>> That should work.
>> Any issues and you can zero the superblock (man mdadm)
>> No need to zero the disk.
>>     
>
> Would --re-add be better?
>
>   
I don't think do. And I would zero the superblock. The more detail you 
put into preventing unwanted autodetection the fewer learning 
experiences you will have.
(Continue reading)

Troy Cauble | 1 Feb 2009 23:18
Picon

Can't boot with drive pulled from RAID-1 /home (was: problem growing raid-1)

OK.  Forget about the "growing" part of my question.
I'll re-state things:

Why doesn't my system boot when I pull a drive that's
part of the the RAID-1 /home?

Recent history:
I discovered a couple of weeks ago that I had been running
this RAID degraded for an unknown amount of time.  So it
could boot and run degraded then.
I did a (fail, add, remove) pattern and was up and running.

Later, I figured out that my partition types for the raid drives
shouldn't be 83 and I changed them to 0xDA with fdisk.  I
did this while the raid was mounted, if it matters.

NOW I find out that if I shutdown, pull a disk and boot, I get
dropped into a repair shell with:

fsck.ext3: Unable to resolve 'UUID=806153bf-6917-440d-ae48-553418cfbbeb'

 which is the UUID of the raid filesystem.

But when I put the drive back in and reboot, and everything is fine.
I've repeated this with both disks of the raid.

In the repair shell I captured the following:

root <at> mastershake:/root# mdadm -D --scan
mdadm: md device /dev/md0 does not appear to be active.
(Continue reading)

Neil Brown | 2 Feb 2009 00:04
X-Face
Picon
Gravatar

Re: [PATCH] mdadm: Fix the used device size in mdadm -D output.

On Tuesday January 27, maan <at> systemlinux.org wrote:
> On 19:33, Andre Noll wrote:
> > This is twice as much as it should be due to a bug in mdadm which bites
> > only for version1 superblocks. The patch below should fix it. However,
> > this might not be the most elegant solution because the real bug
> > is IMHO that get_component_size() multiplies the value from sysfs
> > (which is always in 1K units) by two, so it returns 2K units which
> > looks a bit weird.
> 
> This was of course a braino: Multiplying a value in 1K units by two
> yields 512byte units rather than 2K units, which is exactly what the
> comment to get_component_size() says ;)
> 
> That being said, I think my patch is correct and dividing the result
> by two in Detail() seems the best way to deal with the situation.
> So here's the patch again, this time with proper log message.

Thanks Andre.
Your patch is good... but I really like to use "sectors" as often as
possible.
So I've changed it to leave 'dsize' as sectors, but device it by 2, or
shift by 9, as appropriate.

Thanks,
NeilBrown

> 
> Andre
> 
> commit 2e9fd78bd09bf332ac86f2d288a0e7c3c1c3df4f
(Continue reading)


Gmane