Neil Brown | 1 Aug 2006 03:19
X-Face
Picon
Gravatar

Re: let md auto-detect 128+ raid members, fix potential race condition

On Monday July 31, aoliva <at> redhat.com wrote:
> On Jul 30, 2006, Neil Brown <neilb <at> suse.de> wrote:
> 
> >  1/
> >     It just isn't "right".  We don't mount filesystems from partitions
> >     just because they have type 'Linux'.  We don't enable swap on
> >     partitions just because they have type 'Linux swap'.  So why do we
> >     assemble md/raid from partitions that have type 'Linux raid
> >     autodetect'? 
> 
> Similar reason to why vgscan finds and attempts to use any partitions
> that have the appropriate type/signature (difference being that raid
> auto-detect looks at the actual partition type, whereas vgscan looks
> at the actual data, just like mdadm, IIRC): when you have to bootstrap
> from an initrd, you don't want to be forced to have the correct data
> in the initrd image, since then any reconfiguration requires the info
> to be introduced in the initrd image before the machine goes down.
> Sometimes, especially in case of disk failures, you just can't do
> that.

The initrd need to 'know' how to find the root filesystem, whether by
devnum or uuid or whatever.
In exactly the same way it needs to know how to find the components
for the root md array - uuid is the best.  There is no need to
reconfigure this in the case of a disk failure.

Current mdadm will assemble arrays for you given only a hostname.  You
still need to get the hostname into the initrd, but that is no
different from a root device number.

(Continue reading)

Alexandre Oliva | 1 Aug 2006 04:20
Picon
Favicon

Re: let md auto-detect 128+ raid members, fix potential race condition

On Jul 31, 2006, David Greaves <david <at> dgreaves.com> wrote:

> Alexandre Oliva wrote:

>> in the initrd image, since then any reconfiguration requires the info
>> to be introduced in the initrd image before the machine goes down.
>> Sometimes, especially in case of disk failures, you just can't do
>> that.

> Your example supports Neil's case - the proposal is to use initrd to run
> mdadm which thne (kinda) does what vgscan does.

If mdadm can indeed scan all partitions to bring up all raid devices
in them, like nash's raidautorun does, great.  I'll give that a try,
since Neil suggested it should already work in the version of mdadm
that I got here.  I didn't get that impression while skimming through
the man page, but upon closer inspection now I see it's all there.
Oops :-)

>> I wouldn't have a problem with that, since then distros would probably
>> switch to a more recommended mechanism that works just as well, i.e.,
>> ideally without requiring initrd-regeneration after reconfigurations
>> such as adding one more raid device to the logical volume group
>> containing the root filesystem.

> That's supported in today's mdadm.

> look at --uuid and --name

--uuid and --name won't help at all.  I'm talking about adding raid
(Continue reading)

Alexandre Oliva | 1 Aug 2006 04:35
Picon
Favicon

Re: let md auto-detect 128+ raid members, fix potential race condition

On Jul 31, 2006, Neil Brown <neilb <at> suse.de> wrote:

> The initrd need to 'know' how to find the root filesystem, whether by
> devnum or uuid or whatever.

Yeah, the tricky bit is the `whatever' alternative, when / is a
logical volume, and you need to bring up all of the physical volumes
in order for vgscan to bring up the volume group in a usable way.

> In exactly the same way it needs to know how to find the components
> for the root md array - uuid is the best.  There is no need to
> reconfigure this in the case of a disk failure.

When you add physical volumes to the volume group, you'd have to
reconfigure initrd if it wasn't for mdadm's ability to scan all
partitions.

> Current mdadm will assemble arrays for you given only a hostname.  You
> still need to get the hostname into the initrd, but that is no
> different from a root device number.

Yep, this should work, at least until someone changes the hostname,
creates a new array with the new option and then gets puzzled because
only that array isn't brought up.

Or, worse, does all of the above and then rebuilds initrd ``just in
case'', and then ends up unable to reboot because the root device
won't be brought up.  Oops :-)

>> 
(Continue reading)

Alexandre Oliva | 1 Aug 2006 05:33
Picon
Favicon

Re: let md auto-detect 128+ raid members, fix potential race condition

On Jul 31, 2006, Alexandre Oliva <aoliva <at> redhat.com> wrote:

>> mdadm --assemble --scan --homehost='<system>' --auto-update-homehost \
>> --auto=yes --run

>> in your initrd, having set the hostname correctly first.  It might do
>> exactly what you want.

> I'll give it a try some time tomorrow, since I won't turn on that
> noisy box today any more; my daughter is already asleep :-)

But then, I could use my own desktop to test it :-)

FWIW, here's the patch for Fedora rawhide's mkinitrd that worked for
me.  I figured even without --homehost it worked fine, even without
HOMEHOST set in mdadm.conf.

I hope copying mdadm.conf to initrd won't ever hurt, can you think of
any case in which it would?

Attachment (mkinitrd-mdadm.patch): text/x-patch, 650 bytes

--

-- 
Alexandre Oliva         http://www.lsd.ic.unicamp.br/~oliva/
Secretary for FSF Latin America        http://www.fsfla.org/
Red Hat Compiler Engineer   aoliva <at> {redhat.com, gcc.gnu.org}
Free Software Evangelist  oliva <at> {lsd.ic.unicamp.br, gnu.org}
(Continue reading)

NeilBrown | 1 Aug 2006 06:08
X-Face
Picon
Gravatar

[PATCH] md: Fix a bug that recently crept into md/linear

This patch fixed a bug in 2.6.18-rc3 It would be good if it can get
into -final.

Thanks,
NeilBrown

### Comments for Changeset

A recent patch that allowed linear arrays to be reconfigured on-line
allowed in a bug which results in divide by zero - not all
mddev->array_size were converted to conf->array_size.

This patch finished the conversion and fixed the bug.

The offending patch was commit 7c7546ccf6463edbeee8d9aac6de7be1cd80d08a.

Thanks to Simon Kirby <sim <at> netnation.com> for the bug report.

Cc: Simon Kirby <sim <at> netnation.com>
Signed-off-by: Neil Brown <neilb <at> suse.de>

### Diffstat output
 ./drivers/md/linear.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff .prev/drivers/md/linear.c ./drivers/md/linear.c
--- .prev/drivers/md/linear.c	2006-08-01 11:06:15.000000000 +1000
+++ ./drivers/md/linear.c	2006-08-01 11:06:23.000000000 +1000
 <at>  <at>  -162,7 +162,7  <at>  <at>  static linear_conf_t *linear_conf(mddev_
 		goto out;
(Continue reading)

Michael Tokarev | 1 Aug 2006 10:28
Picon

Re: let md auto-detect 128+ raid members, fix potential race condition

Alexandre Oliva wrote:
[]
> If mdadm can indeed scan all partitions to bring up all raid devices
> in them, like nash's raidautorun does, great.  I'll give that a try,

Never, ever, try to do that (again).  Mdadm (or vgscan, or whatever)
should NOT assemble ALL arrays found, but only those which it has
been told to assemble.  This is it again: you bring another disk into
a system (disk which comes from another machine), and mdadm finds
FOREIGN arrays and brings them up as /dev/md0, where YOUR root
filesystem should be.  That's what 'homehost' option is for, for
example.

If initrd should be reconfigured after some changes (be it raid
arrays, LVM volumes, hostname, whatever), -- I for one am fine
with that.  Hopefully no one will argue that if you forgot to
install an MBR into your replacement drive, it was entirely your
own fault that your system become unbootable, after all ;)

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bill Davidsen | 1 Aug 2006 19:12

Re: [PATCH 005 of 9] md: Replace magic numbers in sb_dirty with well defined bit flags

Ingo Oeser wrote:

>Hi Neil,
>
>I think the names in this patch don't match the description at all.
>May I suggest different ones?
>
>On Monday, 31. July 2006 09:32, NeilBrown wrote:
>  
>
>>Instead of magic numbers (0,1,2,3) in sb_dirty, we have
>>some flags instead:
>>MD_CHANGE_DEVS
>>   Some device state has changed requiring superblock update
>>   on all devices.
>>    
>>
>
>MD_SB_STALE or MD_SB_NEED_UPDATE
>  
>
I think STALE is better, it is unambigous.

>  
>
>>MD_CHANGE_CLEAN
>>   The array has transitions from 'clean' to 'dirty' or back,
>>   requiring a superblock update on active devices, but possibly
>>   not on spares
>>    
(Continue reading)

Bill Davidsen | 1 Aug 2006 19:15

Re: [PATCH 004 of 9] md: Factor out part of raid10d into a separate function.

don't think this is better, NeilBrown wrote:

>raid10d has toooo many nested block, so take the fix_read_error
>functionality out into a separate function.
>  
>

Definite improvement in readability. Will all versions of the compiler 
do something appropriate WRT inlining or not?

--

-- 
bill davidsen <davidsen <at> tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bill Davidsen | 1 Aug 2006 19:40

Re: let md auto-detect 128+ raid members, fix potential race condition

Neil Brown wrote:

>[linux-raid added to cc.
> Background: patch was submitted to remove the current hard limit
> of 127 partitions that can be auto-detected - limit set by 
> 'detected_devices array in md.c.
>]
>
>My first inclination is not to fix this problem.
>
>I consider md auto-detect to be a legacy feature.
>I don't use it and I recommend that other people don't use it.
>However I cannot justify removing it, so it stays there.
>Having this limitation could be seen as a good motivation for some
>more users to stop using it.
>
>Why not use auto-detect?
>I have three issues with it.
>
> 1/
>    It just isn't "right".  We don't mount filesystems from partitions
>    just because they have type 'Linux'.  We don't enable swap on
>    partitions just because they have type 'Linux swap'.  So why do we
>    assemble md/raid from partitions that have type 'Linux raid
>    autodetect'? 
>  
>

I rarely think you are totally wrong about anything RAID, but I do 
believe you have missed the point of autodetect. It is intended to work 
(Continue reading)

Neil Brown | 1 Aug 2006 22:27
X-Face
Picon
Gravatar

Re: [PATCH 004 of 9] md: Factor out part of raid10d into a separate function.

On Tuesday August 1, davidsen <at> tmr.com wrote:
> don't think this is better, NeilBrown wrote:
> 
> >raid10d has toooo many nested block, so take the fix_read_error
> >functionality out into a separate function.
> >  
> >
> 
> Definite improvement in readability. Will all versions of the compiler 
> do something appropriate WRT inlining or not?

As the separated function is called about once in a blue moon, it
hardly matters.  I'd probably rather it wasn't inlined so as to be
sure it doesn't clutter the L-1 cache when it isn't needed, but that's
the sort of thing I really want to leave to the compiler.

Maybe it would be good to stick an 'unlikely' or 'likely' in raid10d
to tell the compiler how likely a read error is...

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane