Neil Brown | 1 Jul 01:58 2009

Re: md: Use new topology calls to indicate alignment and I/O sizes

On Tuesday June 30, snitzer <at> wrote:
> Neil,
> Any chance you could make a formal pull request for this MD topology
> support patch for 2.6.31-rc2?

Yes.. I want to include a couple of other patches in the same batch
and I've been a bit distracted lately (by a cold among other things).
I'll try to get this sorted out today.

> NOTE: all the disk_stack_limits() calls must pass the 3rd 'offset' arg
> in terms of bytes and _not_ sectors.  So all of MD's calls to
> disk_stack_limits() should pass: rdev->data_offset << 9

Noted, thanks.

To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at>
More majordomo info at

Neil Brown | 1 Jul 02:29 2009

Re: [dm-devel] REQUEST for new 'topology' metrics to be moved out of the 'queue' sysfs directory.

On Tuesday June 30, adilger <at> wrote:
> On Jun 29, 2009  13:41 +0200, Jens Axboe wrote:
> > ... externally it just makes the API worse since tools then have to know
> > which device type they are talking to.
> > 
> > So I still see absolutely zero point in making such a change, quite the
> > opposite.
> Exactly correct.  Changing these tunables just for the sake of giving
> them a slightly different name is madness.  Making all block devices
> appear more uniform to userspace (even if they don't strictly need all
> of the semantics) is very sensible.  The whole point of the kernel is
> to abstract away the underlying details so that userspace doesn't need
> to understand it all again.

Uniformity is certainly desirable.  But we shouldn't take it so far
as to make apples look like oranges.

We wouldn't want a SATA disk drive to have 'chunk_size' and 'raid_disks'.
Nor would we want a software RAID array to have a 'scheduler' or
'iosched' attributes.

> In order to get good throughput on RAID arrays we need to tune the
> queue/max_* values to ensure the IO requests don't get split.
> It would be great if the MD queue/max_* values would pass these tunings
> down to the underlying disk devices as well.  As it stands now, we have
> to follow the /sys/block/*/slaves tree to set all of these ourselves,
> and before "slaves/" was introduced it was nigh impossible to automatically
(Continue reading)

Neil Brown | 1 Jul 03:46 2009

Re: [PATCH] MD: md, fix lock imbalance

On Sunday June 21, jirislaby <at> wrote:
> Add unlock and put to one of fail paths in md_alloc.

Hi Jiri,
 thanks for finding this.
 I have split it up in to two patches.  One just fixes the bug as
 simply as possible.  This will be tagged for -stable.
 The other tidies up the exist paths (a little differently to the way
 you did).  That one won't go to -stable.

 See below.


commit d7a0dc02b59b8656d7817bf2da3822df64fe4886
Author: NeilBrown <neilb <at>>
Date:   Wed Jul 1 11:45:14 2009 +1000

    md: fix error patch which duplicate name is found on md device creation.

    When an md device is created by name (rather than number) we need to
    check that the name is not already in use.  If this check finds a
    duplicate, we return an error without dropping the lock or freeing
    the newly create mddev.
    This patch fixes that.

    Cc: stable <at>
    Found-by: Jiri Slaby <jirislaby <at>>
    Signed-off-by: NeilBrown <neilb <at>>
(Continue reading)

Roger Heflin | 1 Jul 04:43 2009

Re: device with newer data added as spare - data now gone?

Molinero wrote:
> Hi all
> I've lost quite a lot of data on my /home raid partition and I'm wondering
> what exactly I did to make it happen. I'd like to know so something similar
> won't happen in the future.
> I'm pretty much a raid newbie. I setup raid1 on my home server and I'm
> guessing that something like this happened. Please tell me if it's possible.
> * Some time ago I did something to have one device fail which resulted md3
> in having only 1 device.
> * Time went by without me noticing (because I suck)
> * An update broke my raid setup and gave me a kernel panic (because I suck).
> Didn't put the mdadm and raid hooks in mkinitcpio.conf
> * Booted a live-cd, mounted the drives and chrooted back into the system and
> fixed the mkinitcpio.conf
> * Rebooted and noticed that md3 was running with only 1 device
> * Added sdb4 to md3 and it then read 1 device with 1 spare
> * cat /proc/mdstat started to say "recovery"
> * All data from approx. 1 year is gone
> I guessing that the old (not updated) device was set as "master" and the
> data on the drive (containing newer data) was overwritten by data on the old
> device - is this plausible?

If the old device was brought up as md3 and had dropped out months 
ago, the data would now be the data that existed when that disk 
dropped off.   And when a device drops out, there is no mark on that 
device marking it as bad since the typical reasons for the device 
(Continue reading)

Neil Brown | 1 Jul 05:20 2009

[PULL REQUEST] md updates for 2.6.31-rc

Hi Linus,
 please pull these md updates:  They add the appropriate call for the
 new 'topology' numbers, and fix a small assortment of bugs.


The following changes since commit 5a4f13fad1ab5bd08dea78fc55321e429d83cddf:
  Linus Torvalds (1):
        Merge git://

are available in the git repository at:

  git:// for-linus

Martin K. Petersen (1):
      md: Use new topology calls to indicate alignment and I/O sizes

NeilBrown (5):
      md: avoid dereferencing NULL pointer when accessing suspend_* sysfs attributes.
      md: fix error path when duplicate name is found on md device creation.
      md: tidy up error paths in md_alloc
      md/raid5: suspend shouldn't affect read requests.
      md: use interruptible wait when duration is controlled by userspace.

 drivers/md/linear.c    |    4 +-
 drivers/md/md.c        |   56 +++++++++++++++++++++++++++--------------------
 drivers/md/multipath.c |    7 +++--
 drivers/md/raid0.c     |    9 ++++++-
(Continue reading)

Leslie Rhorer | 1 Jul 06:12 2009

RE: Adding a smaller drive

> >>>>> "Mikael" == Mikael Abrahamsson <swmike <at>> writes:
> Mikael> The 2TB WD GP drives have less sectors than their 2x their 1TB
> Mikael> drives.
> That's how it's supposed to be.

	You mean that's how IDEMA specs read.  Unless there is some legal
agreement signed by the drive manufacturer (or whomever) requiring adherence
to a certain spec, they can do just about anything they want.  Compliance
with a published spec is great, but unless some licensing agreement is in
place, it isn't enforceable.

> Mikael> The 2TB ones adhere to the IDEMA standard (per your other
> Mikael> email), but the 1TB ones do not. (2TB ones have 3907029168
> Mikael> sectors, the 1TB ones have 1953103872).
> That, on the other hand, is pretty weird.  Call the capacity police!

	That's pretty much the point.  Not adhering to IDEMA specs is not

To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at>
More majordomo info at

Leslie Rhorer | 1 Jul 06:16 2009

RE: Adding a smaller drive

> On 28/06/2009 22:22, Leslie Rhorer wrote:
> > 	I'm not confidant of that presumption.  I would not be surprised in
> > the least if some manufacturer produced a 1T drive with an actual 999.8G
> If they did that, they'd be lying in describing it as a 1TB drive. I

	'Not as long as they do publish the actual size.

> wish they'd be more honest in the first place and sell them as 931GiB
> drives, or make real 1TiB drives, but the marketing literature does at
> least explain their definition of TB, GB etc.

	Which illustrates the point, I think.  In any case, it won't be the
end of the world, but it does make just one more thing to nag me.

To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at>
More majordomo info at

Martin K. Petersen | 1 Jul 07:25 2009

Re: Adding a smaller drive

>>>>> "Leslie" == Leslie Rhorer <lrhorer <at>> writes:

>> That's how it's supposed to be.

Leslie> You mean that's how IDEMA specs read.  Unless there is some
Leslie> legal agreement signed by the drive manufacturer (or whomever)
Leslie> requiring adherence to a certain spec, they can do just about
Leslie> anything they want.  Compliance with a published spec is great,
Leslie> but unless some licensing agreement is in place, it isn't
Leslie> enforceable.

Oracle is not a member so I'm not sure what (if any) leverage is
available as part of the IDEMA membership agreement.

I do think, however, that you are underestimating the power of industry
associations and standards bodies.  System manufacturers, enterprise
customers and governments absolutely refuse to buy things that are not
compliant.  So this is not about whether you can legally cut corners.
It is about being able to sell your product in the first place.

In this particular case IDEMA is an organization founded and run by the
drive manufacturers themselves.  They collaborated on the LBA spec and
have all publicly stated that they'll adhere to it.  It is not a
requirement that was forced upon them by an external entity.  Although
it was, of course, motivated by customers unhappy with the annoying
variation in LBA count between brands and even drive models...


Martin K. Petersen	Oracle Linux Engineering
(Continue reading)

Goswin von Brederlow | 1 Jul 08:57 2009

Re: What will happen with spares in this scenario?

John McNulty <johnmcn1 <at>> writes:

> On 30 Jun 2009, at 18:54, Goswin von Brederlow wrote:
>> In case of grub2: It does support both raid and lvm. The raid
>> superblocks are parsed to construct mdX devices and the lvm metadata
>> is parsed to locate lvm logical volumes. So you could say it does
>> construct the LV and mounts the FS.
> Interesting, thanks.  I've not been following grub2 development.
> Given the production nature of these systems though (and the customer)
> I'm stuck with grub legacy until Redhat update it and support grub2,
> which judging from the chatter on the Fedora Project Portal about the
> possibility of including it in Fedora 12, could be some years off.
> Looks like Ubuntu will be putting it into 9.10 though, so I'll have a
> play with it on one of my dev boxes at home.
>> In case of lilo: Lilo only stores a list of blocks where the
>> kernel/initrd are on the device. Afaik each component device of a raid
>> stores the the block numbers for the kernel/initrd on that component
>> device. So no matter which component device boots it will find the
>> kernel/initrd on the same device. The raid1 and lvm are completly
>> circumvented.
> Have not touched lilo since grub went main stream, but will have
> another look at that.
> I've gone over all this with the customer.  They want belt and braces
> protection (they are to be very critical systems in a hospital) so
> we've decided to hardware mirror c0d0 and c0d1 for the disks on one
(Continue reading)

Andre Noll | 1 Jul 10:38 2009

[PATCH/Resend] md: Push down data integrity code to personalities.

Hi Neil,

here's again the patch that reduces the knowledge about specific
raid levels from md.c by moving the data integrity code to the
personalities. The patch was tested and acked by Martin.

Please review.


commit 51295532895ffe532a5d8401fc32073100268b29
Author: Andre Noll <maan <at>>
Date:   Fri Jun 19 14:40:46 2009 +0200

    [PATCH/RFC] md: Push down data integrity code to personalities.

    This patch replaces md_integrity_check() by two new functions:
    md_integrity_register() and md_integrity_add_rdev() which are both

    md_integrity_register() is a public function which is called from
    the ->run method of all personalities that support data integrity.
    The function iterates over the component devices of the array and
    determines if all active devices are integrity capable and if their
    profiles match. If this is the case, the common profile is registered
    for the mddev via blk_integrity_register().

    The second new function, md_integrity_add_rdev(), is internal to
    md.c and is called by bind_rdev_to_array(), i.e. whenever a new
(Continue reading)