Maciej Patelczyk | 29 Aug 17:49 2011
Picon

[PATCH] mdadm: close parent file descriptors when starting mdmon.

When mdadm is invoked by fork-and-exec it inherits all open file
descriptors and when mdadm forks to exec mdmon those file descriptors
are passed to mdmon. Mdmon closes only first 97 fd and that in some
cases is not enough.
This commit adds function which looks at the '/proc/≤pid>/fd' directory
and closes all inherited file descriptors except the standard ones (0-2).

Signed-off-by: Maciej Patelczyk <maciej.patelczyk <at> intel.com>
---
 util.c |   41 +++++++++++++++++++++++++++++++++++++----
 1 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/util.c b/util.c
index ce03239..b844447 100644
--- a/util.c
+++ b/util.c
 <at>  <at>  -30,7 +30,12  <at>  <at> 
 #include	<sys/un.h>
 #include	<ctype.h>
 #include	<dirent.h>
+#include	<sys/types.h>
 #include	<signal.h>
+#include	<stdlib.h>
+#include	<string.h>
+#include	<errno.h>
+#include	<unistd.h>

 /*
  * following taken from linux/blkpg.h because they aren't
 <at>  <at>  -1571,11 +1576,41  <at>  <at>  int mdmon_running(int devnum)
(Continue reading)

Chris Pearson | 29 Aug 18:30 2011
Picon

Re: dirty chunks on bitmap not clearing (RAID1)

I have the same problem.  3 chunks are always dirty.

I'm using 2.6.38-8-generic and mdadm - v3.1.4 - 31st August 2010

If that's not normal, then maybe what I've done differently is that I
created the array, raid 1, with one live and one missing disk, then
added the second one later after writing a lot of data.

Also, though probably not the cause, I continued writing data while it
was syncing, and a couple times during the syncing, both drives
stopped responding and I had to power off.

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md127 : active raid1 sdd1[0] sdc1[2]
      1904568184 blocks super 1.2 [2/2] [UU]
      bitmap: 3/15 pages [12KB], 65536KB chunk

unused devices: <none>

# mdadm -X /dev/sd[dc]1
        Filename : /dev/sdc1
           Magic : 6d746962
         Version : 4
            UUID : 43761dc5:4383cf0f:41ef2dab:43e6d74e
          Events : 40013
  Events Cleared : 40013
           State : OK
       Chunksize : 64 MB
(Continue reading)

Alexander Lyakas | 29 Aug 19:17 2011
Picon

MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'

Greetings everybody,

I issue
mdadm --stop /dev/md0
and I want to reliably determine that the MD devnode (/dev/md0) is gone.
So I look for the udev 'remove' event for that devnode.
However, in some cases even after I see the udev event, I issue
mdadm --detail /dev/md0
and I get:
mdadm: md device /dev/md0 does not appear to be active

According to Detail.c, this means that mdadm can successfully do
open("/dev/md0") and receive a valid fd.
But later, when issuing ioctl(fd, GET_ARRAY_INFO) it receives ENODEV
from the kernel.

Can somebody suggest an explanation for this behavior? Is there a
reliable way to know when a MD devnode is gone?

Thanks,
  Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

NeilBrown | 29 Aug 23:25 2011
Picon

Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'

On Mon, 29 Aug 2011 20:17:34 +0300 Alexander Lyakas <alex.bolshoy <at> gmail.com>
wrote:

> Greetings everybody,
> 
> I issue
> mdadm --stop /dev/md0
> and I want to reliably determine that the MD devnode (/dev/md0) is gone.
> So I look for the udev 'remove' event for that devnode.
> However, in some cases even after I see the udev event, I issue
> mdadm --detail /dev/md0
> and I get:
> mdadm: md device /dev/md0 does not appear to be active
> 
> According to Detail.c, this means that mdadm can successfully do
> open("/dev/md0") and receive a valid fd.
> But later, when issuing ioctl(fd, GET_ARRAY_INFO) it receives ENODEV
> from the kernel.
> 
> Can somebody suggest an explanation for this behavior? Is there a
> reliable way to know when a MD devnode is gone?

run "udevadm settle" after stopping /dev/md0  is most likely to work.

I suspect that udev removes the node *after* you see the 'remove' event.
Sometimes so soon after that you don't see the lag - sometimes a bit later.

NeilBrown

> 
(Continue reading)

NeilBrown | 30 Aug 01:31 2011
Picon

Re: [PATCH] mdadm: close parent file descriptors when starting mdmon.

On Mon, 29 Aug 2011 17:49:49 +0200 Maciej Patelczyk
<maciej.patelczyk <at> intel.com> wrote:

> When mdadm is invoked by fork-and-exec it inherits all open file
> descriptors and when mdadm forks to exec mdmon those file descriptors
> are passed to mdmon. Mdmon closes only first 97 fd and that in some
> cases is not enough.

Can you describe and actual can when it is not enough?  Maybe there is some
other problem where mdadm is not closing things as it should.

> This commit adds function which looks at the '/proc/≤pid>/fd' directory
> and closes all inherited file descriptors except the standard ones (0-2).

I'm not thrilled by this approach.
For a start, the loop could close the fd that is being used to
read /proc/≤pid>/fd, so subsequent fds won't get seen or closed.

I would much rather do as the comment suggests and use O_CLOEXEC on all opens
so that everything gets closed when we do an exec.

About the most I  would be willing to do in closing more fds before the exec
is to keep closing until we get too many failures.
e.g.
  fd = 3;
  failed = 0;
  while (failed < 20) {
      if (close(fd) < 0)
          failed ++;
      else
(Continue reading)

Mathias Burén | 30 Aug 01:40 2011
Picon

Re: (solved) RAID1, changed disk, 2nd has errors ...

On 29 August 2011 15:34, Stefan G. Weichinger <lists <at> xunil.at> wrote:
> Am 29.08.2011 10:25, schrieb Stefan G. Weichinger:
>
>> I get
>>
>> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always
>>       -       0
>> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age
>> Offline      -       0
>>
>>
>> Sounds good to me! Right?
>>
>> So now I could re-add /dev/sdb4 to retry syncing that array, correct?
>
> Did that.
>
> I failed/removed/re-added /dev/sdb4 and waited for some hours of resyncing.
>
> Now /dev/md2 is in sync again, still with no bad sectors in SMART
> (attached,  <at> Mathias ;-))
>
> thanks to Robin and Mathias for your feedback, it helped me to get the
> picture and chose the next steps!
>
> For now I let the arrays as they are and wait for the second new hdd.
> As soon as I have it here I will swap /dev/sdb as well.
>
> (a new server with maybe RAID6 is soon to come there ...)
>
(Continue reading)

NeilBrown | 30 Aug 04:20 2011
Picon

Re: [PATCH 5/9] imsm: fix reserved sectors for spares

On Fri, 26 Aug 2011 12:51:18 -0700 "Williams, Dan J"
<dan.j.williams <at> intel.com> wrote:

> On Thu, Aug 25, 2011 at 7:14 PM, Dan Williams <dan.j.williams <at> intel.com> wrote:
> > Different OROMs reserve different amounts of space for the migration area.
> > When activating a spare minimize the reserved space otherwise a valid spare
> > can be prevented from joining an array with a migration area smaller than
> > IMSM_RESERVED_SECTORS.
> >
> > This may result in an array that cannot be reshaped, but that is less
> > surprising than not being able to rebuild a degraded array.
> > imsm_reserved_sectors() already reports the minimal value which adds to
> > the confusion when trying rebuild an array because mdadm -E indicates
> > that the device has enough space.
> >
> > Cc: Anna Czarnowska <anna.czarnowska <at> intel.com>
> > Signed-off-by: Dan Williams <dan.j.williams <at> intel.com>
> > ---
> >  super-intel.c |   11 ++++++++++-
> >  1 files changed, 10 insertions(+), 1 deletions(-)
> >
> > diff --git a/super-intel.c b/super-intel.c
> > index 0347183..193e0d0 100644
> > --- a/super-intel.c
> > +++ b/super-intel.c
> >  <at>  <at>  -833,7 +833,16  <at>  <at>  static struct extent *get_extents(struct intel_super *super, struct dl *dl)
> >        struct extent *rv, *e;
> >        int i;
> >        int memberships = count_memberships(dl, super);
> > -       __u32 reservation = MPB_SECTOR_CNT + IMSM_RESERVED_SECTORS;
(Continue reading)

NeilBrown | 30 Aug 04:26 2011
Picon

Re: [PATCH 7/9] imsm: support 'missing' devices at Create

On Thu, 25 Aug 2011 19:14:35 -0700 Dan Williams <dan.j.williams <at> intel.com>
wrote:

> Specifying missing devices at create is very useful for array recovery.
> 
> For imsm create dummy disk entries at init_super_imsm time, and then use
> them to fill in unoccupied slots in the final array (if the container is
> unpopulated).
> 
> If the container is already populated (has a subarray)
> 'missing' disks must be in reference to already recorded missing devices
> in the metadata.

It would appear you have also implement --assume-clean:

> -	vol->dirty = 0;
> +	vol->dirty = !info->state;

Thanks - that has been bothering me.

I'll update the commit-log to record this.

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

(Continue reading)

NeilBrown | 30 Aug 04:58 2011
Picon

Re: [PATCH 9/9] mdadm: 'dump' support

On Thu, 25 Aug 2011 19:14:45 -0700 Dan Williams <dan.j.williams <at> intel.com>
wrote:

>   mdadm -E /dev/sda --dump=foo
> 
> Creates a sparse file image of /dev/sda named foo with a copy of the
> metadata instance found by -E.
> 
> When used in the opposite direction:
> 
>   mdadm -E foo --dump=/dev/sda
> 
> ...can restore metadata to a given block device, assuming it is
> identical size to the dump file.

I like this functionality - I've thought about doing something like this
before but never quite done it.

I'm not convinced of the interface yet - especially the lack of checks.  If
it can be misused it will be misused and we should try to avoid that.

Using a sparse file is probably a good idea - it removes the need to try to
invent a format for storing non-consecutive metadata and recording the offset.

dmraid has a --dump-metadata (-D) option which creates a directory and stores
one file per device there.  I like that as it make it easy to get everything
in the right place.

Maybe the arg to --dump could be a directory name to store dump files in ...
but then we lose the pleasing symmetry of using the same command to dump and
(Continue reading)

NeilBrown | 30 Aug 05:13 2011
Picon

Re: [PATCH 0/9] recovering an imsm raid5 array

On Thu, 25 Aug 2011 19:13:59 -0700 Dan Williams <dan.j.williams <at> intel.com>
wrote:

> I run imsm raid5 and raid1 arrays on my personal systems and was
> recently bit by the bug that was fixed in commit 1a2487c2 "FIX: imsm:
> OROM does not recognize degraded arrays (V2)".  In the process of
> investigating that I pulled the wrong disk and ended up in a dual
> degraded situation.
> 
> These patches (ordered in roughly increasing order of required review)
> are the features needed to get the array up and running again, as well
> as some random fixes spotted along the way.
> 
> The most important patch for recovery being patch7 "imsm: support
> 'missing' devices at Create", allowing the mdadm raid5 recovery path of
> recreating the raid array with what one thinks are the good disks /
> order, and then  attempt to mount the filesystem to see if the guess was
> correct.
> 
> Example, create degraded raid5 with slot1 missing.
> 
> mdadm --create /dev/md0 /dev/sd[ac] -n 2 -e imsm
> mdadm --create /dev/md1 /dev/sda missing /dev/sdc -n 3 -l 5
> 
> However, during this process I wanted to make sure I could get back to
> the exact same metadata that was on the disks, to root cause what went
> wrong, examine the metadata offline, or backup a step if I made a
> mistake in the recovery process.  Patch9 implements --dump support, the
> fact that something like this has not been implemented already is maybe
> a clue that it is not such a great idea?  I can imagine someone messing
(Continue reading)


Gmane