Stan Hoeppner | 1 Jun 2012 03:31

Re: Can extremely high load cause disks to be kicked?

On 5/31/2012 3:31 AM, Andy Smith wrote:

> Now, is this sort of behaviour expected when under incredible load?
> Or is it indicative of a bug somewhere in kernel, mpt driver, or
> even flaky SAS controller/disks?

It is expected that people know what RAID is and how it is supposed to
be used.  RAID is to be used for protecting data in the event of a disk
failure and secondarily to increase performance.  That is not how you
seem to be using RAID.  BTW, I can't fully discern from your log
snippets...are you running md RAID inside of virtual machines or only on
the host hypervisor?  If the former problems like this are expected and
normal, which is why it is recommended to NEVER run md RAID inside a VM.

> Controller: LSISAS1068E B3, FwRev=011a0000h
> Motherboard: Supermicro X7DCL-3
> Disks: 4x SEAGATE  ST9300603SS      Version: 0006
> 
> While I'm familiar with the occasional big DDoS causing extreme CPU
> load, hung tasks, CPU soft lockups etc., I've never had it kick
> disks before. 

The md RAID driver didn't kick disks.  It kicked partitions, as this is
what you built your many arrays with.

> But I only have this one server with SAS and mdadm
> whereas all the others are SATA and 3ware with BBU.

Fancy that.

(Continue reading)

Igor M Podlesny | 1 Jun 2012 05:15
Picon

Re: Can extremely high load cause disks to be kicked?

On 1 June 2012 09:31, Stan Hoeppner <stan <at> hardwarefreak.com> wrote:
[…]
> You could probably expect it to be more reliable if you used RAID as
> it's meant to be used, which in this case would be a single RAID10 array
> using none, or only one partition per disk, instead of creating 4 or 5
> different md RAID arrays from 4-5 partitions on each disk.  This is
> simply silly, and it's dangerous if doing so inside VMs.

   — How do you know those RAIDs are inside VMs?

--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Stan Hoeppner | 1 Jun 2012 16:12

Re: Can extremely high load cause disks to be kicked?

On 5/31/2012 10:15 PM, Igor M Podlesny wrote:
> On 1 June 2012 09:31, Stan Hoeppner <stan <at> hardwarefreak.com> wrote:
> […]
>> You could probably expect it to be more reliable if you used RAID as
>> it's meant to be used, which in this case would be a single RAID10 array
>> using none, or only one partition per disk, instead of creating 4 or 5
>> different md RAID arrays from 4-5 partitions on each disk.  This is
>> simply silly, and it's dangerous if doing so inside VMs.
> 
>    — How do you know those RAIDs are inside VMs?

Those who speak English as a first language likely understood my use of
"if".  Had I used "when" instead, that would have implied certainty of
knowledge.  "If" conveys a possibility, a hypothetical.

For the English challenged, maybe reversing the sentence is more
comprehensible:

"If doing so inside VMs it is dangerous."

--

-- 
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Igor M Podlesny | 1 Jun 2012 17:19
Picon

Re: Can extremely high load cause disks to be kicked?

On 1 June 2012 22:12, Stan Hoeppner <stan <at> hardwarefreak.com> wrote:
> On 5/31/2012 10:15 PM, Igor M Podlesny wrote:
>> On 1 June 2012 09:31, Stan Hoeppner <stan <at> hardwarefreak.com> wrote:
>> […]
>>> You could probably expect it to be more reliable if you used RAID as
>>> it's meant to be used, which in this case would be a single RAID10 array
>>> using none, or only one partition per disk, instead of creating 4 or 5
>>> different md RAID arrays from 4-5 partitions on each disk.  This is
>>> simply silly, and it's dangerous if doing so inside VMs.
>>
>>    — How do you know those RAIDs are inside VMs?
>
> Those who speak English as a first language likely understood my use of

   So you don't. Well, lemme remind some words of wisdom to you:
"Assumption is the mother of all f*ckups". (Feel free to reverse it as
you like), Stan.

--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Andy Smith | 1 Jun 2012 21:25
Favicon

Re: Can extremely high load cause disks to be kicked?

Hi Stan,

On Thu, May 31, 2012 at 08:31:49PM -0500, Stan Hoeppner wrote:
> On 5/31/2012 3:31 AM, Andy Smith wrote:
> > Now, is this sort of behaviour expected when under incredible load?
> > Or is it indicative of a bug somewhere in kernel, mpt driver, or
> > even flaky SAS controller/disks?
> 
> It is expected that people know what RAID is and how it is supposed to
> be used.  RAID is to be used for protecting data in the event of a disk
> failure and secondarily to increase performance.  That is not how you
> seem to be using RAID.

Just to clarify, this was the hypervisor host. The VMs on it don't
use RAID themselves as that would indeed be silly.

> There are a number of scenarios where md RAID is better than hardware
> RAID and vice versa.  Yours is a case where hardware RAID is superior,
> as no matter the host CPU load, drives won't get kicked offline as a
> result, as they're under the control of a dedicated IO processor (same
> for SAN RAID).

Fair enough, thanks.

Cheers,
Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
(Continue reading)

freeone3000 | 2 Jun 2012 01:22
Picon
Gravatar

Re: Data Offset

Hello. I have an issue concerning a broken RAID of unsure pedigree.
Examining the drives tells me the block sizes are not the same, as
listed in the email.

> I certainly won't be easy.  Though if someone did find themselves in that
> situation it might motivate me to enhance mdadm in some way to make it easily
> fixable.

I seem to be your motivation for making this situation fixable.
Somehow I managed to get drives with an invalid block size. All worked
fine until a drive dropped out of the RAID5. When attempting to
replace, I can re-create the RAID, but it cannot be of the same size
because the 1024-sector drives are "too small" when changed to
2048-sector, exactly as described. Are there any recovery options I
could try, including simply editing the header?

mdadm --examine of all drives in the RAID:

/dev/sdb3:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 9759ad94:75e30b6b:8a726b4d:177a6eda
           Name : leyline:1  (local to host leyline)
  Creation Time : Mon Sep 12 13:19:00 2011
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 3906525098 (1862.78 GiB 2000.14 GB)
     Array Size : 15626096640 (7451.10 GiB 8000.56 GB)
(Continue reading)

NeilBrown | 2 Jun 2012 01:52
Picon
Gravatar

Re: Data Offset

On Fri, 1 Jun 2012 18:22:33 -0500 freeone3000 <freeone3000 <at> gmail.com> wrote:

> Hello. I have an issue concerning a broken RAID of unsure pedigree.
> Examining the drives tells me the block sizes are not the same, as
> listed in the email.
> 
> > I certainly won't be easy.  Though if someone did find themselves in that
> > situation it might motivate me to enhance mdadm in some way to make it easily
> > fixable.
> 
> I seem to be your motivation for making this situation fixable.
> Somehow I managed to get drives with an invalid block size. All worked
> fine until a drive dropped out of the RAID5. When attempting to
> replace, I can re-create the RAID, but it cannot be of the same size
> because the 1024-sector drives are "too small" when changed to
> 2048-sector, exactly as described. Are there any recovery options I
> could try, including simply editing the header?

You seem to be leaving out some important information.
The "mdadm --examine" of all the drives is good - thanks - but what exactly
if your problem, and what were you trying to do?

You appear to have a 5-device RAID5 of which one device (sde3) fell out of
the array on or shortly after 23rd May, 3 drives are working fine, and one -
sdf (not sdf3??) - is a confused spare....

What exactly did you do to sdf?

NeilBrown

(Continue reading)

freeone3000 | 2 Jun 2012 02:48
Picon
Gravatar

Re: Data Offset

Sorry.

/dev/sde fell out of the array, so I replaced the physical drive with
what is now /dev/sdf. udev may have relabelled the drive - smartctl
states that the drive that is now /dev/sde works fine.
/dev/sdf is a new drive. /dev/sdf has a single, whole-disk partition
with type marked as raid. It is physically larger than the others.

/dev/sdf1 doesn't have a mdadm superblock. /dev/sdf seems to, so I
gave output of that device instead of /dev/sdf1, despite the
partition. Whole-drive RAID is fine, if it gets it working.

What I'm attempting to do is rebuild the RAID from the data from the
other four drives, and bring the RAID back up without losing any of
the data. /dev/sdb3, /dev/sdc3, /dev/sdd3, and what is now /dev/sde3
should be used to rebuild the array, with /dev/sdf as a new drive. If
I can get the array back up with all my data and all five drives in
use, I'll be very happy.

On Fri, Jun 1, 2012 at 6:52 PM, NeilBrown <neilb <at> suse.de> wrote:
> On Fri, 1 Jun 2012 18:22:33 -0500 freeone3000 <freeone3000 <at> gmail.com> wrote:
>
>> Hello. I have an issue concerning a broken RAID of unsure pedigree.
>> Examining the drives tells me the block sizes are not the same, as
>> listed in the email.
>>
>> > I certainly won't be easy.  Though if someone did find themselves in that
>> > situation it might motivate me to enhance mdadm in some way to make it easily
>> > fixable.
>>
(Continue reading)

Stan Hoeppner | 2 Jun 2012 06:45

Re: Can extremely high load cause disks to be kicked?

On 6/1/2012 10:19 AM, Igor M Podlesny wrote:
> On 1 June 2012 22:12, Stan Hoeppner <stan <at> hardwarefreak.com> wrote:
>> On 5/31/2012 10:15 PM, Igor M Podlesny wrote:
>>> On 1 June 2012 09:31, Stan Hoeppner <stan <at> hardwarefreak.com> wrote:
>>> […]
>>>> You could probably expect it to be more reliable if you used RAID as
>>>> it's meant to be used, which in this case would be a single RAID10 array
>>>> using none, or only one partition per disk, instead of creating 4 or 5
>>>> different md RAID arrays from 4-5 partitions on each disk.  This is
>>>> simply silly, and it's dangerous if doing so inside VMs.
>>>
>>>    — How do you know those RAIDs are inside VMs?
>>
>> Those who speak English as a first language likely understood my use of
> 
>    So you don't. Well, lemme remind some words of wisdom to you:
> "Assumption is the mother of all f*ckups". (Feel free to reverse it as
> you like), Stan.

Igor, you simply misunderstood what I stated.  I explained what likely
caused you to misunderstand.  I wasn't trying to insult you, or anyone
else who is not a native English speaker.

There's no reason for animosity over a simple misinterpretation of
language syntax.  Be proud that you speak and write/read English very
well.  I, on the other hand, cannot read/write/speak any Cyrillic
language, though I can recognize the characters of the alphabet as
Cyrillic.  In fact, I don't know any languages other than English, but
for some basic phrases in Spanish.  If I had to call for a doctor in
anything other than English I'd be in big trouble.
(Continue reading)

Stan Hoeppner | 2 Jun 2012 07:47

Re: Can extremely high load cause disks to be kicked?

On 6/1/2012 2:25 PM, Andy Smith wrote:
> Hi Stan,
> 
> On Thu, May 31, 2012 at 08:31:49PM -0500, Stan Hoeppner wrote:
>> On 5/31/2012 3:31 AM, Andy Smith wrote:
>>> Now, is this sort of behaviour expected when under incredible load?
>>> Or is it indicative of a bug somewhere in kernel, mpt driver, or
>>> even flaky SAS controller/disks?
>>
>> It is expected that people know what RAID is and how it is supposed to
>> be used.  RAID is to be used for protecting data in the event of a disk
>> failure and secondarily to increase performance.  That is not how you
>> seem to be using RAID.
> 
> Just to clarify, this was the hypervisor host. The VMs on it don't
> use RAID themselves as that would indeed be silly.

Cool.  I only mentioned this as I've seen it in the wild more than once.

>> There are a number of scenarios where md RAID is better than hardware
>> RAID and vice versa.  Yours is a case where hardware RAID is superior,
>> as no matter the host CPU load, drives won't get kicked offline as a
>> result, as they're under the control of a dedicated IO processor (same
>> for SAN RAID).
> 
> Fair enough, thanks.

You could still use md RAID in your scenario.  But instead of having
multiple md arrays built of disk partitions and passing each array up to
a VM guest, the proper way to do this thin provisioning is to create one
(Continue reading)


Gmane