Dan Williams | 1 Nov 2007 04:01
Picon
Favicon

Re: Bug in processing dependencies by async_tx_submit() ?

On Wed, 2007-10-31 at 09:21 -0700, Yuri Tikhonov wrote:
> 
>  Hello Dan,
> 
>  I've run into a problem with the h/w accelerated RAID-5 driver (on the
> ppc440spe-based board). After some investigations I've come to conclusion
> that the issue is with the async_tx_submit() implementation in ASYNC_TX.
> 
Unfortunately this is correct, async_tx_submit() will let the third
operation pass the second in the scenario you describe.  I propose the
fix (untested) below.  I'll test this out tomorrow when I am back in the
office.

---
async_tx: fix successive dependent operation submission

From: Dan Williams <dan.j.williams <at> intel.com>

async_tx_submit() tried to use the hardware descriptor chain to maintain
transaction ordering.  However before falling back to hardware-channel
dependency ordering async_tx_submit() must first check if the entire chain
is waiting on another channel.

OP1 (DMA0) <--- OP2 (DMA1) <--- OP3 (DMA1)

OP3 must be submitted as an OP2 dependency if it is submitted before OP1
completes.  Otherwise if OP1 is complete, OP3 can use the natural sequence
of DMA1's hardware chain to satisfy that it runs after OP2.

The fix is to check if the ->parent field of the dependency is non-NULL.
(Continue reading)

Alberto Alonso | 1 Nov 2007 06:08

Re: Implementing low level timeouts within MD

On Tue, 2007-10-30 at 13:39 -0400, Doug Ledford wrote:
> 
> Really, you've only been bitten by three so far.  Serverworks PATA
> (which I tend to agree with the other person, I would probably chock

3 types of bugs is too many, it basically affected all my customers
with  multi-terabyte arrays. Heck, we can also oversimplify things and 
say that it is really just one type and define everything as kernel type
problems (or as some other kernel used to say... general protection
error).

I am sorry for not having hundreds of RAID servers from which to draw
statistical analysis. As I have clearly stated in the past I am trying
to come up with a list of known combinations that work. I think my
data points are worth something to some people, specially those 
considering SATA drives and software RAID for their file servers. If
you don't consider them important for you that's fine, but please don't
belittle them just because they don't match your needs.

> this up to Serverworks, not PATA), USB storage, and SATA (the SATA stack
> is arranged similar to the SCSI stack with a core library that all the
> drivers use, and then hardware dependent driver modules...I suspect that
> since you got bit on three different hardware versions that you were in
> fact hitting a core library bug, but that's just a suspicion and I could
> well be wrong).  What you haven't tried is any of the SCSI/SAS/FC stuff,
> and generally that's what I've always used and had good things to say
> about.  I've only used SATA for my home systems or workstations, not any
> production servers.

The USB array was never meant to be a full production system, just to 
(Continue reading)

Janek Kozicki | 1 Nov 2007 10:10
Picon
Favicon

stride / stripe alignment on LVM ?

Hello,

I have raid5 /dev/md1, --chunk=128 --metadata=1.1. On it I have
created LVM volume called 'raid5', and finally a logical volume
'backup'.

Then I formatted it with command:

   mkfs.ext3 -b 4096 -E stride=32 -E resize=550292480 /dev/raid5/backup

And because LVM is putting its own metadata on /dev/md1, the ext3
partition is shifted by some (unknown for me) amount of bytes from
the beginning of /dev/md1.

I was wondering, how big is the shift, and would it hurt the
performance/safety if the `ext3 stride=32` didn't align perfectly
with the physical stripes on HDD?

PS: the resize option is to make sure that I can grow this fs
in the future.

PSS: I looked in the archive but didn't find this question asked
before. I'm sorry if it really was asked.

--

-- 
Janek Kozicki                                                         |
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
(Continue reading)

BERTRAND Joël | 1 Nov 2007 10:25
Picon

Re: Strange CPU occupation... and system hangs

BERTRAND Joël wrote:

<snip>

> and some process are in D state :
> Root gershwin:[/etc] > ps auwx | grep D
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> root       270  0.0  0.0      0     0 ?        D    Oct27   1:17 [pdflush]
> root      3676  0.9  0.0      0     0 ?        D    Oct27  56:03 [nfsd]
> root      5435  0.0  0.0      0     0 ?        D<   Oct27   3:16 [md7_raid1]
> root      5438  0.0  0.0      0     0 ?        D<   Oct27   1:01 [kjournald]
> root      5440  0.0  0.0      0     0 ?        D<   Oct27   0:33 [loop0]
> root      5441  0.0  0.0      0     0 ?        D<   Oct27   0:05 [kjournald]
> root     16442  0.0  0.0  20032  1208 pts/2    D+   13:23   0:00 iftop 
> -i eth2
> 
> 	Why md7_raid is in D state ? Same question about iftop ?

	Some bad news... After ten or eleven hours, kernel crashes on this 
server. The last top screen is :

top - 04:59:46 up 4 days, 16:24,  3 users,  load average: 19.72, 19.22, 
19.05
Tasks: 285 total,   5 running, 279 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.0%us,  4.2%sy,  0.0%ni, 68.5%id, 27.3%wa,  0.0%hi,  0.0%si, 
0.0%st
Mem:   4139024k total,  4130800k used,     8224k free,    38984k buffers
Swap:  7815536k total,      304k used,  7815232k free,    79056k cached
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 

(Continue reading)

Yuri Tikhonov | 1 Nov 2007 11:00

Re: Bug in processing dependencies by async_tx_submit() ?


 Hi Dan,

  Honestly I tried to fix this quickly using the approach similar to proposed
 by you, with one addition though (in fact, deletion of BUG_ON(chan ==
 tx->chan) in async_tx_run_dependencies()). And this led to "Kernel stack
 overflow". This happened because of the recurseve calling async_tx_submit()
 from async_trigger_callback() and vice verse.

  So, then I made the interrupt scheduling in async_tx_submit() only for the
 cases when it is really needed: i.e. when dependent operations are to be run
 on different channels.

  The resulted kernel locked-up during processing of the mkfs command on the
 top of the RAID-array. The place where it is spinning is the dma_sync_wait()
 function. 

  This is happened because of the specific implementation of
 dma_wait_for_async_tx().
  The "iter", we finally waiting for there, corresponds to the last allocated
 but not-yet-submitted descriptor. But if the "iter" we are waiting for is
 dependent from another descriptor which has cookie > 0, but is not yet
 submitted to the h/w channel because of the fact that threshold is not
 achieved to this moment, then we may wait in dma_wait_for_async_tx()
 infinitely. I think that it makes more sense to get the first descriptor
 which was submitted to the channel but probably is not put into the h/w
 chain, i.e. with cookie > 0 and do dma_sync_wait() of this descriptor.

  When I modified the dma_wait_for_async_tx() in such way, then the kernel
 locking had disappeared. But nevertheless the mkfs processes hangs-up after
(Continue reading)

Bill Davidsen | 1 Nov 2007 14:56

Re: Raid-10 mount at startup always has problem

Daniel L. Miller wrote:
> Doug Ledford wrote:
>> Nah.  Even if we had concluded that udev was to blame here, I'm not
>> entirely certain that we hadn't left Daniel with the impression that we
>> suspected it versus blamed it, so reiterating it doesn't hurt.  And I'm
>> sure no one has given him a fix for the problem (although Neil did
>> request a change that will give debug output, but not solve the
>> problem), so not dropping it entirely would seem appropriate as well.
>>   
> I've opened a bug report on Ubuntu's Launchpad.net.  Scott James 
> Remnant asked me to cc him on Neil's incremental reference - we'll see 
> what happens from here.
>
> Thanks for the help guys.  At the moment, I've changed my mdadm.conf 
> to explicitly list the drives, instead of the auto=partition 
> parameter.  We'll see what happens on the next reboot.
>
> I don't know if it means anything, but I'm using a self-compiled 
> 2.6.22 kernel - with initrd.  At least I THINK I'm using initrd - I 
> have an image, but I don't see an initrd line in my grub config.  
> Hmm....I'm going to add a stanza that includes the initrd and see what 
> happens also.
>
What did that do?

--

-- 
bill davidsen <davidsen <at> tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

(Continue reading)

Bill Davidsen | 1 Nov 2007 15:14

Re: Implementing low level timeouts within MD

Alberto Alonso wrote:
> On Tue, 2007-10-30 at 13:39 -0400, Doug Ledford wrote:
>   
>> Really, you've only been bitten by three so far.  Serverworks PATA
>> (which I tend to agree with the other person, I would probably chock
>>     
>
> 3 types of bugs is too many, it basically affected all my customers
> with  multi-terabyte arrays. Heck, we can also oversimplify things and 
> say that it is really just one type and define everything as kernel type
> problems (or as some other kernel used to say... general protection
> error).
>
> I am sorry for not having hundreds of RAID servers from which to draw
> statistical analysis. As I have clearly stated in the past I am trying
> to come up with a list of known combinations that work. I think my
> data points are worth something to some people, specially those 
> considering SATA drives and software RAID for their file servers. If
> you don't consider them important for you that's fine, but please don't
> belittle them just because they don't match your needs.
>
>   
>> this up to Serverworks, not PATA), USB storage, and SATA (the SATA stack
>> is arranged similar to the SCSI stack with a core library that all the
>> drivers use, and then hardware dependent driver modules...I suspect that
>> since you got bit on three different hardware versions that you were in
>> fact hitting a core library bug, but that's just a suspicion and I could
>> well be wrong).  What you haven't tried is any of the SCSI/SAS/FC stuff,
>> and generally that's what I've always used and had good things to say
>> about.  I've only used SATA for my home systems or workstations, not any
(Continue reading)

Bill Davidsen | 1 Nov 2007 15:19

Re: Implementing low level timeouts within MD

Alberto Alonso wrote:
> On Mon, 2007-10-29 at 13:22 -0400, Doug Ledford wrote:
>
>   
>> What kernels were these under?
>>     
>
>
> Yes, these 3 were all SATA. The kernels (in the same order as above) 
> are:
>
> * 2.4.21-4.ELsmp #1 (Basically RHEL v3)
> * 2.6.18-4-686 #1 SMP on a Fedora Core release 2
> * 2.6.17.13 (compiled from vanilla sources)
>   

*Old* kernels. If you are going to build your own kernel, get a new one!
> The RocketRAID was configured for all drives as legacy/normal and
> software RAID5 across all drives. I wasn't using hardware raid on
> the last described system when it crashed.
>   
--

-- 

bill davidsen <davidsen <at> tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
(Continue reading)

Bill Davidsen | 1 Nov 2007 15:51

Re: Requesting migrate device options for raid5/6

Goswin von Brederlow wrote:
> Hi,
>
> I would welcome if someone could work on a new feature for raid5/6
> that would allow replacing a disk in a raid5/6 with a new one without
> having to degrade the array.
>
> Consider the following situation:
>
> raid5 md0 : sda sdb sdc
>
> Now sda gives a "SMART - failure iminent" warning and you want to
> repalce it with sdd.
>
> % mdadm --fail /dev/md0 /dev/sda
> % mdadm --remove /dev/md0 /dev/sda
> % mdadm --add /dev/md0 /dev/sdd
>
> Further consider that drive sdb will give an I/O error during resync
> of the array or fail completly. The array is in degraded mode so you
> experience data loss.
>
>   
That's a two drive failure, so you will lose data.
> But that is completly avoidable and some hardware raids support disk
> migration too. Loosly speaking the kernel should do the following:
>
>   
No, it's not "completly avoidable" because have described sda is ready 
to fail and sdb as "will give an I/O error" so if both happen at once 
(Continue reading)

H. Peter Anvin | 1 Nov 2007 18:31
Favicon

Re: switching root fs '/' to boot from RAID1 with grub

Doug Ledford wrote:
> 
> device /dev/sda (hd0)
> root (hd0,0)
> install --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd0) /boot/grub/e2fs_stage1_5 p
/boot/grub/stage2 /boot/grub/menu.lst
> device /dev/hdc (hd0)
> root (hd0,0)
> install --stage2=/boot/grub/stage2 /boot/grub/stage1 (hd0) /boot/grub/e2fs_stage1_5 p
/boot/grub/stage2 /boot/grub/menu.lst
> 
> That will install grub on the master boot record of hdc and sda, and in
> both cases grub will look to whatever drive it is running on for the
> files to boot instead of going to a specific drive.
> 

No, it won't... it'll look for the first drive in the system (BIOS drive 
80h).  This means that if the BIOS can see the bad drive, but it doesn't 
work, you're still screwed.

	-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Gmane