Philip Cameron | 1 Jan 2003 16:54
Picon

Raid1 fast resync

Do any of you know of work that is going on to improve the raid 1 resync 
speed? I am starting to evaluate approaches to reducing the time to 
resync the disks.

I am involved in a project on a fault tolerant x86 based server that has 
hot plugable scsi disks and PCI busses. There are two PCI busses and 3 
mirrrored sets of two disks. When a PCI bus is pulled all of the mirrors 
are broken so when the PCI bus is again inserted, all of the disks 
resync which takes hours. Also, during development the resync after 
crash really slows down debug.

Thanks,
Phil Cameron

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Scott Mcdermott | 1 Jan 2003 17:09
Favicon

Re: Raid1 fast resync

Philip Cameron on Wed  1/01 10:54 -0500:
> improve the raid 1 resync speed?

perhaps /proc/sys/dev/raid/ settings will help?
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Neil Brown | 2 Jan 2003 01:07
X-Face
Picon
Picon
Favicon

Re: RAID 1+0 makes BUG in raid1.c, but 0+1 works?

On Wednesday December 25, smcdermott <at> questra.com wrote:
> 
> is there something I'm doing wrong or is this a bug? Should I not be
> using RAID1+0 ? I just tried it with RAID0+1 instead and it seems to
> work fine (although it's somewhat slower than I expected, and initial
> sync goes 250K/s for some reason until I turn up the minimum).  This
> makes no sense to me as I thought that RAID devices were a block-level
> abstraction...so why would 0+1 work but not 1+0 ?? I really dislike the
> additional probability of second-disk failure in RAID0+1 over RAID1+0,
> and the ridiculous resync times, and I don't like the slow write speed
> of RAID5.

It is a bug.  Arguably, the bug is that the test and BUG() call are
wrong, but  the patch below is probably the perferred fix in a stable
kernel.

I always recommend raid1 or raid5 at the bottom, and raid0/linear/lvm
on top of that.

NeilBrown

 ----------- Diffstat output ------------
 ./drivers/md/md.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2002-12-17 15:20:57.000000000 +1100
+++ ./drivers/md/md.c	2003-01-02 11:05:20.000000000 +1100
 <at>  <at>  -489,7 +489,7  <at>  <at>  static int sync_page_io(kdev_t dev, unsi
 	init_buffer(&bh, bh_complete, &event);
(Continue reading)

Alvin Oga | 2 Jan 2003 10:44

Re: Raid1 fast resync


hi ya philip

On Wed, 1 Jan 2003, Philip Cameron wrote:

> Do any of you know of work that is going on to improve the raid 1 resync 
> speed? I am starting to evaluate approaches to reducing the time to 
> resync the disks.
> 
> I am involved in a project on a fault tolerant x86 based server that has 
> hot plugable scsi disks and PCI busses. There are two PCI busses and 3 
> mirrrored sets of two disks. When a PCI bus is pulled all of the mirrors 
> are broken so when the PCI bus is again inserted, all of the disks 
> resync which takes hours. Also, during development the resync after 
> crash really slows down debug.

without knowing more details ... here's some comments

- when a pci card is pulled out, you stand a good chance that the cpu bios
  will need to be reset ( re-saved )
	- the machine goes into bios mode, whether you like it or not

- since you are using scsi disks..
	- when you pull out a scsi disk, the next time you reboot,
	the scsi drives will be in different order if you lost sda
	or lost sdb and you had sdc  which is now your new sdb

- if you have 2 pci buses...
	- put sda  on pci #1
	and put sdb ( mirrored ) onto pci #2
(Continue reading)

Stephan van Hienen | 2 Jan 2003 16:12
Picon

Re: mdadm -D shows incorrect working devices ?

There is no one that can tell me what is going wrong ?

On Sun, 22 Dec 2002, Stephan van Hienen wrote:

> On Wed, 18 Dec 2002 raid <at> ddx.a2000.nu wrote:
>
> > Today rebuiled raid5 (8/8 disks)
> > mdadm -D is still showing incorrect info
> > (9 devices total/8 active/7 working/2 failed)
> > can someone explain this ?
>
> Hi again, still have this 'incorrect' information here
> I'm a bit worried that maybe my raid config is 'a bit corrupted'
> or is this just a bug in mdadm ?
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo <at> vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Shaw, Marco | 2 Jan 2003 16:23
Picon

RE: mdadm -D shows incorrect working devices ?

Pretty much everyone is on vacation, and likely not checking their messages lately.  Now, if it's something
nobody has encountered before then that's a different thing.

Marco

-----Original Message-----
From: Stephan van Hienen [mailto:raid <at> a2000.nu] 
Sent: Thursday, January 02, 2003 11:12 AM
To: linux-raid <at> vger.kernel.org
Subject: Re: mdadm -D shows incorrect working devices ?

There is no one that can tell me what is going wrong ?

On Sun, 22 Dec 2002, Stephan van Hienen wrote:

> On Wed, 18 Dec 2002 raid <at> ddx.a2000.nu wrote:
>
> > Today rebuiled raid5 (8/8 disks)
> > mdadm -D is still showing incorrect info
> > (9 devices total/8 active/7 working/2 failed)
> > can someone explain this ?
>
> Hi again, still have this 'incorrect' information here
> I'm a bit worried that maybe my raid config is 'a bit corrupted' or is 
> this just a bug in mdadm ?
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to
majordomo <at> vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
(Continue reading)

David Caldwell | 2 Jan 2003 23:50
Favicon
Gravatar

RAID 5 recover locks up with 2.4.20

Hi,

  I have a 4 disk RAID 5 array. I upgraded to the 2.4.20 kernel and one of 
my disks went kaput (loose power connector). I fixed the disk and tried to 
rebuilt the raid but it locked up hard (no magic sys-req key, no oops, 
nothing!). So I rebooted and it locked up everytime it started the recovery 
process. So then I tried kernel 2.4.19-rc3 (my old faithful kernel) and it 
rebuilt just fine.

More details:
  The 4 disks are connected to 2 promise chips. One is on a PCI card and is 
just a ATA-100 host adapter. The other is on the motherboard and is a raid 
adapter that I run in straight ATA host adapter mode. All the disks are 
masters. Here are the lines from lspci:

00:0f.0 RAID bus controller: Promise Technology, Inc. PDC20276 IDE (rev 01)
00:09.0 Unknown mass storage controller: Promise Technology, Inc. 20267 
(rev 02)

  Before it locked up the second time it did a ext-3 journal recovery on 
the disk and that worked ok. I mention this because that shows the 
read/write path to the drives is at least partially working.

  Since it locks up completely, I'm not sure how to debug any further. Does 
anyone know what the problem might be?

Thanks,
  David

ps. I am not subscribed to the list so if you could cc me that would be 
(Continue reading)

Philip Cameron | 3 Jan 2003 02:22
Picon

Re: Fast RAID 1 Resync

Hi Neil,

(Sorry if this is a repost. I had an error with returned mail)

Thanks for your comments.

I also don't see a need to synchronize disks at mkraid time. Its nice to have
identical disks but not necessary as long as the result of reading a sector that
has never been written is undefined. An option to do either approach can be done
as long as there is a real need. Adding an option increases complexity
especially during test.

I have been thinking of tracking writes to each chunk with a counter. The
counters would be organized into a vector indexed by chunk number. The counter
is incremented by the number of mirrors including any currently unavailable
mirrors before the write starts. It is decremeted as each write completes. So
when all writes in a chunk are complete the counter returns to zero. If there is
a missing mirror, the counter will not return to zero (since one of the needed
writes was not done). 

When a disk is pulled, the counters increment but don't return to zero. When the
disk is reinserted, the resync needs to copy chunks where the counter doesn't go
to zero. When the resync of the chunk is complete, set the counter to zero.

To deal with a recovery after crash, I am thinking about using your approach.
Use a bit per counter and set the bit when the counter is non-zero. When a bit
goes from 0 to 1, the updated bit vector is written before starting the write to
the chunk. On reboot after a crash, the bit vector from the selected mirror is
used (the current mechanism is used to select the base disk). The counter is
incremented for each chunk that has a bit that is set. After this, the resync in
(Continue reading)

Philip Cameron | 3 Jan 2003 02:40
Picon

Re: Raid1 fast resync

Hi Alvin,

Thanks for the comments. The hardware includes two hot swappable  IO 
assemblies each of which includes a PCI bus, SCSI HBA and 3 hot swapable 
SCSI disk slots. All of the issues related to hot-swap have been 
resolved so I don't need to deal with SCSI and PCI bus issues (thankfully).

Because of the redundant IO assemblies, the system is not considered 
fully operational until either assembly can fail or be pulled at random. 
This means the resync must be complete. So the real deal is reducing the 
resync so that the system can be considered fully operational sooner. I 
am looking to reduce the multiple hour resync times to a few minutes.

Thanks again,
Phil Cameron
---------------------------

Alvin Oga wrote:

>hi ya philip
>
>On Wed, 1 Jan 2003, Philip Cameron wrote:
>
>  
>
>>Do any of you know of work that is going on to improve the raid 1 resync 
>>speed? I am starting to evaluate approaches to reducing the time to 
>>resync the disks.
>>
>>I am involved in a project on a fault tolerant x86 based server that has 
(Continue reading)

Oliver Elphick | 3 Jan 2003 18:26
Picon
Favicon

Linux dpti driver in 2.4 kernel with Adaptec 2400A card

I'm trying to install Debian Linux with kernel 2.4.20 on a machine with an
Adaptec 2400A RAID card with 3 40Gb IDE disks attached.  The machine has
no disks other than those connected to this card. 

I am able to boot the system, and the RAID card is recognised (as
/dev/sda) but it cannot be accessed.  I get these messages recorded by
dmesg:

SCSI subsystem driver Revision: 1.00
Loading Adaptec I20 RAID: Version 2.4 Build 5
Detecting Adaptec RAID controllers...
Adaptec I20 RAID controller 0 at f8898000 size=100000 irq=11
dpti: If you have a lot of devices this could take a few minutes.
dpti0: Reading the hardware resource table.
TID 008  Vendor: HIGHPOINT    Device: IDEhpt370    Rev: 00000001
TID 009  Vendor: HIGHPOINT    Device: IDEhpt370    Rev: 00000001
TID 010  Vendor: HIGHPOINT    Device: IDEhpt370    Rev: 00000001
TID 011  Vendor: HIGHPOINT    Device: IDEhpt370    Rev: 00000001
TID 519  Vendor: ADAPTEC      Device: RAID-5       Rev: 370L
scsi0 : Vendor: Adaptec  Model: 2400A            FW:370L
  Vendor: ADAPTEC   Model: RAID-5            Rev: 370L
  Type:   Direct-Access                      ANSI SCSI revision: 02
[scsi1: Adaptec SCSI card]
[scsi2: IDE-SCSI]
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sda: 156327936 512-byte hdwr sectors (80040 MB)
Partition check:
 sda:<4>dpti0: SCSI Data Protect-Device (0,0,0) hba_status=0x0, dev_status=0x2, cmd=0x28
dpti0: Trying to reset device
dpti0: Device reset not supported
(Continue reading)


Gmane