Andrei Levin | 1 Sep 2006 11:46
Picon
Favicon

Re: AoE vs iSCSI

Sam Hopkins wrote:
> 
> You're performing two different tests here: a single file dd using a
> large block size and a multi file cat using an unknown blocksize.
> Comparing the two for throughput is only marginally useful.  Your
> writes are likely faster than your reads due to filesystem caching.
> 
Do you think that this script should be better? It is doing the same 
thing transferring 1000 files of 1MB, 100 files of 10MB, 10 of 100MB and 
one file of 1GB.
------------------------------------------------------
#!/bin/sh
# Usage: throughput.sh <destination>

dd_test() {
     k=0
     echo
     echo "Writing ${2}MB file..."
     let "LIMIT = 1000/$2"
     while [ "$k" -lt $LIMIT ]; do
         #echo "$k Writing 1000MB file"
         dd if=/dev/zero of=$1/${2}MB_file_$k count=$2 bs=1M &> /dev/null
         let "k += 1"
     done
}

for SIZE in 1 10 100; do
     time dd_test $1 $SIZE
done

(Continue reading)

Andrei Levin | 4 Sep 2006 16:06
Picon
Favicon

Real life example - 95GB in 2h21m

Hi,

I hope that this information can be useful for anybody else in doubt if 
AoE is fast enough.

I've finished doing a backup of my server and this is the result:

Number of files: 561724
Number of files transferred: 424008
Total file size: 100155617799 bytes
Total transferred file size: 100155584686 bytes
Literal data: 100155586250 bytes
Matched data: 0 bytes
File list size: 12092783
Total bytes written: 100198215644
Total bytes read: 8480180

wrote 100198215644 bytes  read 8480180 bytes  11813344.63 bytes/sec
total size is 100155617799  speedup is 1.00
rsync warning: some files vanished before they could be transfered (code 
24) at main.c(633)

real    141m24.095s
user    15m3.208s
sys     12m53.868s

/dev/etherd/e0.0      144G   95G   49G  66% /mnt/backup

The backup was made with rsync:
/usr/bin/rsync --verbose --progress --stats --recursive --times --perms
(Continue reading)

Ste | 4 Sep 2006 19:04

Re: kernel wait to write if a "etherd" device is no

Sam Hopkins wrote:
> Hello Stefano,
>
>>
>> the kernel does not understand that the device e0.0 is no more 
>> aviable so if i try to write on the raid1 composed by e0.0 and e0.1 
>> the kernel wait to write on e0.0 and all freeze in this situation.
>
> Absent any bugs we haven't yet discovered ...
>
> Md doesn't know the aoe device has vanished until the aoe driver fails
> the outstanding I/O. The driver won't do that until the device is
> unresponsive for 3 minutes (by default). You can set this timeout in
> the latest driver using the aoe_deadsecs load time parameter. What
> driver are you using?
Hi, i was away some days..

Thanks a lot, you are right. After the 3 minutes the drive became 
unaviable, the raid driver set it as faulty and all goes on. ( I attach 
some shell output at the end of this email if someone need it.)

I use the drivers who cames with the 2.6.16.16 kernel compiled inside 
the kernel, not as a module.
So I think that i have to download the latest driver, compile it as a 
module and then load the module with a "modprobe aoe aoe_deadsecs=5". I 
am right?

Another two little questions: there is a way to tell aoe the timeout if 
the driver is compiled inside the kernel? Is the kernel 2.6.16.16 new 
enough to include an aoe version that support the set of the timeout?
(Continue reading)

Ed L. Cashin | 4 Sep 2006 20:25
Favicon

Re: kernel wait to write if a "etherd" device is no

On Mon, Sep 04, 2006 at 07:04:25PM +0200, Ste wrote:
...
> I use the drivers who cames with the 2.6.16.16 kernel compiled inside 
> the kernel, not as a module.

That is aoe driver version 14.  You can verify that by doing this:

  find /sys/module/aoe -name version | xargs grep -H .

You should see a file with 14 in it.

> So I think that i have to download the latest driver, compile it as a 
> module and then load the module with a "modprobe aoe aoe_deadsecs=5". I 
> am right?

You can load the module and set the parameter or you can set it via
sysfs, like this:

  echo 5 > /sys/module/aoe/parameters/aoe_deadsecs

... or you can build the driver into the kernel itself and set the
parameter with a boot argument.

> Another two little questions: there is a way to tell aoe the timeout if 
> the driver is compiled inside the kernel? 

Yes, when you haven't set it, you can do this:

  cat /sys/module/aoe/parameters/aoe_deadsecs

(Continue reading)

Ste | 5 Sep 2006 15:55

down,closewait problem with raid1

Hi, i have successfully created a storage cluster using Aoe and raid1. 
All works fine, except if a vblade server goes down.

This is what happens:

e0.0 and e0.1 are part of a raid1.

the vblade server corresponding to e0.0 crash (i.e. hard disk fials, 
kernel panic), so after 3 minutes (the timeout by default) aoe client 
should set e0.0 as down.

But, instead of set it as down, e0.0 freeze in this situation:
root <at> data:/var/www# aoe-stat
      e0.0         0.000GB   eth1 down,closewait
      e0.1        40.057GB   eth1 up

So if the vblade server correspondig to e0.0 return up, the device e0.0 
remain in the state "down,closewait".

So i decide to try to stop the raid1 on e0.0 and e0.1: the resut is that 
e0.0 goes in state "down" and when vblade server of e0.0 return online 
e0.0 goes to the state "up". All okay.

I think that raid1 cause that aoe client cannot set as "down" the device 
e0.0

Now i ask if there is a way to solve this problem. Could i change a 
little bit of the driver code? Can someone indicate me the way to change 
it? Or this is a bug?

(Continue reading)

Ed L. Cashin | 5 Sep 2006 18:11
Favicon

Re: down,closewait problem with raid1

Hi.  I don't think it's a bug but just the normal behavior.  As long
as something is holding the aoe device open, it stays around,
returning an error for any I/O requests.

As you probably noticed, md will start using the "up" device instead
of the "down,closewait" device.

If mdadm is running in monitor mode it can hold devices open.

--

-- 
  Ed L Cashin <ecashin@...>

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
roland | 6 Sep 2006 00:57
Picon

Re: down,closewait problem with raid1

is it md-raid ?
what does "cat /proc/mdstat" tell about the raid then ?

> But, instead of set it as down, e0.0 freeze in this situation:
can you explain "e0.0 freeze" ?
does your raid1 volume freeze, too ?

maybe http://www.coraid.com/support/linux/EtherDrive-2.6-HOWTO-5.html#ss5.5 
gives some hint ?

regards
roland

----- Original Message ----- 
From: "Ste" <ste@...>
To: <aoetools-discuss@...>
Sent: Tuesday, September 05, 2006 3:55 PM
Subject: [Aoetools-discuss] down,closewait problem with raid1

> Hi, i have successfully created a storage cluster using Aoe and raid1.
> All works fine, except if a vblade server goes down.
>
> This is what happens:
>
> e0.0 and e0.1 are part of a raid1.
>
> the vblade server corresponding to e0.0 crash (i.e. hard disk fials,
> kernel panic), so after 3 minutes (the timeout by default) aoe client
> should set e0.0 as down.
>
(Continue reading)

roland | 6 Sep 2006 03:03
Picon

Re: down,closewait problem with raid1

there seems to be one more person having a problem with this:

http://www.redhat.com/archives/linux-cluster/2006-January/msg00015.html

"  When a node goes down/is rebooted, how do you
restore the "down, closewait" state on the remaining
nodes that refer to that vblade/vblade-kernel?

The "solution" appears to be stop lvm (to release open file
handles to the /dev/etherd/e?.? devices), unload "aoe", and
reload "aoe". On the remaining "good" nodes.
This particular problem has me looking at gnbd devices again.

If aoe were truly stateless, and the aoe clients could recover
seamlessly on the restore of a vblade server, I'd have no
issues.
- Ian C. Blenke <ian blenke com> http://ian.blenke.com/  "

roland

----- Original Message ----- 
From: "roland" <devzero@...>
To: "Ste" <ste@...>; <aoetools-discuss@...>
Sent: Wednesday, September 06, 2006 12:57 AM
Subject: Re: [Aoetools-discuss] down,closewait problem with raid1

> is it md-raid ?
> what does "cat /proc/mdstat" tell about the raid then ?
>
>> But, instead of set it as down, e0.0 freeze in this situation:
(Continue reading)

roland | 6 Sep 2006 02:54
Picon

patch for kvblade-alpha2

hello,

since i found some compile-issue with aoe-driver (reported to ed l. cashin) 
, i also tried to compile kvblade on the same platform (its an old suse 9.1 
with kernel 2.6.4-52)

this failed with errors.

/tmp/kvblade-alpha-2/kvblade.c: In function `ata_io_complete':
/tmp/kvblade-alpha-2/kvblade.c:489: error: `ATA_DRDY' undeclared (first use 
in this function)
/tmp/kvblade-alpha-2/kvblade.c:489: error: (Each undeclared identifier is 
reported only once
/tmp/kvblade-alpha-2/kvblade.c:489: error: for each function it appears in.)
/tmp/kvblade-alpha-2/kvblade.c:494: error: `ATA_DF' undeclared (first use in 
this function)
/tmp/kvblade-alpha-2/kvblade.c: In function `ata':
/tmp/kvblade-alpha-2/kvblade.c:594: error: `ATA_CMD_FLUSH' undeclared (first 
use in this function)
/tmp/kvblade-alpha-2/kvblade.c:595: error: `ATA_DRDY' undeclared (first use 
in this function)
/tmp/kvblade-alpha-2/kvblade.c: At top level:
/tmp/kvblade-alpha-2/kvblade.c:811: warning: `exit' was declared `extern' 
and later `static'
/tmp/kvblade-alpha-2/kvblade.c: In function `exit':
/tmp/kvblade-alpha-2/kvblade.c:822: warning: implicit declaration of 
function `msleep'
make[2]: *** [/tmp/kvblade-alpha-2/kvblade.o] Error 1
make[1]: *** [/tmp/kvblade-alpha-2] Error 2
make[1]: Leaving directory `/usr/src/linux-2.6.4-52'
(Continue reading)

Andrei Levin | 6 Sep 2006 11:46
Picon
Favicon

smartctl and hdparm

Hi,

I saw that there was a patch (in December 2004) written by Sam Hopkins 
to make smartctl work with AoE, but seems that it was not integrated 
yet. Is there a way to get S.M.A.R.T. information from disks attached 
via AoE?

I've tried to use hdparm and I get:
hdparm -Tt /dev/etherd/e0.0

/dev/etherd/e0.0:
  Timing cached reads:   1224 MB in  2.00 seconds = 611.77 MB/sec
HDIO_DRIVE_CMD(null) (wait for flush complete) failed: Inappropriate 
ioctl for device
  Timing buffered disk reads:   68 MB in  3.03 seconds =  22.48 MB/sec
HDIO_DRIVE_CMD(null) (wait for flush complete) failed: Inappropriate 
ioctl for device

Is it a problem with hdparm or just a wrong configuration of smth?

Thanks

Andrei
--

-- 
Lan.Art s.r.l.

via Co' del Panico 36/1
35028 Piove di Sacco (PD)

tel. 049-7966424
(Continue reading)


Gmane