it | 3 Jun 23:27 2006

AoE stalling while trying to copy medium-large files

Hello,

We are doing a fairly regular setup, "darchive" is our initiator and
"darcbee1" our target.

Problem: the transfer goes quite fast and then stalls for unknown reason
before continuing; sometimes it doesn't start for a minute to 3 minutes

Below are the logs - last log shows the problem in detail.

1 - regular copy tests showing that the computer works fine - three
     500MB and 1G transfers
2 - AoE transfer
  a) 500MB transfer  <at>  139MB/s - works good
  b) 500MB transfer  <at>  142MB/s - also good
  c) 1G transfer  <at> 123MB/s - about a minute wait before it started
  d) 1G transfer  <at>  3MB/s - stalled at 95% (in detail)

Interesting thing is that the both network cards seem to be working
full-speed while stalled: darchive's output is at 19 Mbps and darcbee1's
input is at 12 Mbps.

Thanks,
Andrey.

###############################################################
# 1. Local transfer - computer is working fine.

Tested 500MB and 1G files, very consistent, no stalls; average speeds:
500MB:
(Continue reading)

devzero | 5 Jun 15:37 2006
Picon

Re: AoE stalling while trying to copy medium-large files

hello !

could you provide some more information?

what os/kernel on "darchive" - what on "darcbee1" ?
aoe driver version?
target (vblade?) version? 
hardware? (cpu/ram/disk...)

if you use vblade from aoetools on the target side, maybe you can do some analysis with "strace" !?

> Problem: the transfer goes quite fast and then stalls for unknown reason
> before continuing; sometimes it doesn't start for a minute to 3 minutes
maybe some effects due to chaching/buffering?
try with syncrounous i/o options of spew and compare.

thanks for giving that hint to "spew" - didn`t know that tool. seems to be really nice!

regards
roland

> -----Urspr√ľngliche Nachricht-----
> Von: it <itech@...>
> Gesendet: 03.06.06 23:27:38
> An: aoetools-discuss@...
> Betreff: [Aoetools-discuss] AoE stalling while trying to copy medium-large files

> Hello,
> 
> We are doing a fairly regular setup, "darchive" is our initiator and
(Continue reading)

Ed L. Cashin | 5 Jun 15:31 2006

Re: AoE stalling while trying to copy medium-large files

On Sat, Jun 03, 2006 at 02:27:03PM -0700, it wrote:
> Hello,
> 
> We are doing a fairly regular setup, "darchive" is our initiator and
> "darcbee1" our target.
...
> Problem: the transfer goes quite fast and then stalls for unknown reason
> before continuing; sometimes it doesn't start for a minute to 3 minutes
...
> Interesting thing is that the both network cards seem to be working
> full-speed while stalled: darchive's output is at 19 Mbps and darcbee1's
> input is at 12 Mbps.

That just sounds like caching.  I/O often is postponed until RAM is
nearly full of requested I/O.  At that point, real I/O is needed, and
you have to wait for that to occur.  Before that point, since the I/O
is being deferred it looks like I/O is occurring very quickly.

--

-- 
  Ed L Cashin <ecashin@...>
it | 5 Jun 17:32 2006

Re: AoE stalling while trying to copy medium-large files

Hello,

> That just sounds like caching.  I/O often is postponed until RAM is
> nearly full of requested I/O.  At that point, real I/O is needed, and
> you have to wait for that to occur.  Before that point, since the I/O
> is being deferred it looks like I/O is occurring very quickly.
It seems to be the vblade-kernel module, it just tested it on userspace 
version 10, and it works fine - it does exactly what you just said, the 
CPU on the other side is working and the pause is consistent - about 
7-15 seconds. What is happening with the vblade-kernel is different - it 
stalls for minutes at a time, both of the computers have CPU activity at 
0%, nothing is happening except the network cards working at 19Mbps 
(initiator) and 12Mbps (target).

> could you provide some more information?
>   
sure
> what os/kernel on "darchive" - what on "darcbee1" ?
>   
root <at> darchive:~# cat /proc/version
Linux version 2.6.16.16 (root <at> darchive) (gcc version 4.0.4 20060507 
(prerelease) (Debian 4.0.3-3)) #1 SMP Fri Jun 2 13:07:56 PDT 2006

root <at> darcbee1:~# cat /proc/version
Linux version 2.6.16.16 (root <at> darcbee2) (gcc version 4.0.4 20060507 
(prerelease) (Debian 4.0.3-3)) #1 Mon May 29 18:11:52 PDT 2006   
> aoe driver version?
>   
root <at> darchive:~# wajig status-match aoetools
Package                 Installed       Previous        Now             
(Continue reading)

devzero | 5 Jun 18:37 2006
Picon

Re: AoE stalling while trying to copy medium-large files

hello  andrey,

ah - you are using vblade-kernel - this brings light into it !

vblade-kernel is known to be "beta" and definetely not ready for any "real world" use.
it seems basically a one-man project, done by lelik p. korchagin, the author of the module.
it`s not related to coraid, nor to the aoetools project - so, having problems with that module means, that
only lelik is the one who will be able to help.
(unfortunately this "project" really seems a little bit inactive for the last weeks/months). 
try contacting lelik directly or hope that he is subscribed to this list and responds to your message.

btw: there is another kernel based vblade under development (called kvblade), but it`s not even released
for public.

so, the only way to go for now is buying hardware from coraid or using userspace vblade. otherwise, you
should take some time for a prayer to have vblade-kernel or kvblade go mature in the near future..... (i`m
of the opinion that the existance of a kernel based AoE target is very essential for the success of AoE in general)

regards
roland

> -----Urspr√ľngliche Nachricht-----
> Von: it <itech@...>
> Gesendet: 05.06.06 17:32:51
> An:  aoetools-discuss@...
> Betreff: Re: [Aoetools-discuss] AoE stalling while trying to copy medium-large files

> Hello,
> 
> > That just sounds like caching.  I/O often is postponed until RAM is
(Continue reading)

it | 5 Jun 19:58 2006

Re: AoE stalling while trying to copy medium-large files

Roland,

Thanks for filling me in, I didn't know these were separate projects.

All the Best,
Andrey.

devzero@... wrote:
> hello  andrey,
>
> ah - you are using vblade-kernel - this brings light into it !
>
> vblade-kernel is known to be "beta" and definetely not ready for any "real world" use.
> it seems basically a one-man project, done by lelik p. korchagin, the author of the module.
> it`s not related to coraid, nor to the aoetools project - so, having problems with that module means, that
only lelik is the one who will be able to help.
> (unfortunately this "project" really seems a little bit inactive for the last weeks/months). 
> try contacting lelik directly or hope that he is subscribed to this list and responds to your message.
>
> btw: there is another kernel based vblade under development (called kvblade), but it`s not even released
for public.
>
> so, the only way to go for now is buying hardware from coraid or using userspace vblade. otherwise, you
should take some time for a prayer to have vblade-kernel or kvblade go mature in the near future..... (i`m
of the opinion that the existance of a kernel based AoE target is very essential for the success of AoE in general)
>
> regards
> roland
>
>
(Continue reading)

it | 5 Jun 22:32 2006

How to stop Vblade

Hello,

How do I safely stop a running session of Vblade?

I want to stop a certain AoE exported drive, do something to it locally 
and then export it again. Is there a way to do this?

Thanks,
Andrey.
Ed L. Cashin | 5 Jun 22:50 2006

Re: How to stop Vblade

On Mon, Jun 05, 2006 at 01:32:03PM -0700, it wrote:
> Hello,
> 
> How do I safely stop a running session of Vblade?

If it's in the foreground, you can hit control-c to stop it.  That's
safe if the remote AoE initiator isn't doing anything.  

In fact, if you start it up again before the initiator has timed out
any packets, the initiator shouldn't even care much.

If it's not in the foreground you can kill it using its PID.

> I want to stop a certain AoE exported drive, do something to it locally 
> and then export it again. Is there a way to do this?

Sure, just keep in mind that the host running the AoE initiator will
have its own ideas about the way things are.  If you change data
around on a block device while a remote host still has a filesystem
mounted, for example, you're asking for trouble.

--

-- 
  Ed L Cashin <ecashin@...>
roland | 6 Jun 23:56 2006
Picon

AoE driver problem on debian

Hello !

i tried debian 3.1 today. i installed a base system and updated kernel to 
2.6 ( 2.6.8-2-386 )

i export /dev/sdb (empy disk, only disklabel written). from another system

when loading aoe at the initiator side, i get

aoe: aoe_init: AoE v30 initialised.
aoe: aoecmd_cfg_rsp: e0.0: setting 1024 byte data frames on eth0
aoe: ataid_complete: 000c2921d002 e0.0 v400a has 8388608 sectors
devfs_mk_dir: invalid argument.<4>devfs_mk_dev: could not append to parent 
for /disc
 etherd/e0.0:

when doing a "rmmod aoe" i get:

_devfs_find_entry(): too short
devfs_remove: /disc not found, cannot remove
 [<c0181c0a>] devfs_remove+0x50/0x7d
 [<c01dae41>] blk_unregister_region+0x13/0x17
 [<c01710eb>] devfs_remove_disk+0x39/0x72
 [<c0170e2c>] del_gendisk+0x70/0xc0
 [<c49e3061>] aoedev_freedev+0x1b/0x9d [aoe]
 [<c49e3138>] aoedev_exit+0x55/0x65 [aoe]
 [<c49e329b>] aoe_exit+0x25/0x2e [aoe]
 [<c01296b2>] sys_delete_module+0x12d/0x15f
 [<c013b1fe>] unmap_vma_list+0x14/0x1f
 [<c013b533>] do_munmap+0x136/0x142
(Continue reading)

roland | 7 Jun 01:01 2006
Picon

Re: AoE driver problem on debian

hello ed!

> When you apply the patch below to the aoe6-30 driver, do you still see
> error messages?  These additions are based on Jason McMullan's aoeroot
> patch.

perfect - this seems to fix the issue!

Jun  6 19:38:45 aoe-client kernel: aoe: aoe_init: AoE v30 initialised.
Jun  6 19:38:45 aoe-client kernel: aoe: aoecmd_cfg_rsp: e0.0: setting 1024 
byte data frames on eth0
Jun  6 19:38:45 aoe-client kernel: aoe: ataid_complete: 000c2921d002 e0.0 
v400a has 8388608 sectors
Jun  6 19:38:45 aoe-client kernel:  /dev/etherd/e0.0:

> The devfs feature has been going away for a very long time, but it
> still has a few users.
what i don`t understand here:
aoe6-30 doesn`t seem to use devfs - and the patch is re-adding that.
so - why does the debian-kernel complain at all?

to make this work without patching - shouldn`t this go into aoe6-31 (via 
configure/compile switch) ?

thanks very much!

regards
roland

----- Original Message ----- 
(Continue reading)


Gmane