bin dong | 6 Nov 14:17 2009
Picon

How to disable or or change the default timeout for cache and acache at client ?

Hi,

    I am employing PVFS-2-7-1 to do some benchmatk, want to compare the performance of PVFS without and with acache and acache.

   How to change the default timeout of or  even disable cache and acache  at client ?

 Does it can work by
      TCACHE_TIMEOUT_MSECS = 0

Tks for help
-Bin

_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Jim Kusznir | 7 Nov 00:15 2009
Picon

Another Crash

Hi all:

Well, it happened again today...Another pvfs2 client crash on my head
node.  This one was worse than many I've experienced.  When my users
informed me, I found the system load at 14 with the cpu utilization %
fairly low.  pvfs2-client-core was state "r" at 100% utilization, but
was actually not getting anything done (the I/O that was underway has
very definately stopped according to the user).  I also noticed that
the process had only been alive for about 2 hours at that time (last
time pvfs2-client had restarted pvfs2-client-core), and the problems
started getting noticably worse at that point in time.

I tried a kill -9 on the process, again hoping to get a responsive
pvfs2-client-core process, but nothing happened...it would not die.
It would not respond to anything.  Even when I tried to reboot the
server, it wouldn't give way (or let the server reboot..I finally had
to go down to the machine room and hard power cycle the system).

The pvfs2-client.log file had this to say:

[D 23:37:29.802697] [INFO]: Mapping pointer 0x2b44497ef000 for I/O.
[D 23:37:29.818953] [INFO]: Mapping pointer 0xaaf7000 for I/O.
[E 23:47:37.825740] PVFS2 client: signal 11, faulty address is 0x7b9,
from 0x425120
[E 23:47:37.826245] [bt]
pvfs2-client-core(PINT_client_io_cancel+0x1f0) [0x425120]
[E 23:47:37.826260] [bt]
pvfs2-client-core(PINT_client_io_cancel+0x1f0) [0x425120]
[E 23:47:37.826271] [bt] pvfs2-client-core [0x40ee95]
[E 23:47:37.826282] [bt] pvfs2-client-core [0x41248e]
[E 23:47:37.826292] [bt] pvfs2-client-core [0x4133df]
[E 23:47:37.826302] [bt] pvfs2-client-core(main+0xc60) [0x414780]
[E 23:47:37.826312] [bt] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3694c1d8b4]
[E 23:47:37.826322] [bt] pvfs2-client-core [0x40d989]
[E 23:47:37.831308] Child process with pid 31589 was killed by an
uncaught signal 6
[E 23:47:37.835138] PVFS Client Daemon Started.  Version 2.8.1
[D 23:47:37.835358] [INFO]: Mapping pointer 0x2b46cd653000 for I/O.
[D 23:47:37.851828] [INFO]: Mapping pointer 0x1a4e1000 for I/O.
[E 23:51:34.274309] Child process with pid 32537 was killed by an
uncaught signal 6
[E 23:51:34.278184] PVFS Client Daemon Started.  Version 2.8.1
[D 23:51:34.278404] [INFO]: Mapping pointer 0x2ba89091f000 for I/O.
[D 23:51:34.294720] [INFO]: Mapping pointer 0x131a6000 for I/O.
[E 23:58:20.185034] PVFS2 client: signal 11, faulty address is 0x78c,
from 0x425120
[E 23:58:20.185553] [bt]
pvfs2-client-core(PINT_client_io_cancel+0x1f0) [0x425120]
[E 23:58:20.185568] [bt]
pvfs2-client-core(PINT_client_io_cancel+0x1f0) [0x425120]
[E 23:58:20.185579] [bt] pvfs2-client-core [0x40ee95]
[E 23:58:20.185590] [bt] pvfs2-client-core [0x41248e]
[E 23:58:20.185600] [bt] pvfs2-client-core [0x4133df]
[E 23:58:20.185610] [bt] pvfs2-client-core(main+0xc60) [0x414780]
[E 23:58:20.185620] [bt] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3694c1d8b4]
[E 23:58:20.185630] [bt] pvfs2-client-core [0x40d989]
[E 23:58:20.190414] Child process with pid 32677 was killed by an
uncaught signal 6
[E 23:58:20.194285] PVFS Client Daemon Started.  Version 2.8.1
[D 23:58:20.194506] [INFO]: Mapping pointer 0x2acb010a2000 for I/O.
[D 23:58:20.211131] [INFO]: Mapping pointer 0x5417000 for I/O.
[E 09:50:42.173029] fp_multiqueue_cancel: flow proto cancel called on 0x587c078
[E 09:50:42.173091] fp_multiqueue_cancel: I/O error occurred
[E 09:50:42.173107] handle_io_error: flow proto error cleanup started
on 0x587c078: Operation cancelled (possibly due to timeout)
[E 09:50:42.173161] handle_io_error: flow proto 0x587c078 canceled 1
operations, will clean up.
[E 09:50:42.173209] bmi_to_mem_callback_fn: I/O error occurred
[E 09:50:42.173223] handle_io_error: flow proto 0x587c078 error
cleanup finished: Operation cancelled (possibly due to timeout)
[E 09:52:42.125385] PVFS2 client: signal 11, faulty address is 0x41d5,
from 0x413301
[E 09:52:42.125962] [bt] pvfs2-client-core [0x413301]
[E 09:52:42.125977] [bt] pvfs2-client-core [0x413301]
[E 09:52:42.125988] [bt] pvfs2-client-core(main+0xc60) [0x414780]
[E 09:52:42.125999] [bt] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3694c1d8b4]
[E 09:52:42.126009] [bt] pvfs2-client-core [0x40d989]
[E 09:52:42.131014] Child process with pid 377 was killed by an
uncaught signal 6
[E 09:52:42.134941] PVFS Client Daemon Started.  Version 2.8.1
[D 09:52:42.135166] [INFO]: Mapping pointer 0x2b416a037000 for I/O.
[D 09:52:42.151602] [INFO]: Mapping pointer 0x9f7e000 for I/O.
[E 10:30:49.813443] Got an unrecognized/unimplemented vfs operation of
type ff000000.
[E 10:30:49.813524] Post of op: PVFS_VFS_OP_INVALID failed!

I was not running this in anything I could get a trace or other
debugging information, and due to the users lining up, I couldn't take
any more time debugging, so I restarted to get it back online.  Its
currently running without valgrind or others.  I have asked the user
whom was most active when this happened to pay extra attention and let
me know if she can reproduce the problem.

--Jim
Jim Kusznir | 7 Nov 00:18 2009
Picon

pvfs2 configuration questions

Hi again:

The same user later came to me with a document for a toolkit she's
trying to run on the cluster.  It said that for those running luster,
please ensure that the luster settings are as follows:

stripe size 0 (default, typically 1MB)
stripe offset -1 (default, typically round robin)
stripe count 1 (do not split the file onto multiple OSTs)

Apparently this toolkit does a lot of IO on a lot of very small files.
 Is there something I should/could do to pvfs, at least to her
directory, to enhance the performance of this?  Better yet, is there a
command she can run to set this up on her directory?

Thanks!
--Jim
bin dong | 8 Nov 09:17 2009
Picon

How to disable acache and ncache for pvfs2-cp

Hi,

    How to disable acache and ncache for pvfs2-cp

Thanks.
Bin

_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Jim Kusznir | 9 Nov 17:52 2009
Picon

Re: Another Crash

Followup:  It looks like more stuff is broke.  My longwatch has this line in it:

WARNING:  Kernel Errors Present
    pvfs2_file_read: error in vectored read ...:  2151 Time(s)

And I got an e-mail from my main user:

---------------------
I issued the command from lar-transfer

cp
/mnt/pvfs2/schung/amt_toolkit/trunk/MILAGRO/Obs/aircraft/c130/mrg60_c130_200
60318_r4/* .

and got error messages such as

cp: reading
`/mnt/pvfs2/schung/amt_toolkit/trunk/MILAGRO/Obs/aircraft/c130/mrg60_c130_20
060318_r4/wind_speed_obs.txt': No such file or directory

But as far as I tell, the file exists and was copied over correctly.  I get
the error message for all the files that were copied, but all the files were
copied correctly (as far as I can tell).
-----------------------

I've had other people complain about issues with relative paths
failing, but absolute paths succeeding.

I'm getting a really bad feeling about this.....

--Jim

On Fri, Nov 6, 2009 at 3:15 PM, Jim Kusznir <jkusznir <at> gmail.com> wrote:
> Hi all:
>
> Well, it happened again today...Another pvfs2 client crash on my head
> node.  This one was worse than many I've experienced.  When my users
> informed me, I found the system load at 14 with the cpu utilization %
> fairly low.  pvfs2-client-core was state "r" at 100% utilization, but
> was actually not getting anything done (the I/O that was underway has
> very definately stopped according to the user).  I also noticed that
> the process had only been alive for about 2 hours at that time (last
> time pvfs2-client had restarted pvfs2-client-core), and the problems
> started getting noticably worse at that point in time.
>
> I tried a kill -9 on the process, again hoping to get a responsive
> pvfs2-client-core process, but nothing happened...it would not die.
> It would not respond to anything.  Even when I tried to reboot the
> server, it wouldn't give way (or let the server reboot..I finally had
> to go down to the machine room and hard power cycle the system).
>
> The pvfs2-client.log file had this to say:
>
> [D 23:37:29.802697] [INFO]: Mapping pointer 0x2b44497ef000 for I/O.
> [D 23:37:29.818953] [INFO]: Mapping pointer 0xaaf7000 for I/O.
> [E 23:47:37.825740] PVFS2 client: signal 11, faulty address is 0x7b9,
> from 0x425120
> [E 23:47:37.826245] [bt]
> pvfs2-client-core(PINT_client_io_cancel+0x1f0) [0x425120]
> [E 23:47:37.826260] [bt]
> pvfs2-client-core(PINT_client_io_cancel+0x1f0) [0x425120]
> [E 23:47:37.826271] [bt] pvfs2-client-core [0x40ee95]
> [E 23:47:37.826282] [bt] pvfs2-client-core [0x41248e]
> [E 23:47:37.826292] [bt] pvfs2-client-core [0x4133df]
> [E 23:47:37.826302] [bt] pvfs2-client-core(main+0xc60) [0x414780]
> [E 23:47:37.826312] [bt] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3694c1d8b4]
> [E 23:47:37.826322] [bt] pvfs2-client-core [0x40d989]
> [E 23:47:37.831308] Child process with pid 31589 was killed by an
> uncaught signal 6
> [E 23:47:37.835138] PVFS Client Daemon Started.  Version 2.8.1
> [D 23:47:37.835358] [INFO]: Mapping pointer 0x2b46cd653000 for I/O.
> [D 23:47:37.851828] [INFO]: Mapping pointer 0x1a4e1000 for I/O.
> [E 23:51:34.274309] Child process with pid 32537 was killed by an
> uncaught signal 6
> [E 23:51:34.278184] PVFS Client Daemon Started.  Version 2.8.1
> [D 23:51:34.278404] [INFO]: Mapping pointer 0x2ba89091f000 for I/O.
> [D 23:51:34.294720] [INFO]: Mapping pointer 0x131a6000 for I/O.
> [E 23:58:20.185034] PVFS2 client: signal 11, faulty address is 0x78c,
> from 0x425120
> [E 23:58:20.185553] [bt]
> pvfs2-client-core(PINT_client_io_cancel+0x1f0) [0x425120]
> [E 23:58:20.185568] [bt]
> pvfs2-client-core(PINT_client_io_cancel+0x1f0) [0x425120]
> [E 23:58:20.185579] [bt] pvfs2-client-core [0x40ee95]
> [E 23:58:20.185590] [bt] pvfs2-client-core [0x41248e]
> [E 23:58:20.185600] [bt] pvfs2-client-core [0x4133df]
> [E 23:58:20.185610] [bt] pvfs2-client-core(main+0xc60) [0x414780]
> [E 23:58:20.185620] [bt] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3694c1d8b4]
> [E 23:58:20.185630] [bt] pvfs2-client-core [0x40d989]
> [E 23:58:20.190414] Child process with pid 32677 was killed by an
> uncaught signal 6
> [E 23:58:20.194285] PVFS Client Daemon Started.  Version 2.8.1
> [D 23:58:20.194506] [INFO]: Mapping pointer 0x2acb010a2000 for I/O.
> [D 23:58:20.211131] [INFO]: Mapping pointer 0x5417000 for I/O.
> [E 09:50:42.173029] fp_multiqueue_cancel: flow proto cancel called on 0x587c078
> [E 09:50:42.173091] fp_multiqueue_cancel: I/O error occurred
> [E 09:50:42.173107] handle_io_error: flow proto error cleanup started
> on 0x587c078: Operation cancelled (possibly due to timeout)
> [E 09:50:42.173161] handle_io_error: flow proto 0x587c078 canceled 1
> operations, will clean up.
> [E 09:50:42.173209] bmi_to_mem_callback_fn: I/O error occurred
> [E 09:50:42.173223] handle_io_error: flow proto 0x587c078 error
> cleanup finished: Operation cancelled (possibly due to timeout)
> [E 09:52:42.125385] PVFS2 client: signal 11, faulty address is 0x41d5,
> from 0x413301
> [E 09:52:42.125962] [bt] pvfs2-client-core [0x413301]
> [E 09:52:42.125977] [bt] pvfs2-client-core [0x413301]
> [E 09:52:42.125988] [bt] pvfs2-client-core(main+0xc60) [0x414780]
> [E 09:52:42.125999] [bt] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3694c1d8b4]
> [E 09:52:42.126009] [bt] pvfs2-client-core [0x40d989]
> [E 09:52:42.131014] Child process with pid 377 was killed by an
> uncaught signal 6
> [E 09:52:42.134941] PVFS Client Daemon Started.  Version 2.8.1
> [D 09:52:42.135166] [INFO]: Mapping pointer 0x2b416a037000 for I/O.
> [D 09:52:42.151602] [INFO]: Mapping pointer 0x9f7e000 for I/O.
> [E 10:30:49.813443] Got an unrecognized/unimplemented vfs operation of
> type ff000000.
> [E 10:30:49.813524] Post of op: PVFS_VFS_OP_INVALID failed!
>
> I was not running this in anything I could get a trace or other
> debugging information, and due to the users lining up, I couldn't take
> any more time debugging, so I restarted to get it back online.  Its
> currently running without valgrind or others.  I have asked the user
> whom was most active when this happened to pay extra attention and let
> me know if she can reproduce the problem.
>
> --Jim
>
Kevin Harms | 9 Nov 18:03 2009

Re: How to disable acache and ncache for pvfs2-cp

Bin,

   I don't believe pvfs2-cp supports an option for changing these  
parameters. The code test/client/mpi-io/multi-md-test-size-sweep.c  
does have an example of how to do this from the client using  
PVFS_sys_set_info.

kevin

On Nov 8, 2009, at 2:17 AM, bin dong wrote:

> Hi,
>
>     How to disable acache and ncache for pvfs2-cp
>
> Thanks.
> Bin
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users <at> beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Attachment (smime.p7s): application/pkcs7-signature, 3932 bytes
_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Kevin Harms | 9 Nov 18:24 2009

Re: pvfs2 configuration questions

Jim,

   here are settings that can be applied on a per directory basis.

# set files to use only one i/o server
setfattr -n "user.pvfs2.num_dfiles" -v "1" ./dir

# set strip size to 1MB for simple_stripe distribution.
setfattr -n "user.pvfs2.dist_params" -v "strip_size:1048576" ./dir

   I have no idea if using these settings would actually be an  
improvement in performance.

kevin

On Nov 6, 2009, at 5:18 PM, Jim Kusznir wrote:

> Hi again:
>
> The same user later came to me with a document for a toolkit she's
> trying to run on the cluster.  It said that for those running luster,
> please ensure that the luster settings are as follows:
>
> stripe size 0 (default, typically 1MB)
> stripe offset -1 (default, typically round robin)
> stripe count 1 (do not split the file onto multiple OSTs)
>
> Apparently this toolkit does a lot of IO on a lot of very small files.
> Is there something I should/could do to pvfs, at least to her
> directory, to enhance the performance of this?  Better yet, is there a
> command she can run to set this up on her directory?
>
> Thanks!
> --Jim
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users <at> beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Attachment (smime.p7s): application/pkcs7-signature, 3932 bytes
_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Sam Lang | 10 Nov 18:05 2009

Re: How to disable acache and ncache for pvfs2-cp


Bin,

Kevin is right, pvfs2-cp doesn't support disabling ncache and acache.   
The caches don't really get used for pvfs2-cp.  They may get  
initialized when the application starts, but since pvfs2-cp only  
copies a single file, any entries placed in the caches are gone once  
the process exits and starts again.

-sam

On Nov 9, 2009, at 11:03 AM, Kevin Harms wrote:

> Bin,
>
>  I don't believe pvfs2-cp supports an option for changing these  
> parameters. The code test/client/mpi-io/multi-md-test-size-sweep.c  
> does have an example of how to do this from the client using  
> PVFS_sys_set_info.
>
> kevin
>
> On Nov 8, 2009, at 2:17 AM, bin dong wrote:
>
>> Hi,
>>
>>    How to disable acache and ncache for pvfs2-cp
>>
>> Thanks.
>> Bin
>> _______________________________________________
>> Pvfs2-users mailing list
>> Pvfs2-users <at> beowulf-underground.org
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users <at> beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Phil Carns | 11 Nov 18:38 2009

Re: pvfs2-client unstable on kernel 2.6.31.4 + open-mx

Hi Ryuta,

I don't know if you are still interested at this point, but there is a 
workaround for this problem.  You just need to set the MX_IMM_ACK 
environment variable to 1 ("export MX_IMM_ACK=1") before starting 
pvfs2-client.

I'm not sure what the performance ramifications are, but it should work 
fine for testing for now.

-Phil

ryuuta wrote:
> Hi,
> 
> I'm testing pvfs2 + open-mx on my laptop.
> I was finally able to configure pvfs2 with open-mx bmi to confirm that 
> pvfs2-ping works
> and pvfs2-server is functioning. So far so good.
> 
> Bad news is that pvfs2-client behaves erratically.
> The client starts but it dies sporadically.
> 
> [ryuta <at> oroppas]$ sudo more /tmp/pvfs2-client.log
> [E 14:03:20.917081] PVFS Client Daemon Started.  Version 
> 2.8.1pre1-2009-10-20-045035
> [D 14:03:20.917680] [INFO]: Mapping pointer 0xb60c3000 for I/O.
> [D 14:03:20.935398] [INFO]: Mapping pointer 0xb6021000 for I/O.
> [E 14:04:22.635267] pvfs2-client-core with pid 4502 exited with value 255
> 
> dmesg output is
> 
> pvfs2: module version 2.8.1pre1-2009-10-20-045035 loaded
> pvfs2: pvfs2_statfs -- wait timed out; aborting attempt.
> 
> Any advice will be greatly appreciated.
> 
> Thanks,
> -ryuta
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users <at> beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Huang, Bin | 12 Nov 19:14 2009

Running pvfs2 in ramdisk, can I direct local disk drive to pvfs2 storage pool?

Hi,

I have a toy cluster consisting of one
metadata server and fifteen I/O servers
which have a total storage of 15*250GB.

In my design, I put the pvfs2.ko into a
linux ramdisk and assumed that metadata server
can have a better performance.

My question is, can I still run my pvfs2
module in a ramdisk on the I/O server and
direct my local disk drive to pvfs2? By
directing, I mean I can create a new mount
point for disk drive in /mnt/pvfs2.

I have read through the user guide but no
direct answer to that. Can someone give me
a hint?

Thanks!

Ben

_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Gmane