Tony Kew | 2 Mar 2009 18:57
Favicon

Re: PVFS v2.8.0 compile problems

Dear Phil,

Thanks for the info, the option worked for me with 7.2.1 codebase, for
what its worth.  The 7.8.0 code with my patch works so far (with the very
limited tests I have done.)  I'll be testing over several nodes this 
afternoon,
or tomorrow.

Here is a trivial patch to fix installation for RPM builds that use:
make install DESTDIR=/var/tmp/pvfs-buildroot

--- pvfs-2.8.0/Makefile.in.orig 2009-03-02 11:40:23.000000000 -0500
+++ pvfs-2.8.0/Makefile.in      2009-03-02 11:41:03.000000000 -0500
 <at>  <at>  -1094,7 +1094,7  <at>  <at>  endif
        install -d $(bindir)
        install -m 755 $(ADMINTOOLS) $(bindir)
        # for compatibility in case anyone really wants "lsplus"
-       ln -s $(bindir)/pvfs2-ls $(bindir)/pvfs2-lsplus
+       ln -s  <at> bindir <at> /pvfs2-ls $(bindir)/pvfs2-lsplus
        install -m 755 src/apps/admin/pvfs2-config $(bindir)
         <at> # if we ever auto-generate genconfig, remove the $(srcdir)
        install -m 755 $(srcdir)/src/apps/admin/pvfs2-genconfig $(bindir)

Thanks,
Tony

Tony Kew
SAN Administrator
The Center for Computational Research
New York State Center of Excellence
(Continue reading)

Phil Carns | 2 Mar 2009 20:50
Favicon

Re: PVFS v2.8.0 compile problems

Thanks Tony.  We applied a variant of your install symlink patch to cvs.

-Phil

Tony Kew wrote:
> Dear Phil,
> 
> Thanks for the info, the option worked for me with 7.2.1 codebase, for
> what its worth.  The 7.8.0 code with my patch works so far (with the very
> limited tests I have done.)  I'll be testing over several nodes this 
> afternoon,
> or tomorrow.
> 
> Here is a trivial patch to fix installation for RPM builds that use:
> make install DESTDIR=/var/tmp/pvfs-buildroot
> 
> --- pvfs-2.8.0/Makefile.in.orig 2009-03-02 11:40:23.000000000 -0500
> +++ pvfs-2.8.0/Makefile.in      2009-03-02 11:41:03.000000000 -0500
>  <at>  <at>  -1094,7 +1094,7  <at>  <at>  endif
>        install -d $(bindir)
>        install -m 755 $(ADMINTOOLS) $(bindir)
>        # for compatibility in case anyone really wants "lsplus"
> -       ln -s $(bindir)/pvfs2-ls $(bindir)/pvfs2-lsplus
> +       ln -s  <at> bindir <at> /pvfs2-ls $(bindir)/pvfs2-lsplus
>        install -m 755 src/apps/admin/pvfs2-config $(bindir)
>         <at> # if we ever auto-generate genconfig, remove the $(srcdir)
>        install -m 755 $(srcdir)/src/apps/admin/pvfs2-genconfig $(bindir)
> 
> 
> Thanks,
(Continue reading)

Myron Cheung | 2 Mar 2009 21:38
Picon

pvfs2-fsck crashes pvfs2-server

I tested pvfs2 on Debian lenny kernel 2.6.26-1-686 #1 SMP Sat Jan 10 18:29:31 UTC 2009 i686 GNU/Linux.  When I ran pvfs2-fsck, these error messages came up:

 pvfs2-fsck -m /mnt/pvfs2/
# Current FSID is 1644169005.
[E 15:11:42.724517] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Broken pipe
[E 15:11:44.730287] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:11:46.736016] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:11:48.739135] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:11:50.744279] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:11:52.750153] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:11:52.750207] *** msgpairarray_completion_fn: msgpair to server tcp://oyster.jatheon.com:3334 failed: Connection refused
[E 15:11:52.750228] *** Out of retries.
PVFS_mgmt_iterate_handles_list: Connection refused (error class: 128)
[E 15:11:52.751002] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:11:54.756904] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:11:56.763124] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:11:58.769307] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:12:00.775364] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:12:02.782929] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:12:02.782984] *** msgpairarray_completion_fn: msgpair to server tcp://oyster.jatheon.com:3334 failed: Connection refused
[E 15:12:02.783008] *** Out of retries.
[E 15:12:02.783745] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:12:04.790847] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:12:06.797528] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:12:08.804913] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:12:10.807222] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:12:12.816268] Warning: msgpair failed to tcp://oyster.jatheon.com:3334, will retry: Connection refused
[E 15:12:12.816325] *** msgpairarray_completion_fn: msgpair to server tcp://oyster.jatheon.com:3334 failed: Connection refused
[E 15:12:12.816348] *** Out of retries.


So I strace pvfs2-server and got this output when it crashed:

clock_gettime(CLOCK_REALTIME, {1236026198, 349308706}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 235, {0, 99942294}) = -1 ETIMEDOUT (Connection timed out)
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
gettimeofday({1236026198, 449537}, NULL) = 0
gettimeofday({1236026198, 449612}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026198, 449677827}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 237, {0, 99934173}) = -1 ETIMEDOUT (Connection timed out)
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
gettimeofday({1236026198, 549881}, NULL) = 0
gettimeofday({1236026198, 549942}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026198, 549997189}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 239, {0, 99944811}) = -1 ETIMEDOUT (Connection timed out)
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
gettimeofday({1236026198, 650129}, NULL) = 0
gettimeofday({1236026198, 650194}, NULL) = 0
gettimeofday({1236026198, 650252}, NULL) = 0
gettimeofday({1236026198, 650311}, NULL) = 0
gettimeofday({1236026198, 650369}, NULL) = 0
gettimeofday({1236026198, 650423}, NULL) = 0
gettimeofday({1236026198, 650474}, NULL) = 0
gettimeofday({1236026198, 650524}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026198, 650577709}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 241, {0, 99946291}) = -1 ETIMEDOUT (Connection timed out)
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
gettimeofday({1236026198, 750722}, NULL) = 0
gettimeofday({1236026198, 750774}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026198, 750831391}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 243, {0, 99942609}) = -1 ETIMEDOUT (Connection timed out)
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
gettimeofday({1236026198, 850951}, NULL) = 0
gettimeofday({1236026198, 851003}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026198, 851059553}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 245, {0, 99943447}) = -1 ETIMEDOUT (Connection timed out)
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
gettimeofday({1236026198, 951183}, NULL) = 0
gettimeofday({1236026198, 951240}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026198, 951292556}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 247, {0, 99947444}) = 0
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
brk(0x8590000)                          = 0x8590000
writev(10, [{"\277\312\0\0\4\0\0\0\1\0\0\0\0\0\0\0\310\3\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\r\0\0\0\0\0\0\0\246\3\0\0\0\0\0\0\245\3\0\0<Defa"..., 968}], 2) = 992
gettimeofday({1236026199, 41736}, NULL) = 0
gettimeofday({1236026199, 41796}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026199, 41857591}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 249, {0, 99938409}) = 0
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
gettimeofday({1236026199, 51424}, NULL) = 0
gettimeofday({1236026199, 51487}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026199, 51559118}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 251, {0, 99927882}) = 0
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
gettimeofday({1236026199, 51915}, NULL) = 0
gettimeofday({1236026199, 51977}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026199, 52049314}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 253, {0, 99927686}) = 0
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
gettimeofday({1236026199, 52359}, NULL) = 0
gettimeofday({1236026199, 52421}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026199, 52492391}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 255, {0, 99928609}) = 0
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
gettimeofday({1236026199, 52817}, NULL) = 0
gettimeofday({1236026199, 52879}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026199, 52948988}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 257, {0, 99930012}) = 0
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
writev(10, [{"\277\312\0\0\4\0\0\0\2\0\0\0\0\0\0\0X\0\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\20\0\0\0\0\0\0"..., 88}], 2) = 112
gettimeofday({1236026199, 53416}, NULL) = 0
gettimeofday({1236026199, 53479}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026199, 53550703}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 259, {0, 99928297}) = 0
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
writev(10, [{"\277\312\0\0\4\0\0\0\3\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\20\0\0\0\0\0\0\0"..., 16}], 2) = 40
gettimeofday({1236026199, 54116}, NULL) = 0
gettimeofday({1236026199, 54179}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026199, 54248338}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 261, {0, 99930662}) = 0
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
statfs64("//pvfs2-storage-space", 84, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12901535, f_bfree=10286937, f_bavail=9631577, f_files=6553600, f_ffree=6240866, f_fsid={1892488181, 343772667}, f_namelen=255, f_frsize=4096}) = 0
sysinfo({uptime=1288517, loads=[6848, 18752, 21536] totalram=746405, freeram=68070, sharedram=0, bufferram=86232} totalswap=524286, freeswap=524286, procs=195}) = 0
writev(10, [{"\277\312\0\0\4\0\0\0\4\0\0\0\0\0\0\0h\0\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0-\7\0b\0\220u/\t\0\0\0\0"..., 104}], 2) = 128
gettimeofday({1236026199, 55206}, NULL) = 0
gettimeofday({1236026199, 55267}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026199, 55328370}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 263, {0, 99938630}) = 0
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
gettimeofday({1236026199, 60157}, NULL) = 0
gettimeofday({1236026199, 60216}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026199, 60277052}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 265, {0, 99938948}) = 0
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
writev(10, [{"\277\312\0\0\4\0\0\0\5\0\0\0\0\0\0\0 <at> \0\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\25\0\0\0\0\0\0\0\375\377\377\177\0\0\0\0\0\0\0\0\4\0\0\0\1"..., 64}], 2) = 88
gettimeofday({1236026199, 60777}, NULL) = 0
gettimeofday({1236026199, 60838}, NULL) = 0
clock_gettime(CLOCK_REALTIME, {1236026199, 60903807}) = 0
futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 267, {0, 99934193}) = 0
futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
write(2, "pvfs2-server: src/io/job/job.c:61"..., 96pvfs2-server: src/io/job/job.c:6165: job_precreate_pool_iterate_handles: Assertion `fs' failed.
) = 96
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(13557, 13557, SIGABRT)           = 0
--- SIGABRT (Aborted) <at> 0 (0) ---
+++ killed by SIGABRT +++


Any suggestion or help will be much appreciated.

_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Sam Lang | 3 Mar 2009 17:40
Favicon

PVFS Release: 2.8.1


Announcing version 2.8.1 of PVFS
===========================

The PVFS project has a new major release, version 2.8.1. This release
includes important bug fixes since the 2.8.0 release.  Details can be  
found
in the ChangeLog and at:  http://www.pvfs.org/news.php#31.

You can download the release from website, or directly with the  
following link:

ftp://ftp.parl.clemson.edu/pub/pvfs2/pvfs-2.8.1.tar.gz

The checksums for this release are:

SHA1SUM: 73732ffd9008305dae83c5f156569c36836b3500  pvfs-2.8.1.tar.gz
MD5SUM:   69b6d40ed725e2a802c7624e34e841a0  pvfs-2.8.1.tar.gz

Thanks,

-PVFS v2 Development Team
Phil Carns | 3 Mar 2009 18:20
Favicon

Re: pvfs2-fsck crashes pvfs2-server

Hi Myron,

Sorry you ran into that, but fortunately we have fixed it in the 2.8.1 
release that Sam just posted a few minutes ago.  Could you try that and 
let us know if it solves the problem for you?

thanks,
-Phil

Myron Cheung wrote:
> I tested pvfs2 on Debian lenny kernel 2.6.26-1-686 #1 SMP Sat Jan 10 
> 18:29:31 UTC 2009 i686 GNU/Linux.  When I ran pvfs2-fsck, these error 
> messages came up:
> 
>  pvfs2-fsck -m /mnt/pvfs2/
> # Current FSID is 1644169005.
> [E 15:11:42.724517] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Broken pipe
> [E 15:11:44.730287] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:11:46.736016] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:11:48.739135] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:11:50.744279] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:11:52.750153] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:11:52.750207] *** msgpairarray_completion_fn: msgpair to server 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334> failed: 
> Connection refused
> [E 15:11:52.750228] *** Out of retries.
> PVFS_mgmt_iterate_handles_list: Connection refused (error class: 128)
> [E 15:11:52.751002] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:11:54.756904] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:11:56.763124] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:11:58.769307] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:12:00.775364] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:12:02.782929] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:12:02.782984] *** msgpairarray_completion_fn: msgpair to server 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334> failed: 
> Connection refused
> [E 15:12:02.783008] *** Out of retries.
> [E 15:12:02.783745] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:12:04.790847] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:12:06.797528] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:12:08.804913] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:12:10.807222] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:12:12.816268] Warning: msgpair failed to 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334>, will 
> retry: Connection refused
> [E 15:12:12.816325] *** msgpairarray_completion_fn: msgpair to server 
> tcp://oyster.jatheon.com:3334 <http://oyster.jatheon.com:3334> failed: 
> Connection refused
> [E 15:12:12.816348] *** Out of retries.
> 
> 
> So I strace pvfs2-server and got this output when it crashed:
> 
> clock_gettime(CLOCK_REALTIME, {1236026198, 349308706}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 235, {0, 99942294}) = -1 ETIMEDOUT 
> (Connection timed out)
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> gettimeofday({1236026198, 449537}, NULL) = 0
> gettimeofday({1236026198, 449612}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026198, 449677827}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 237, {0, 99934173}) = -1 ETIMEDOUT 
> (Connection timed out)
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> gettimeofday({1236026198, 549881}, NULL) = 0
> gettimeofday({1236026198, 549942}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026198, 549997189}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 239, {0, 99944811}) = -1 ETIMEDOUT 
> (Connection timed out)
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> gettimeofday({1236026198, 650129}, NULL) = 0
> gettimeofday({1236026198, 650194}, NULL) = 0
> gettimeofday({1236026198, 650252}, NULL) = 0
> gettimeofday({1236026198, 650311}, NULL) = 0
> gettimeofday({1236026198, 650369}, NULL) = 0
> gettimeofday({1236026198, 650423}, NULL) = 0
> gettimeofday({1236026198, 650474}, NULL) = 0
> gettimeofday({1236026198, 650524}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026198, 650577709}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 241, {0, 99946291}) = -1 ETIMEDOUT 
> (Connection timed out)
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> gettimeofday({1236026198, 750722}, NULL) = 0
> gettimeofday({1236026198, 750774}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026198, 750831391}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 243, {0, 99942609}) = -1 ETIMEDOUT 
> (Connection timed out)
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> gettimeofday({1236026198, 850951}, NULL) = 0
> gettimeofday({1236026198, 851003}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026198, 851059553}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 245, {0, 99943447}) = -1 ETIMEDOUT 
> (Connection timed out)
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> gettimeofday({1236026198, 951183}, NULL) = 0
> gettimeofday({1236026198, 951240}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026198, 951292556}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 247, {0, 99947444}) = 0
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> brk(0x8590000)                          = 0x8590000
> writev(10, 
> [{"\277\312\0\0\4\0\0\0\1\0\0\0\0\0\0\0\310\3\0\0\0\0\0\0"..., 24}, 
> {"p\27\0\0\2\0\0\0\r\0\0\0\0\0\0\0\246\3\0\0\0\0\0\0\245\3\0\0<Defa"..., 
> 968}], 2) = 992
> gettimeofday({1236026199, 41736}, NULL) = 0
> gettimeofday({1236026199, 41796}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026199, 41857591}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 249, {0, 99938409}) = 0
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 
> 0, FUTEX_OP_CMP_GT, 1}) = 1
> gettimeofday({1236026199, 51424}, NULL) = 0
> gettimeofday({1236026199, 51487}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026199, 51559118}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 251, {0, 99927882}) = 0
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 
> 0, FUTEX_OP_CMP_GT, 1}) = 1
> gettimeofday({1236026199, 51915}, NULL) = 0
> gettimeofday({1236026199, 51977}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026199, 52049314}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 253, {0, 99927686}) = 0
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 
> 0, FUTEX_OP_CMP_GT, 1}) = 1
> gettimeofday({1236026199, 52359}, NULL) = 0
> gettimeofday({1236026199, 52421}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026199, 52492391}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 255, {0, 99928609}) = 0
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 
> 0, FUTEX_OP_CMP_GT, 1}) = 1
> gettimeofday({1236026199, 52817}, NULL) = 0
> gettimeofday({1236026199, 52879}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026199, 52948988}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 257, {0, 99930012}) = 0
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> writev(10, [{"\277\312\0\0\4\0\0\0\2\0\0\0\0\0\0\0X\0\0\0\0\0\0\0"..., 
> 24}, 
> {"p\27\0\0\2\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\20\0\0\0\0\0\0"..., 
> 88}], 2) = 112
> gettimeofday({1236026199, 53416}, NULL) = 0
> gettimeofday({1236026199, 53479}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026199, 53550703}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 259, {0, 99928297}) = 0
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> writev(10, [{"\277\312\0\0\4\0\0\0\3\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0"..., 
> 24}, {"p\27\0\0\2\0\0\0\20\0\0\0\0\0\0\0"..., 16}], 2) = 40
> gettimeofday({1236026199, 54116}, NULL) = 0
> gettimeofday({1236026199, 54179}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026199, 54248338}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 261, {0, 99930662}) = 0
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> statfs64("//pvfs2-storage-space", 84, {f_type="EXT2_SUPER_MAGIC", 
> f_bsize=4096, f_blocks=12901535, f_bfree=10286937, f_bavail=9631577, 
> f_files=6553600, f_ffree=6240866, f_fsid={1892488181, 343772667}, 
> f_namelen=255, f_frsize=4096}) = 0
> sysinfo({uptime=1288517, loads=[6848, 18752, 21536] totalram=746405, 
> freeram=68070, sharedram=0, bufferram=86232} totalswap=524286, 
> freeswap=524286, procs=195}) = 0
> writev(10, [{"\277\312\0\0\4\0\0\0\4\0\0\0\0\0\0\0h\0\0\0\0\0\0\0"..., 
> 24}, 
> {"p\27\0\0\2\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0-\7\0b\0\220u/\t\0\0\0\0"..., 
> 104}], 2) = 128
> gettimeofday({1236026199, 55206}, NULL) = 0
> gettimeofday({1236026199, 55267}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026199, 55328370}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 263, {0, 99938630}) = 0
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 
> 0, FUTEX_OP_CMP_GT, 1}) = 1
> gettimeofday({1236026199, 60157}, NULL) = 0
> gettimeofday({1236026199, 60216}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026199, 60277052}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 265, {0, 99938948}) = 0
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> writev(10, [{"\277\312\0\0\4\0\0\0\5\0\0\0\0\0\0\0 <at> \0\0\0\0\0\0\0"..., 
> 24}, 
> {"p\27\0\0\2\0\0\0\25\0\0\0\0\0\0\0\375\377\377\177\0\0\0\0\0\0\0\0\4\0\0\0\1"..., 
> 64}], 2) = 88
> gettimeofday({1236026199, 60777}, NULL) = 0
> gettimeofday({1236026199, 60838}, NULL) = 0
> clock_gettime(CLOCK_REALTIME, {1236026199, 60903807}) = 0
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 267, {0, 99934193}) = 0
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
> write(2, "pvfs2-server: src/io/job/job.c:61"..., 96pvfs2-server: 
> src/io/job/job.c:6165: job_precreate_pool_iterate_handles: Assertion 
> `fs' failed.
> ) = 96
> rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
> tgkill(13557, 13557, SIGABRT)           = 0
> --- SIGABRT (Aborted)  <at>  0 (0) ---
> +++ killed by SIGABRT +++
> 
> 
> Any suggestion or help will be much appreciated.
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Pvfs2-users mailing list
> Pvfs2-users <at> beowulf-underground.org
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
onyx.peridot | 3 Mar 2009 19:25
Picon

Re: Re: pvfs2-fsck crashes pvfs2-server

Hi Phil,

Many thanks for the quick response. I'll try 2.8.1.

Myron

On Mar 3, 2009 12:20pm, Phil Carns <carns <at> mcs.anl.gov> wrote:
> Hi Myron,
>
>
>
> Sorry you ran into that, but fortunately we have fixed it in the 2.8.1 release that Sam just posted a few minutes ago.  Could you try that and let us know if it solves the problem for you?
>
>
>
> thanks,
>
> -Phil
>
>
>
> Myron Cheung wrote:
>
>
> I tested pvfs2 on Debian lenny kernel 2.6.26-1-686 #1 SMP Sat Jan 10 18:29:31 UTC 2009 i686 GNU/Linux.  When I ran pvfs2-fsck, these error messages came up:
>
>
>
>  pvfs2-fsck -m /mnt/pvfs2/
>
> # Current FSID is 1644169005.
>
> [E 15:11:42.724517] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Broken pipe
>
> [E 15:11:44.730287] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:46.736016] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:48.739135] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:50.744279] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:52.750153] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:52.750207] *** msgpairarray_completion_fn: msgpair to server tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334> failed: Connection refused
>
> [E 15:11:52.750228] *** Out of retries.
>
> PVFS_mgmt_iterate_handles_list: Connection refused (error class: 128)
>
> [E 15:11:52.751002] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:54.756904] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:56.763124] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:58.769307] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:00.775364] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:02.782929] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:02.782984] *** msgpairarray_completion_fn: msgpair to server tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334> failed: Connection refused
>
> [E 15:12:02.783008] *** Out of retries.
>
> [E 15:12:02.783745] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:04.790847] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:06.797528] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:08.804913] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:10.807222] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:12.816268] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:12.816325] *** msgpairarray_completion_fn: msgpair to server tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334> failed: Connection refused
>
> [E 15:12:12.816348] *** Out of retries.
>
>
>
>
>
> So I strace pvfs2-server and got this output when it crashed:
>
>
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 349308706}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 235, {0, 99942294}) = -1 ETIMEDOUT (Connection timed out)
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> gettimeofday({1236026198, 449537}, NULL) = 0
>
> gettimeofday({1236026198, 449612}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 449677827}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 237, {0, 99934173}) = -1 ETIMEDOUT (Connection timed out)
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> gettimeofday({1236026198, 549881}, NULL) = 0
>
> gettimeofday({1236026198, 549942}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 549997189}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 239, {0, 99944811}) = -1 ETIMEDOUT (Connection timed out)
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> gettimeofday({1236026198, 650129}, NULL) = 0
>
> gettimeofday({1236026198, 650194}, NULL) = 0
>
> gettimeofday({1236026198, 650252}, NULL) = 0
>
> gettimeofday({1236026198, 650311}, NULL) = 0
>
> gettimeofday({1236026198, 650369}, NULL) = 0
>
> gettimeofday({1236026198, 650423}, NULL) = 0
>
> gettimeofday({1236026198, 650474}, NULL) = 0
>
> gettimeofday({1236026198, 650524}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 650577709}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 241, {0, 99946291}) = -1 ETIMEDOUT (Connection timed out)
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> gettimeofday({1236026198, 750722}, NULL) = 0
>
> gettimeofday({1236026198, 750774}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 750831391}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 243, {0, 99942609}) = -1 ETIMEDOUT (Connection timed out)
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> gettimeofday({1236026198, 850951}, NULL) = 0
>
> gettimeofday({1236026198, 851003}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 851059553}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 245, {0, 99943447}) = -1 ETIMEDOUT (Connection timed out)
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> gettimeofday({1236026198, 951183}, NULL) = 0
>
> gettimeofday({1236026198, 951240}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 951292556}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 247, {0, 99947444}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> brk(0x8590000)                          = 0x8590000
>
> writev(10, [{"\277\312\0\0\4\0\0\0\1\0\0\0\0\0\0\0\310\3\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\r\0\0\0\0\0\0\0\246\3\0\0\0\0\0\0\245\3\0\0
> gettimeofday({1236026199, 41736}, NULL) = 0
>
> gettimeofday({1236026199, 41796}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 41857591}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 249, {0, 99938409}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>
> gettimeofday({1236026199, 51424}, NULL) = 0
>
> gettimeofday({1236026199, 51487}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 51559118}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 251, {0, 99927882}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>
> gettimeofday({1236026199, 51915}, NULL) = 0
>
> gettimeofday({1236026199, 51977}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 52049314}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 253, {0, 99927686}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>
> gettimeofday({1236026199, 52359}, NULL) = 0
>
> gettimeofday({1236026199, 52421}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 52492391}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 255, {0, 99928609}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>
> gettimeofday({1236026199, 52817}, NULL) = 0
>
> gettimeofday({1236026199, 52879}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 52948988}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 257, {0, 99930012}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> writev(10, [{"\277\312\0\0\4\0\0\0\2\0\0\0\0\0\0\0X\0\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\20\0\0\0\0\0\0"..., 88}], 2) = 112
>
> gettimeofday({1236026199, 53416}, NULL) = 0
>
> gettimeofday({1236026199, 53479}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 53550703}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 259, {0, 99928297}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> writev(10, [{"\277\312\0\0\4\0\0\0\3\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\20\0\0\0\0\0\0\0"..., 16}], 2) = 40
>
> gettimeofday({1236026199, 54116}, NULL) = 0
>
> gettimeofday({1236026199, 54179}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 54248338}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 261, {0, 99930662}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> statfs64("//pvfs2-storage-space", 84, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12901535, f_bfree=10286937, f_bavail=9631577, f_files=6553600, f_ffree=6240866, f_fsid={1892488181, 343772667}, f_namelen=255, f_frsize=4096}) = 0
>
> sysinfo({uptime=1288517, loads=[6848, 18752, 21536] totalram=746405, freeram=68070, sharedram=0, bufferram=86232} totalswap=524286, freeswap=524286, procs=195}) = 0
>
> writev(10, [{"\277\312\0\0\4\0\0\0\4\0\0\0\0\0\0\0h\0\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0-\7\0b\0\220u/\t\0\0\0\0"..., 104}], 2) = 128
>
> gettimeofday({1236026199, 55206}, NULL) = 0
>
> gettimeofday({1236026199, 55267}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 55328370}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 263, {0, 99938630}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>
> gettimeofday({1236026199, 60157}, NULL) = 0
>
> gettimeofday({1236026199, 60216}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 60277052}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 265, {0, 99938948}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> writev(10, [{"\277\312\0\0\4\0\0\0\5\0\0\0\0\0\0\0 <at> \0\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\25\0\0\0\0\0\0\0\375\377\377\177\0\0\0\0\0\0\0\0\4\0\0\0\1"..., 64}], 2) = 88
>
> gettimeofday({1236026199, 60777}, NULL) = 0
>
> gettimeofday({1236026199, 60838}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 60903807}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 267, {0, 99934193}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> write(2, "pvfs2-server: src/io/job/job.c:61"..., 96pvfs2-server: src/io/job/job.c:6165: job_precreate_pool_iterate_handles: Assertion `fs' failed.
>
> ) = 96
>
> rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
>
> tgkill(13557, 13557, SIGABRT)           = 0
>
> --- SIGABRT (Aborted) <at> 0 (0) ---
>
> +++ killed by SIGABRT +++
>
>
>
>
>
> Any suggestion or help will be much appreciated.
>
>
>
>
>
> ------------------------------------------------------------------------
>
>
>
> _______________________________________________
>
> Pvfs2-users mailing list
>
> Pvfs2-users <at> beowulf-underground.org
>
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
>
>

_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Myron Cheung | 4 Mar 2009 02:03
Picon

Re: Re: pvfs2-fsck crashes pvfs2-server

Hi Phil,

2.8.1 works great!  Only one more minor thing,  there is a typo at src/apps/fuse/pvfs2fuse.c:1050.  A semicolon is missing.

Thanks again for your great efforts on PVFS2.

Myron

On Tue, Mar 3, 2009 at 1:25 PM, <onyx.peridot <at> gmail.com> wrote:
Hi Phil,

Many thanks for the quick response. I'll try 2.8.1.

Myron


On Mar 3, 2009 12:20pm, Phil Carns <carns <at> mcs.anl.gov> wrote:
> Hi Myron,
>
>
>
> Sorry you ran into that, but fortunately we have fixed it in the 2.8.1 release that Sam just posted a few minutes ago.  Could you try that and let us know if it solves the problem for you?
>
>
>
> thanks,
>
> -Phil
>
>
>
> Myron Cheung wrote:
>
>
> I tested pvfs2 on Debian lenny kernel 2.6.26-1-686 #1 SMP Sat Jan 10 18:29:31 UTC 2009 i686 GNU/Linux.  When I ran pvfs2-fsck, these error messages came up:
>
>
>
>  pvfs2-fsck -m /mnt/pvfs2/
>
> # Current FSID is 1644169005.
>
> [E 15:11:42.724517] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Broken pipe
>
> [E 15:11:44.730287] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:46.736016] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:48.739135] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:50.744279] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:52.750153] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:52.750207] *** msgpairarray_completion_fn: msgpair to server tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334> failed: Connection refused
>
> [E 15:11:52.750228] *** Out of retries.
>
> PVFS_mgmt_iterate_handles_list: Connection refused (error class: 128)
>
> [E 15:11:52.751002] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:54.756904] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:56.763124] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:11:58.769307] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:00.775364] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:02.782929] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:02.782984] *** msgpairarray_completion_fn: msgpair to server tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334> failed: Connection refused
>
> [E 15:12:02.783008] *** Out of retries.
>
> [E 15:12:02.783745] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:04.790847] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:06.797528] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:08.804913] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:10.807222] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:12.816268] Warning: msgpair failed to tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334>, will retry: Connection refused
>
> [E 15:12:12.816325] *** msgpairarray_completion_fn: msgpair to server tcp://oyster.jatheon.com:3334 http://oyster.jatheon.com:3334> failed: Connection refused
>
> [E 15:12:12.816348] *** Out of retries.
>
>
>
>
>
> So I strace pvfs2-server and got this output when it crashed:
>
>
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 349308706}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 235, {0, 99942294}) = -1 ETIMEDOUT (Connection timed out)
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> gettimeofday({1236026198, 449537}, NULL) = 0
>
> gettimeofday({1236026198, 449612}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 449677827}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 237, {0, 99934173}) = -1 ETIMEDOUT (Connection timed out)
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> gettimeofday({1236026198, 549881}, NULL) = 0
>
> gettimeofday({1236026198, 549942}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 549997189}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 239, {0, 99944811}) = -1 ETIMEDOUT (Connection timed out)
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> gettimeofday({1236026198, 650129}, NULL) = 0
>
> gettimeofday({1236026198, 650194}, NULL) = 0
>
> gettimeofday({1236026198, 650252}, NULL) = 0
>
> gettimeofday({1236026198, 650311}, NULL) = 0
>
> gettimeofday({1236026198, 650369}, NULL) = 0
>
> gettimeofday({1236026198, 650423}, NULL) = 0
>
> gettimeofday({1236026198, 650474}, NULL) = 0
>
> gettimeofday({1236026198, 650524}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 650577709}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 241, {0, 99946291}) = -1 ETIMEDOUT (Connection timed out)
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> gettimeofday({1236026198, 750722}, NULL) = 0
>
> gettimeofday({1236026198, 750774}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 750831391}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 243, {0, 99942609}) = -1 ETIMEDOUT (Connection timed out)
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> gettimeofday({1236026198, 850951}, NULL) = 0
>
> gettimeofday({1236026198, 851003}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 851059553}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 245, {0, 99943447}) = -1 ETIMEDOUT (Connection timed out)
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> gettimeofday({1236026198, 951183}, NULL) = 0
>
> gettimeofday({1236026198, 951240}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026198, 951292556}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 247, {0, 99947444}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> brk(0x8590000)                          = 0x8590000
>
> writev(10, [{"\277\312\0\0\4\0\0\0\1\0\0\0\0\0\0\0\310\3\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\r\0\0\0\0\0\0\0\246\3\0\0\0\0\0\0\245\3\0\0
> gettimeofday({1236026199, 41736}, NULL) = 0
>
> gettimeofday({1236026199, 41796}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 41857591}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 249, {0, 99938409}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>
> gettimeofday({1236026199, 51424}, NULL) = 0
>
> gettimeofday({1236026199, 51487}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 51559118}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 251, {0, 99927882}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>
> gettimeofday({1236026199, 51915}, NULL) = 0
>
> gettimeofday({1236026199, 51977}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 52049314}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 253, {0, 99927686}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>
> gettimeofday({1236026199, 52359}, NULL) = 0
>
> gettimeofday({1236026199, 52421}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 52492391}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 255, {0, 99928609}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>
> gettimeofday({1236026199, 52817}, NULL) = 0
>
> gettimeofday({1236026199, 52879}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 52948988}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 257, {0, 99930012}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> writev(10, [{"\277\312\0\0\4\0\0\0\2\0\0\0\0\0\0\0X\0\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\20\0\0\0\0\0\0"..., 88}], 2) = 112
>
> gettimeofday({1236026199, 53416}, NULL) = 0
>
> gettimeofday({1236026199, 53479}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 53550703}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 259, {0, 99928297}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> writev(10, [{"\277\312\0\0\4\0\0\0\3\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\20\0\0\0\0\0\0\0"..., 16}], 2) = 40
>
> gettimeofday({1236026199, 54116}, NULL) = 0
>
> gettimeofday({1236026199, 54179}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 54248338}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 261, {0, 99930662}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> statfs64("//pvfs2-storage-space", 84, {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=12901535, f_bfree=10286937, f_bavail=9631577, f_files=6553600, f_ffree=6240866, f_fsid={1892488181, 343772667}, f_namelen=255, f_frsize=4096}) = 0
>
> sysinfo({uptime=1288517, loads=[6848, 18752, 21536] totalram=746405, freeram=68070, sharedram=0, bufferram=86232} totalswap=524286, freeswap=524286, procs=195}) = 0
>
> writev(10, [{"\277\312\0\0\4\0\0\0\4\0\0\0\0\0\0\0h\0\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0-\7\0b\0\220u/\t\0\0\0\0"..., 104}], 2) = 128
>
> gettimeofday({1236026199, 55206}, NULL) = 0
>
> gettimeofday({1236026199, 55267}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 55328370}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 263, {0, 99938630}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> futex(0x80f3464, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x80f3460, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
>
> gettimeofday({1236026199, 60157}, NULL) = 0
>
> gettimeofday({1236026199, 60216}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 60277052}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 265, {0, 99938948}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> writev(10, [{"\277\312\0\0\4\0\0\0\5\0\0\0\0\0\0\0 <at> \0\0\0\0\0\0\0"..., 24}, {"p\27\0\0\2\0\0\0\25\0\0\0\0\0\0\0\375\377\377\177\0\0\0\0\0\0\0\0\4\0\0\0\1"..., 64}], 2) = 88
>
> gettimeofday({1236026199, 60777}, NULL) = 0
>
> gettimeofday({1236026199, 60838}, NULL) = 0
>
> clock_gettime(CLOCK_REALTIME, {1236026199, 60903807}) = 0
>
> futex(0x80ea6a4, FUTEX_WAIT_PRIVATE, 267, {0, 99934193}) = 0
>
> futex(0x80ea638, FUTEX_WAKE_PRIVATE, 1) = 0
>
> write(2, "pvfs2-server: src/io/job/job.c:61"..., 96pvfs2-server: src/io/job/job.c:6165: job_precreate_pool_iterate_handles: Assertion `fs' failed.
>
> ) = 96
>
> rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
>
> tgkill(13557, 13557, SIGABRT)           = 0
>
> --- SIGABRT (Aborted) <at> 0 (0) ---
>
> +++ killed by SIGABRT +++
>
>
>
>
>
> Any suggestion or help will be much appreciated.
>
>
>
>
>
> ------------------------------------------------------------------------
>
>
>
> _______________________________________________
>
> Pvfs2-users mailing list
>
> Pvfs2-users <at> beowulf-underground.org
>
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>
>
>

_______________________________________________
Pvfs2-users mailing list
Pvfs2-users <at> beowulf-underground.org
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
belcampo | 4 Mar 2009 12:25
Picon

kmod building problem with 2.8.1 and 2.6.28 kernel

Downloaded 2.8.1 as of yesterday, everything builds fine except for the 
kernel module.

./configure --prefix=/usr --kernel=/path_to_current_kernel
make just_kmod results in:

  CC [M]  /spvfs/sources/pvfs2/pvfs-2.8.1/src/kernel/linux-2.6/pvfs2-utils.o
In file included from 
/spvfs/sources/pvfs2/pvfs-2.8.1/src/kernel/linux-2.6/pvfs2-utils.c:8:
/spvfs/sources/pvfs2/pvfs-2.8.1/src/kernel/linux-2.6/pvfs2-kernel.h:1272: 
error: conflicting types for ‘kzalloc’
include/linux/slab.h:304: error: previous definition of ‘kzalloc’ was here
/spvfs/sources/pvfs2/pvfs-2.8.1/src/kernel/linux-2.6/pvfs2-utils.c: In 
function ‘pvfs2_inode_removexattr’:
/spvfs/sources/pvfs2/pvfs-2.8.1/src/kernel/linux-2.6/pvfs2-utils.c:1019: 
error: ‘XATTR_REPLACE’ undeclared (first use in this function)
/spvfs/sources/pvfs2/pvfs-2.8.1/src/kernel/linux-2.6/pvfs2-utils.c:1019: 
error: (Each undeclared identifier is reported only once
/spvfs/sources/pvfs2/pvfs-2.8.1/src/kernel/linux-2.6/pvfs2-utils.c:1019: 
error: for each function it appears in.)
make[3]: *** 
[/spvfs/sources/pvfs2/pvfs-2.8.1/src/kernel/linux-2.6/pvfs2-utils.o] Error 1
make[2]: *** 
[_module_/spvfs/sources/pvfs2/pvfs-2.8.1/src/kernel/linux-2.6] Error 2
make[1]: *** [default] Error 2
make: *** [just_kmod] Error 2

Searched the list for conflicting 'kzalloc' and changed
according to the mail with
Subject: Re: [Pvfs2-users] building kmod on linux-2.6.22
where pvfs2-config.h and src/kernel/linux-2.6/Makefile have to be 
changed and exported CPATH to my configuration, but to no avail.
After this changes I get a lot of
/spvfs/sources/pvfs2/pvfs-2.8.1/src/kernel/linux-2.6/upcall.h:47: error: 
expected specifier-qualifier-list before ‘PVFS_object_ref’
.....
/spvfs/sources/pvfs2/pvfs-2.8.1/src/kernel/linux-2.6/downcall.h:35: 
error: expected specifier-qualifier-list before ‘PVFS_object_ref’
.....
/spvfs/sources/pvfs2/pvfs-2.8.1/src/kernel/linux-2.6/pvfs2-dev-proto.h:89: 
error: expected specifier-qualifier-list before ‘PVFS_offset’
.....
/spvfs/sources/pvfs2/pvfs-2.8.1/src/kernel/linux-2.6/pvfs2-kernel.h:120:25: 
error: pvfs2-types.h: No such file or directory
but pvfs2-types.h can be found at ./include/pvfs2-types.h

And a whole lot more.

Could someone be so nice and help me with this.
Kernel version is 2.6.28

Henk Schoneveld
Bradley Settlemyer | 5 Mar 2009 01:15

MX help

Hello

  I am trying to use PAV to run pvfs with the MX protocol.  I've
updated pav so that servers start and ping correctly.  But when I try
and run an mpi code, I'm getting client timeouts like the client
cannot contact the servers:

Lots of this stuff:

[E 19:11:02.573509] job_time_mgr_expire: job time out: cancelling bmi
operation, job_id: 3.
[E 19:11:02.583659] msgpair failed, will retry: Operation cancelled
(possibly due to timeout)

I have no problem acknowledging that I've done something wrong, but I
don't know how to debug MX at all.  Any pointers to at least get me
started?

Cheers,
brad
Robert Latham | 5 Mar 2009 14:46
Favicon

Re: MX help

On Wed, Mar 04, 2009 at 07:15:24PM -0500, Bradley Settlemyer wrote:
> Hello
> 
>   I am trying to use PAV to run pvfs with the MX protocol.  I've
> updated pav so that servers start and ping correctly.  But when I try
> and run an mpi code, I'm getting client timeouts like the client
> cannot contact the servers:
> 
> Lots of this stuff:
> 
> [E 19:11:02.573509] job_time_mgr_expire: job time out: cancelling bmi
> operation, job_id: 3.
> [E 19:11:02.583659] msgpair failed, will retry: Operation cancelled
> (possibly due to timeout)

OK, so pvfs utilities are all hunky-dory? not just pvfs2-ping but
pvfs2-cp and pvfs2-ls? 

On Jazz, I usually configure MPICH2 to communicate over TCP and have
the PVFS system interface communicate over MX.  This keeps the
situation fairly simple, but of course you get awful MPI performance.

Does MX still have the "ports" restriction that GM has?  I wonder if
MPI communication is getting in the way of PVFS communication...

In short, I don't exactly know what's wrong myself.  just tossing out
some theories.

==rob

--

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B

Gmane