Fernando Lemos | 1 Mar 2010 03:11
Picon

[OMPI users] Segfault in ompi-restart (ft-enable-cr)

Hello,

I'm trying to come up with a fault tolerant OpenMPI setup for research
purposes. I'm doing some tests now, but I'm stuck with a segfault when
I try to restart my test program from a checkpoint.

My test program is the "ring" program, where messages are sent to the
next node in the ring N times. It's pretty simple, I can supply the
source code if needed. I'm running it like this:

# mpirun -np 4 -am ft-enable-cr ring
...
>>> Process 1 sending 703 to 2
>>> Process 3 received 704
>>> Process 3 sending 704 to 0
>>> Process 3 received 703
>>> Process 3 sending 703 to 0
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 18358 on node debian1
exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
4 total processes killed (some possibly by mpirun during cleanup)

That's the output when I ompi-checkpoint the mpirun PID from another terminal.

The checkpoint is taken just fine in maybe 1.5 seconds. I can see the
checkpoint directory has been created in $HOME.

This is what I get when I try to run ompi-restart

(Continue reading)

David Turner | 1 Mar 2010 09:42
Favicon

[OMPI users] sm btl choices

Hi all,

Running on a large cluster of 8-core nodes.  I understand
that the SM BTL is a "good thing".  But I'm curious about
its use of memory-mapped files.  I believe these files will
be in $TMPDIR, which defaults to /tmp.

In our cluster, the compute nodes are stateless, so /tmp
is actually in RAM.  Keeping memory-mapped "files" in
memory seems kind of circular, although I know little
about these things.  A bigger problem is that it appears
OMPI does not remove the files upon completion.

Another option is to redefine $TMPDIR to point to a
"real" file system.  In our cluster, all the available
file systems are accessed over the IB fabric.  So it
seems that there will be IB traffic, even though the
point of the SM BTL is to avoid this traffic.

Given the above two constraints, might it just be
better to disable the SM BTL entirely, and use the
IB BTL even within a node?  Of course, the "self"
BTL should still be used if appropriate.

Any thoughts clarifying these issues would be
greatly appreciated.  Thanks!

--

-- 
Best regards,

(Continue reading)

Ralph Castain | 1 Mar 2010 10:51
Favicon
Gravatar

Re: [OMPI users] sm btl choices

Which version of OMPI are you using? We know that the 1.2 series was unreliable about removing the session
directories, but 1.3 and above appear to be quite good about it. If you are having problems with the 1.3 or
1.4 series, I would definitely like to know about it.

When I was at LANL, I ran a number of tests in exactly this configuration. While the sm btl did provide some
performance advantage, it wasn't very much (the bandwidth was only about 10% greater, and the latency
wasn't all that different either). I set the default configuration for users to include sm as 10% isn't
something to sneer at, but you could disable it without an enormous impact.

Another option would be to run an epilog that hammers the session directory. That's what LANL does, even
though we didn't see much trouble with cleanup starting with the 1.3 series (still have a bunch of users
stuck on 1.2). Depending on what environment you are running, you might contact folks there and get a copy
of their epilog script.

On Mar 1, 2010, at 1:42 AM, David Turner wrote:

> Hi all,
> 
> Running on a large cluster of 8-core nodes.  I understand
> that the SM BTL is a "good thing".  But I'm curious about
> its use of memory-mapped files.  I believe these files will
> be in $TMPDIR, which defaults to /tmp.
> 
> In our cluster, the compute nodes are stateless, so /tmp
> is actually in RAM.  Keeping memory-mapped "files" in
> memory seems kind of circular, although I know little
> about these things.  A bigger problem is that it appears
> OMPI does not remove the files upon completion.
> 
> Another option is to redefine $TMPDIR to point to a
(Continue reading)

Federico Golfrè Andreasi | 1 Mar 2010 11:51
Picon

Re: [OMPI users] Number of processes and spawn

Ok, thank you !

where can I found instructions for download the developer's copy of OpenMPI, if it is possibile?

I'd like to test it just to be sure that the problem is solved, with that patch.

Can you let me know where that patch is available?

Thank you very much,

Federico




2010/2/27 Ralph Castain <rhc <at> open-mpi.org>
Okay, thanks. It's the same problem as the other person encountered. Basically, it looks to OMPI as if you are launching > 128 independent app contexts, and our arrays were limited to 128.

He has provided a patch that I'll review (couple of things I'd rather change) and then apply to our developer's trunk. I would expect it to migrate over to the 1.4 release series at some point (can't guarantee which one).


On Feb 27, 2010, at 6:47 AM, Federico Golfrè Andreasi wrote:

Hi,

the program is executed as one application on 129 cpus defined by the hostfile.
Than rank 0, inside the code, execute another program with 129 cpus, with a one-to-one relation, rank0 of the spawined process runs on the same host of rank0 of the spawning one and so on...
Excuting the spawning program does not give any problem,
but in the moment of spawning (with more than 128 cpus) it holds.

Thank you!

Federico




2010/2/27 Ralph Castain <rhc <at> open-mpi.org>
Since another user was doing something that caused a similar problem, perhaps we are missing a key piece of info here. Are you launching one app_context across 128 nodes? Or are you launching 128 app_contexts, each on a separate node?


On Feb 26, 2010, at 10:23 AM, Federico Golfrè Andreasi wrote:

I'm doing some tests and it seems that is not able to do a spawn multiple with more than 128 nodes.

It just hold, with no error message.

What do you think? What can I try to understand the problem.

Thanks,

Federico




2010/2/26 Ralph Castain <rhc <at> open-mpi.org>
No known limitations of which we are aware...the variables are all set to int32_t, so INT32_MAX would be the only limit I can imagine. In which case, you'll run out of memory long before you hit it.


2010/2/26 Federico Golfrè Andreasi <federico.golfre <at> gmail.com>
HI !

have you ever did some analysis to understand if there is a limitation in the number of nodes usable with OpenMPI-v1.4 ?
Using also the functions MPI_Comm_spawn o MPI_Comm_spawn_multiple.

Thanks,
   Federico

_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
Timur Magomedov | 1 Mar 2010 11:55
Picon
Favicon

Re: [OMPI users] Segmentation fault when Send/Recv on heterogeneous cluster (32/64 bit machines)

Hello.
It looks like you allocate memory in every loop iteration on process #0
and doesn't free it so malloc fails on some iteration.

В Вск, 28/02/2010 в 19:22 +0100, TRINH Minh Hieu пишет:
> Hello,
> 
> I have some problems running MPI on my heterogeneous cluster. More
> precisley i got segmentation fault when sending a large array (about
> 10000) of double from a i686 machine to a x86_64 machine. It does not
> happen with small array. Here is the send/recv code source (complete
> source is in attached file) :
> ========code ================
>     if (me == 0 ) {
> 	for (int pe=1; pe<nprocs; pe++)
> 	{
> 		printf("Receiving from proc %d : ",pe); fflush(stdout);
> 	    d=(double *)malloc(sizeof(double)*n);
> 	    MPI_Recv(d,n,MPI_DOUBLE,pe,999,MPI_COMM_WORLD,&status);
> 	    printf("OK\n"); fflush(stdout);
> 	}
> 	printf("All done.\n");
>     }
>     else {
>       d=(double *)malloc(sizeof(double)*n);
>       MPI_Send(d,n,MPI_DOUBLE,0,999,MPI_COMM_WORLD);
>     }
> ======== code ================
> 
> I got segmentation fault with n=10000 but no error with n=1000
> I have 2 machines :
> sbtn155 : Intel Xeon,         x86_64
> sbtn211 : Intel Pentium 4, i686
> 
> The code is compiled in x86_64 and i686 machine, using OpenMPI 1.4.1,
> installed in /tmp/openmpi :
> [mhtrinh <at> sbtn211 heterogenous]$ make hetero
> gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o hetero.i686.o
> /tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include
> hetero.i686.o -o hetero.i686 -lm
> 
> [mhtrinh <at> sbtn155 heterogenous]$ make hetero
> gcc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include -c hetero.c -o hetero.x86_64.o
> /tmp/openmpi/bin/mpicc -Wall -I. -std=c99 -O3 -I/tmp/openmpi/include
> hetero.x86_64.o -o hetero.x86_64 -lm
> 
> I run with the code using appfile and got thoses error :
> $ cat appfile
> --host sbtn155 -np 1 hetero.x86_64
> --host sbtn155 -np 1 hetero.x86_64
> --host sbtn211 -np 1 hetero.i686
> 
> $ mpirun -hetero --app appfile
> Input array length :
> 10000
> Receiving from proc 1 : OK
> Receiving from proc 2 : [sbtn155:26386] *** Process received signal ***
> [sbtn155:26386] Signal: Segmentation fault (11)
> [sbtn155:26386] Signal code: Address not mapped (1)
> [sbtn155:26386] Failing at address: 0x200627bd8
> [sbtn155:26386] [ 0] /lib64/libpthread.so.0 [0x3fa4e0e540]
> [sbtn155:26386] [ 1] /tmp/openmpi/lib/openmpi/mca_pml_ob1.so [0x2aaaad8d7908]
> [sbtn155:26386] [ 2] /tmp/openmpi/lib/openmpi/mca_btl_tcp.so [0x2aaaae2fc6e3]
> [sbtn155:26386] [ 3] /tmp/openmpi/lib/libopen-pal.so.0 [0x2aaaaafe39db]
> [sbtn155:26386] [ 4]
> /tmp/openmpi/lib/libopen-pal.so.0(opal_progress+0x9e) [0x2aaaaafd8b9e]
> [sbtn155:26386] [ 5] /tmp/openmpi/lib/openmpi/mca_pml_ob1.so [0x2aaaad8d4b25]
> [sbtn155:26386] [ 6] /tmp/openmpi/lib/libmpi.so.0(MPI_Recv+0x13b)
> [0x2aaaaab30f9b]
> [sbtn155:26386] [ 7] hetero.x86_64(main+0xde) [0x400cbe]
> [sbtn155:26386] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3fa421e074]
> [sbtn155:26386] [ 9] hetero.x86_64 [0x400b29]
> [sbtn155:26386] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 26386 on node sbtn155
> exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 
> Am I missing an option in order to run in heterogenous cluster ?
> MPI_Send/Recv have limit array size when using heterogeneous cluster ?
> Thanks for your help. Regards
> 
> _______________________________________________
> users mailing list
> users <at> open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--

-- 
Kind regards,
Timur Magomedov
Senior C++ Developer
DevelopOnBox LLC / Zodiac Interactive
http://www.zodiac.tv/

_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
TRINH Minh Hieu | 1 Mar 2010 14:55
Picon

Re: [OMPI users] Segmentation fault when Send/Recv on heterogeneous cluster (32/64 bit machines)

Hi,

The problem is not there.
I put a "free" and check for return value of malloc but still have the
segfault. (source code updated in attach)

I discovered that array size to send is limited to 64kB. If I send
8192 x double : it's ok. But more will cause segfault. I also changed
in order to send float than double: In that case, I can send an array
of 16384 float (64kB) but no more.
Is there a parameter when building OpenMPI about packet size ??

Regards,

> From: Timur Magomedov (timur.magomedov_at_[hidden])
> Date: 2010-03-01 05:55:44
>
> Hello.
> It looks like you allocate memory in every loop iteration on process #0
> and doesn't free it so malloc fails on some iteration.

--

-- 
============================================
   M. TRINH Minh Hieu
   CEA, IBEB, SBTN/LIRM,
   F-30207 Bagnols-sur-Cèze, FRANCE
============================================
Attachment (hetero.c): text/x-csrc, 1504 bytes
_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
Ralph Castain | 1 Mar 2010 15:41
Favicon
Gravatar

Re: [OMPI users] Number of processes and spawn

http://www.open-mpi.org/nightly/trunk/

I'm not sure this patch will solve your problem, but it is worth a try.


On Mar 1, 2010, at 3:51 AM, Federico Golfrè Andreasi wrote:

Ok, thank you !

where can I found instructions for download the developer's copy of OpenMPI, if it is possibile?

I'd like to test it just to be sure that the problem is solved, with that patch.

Can you let me know where that patch is available?

Thank you very much,

Federico




2010/2/27 Ralph Castain <rhc <at> open-mpi.org>
Okay, thanks. It's the same problem as the other person encountered. Basically, it looks to OMPI as if you are launching > 128 independent app contexts, and our arrays were limited to 128.

He has provided a patch that I'll review (couple of things I'd rather change) and then apply to our developer's trunk. I would expect it to migrate over to the 1.4 release series at some point (can't guarantee which one).


On Feb 27, 2010, at 6:47 AM, Federico Golfrè Andreasi wrote:

Hi,

the program is executed as one application on 129 cpus defined by the hostfile.
Than rank 0, inside the code, execute another program with 129 cpus, with a one-to-one relation, rank0 of the spawined process runs on the same host of rank0 of the spawning one and so on...
Excuting the spawning program does not give any problem,
but in the moment of spawning (with more than 128 cpus) it holds.

Thank you!

Federico




2010/2/27 Ralph Castain <rhc <at> open-mpi.org>
Since another user was doing something that caused a similar problem, perhaps we are missing a key piece of info here. Are you launching one app_context across 128 nodes? Or are you launching 128 app_contexts, each on a separate node?


On Feb 26, 2010, at 10:23 AM, Federico Golfrè Andreasi wrote:

I'm doing some tests and it seems that is not able to do a spawn multiple with more than 128 nodes.

It just hold, with no error message.

What do you think? What can I try to understand the problem.

Thanks,

Federico




2010/2/26 Ralph Castain <rhc <at> open-mpi.org>
No known limitations of which we are aware...the variables are all set to int32_t, so INT32_MAX would be the only limit I can imagine. In which case, you'll run out of memory long before you hit it.


2010/2/26 Federico Golfrè Andreasi <federico.golfre <at> gmail.com>
HI !

have you ever did some analysis to understand if there is a limitation in the number of nodes usable with OpenMPI-v1.4 ?
Using also the functions MPI_Comm_spawn o MPI_Comm_spawn_multiple.

Thanks,
   Federico

_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users <at> open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
David Turner | 1 Mar 2010 16:41
Favicon

Re: [OMPI users] sm btl choices

On 3/1/10 1:51 AM, Ralph Castain wrote:
> Which version of OMPI are you using? We know that the 1.2 series was unreliable about removing the session
directories, but 1.3 and above appear to be quite good about it. If you are having problems with the 1.3 or
1.4 series, I would definitely like to know about it.

Oops; sorry!  OMPI 1.4.1, compiled with PGI 10.0 compilers,
running on Scientific Linux 5.4, ofed 1.4.2.

The session directories are *frequently* left behind.  I have
not really tried to characterize under what circumstances they
are removed. But please confirm:  they *should* be removed by
OMPI.

> When I was at LANL, I ran a number of tests in exactly this configuration. While the sm btl did provide some
performance advantage, it wasn't very much (the bandwidth was only about 10% greater, and the latency
wasn't all that different either). I set the default configuration for users to include sm as 10% isn't
something to sneer at, but you could disable it without an enormous impact.

I'd prefer to provide as much performance as possible, also.

> Another option would be to run an epilog that hammers the session directory. That's what LANL does, even
though we didn't see much trouble with cleanup starting with the 1.3 series (still have a bunch of users
stuck on 1.2). Depending on what environment you are running, you might contact folks there and get a copy
of their epilog script.

Yes, we are already planning our prologues and epilogues, just
haven't implemented them yet.  Even if I can find and fix a
reason why OMPI is currently not doing this, we will probably
do it an epilogue anyway.

Thanks for your help!

> On Mar 1, 2010, at 1:42 AM, David Turner wrote:
> 
>> Hi all,
>>
>> Running on a large cluster of 8-core nodes.  I understand
>> that the SM BTL is a "good thing".  But I'm curious about
>> its use of memory-mapped files.  I believe these files will
>> be in $TMPDIR, which defaults to /tmp.
>>
>> In our cluster, the compute nodes are stateless, so /tmp
>> is actually in RAM.  Keeping memory-mapped "files" in
>> memory seems kind of circular, although I know little
>> about these things.  A bigger problem is that it appears
>> OMPI does not remove the files upon completion.
>>
>> Another option is to redefine $TMPDIR to point to a
>> "real" file system.  In our cluster, all the available
>> file systems are accessed over the IB fabric.  So it
>> seems that there will be IB traffic, even though the
>> point of the SM BTL is to avoid this traffic.
>>
>> Given the above two constraints, might it just be
>> better to disable the SM BTL entirely, and use the
>> IB BTL even within a node?  Of course, the "self"
>> BTL should still be used if appropriate.
>>
>> Any thoughts clarifying these issues would be
>> greatly appreciated.  Thanks!
>>
>> -- 
>> Best regards,
>>
>> David Turner
>> User Services Group        email: dpturner <at> lbl.gov
>> NERSC Division             phone: (510) 486-4027
>> Lawrence Berkeley Lab        fax: (510) 486-4316
>> _______________________________________________
>> users mailing list
>> users <at> open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> users <at> open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--

-- 
Best regards,

David Turner
User Services Group        email: dpturner <at> lbl.gov
NERSC Division             phone: (510) 486-4027
Lawrence Berkeley Lab        fax: (510) 486-4316
Ralph Castain | 1 Mar 2010 17:51
Favicon
Gravatar

Re: [OMPI users] sm btl choices


On Mar 1, 2010, at 8:41 AM, David Turner wrote:

> On 3/1/10 1:51 AM, Ralph Castain wrote:
>> Which version of OMPI are you using? We know that the 1.2 series was unreliable about removing the session
directories, but 1.3 and above appear to be quite good about it. If you are having problems with the 1.3 or
1.4 series, I would definitely like to know about it.
> 
> Oops; sorry!  OMPI 1.4.1, compiled with PGI 10.0 compilers,
> running on Scientific Linux 5.4, ofed 1.4.2.
> 
> The session directories are *frequently* left behind.  I have
> not really tried to characterize under what circumstances they
> are removed. But please confirm:  they *should* be removed by
> OMPI.

Most definitely - they should always be removed by OMPI. This is the first report we have had of them -not-
being removed in the 1.4 series, so it is disturbing.

What environment are you running under? Does this happen under normal termination, or under abnormal
failures (the more you can tell us, the better)?

> 
>> When I was at LANL, I ran a number of tests in exactly this configuration. While the sm btl did provide some
performance advantage, it wasn't very much (the bandwidth was only about 10% greater, and the latency
wasn't all that different either). I set the default configuration for users to include sm as 10% isn't
something to sneer at, but you could disable it without an enormous impact.
> 
> I'd prefer to provide as much performance as possible, also.
> 
>> Another option would be to run an epilog that hammers the session directory. That's what LANL does, even
though we didn't see much trouble with cleanup starting with the 1.3 series (still have a bunch of users
stuck on 1.2). Depending on what environment you are running, you might contact folks there and get a copy
of their epilog script.
> 
> Yes, we are already planning our prologues and epilogues, just
> haven't implemented them yet.  Even if I can find and fix a
> reason why OMPI is currently not doing this, we will probably
> do it an epilogue anyway.
> 
> Thanks for your help!
> 
>> On Mar 1, 2010, at 1:42 AM, David Turner wrote:
>>> Hi all,
>>> 
>>> Running on a large cluster of 8-core nodes.  I understand
>>> that the SM BTL is a "good thing".  But I'm curious about
>>> its use of memory-mapped files.  I believe these files will
>>> be in $TMPDIR, which defaults to /tmp.
>>> 
>>> In our cluster, the compute nodes are stateless, so /tmp
>>> is actually in RAM.  Keeping memory-mapped "files" in
>>> memory seems kind of circular, although I know little
>>> about these things.  A bigger problem is that it appears
>>> OMPI does not remove the files upon completion.
>>> 
>>> Another option is to redefine $TMPDIR to point to a
>>> "real" file system.  In our cluster, all the available
>>> file systems are accessed over the IB fabric.  So it
>>> seems that there will be IB traffic, even though the
>>> point of the SM BTL is to avoid this traffic.
>>> 
>>> Given the above two constraints, might it just be
>>> better to disable the SM BTL entirely, and use the
>>> IB BTL even within a node?  Of course, the "self"
>>> BTL should still be used if appropriate.
>>> 
>>> Any thoughts clarifying these issues would be
>>> greatly appreciated.  Thanks!
>>> 
>>> -- 
>>> Best regards,
>>> 
>>> David Turner
>>> User Services Group        email: dpturner <at> lbl.gov
>>> NERSC Division             phone: (510) 486-4027
>>> Lawrence Berkeley Lab        fax: (510) 486-4316
>>> _______________________________________________
>>> users mailing list
>>> users <at> open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users <at> open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Best regards,
> 
> David Turner
> User Services Group        email: dpturner <at> lbl.gov
> NERSC Division             phone: (510) 486-4027
> Lawrence Berkeley Lab        fax: (510) 486-4316
> _______________________________________________
> users mailing list
> users <at> open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
David Turner | 1 Mar 2010 18:04
Favicon

Re: [OMPI users] sm btl choices

Hi Ralph,

> Which version of OMPI are you using? We know that the 1.2 series was unreliable about removing the session
directories, but 1.3 and above appear to be quite good about it. If you are having problems with the 1.3 or
1.4 series, I would definitely like to know about it.
> 
> When I was at LANL, I ran a number of tests in exactly this configuration. While the sm btl did provide some
performance advantage, it wasn't very much (the bandwidth was only about 10% greater, and the latency
wasn't all that different either). I set the default configuration for users to include sm as 10% isn't
something to sneer at, but you could disable it without an enormous impact.

I realize I have another question about this.  When you say "exactly"
this configuration, do you mean the mmap files were backed to /tmp
via ramdisk, or to a remote file system over the communications fabric?

We have historically redefined TMPDIR to point somewhere other than
/tmp, and have told our users *never* to use /tmp (if possible).
I suppose that if OMPI cleans up after itself, and we use a
prologue/epilogue, and regular scrubbing, we can keep /tmp under
control.

> Another option would be to run an epilog that hammers the session directory. That's what LANL does, even
though we didn't see much trouble with cleanup starting with the 1.3 series (still have a bunch of users
stuck on 1.2). Depending on what environment you are running, you might contact folks there and get a copy
of their epilog script.
> 
> 
> On Mar 1, 2010, at 1:42 AM, David Turner wrote:
> 
>> Hi all,
>>
>> Running on a large cluster of 8-core nodes.  I understand
>> that the SM BTL is a "good thing".  But I'm curious about
>> its use of memory-mapped files.  I believe these files will
>> be in $TMPDIR, which defaults to /tmp.
>>
>> In our cluster, the compute nodes are stateless, so /tmp
>> is actually in RAM.  Keeping memory-mapped "files" in
>> memory seems kind of circular, although I know little
>> about these things.  A bigger problem is that it appears
>> OMPI does not remove the files upon completion.
>>
>> Another option is to redefine $TMPDIR to point to a
>> "real" file system.  In our cluster, all the available
>> file systems are accessed over the IB fabric.  So it
>> seems that there will be IB traffic, even though the
>> point of the SM BTL is to avoid this traffic.
>>
>> Given the above two constraints, might it just be
>> better to disable the SM BTL entirely, and use the
>> IB BTL even within a node?  Of course, the "self"
>> BTL should still be used if appropriate.
>>
>> Any thoughts clarifying these issues would be
>> greatly appreciated.  Thanks!
>>
>> -- 
>> Best regards,
>>
>> David Turner
>> User Services Group        email: dpturner <at> lbl.gov
>> NERSC Division             phone: (510) 486-4027
>> Lawrence Berkeley Lab        fax: (510) 486-4316
>> _______________________________________________
>> users mailing list
>> users <at> open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> users <at> open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--

-- 
Best regards,

David Turner
User Services Group        email: dpturner <at> lbl.gov
NERSC Division             phone: (510) 486-4027
Lawrence Berkeley Lab        fax: (510) 486-4316

Gmane