- Make it so the SLURM ras can handle different nodelist configurations
- Some code cleanup and better/more informative error messages and error handling
This commit was SVN r13271.
The following Trac tickets were found above:
Ticket 801 --> https://svn.open-mpi.org/trac/ompi/ticket/801
then exec the "srun..." from there. But somewhere along the line, we
switched to having a copy of environ and modifying that. It looks
like we forgot to update the stuff for --prefix behavior. So this
commit fixes the setenv's for PATH and LD_LIBRARY_PATH to modify the
environ copy (not environ itself) so that the values properly get
passed down to the srun environment via execve().
This restores --prefix behavior in the SLURM pls.
This commit was SVN r13239.
memcpy() instead of assigning the struct's by value.
Fixes trac:739.
This commit was SVN r13081.
The following Trac tickets were found above:
Ticket 739 --> https://svn.open-mpi.org/trac/ompi/ticket/739
Sorry for the configure change -- hopefully it's early enough in the
morning that it won't affect people... (new approach won't have a
configure change).
Refs trac:739.
This commit was SVN r13080.
The following Trac tickets were found above:
Ticket 739 --> https://svn.open-mpi.org/trac/ompi/ticket/739
Let's minimize the disturbances and say that the configure system is right.
From now on it's OPAL_BOOL_STRUCT_COPY. This one is related to r13076 and
has to follow when r13076 goes in the 1.2.
This commit was SVN r13077.
The following SVN revision numbers were found above:
r13076 --> open-mpi/ompi@f0932a0701
been fixed in the 7.0 PGI series, but is unlikely to be fixed in the
6.2 series:
* Add a configure test looking for the bad behavior (the PGI compiler
chokes on C code where structs containing bool's are copied by
value)
* Set OMPI_BOOL_STRUCT_COPY to 1 if it's ok, 0 if it's not (i.e., PGI
6.2 series will have this value set to 0)
* In two places in the code base -- orte-clean and btl_openib_ini.h,
we have a struct that contains a bool that is copied by value. In
these two places, check OMPI_BOOL_STRUCT_COPY and if it's 1, use
the "int" type instead of "bool".
Fixes trac:739
This commit was SVN r13076.
The following Trac tickets were found above:
Ticket 739 --> https://svn.open-mpi.org/trac/ompi/ticket/739
function prototype lives. Without this, we get compile
warnings. In addition, for 64-bit Solaris, we get a
segmentation fault from orterun without this include.
This commit was SVN r13065.
the connect() timeout, so that we'll use that rather than our own timeout by
defualt. There timeout was set low for Big Red, but causes problems for very
large clusters, as there's no way to wire them up in 10 seconds most of the
time.
This commit was SVN r13062.
to effect the following:
* The first time the user hits ctrl-c, we go into the process of
killing the ORTE job (this is not new).
* While waiting for the job to actually terminate, if the user hits
ctrl-c a second time, we print a warning saying "Hey, I'm still
trying to kill the job. If you *really* want me to die
immediately, hit ctrl-c again within 1 second."
* If the user hits ctrl-c a within 1 second, orterun quits with a
warning about how the job may not have actually been killed.
Note that none of this logic won't really work until the second part
of the fix for #726 is also committed (i.e., make pls.terminate_job()
non-blocking). So I'm now throwing the ticket over to Ralph for the
second part of the fix...
Refs trac:726
This commit was SVN r13040.
The following Trac tickets were found above:
Ticket 726 --> https://svn.open-mpi.org/trac/ompi/ticket/726
components that use configure.m4 for configuration or are always built.
The macro has not been needed since moving to configure types other than
configure.stub
Fixes trac:590
This commit was SVN r13031.
The following Trac tickets were found above:
Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590
but remove them also. This current set of changes will affect
nothing as no one is making use of this ability. However, orte-clean
will be changed soon to utilize this new feature.
This commit was SVN r12996.
I know it's just a technicality, but it is time to address such things rather than just letting them continue to propagate. :-)
This commit was SVN r12954.
This has now been corrected. The singleton startup will dutifully call the mapper framework so that the proper data storage locations get initialized. Unfortunately, we then had to instruct the RMAPS not to allocate a vpid range for this job - otherwise, it would make a mistake and think there were two processes in it. Hence, a change was required to RMAPS to tell it "map this job, but don't allocate a vpid range for it".
This change will need to migrate across to 1.2 after it "soaks" the appropriate time.
This commit was SVN r12952.
is allocated on a per comm_world instance, with the lowest rank
in comm_world on the given host creating and initializing the file,
and then notifying the remaining files via the OOB.
Reviewed: Ralph Castain, Brian Barrett
Addressing ticket #674.
This commit was SVN r12949.
rc (which is -1 or 4 if we hit this case) resulted in an odd error that a
signal killed the proc (instead of a startup error, as is reality).
Instead, use the W_EXITCODE macro (if available) to build up an exit
code that has an error code for exit status, but does not make it look
like the process died from a signal
This commit was SVN r12890.