and George on these refinements):
* Rename the static OBJ initializer macro to be
OPAL_OBJ_STATIC_INIT(class)
* Ensure that all static OBJ initializations get a refcount of 1
(doesn't ''really'' matter, since they're static, it should never
get to the point where the OBJ is DESTRUCTed, but more correct
nonetheless)
* Add a "magic number" to the OBJ when compiling with debug support.
The magic number does some rudimentary support to ensure that
you're operating on a valid OBJ (and fails an assertion if you're
not). Check to ensure that the memory contains the magic number
when performing actions of OBJ's. Also remove the magic number
when DESTRUCTing OBJs, so that if, for example, an OBJ is
DESTRUCTed more than once, we'll fail the magic number assert.
This commit was SVN r13338.
The following SVN revision numbers were found above:
r13227 --> open-mpi/ompi@96030de97b
r13228 --> open-mpi/ompi@c2e9075d29
then we modify the argv, forcing the reallocation of the array. With luck
the saved pointer still have a meaning ... without execve return with error
14 (EFAULT).
This commit was SVN r13321.
The following SVN revision numbers were found above:
r12059 --> open-mpi/ompi@ae79894bad
1. add a "cancel_operation" API to the pls components that allows orterun to demand that an orted operation (e.g., terminate_job) be immediately cancelled and abandoned.
2. changes the pls orted commands from blocking to non-blocking. This allows us to interrupt those operations should an orted be non-responsive. The change also adds an orte_abort_timeout that limits how long orterun will automatically wait for the orteds to respond - if the terminate command, for example, doesn't see orted response within that time, then we printout an appropriate error message and just give up.
3. modifies orterun to allow multiple ctrl-c's to simply abort the program even if the orteds have not responded
4. does some cleanup on the orte-level mca params so that their implementation looks a lot more like that of ompi - makes it easier to maintain. This change also includes the definition of an orte_abort_timeout struct and associated MCA param (can't have too many!) so you can set the time after which orterun gives up on waiting for orteds to respond
This needs more testing before migrating to 1.2.
This commit was SVN r13304.
This fix includes two parts: (a) we now initialize the keyval pointer locations to NULL after the malloc, and (b) we now OBJ_NEW the keyvals prior to storing info in them.
BTW, in case anyone reads this and wonders why we don't just OBJ_NEW the keyvals in create_value, the reason is simply that some places in the code use static keyvals and simply assign those addresses into the value object's array. So not everyone wants to OBJ_NEW keyvals - by not forcing it here in create_value, we give the user the flexibility to do whatever they want.
This commit was SVN r13300.
- Make it so the SLURM ras can handle different nodelist configurations
- Some code cleanup and better/more informative error messages and error handling
This commit was SVN r13271.
The following Trac tickets were found above:
Ticket 801 --> https://svn.open-mpi.org/trac/ompi/ticket/801
then exec the "srun..." from there. But somewhere along the line, we
switched to having a copy of environ and modifying that. It looks
like we forgot to update the stuff for --prefix behavior. So this
commit fixes the setenv's for PATH and LD_LIBRARY_PATH to modify the
environ copy (not environ itself) so that the values properly get
passed down to the srun environment via execve().
This restores --prefix behavior in the SLURM pls.
This commit was SVN r13239.
function prototype lives. Without this, we get compile
warnings. In addition, for 64-bit Solaris, we get a
segmentation fault from orterun without this include.
This commit was SVN r13065.
the connect() timeout, so that we'll use that rather than our own timeout by
defualt. There timeout was set low for Big Red, but causes problems for very
large clusters, as there's no way to wire them up in 10 seconds most of the
time.
This commit was SVN r13062.
components that use configure.m4 for configuration or are always built.
The macro has not been needed since moving to configure types other than
configure.stub
Fixes trac:590
This commit was SVN r13031.
The following Trac tickets were found above:
Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590
I know it's just a technicality, but it is time to address such things rather than just letting them continue to propagate. :-)
This commit was SVN r12954.
This has now been corrected. The singleton startup will dutifully call the mapper framework so that the proper data storage locations get initialized. Unfortunately, we then had to instruct the RMAPS not to allocate a vpid range for this job - otherwise, it would make a mistake and think there were two processes in it. Hence, a change was required to RMAPS to tell it "map this job, but don't allocate a vpid range for it".
This change will need to migrate across to 1.2 after it "soaks" the appropriate time.
This commit was SVN r12952.
is allocated on a per comm_world instance, with the lowest rank
in comm_world on the given host creating and initializing the file,
and then notifying the remaining files via the OOB.
Reviewed: Ralph Castain, Brian Barrett
Addressing ticket #674.
This commit was SVN r12949.
rc (which is -1 or 4 if we hit this case) resulted in an odd error that a
signal killed the proc (instead of a startup error, as is reality).
Instead, use the W_EXITCODE macro (if available) to build up an exit
code that has an error code for exit status, but does not make it look
like the process died from a signal
This commit was SVN r12890.