initialized. For example, there is a period of time during
ompi_mpi_init when orte_initialized==true, but
ompi_mpi_initialized==false (and therefore communicators are not setup
yet, etc.).
This commit was SVN r17937.
Only one place used the user name field - session_dir, when formulating the name of the top-level directory. Accordingly, the code for getting the user's id has been moved to the session_dir code.
This commit was SVN r17926.
* Fix an error message to correctly display if we were before
MPI_INIT or after MPI_FINALIZE (refs trac:1243)
This commit was SVN r17873.
The following Trac tickets were found above:
Ticket 1243 --> https://svn.open-mpi.org/trac/ompi/ticket/1243
completed successfully, Bad Things(tm) could happen.
* Now we explicitly check orte_initialized (a new global in ORTE
indicating whether we are between orte_init() and orte_finalize()
or not), and if so, react accordingly.
* If ORTE is initialized, use orte_system_info.nodename; otherwise,
use gethostname().
* Add loop protection to ensure that ompi_mpi_abort() is not invoked
multiple times recursively.
This commit was SVN r13354.
- Fix some fpritnf's in ompi_mpi_abort() that incorrectly assumed that
we were always being invoked from MPI_ABORT (ompi_mpi_abort() may be
invoked from a bunch of different places)
- Also try to opal_backtrace_print() if opal_bactrace_buffer() is not
supported.
- Print a message in MPI_ABORT if we're aborting.
This commit was SVN r12998.
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
should die, according to the MPI standard. It's possible that the
ORTE layer may kill additional processes, but that's beyond our
control and seems to be allowed by the standard (ie, it might also
end up killing all the procs in all the jobs covered by the
communicator).
* update the stack trace printing code to use the framework rather
than calling execinfo directly, so that we should be able to get
stack traces on all the platforms we support stack tracing on
(if the user wants stack traces on abort, of course)
This commit was SVN r11753.
MPI_ABORT. From the ompi_info output:
MCA mpi: parameter "mpi_abort_delay" (current value: "0")
If nonzero, print out an identifying message when
MPI_ABORT is invoked (hostname, PID of the process
that called MPI_ABORT) and delay for that many seconds
before exiting (a negative delay value means to never
abort). This allows attaching of a debugger before
quitting the job.
MCA mpi: parameter "mpi_abort_print_stack" (current value: "0")
If nonzero, print out a stack trace when MPI_ABORT is
invoked
This commit was SVN r9487.
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
- If we have an NS error, don't return an error -- this function's
purpose is to abort :-)
- s/abort()/exit(1)/ so that we don't drop massive corefiles
This commit was SVN r7524.