1
1
openmpi/orte/tools/orterun
Ralph Castain b456fb2d42 Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2*heartrate, declares orted failure if not seen in last 2*heartrate time.
Also detect orted failed-to-start by setting timeout on launch. Currently only used in TM launcher.

Neither detection is enabled by default, but are only active if heartrate is set and/or launch timeout is set. Exception for SLURM as orted failure is always detected and reported.

More info to come on devel list.

This commit was SVN r18555.
2008-06-02 21:46:34 +00:00
..
help-orterun.txt Bring some sanity to the exit code returned by mpirun. Ensure that we provide a non-zero code if something goes wrong, including someone exiting after calling mpi_init without calling mpi_finalize. 2008-03-19 19:00:51 +00:00
main.c Update the copyright notices for IU and UTK. 2005-11-05 19:57:48 +00:00
Makefile.am Per recent off-list discussions about the build system, I have done 2008-03-22 02:04:05 +00:00
orterun.1 Formatting fixes from Peter Breitenlohner. 2008-01-18 23:21:31 +00:00
orterun.c Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2*heartrate, declares orted failure if not seen in last 2*heartrate time. 2008-06-02 21:46:34 +00:00
orterun.h Add the -xml option to mpirun to indicate that xml output is desired 2008-05-29 14:11:31 +00:00
totalview.c This commit represents a bunch of work on a Mercurial side branch. As 2008-05-13 20:00:55 +00:00
totalview.h Be nice with parallel debugger, export this required symbol. 2008-02-28 05:59:07 +00:00