1
1
openmpi/orte/mca/plm/base
2008-06-03 14:23:04 +00:00
..
base.h Cleanup recursions in ORTE caused by processing recv'd messages that can cause the system to take action resulting in receipt of another message. 2008-02-28 19:58:32 +00:00
help-plm-base.txt Cleanly handle the failed start of an orted, or its unexpected failure after start. This commit will allow mpirun to exit cleanly when this occurs, and does a best-effort attempt to cleanup the mess. However, it still has two unresolved issues that need to be eventually addressed: 2008-05-29 13:38:27 +00:00
Makefile.am Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2*heartrate, declares orted failure if not seen in last 2*heartrate time. 2008-06-02 21:46:34 +00:00
plm_base_close.c This commit represents a bunch of work on a Mercurial side branch. As 2008-05-13 20:00:55 +00:00
plm_base_heartbeat.c Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2*heartrate, declares orted failure if not seen in last 2*heartrate time. 2008-06-02 21:46:34 +00:00
plm_base_jobid.c This commit represents a bunch of work on a Mercurial side branch. As 2008-05-13 20:00:55 +00:00
plm_base_launch_support.c Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2*heartrate, declares orted failure if not seen in last 2*heartrate time. 2008-06-02 21:46:34 +00:00
plm_base_open.c This commit represents a bunch of work on a Mercurial side branch. As 2008-05-13 20:00:55 +00:00
plm_base_orted_cmds.c Fix single-node operations so that the HNP correctly exits when the job completes 2008-06-03 14:23:04 +00:00
plm_base_proxy.c This commit represents a bunch of work on a Mercurial side branch. As 2008-05-13 20:00:55 +00:00
plm_base_receive.c Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2*heartrate, declares orted failure if not seen in last 2*heartrate time. 2008-06-02 21:46:34 +00:00
plm_base_select.c This commit represents a bunch of work on a Mercurial side branch. As 2008-05-13 20:00:55 +00:00
plm_private.h Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2*heartrate, declares orted failure if not seen in last 2*heartrate time. 2008-06-02 21:46:34 +00:00