1
1
openmpi/orte/mca/plm/base
Ralph Castain b65eb54ea2 Cut out a new iof pull - that capability isn't ready yet for the trunk, but will be coming shortly
Thanks to Pak for letting me know...

This commit was SVN r18614.
2008-06-06 21:24:15 +00:00
..
base.h Cleanup recursions in ORTE caused by processing recv'd messages that can cause the system to take action resulting in receipt of another message. 2008-02-28 19:58:32 +00:00
help-plm-base.txt Cleanly handle the failed start of an orted, or its unexpected failure after start. This commit will allow mpirun to exit cleanly when this occurs, and does a best-effort attempt to cleanup the mess. However, it still has two unresolved issues that need to be eventually addressed: 2008-05-29 13:38:27 +00:00
Makefile.am Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2*heartrate, declares orted failure if not seen in last 2*heartrate time. 2008-06-02 21:46:34 +00:00
plm_base_close.c This commit represents a bunch of work on a Mercurial side branch. As 2008-05-13 20:00:55 +00:00
plm_base_heartbeat.c Fix a potential, albeit perhaps esoteric, race condition that can occur for fast HNP's, slow orteds, and fast apps. Under those conditions, it is possible for the orted to be caught in its original send of contact info back to the HNP, and thus for the progress stack never to recover back to a high level. In those circumstances, the orted can "hang" when trying to exit. 2008-06-06 19:36:27 +00:00
plm_base_jobid.c This commit represents a bunch of work on a Mercurial side branch. As 2008-05-13 20:00:55 +00:00
plm_base_launch_support.c Cut out a new iof pull - that capability isn't ready yet for the trunk, but will be coming shortly 2008-06-06 21:24:15 +00:00
plm_base_open.c Remove the tags from orte_output_open and the filtering operation from orte_output - this will be handled differently to improve the XML output interface 2008-06-03 14:24:01 +00:00
plm_base_orted_cmds.c Fix single-node operations so that the HNP correctly exits when the job completes 2008-06-03 14:23:04 +00:00
plm_base_proxy.c This commit represents a bunch of work on a Mercurial side branch. As 2008-05-13 20:00:55 +00:00
plm_base_receive.c Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2*heartrate, declares orted failure if not seen in last 2*heartrate time. 2008-06-02 21:46:34 +00:00
plm_base_select.c Fix some Coverity 'Event set_but_not_used' highlights. 2008-06-06 14:38:41 +00:00
plm_private.h Fix a potential, albeit perhaps esoteric, race condition that can occur for fast HNP's, slow orteds, and fast apps. Under those conditions, it is possible for the orted to be caught in its original send of contact info back to the HNP, and thus for the progress stack never to recover back to a high level. In those circumstances, the orted can "hang" when trying to exit. 2008-06-06 19:36:27 +00:00