1
1
openmpi/orte/mca/plm/base
Ralph Castain c13cadc3c7 Refs trac:1255
This commit repairs the debugger initialization procedure. I am not closing the ticket, however, pending Jeff's review of how it interfaces to the ompi_debugger code he implemented. There were duplicate symbols being created in that code, but not used anywhere. I replaced them with the ORTE-created symbols instead. However, since they aren't used anywhere, I have no way of checking to ensure I didn't break something.

So the ticket can be checked by Jeff when he returns from vacation... :-)

This commit was SVN r18625.

The following Trac tickets were found above:
  Ticket 1255 --> https://svn.open-mpi.org/trac/ompi/ticket/1255
2008-06-09 20:34:14 +00:00
..
base.h Cleanup recursions in ORTE caused by processing recv'd messages that can cause the system to take action resulting in receipt of another message. 2008-02-28 19:58:32 +00:00
help-plm-base.txt Cleanly handle the failed start of an orted, or its unexpected failure after start. This commit will allow mpirun to exit cleanly when this occurs, and does a best-effort attempt to cleanup the mess. However, it still has two unresolved issues that need to be eventually addressed: 2008-05-29 13:38:27 +00:00
Makefile.am Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2*heartrate, declares orted failure if not seen in last 2*heartrate time. 2008-06-02 21:46:34 +00:00
plm_base_close.c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. 2008-06-09 14:53:58 +00:00
plm_base_heartbeat.c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. 2008-06-09 14:53:58 +00:00
plm_base_jobid.c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. 2008-06-09 14:53:58 +00:00
plm_base_launch_support.c Refs trac:1255 2008-06-09 20:34:14 +00:00
plm_base_open.c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. 2008-06-09 14:53:58 +00:00
plm_base_orted_cmds.c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. 2008-06-09 14:53:58 +00:00
plm_base_proxy.c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. 2008-06-09 14:53:58 +00:00
plm_base_receive.c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. 2008-06-09 14:53:58 +00:00
plm_base_select.c Effectively revert the orte_output system and return to direct use of opal_output at all levels. Retain the orte_show_help subsystem to allow aggregation of show_help messages at the HNP. 2008-06-09 14:53:58 +00:00
plm_private.h Fix a potential, albeit perhaps esoteric, race condition that can occur for fast HNP's, slow orteds, and fast apps. Under those conditions, it is possible for the orted to be caught in its original send of contact info back to the HNP, and thus for the progress stack never to recover back to a high level. In those circumstances, the orted can "hang" when trying to exit. 2008-06-06 19:36:27 +00:00