1
1
openmpi/orte/mca
Ralph Castain 3815bfbba6 Provide a better error message when the oob cannot send a message after exhausting retries, and then have the proc abort so the job doesn't just hang forever.
Since it could be a daemon that needs to abort, cleanup the abort sequence so the daemon can exit as cleanly as possible.

This commit was SVN r21361.
2009-06-02 23:57:12 +00:00
..
errmgr Allow CM's to select the default errmgr component. Add support for error function callbacks 2009-05-30 20:43:42 +00:00
ess Provide a better error message when the oob cannot send a message after exhausting retries, and then have the proc abort so the job doesn't just hang forever. 2009-06-02 23:57:12 +00:00
filem This is a very large change to rename several #define values from 2009-05-06 20:11:28 +00:00
grpcomm Take the next step towards fully utilizing static ports for the daemons to eliminate the initial "phone home" to mpirun by modifying the orted termination procedure to eliminate the need for a full barrier-like operation. Instead, we add a "onesided" barrier to the grpcomm framework API that releases the orted once it has completed its own contribution to the barrier - i.e., the orteds now exit as the "ack" message rolls up towards mpirun instead of sending the "ack" directly to mpirun. 2009-05-11 14:11:44 +00:00
iof Minor mod per Greg Watson, plus some cleanups to make George smile...or at least grimace a little less! :-) 2009-05-28 00:55:01 +00:00
notifier - Eliminate icc warning w/ regard to __attribute__((__format__)) on 2009-05-20 00:39:22 +00:00
odls This change does two things. First, do not emit error 2009-05-22 14:59:27 +00:00
oob Provide a better error message when the oob cannot send a message after exhausting retries, and then have the proc abort so the job doesn't just hang forever. 2009-06-02 23:57:12 +00:00
plm Provide a "progress meter" for launch that outputs progress as we are launching, especially on large jobs. Also, provide a timeout mechanism so that we cleanly abort if we don't get a response from the next daemon in a specified time. 2009-06-02 23:52:02 +00:00
ras Completely remove ltdl support for Windows build. 2009-05-05 18:59:13 +00:00
rmaps Add a resilient mapping capability - currently maps by fault groups (if provided), still need to add the remapping capability for failed procs. 2009-06-02 03:23:20 +00:00
rml - Similar to r21229, check for return code from 2009-05-14 00:36:51 +00:00
routed I'm (temporarily?) removing this entry because there's no .window file 2009-06-01 14:07:08 +00:00
snapc A small fix, the right one to use in orte. 2009-05-12 09:53:34 +00:00