1
1
openmpi/orte/mca
Ralph Castain 4bd25f587c Begin handling the case of lost connections by having the OOB report it to the errmgr instead of the routed framework. Add an "app" component to t
he errmgr framework so that it can decide how to respond - which for now at least is just to check for lifeline and abort if so.

Add a new error constant to indicate that the error is "unrecoverable" so the oob can know it needs to abort.

This commit was SVN r23112.
2010-05-11 00:34:12 +00:00
..
db Add a new test for the db framework, fix some minor bugs in the daemon module 2010-05-04 02:38:11 +00:00
errmgr Begin handling the case of lost connections by having the OOB report it to the errmgr instead of the routed framework. Add an "app" component to t 2010-05-11 00:34:12 +00:00
ess Begin handling the case of lost connections by having the OOB report it to the errmgr instead of the routed framework. Add an "app" component to t 2010-05-11 00:34:12 +00:00
filem - Replace combinations of 2009-08-20 11:42:18 +00:00
grpcomm Remove an errant '$' in the configure.m4 files. Was causing problems with configure. 2010-03-12 20:08:22 +00:00
iof IOF components should not assume they will be selected when queried - thus, they should not perform init functions until after selection. Create init/finalize entry points for that purpose, and have select init the module after it has been selected. 2010-04-16 18:51:27 +00:00
notifier Revamp the errmgr framework to provide a greater range of optional behaviors, including different behaviors for daemons, and remove several looping messages across the code base: 2010-04-23 04:44:41 +00:00
odls Little more cleanup on paffinity. Provide a specific error code for affinity not supported so we can better report the problem. Move the error reporting to orterun so we only get one error message. Update the darwin paffinity module to return the correct new error codes. 2010-05-07 14:04:55 +00:00
oob Begin handling the case of lost connections by having the OOB report it to the errmgr instead of the routed framework. Add an "app" component to t 2010-05-11 00:34:12 +00:00
plm Fix "make dist" breakage 2010-05-06 18:47:20 +00:00
ras IBM has approved the release of the LoadLeveler sample code under the 2010-04-08 19:41:44 +00:00
rmaps Continue developing support for distributed virtual machines - minor changes to ensure correct jobid gets used and that dvm's can communicate with tools 2010-04-12 22:33:09 +00:00
rmcast Add some contributed examples of how to start and configure the spread library. Do a little more cleanup on the spread module, and ensure that it isn't selected if spread isn't running. 2010-05-04 23:44:00 +00:00
rml Create a new "heartbeat" module in the sensor framework and move the plm_base heartbeat code there. Add new proc and job states for heartbeat_failed. Remove the "heartbeat" cmd line option for orted as this is now done automatically if the --enable-heartbeat configure option is set. 2010-05-05 00:48:43 +00:00
routed Revamp the errmgr framework to provide a greater range of optional behaviors, including different behaviors for daemons, and remove several looping messages across the code base: 2010-04-23 04:44:41 +00:00
sensor Create a new "heartbeat" module in the sensor framework and move the plm_base heartbeat code there. Add new proc and job states for heartbeat_failed. Remove the "heartbeat" cmd line option for orted as this is now done automatically if the --enable-heartbeat configure option is set. 2010-05-05 00:48:43 +00:00
snapc r22885 missed a few symbol updates when it changed ompi_want_ft to opal_want_ft 2010-03-30 16:47:39 +00:00