1
1
openmpi/orte/mca
Ralph Castain 319758e3e0 Restore process recovery for procs local to mpirun (first step towards restoring full capability). Define three new MCA params:
1. orte_enable_recovery - default recovery policy, can be overridden on a per-job basis

2. orte_max_local_restarts - default max number of local restarts, can be overridden

3. orte_max_global_restarts - default max number of relocates, can be overridden

Implement the restart_proc API for the ODLS framework, reorganize the default fns a little to avoid copying code.

This commit was SVN r23057.
2010-04-28 04:06:57 +00:00
..
errmgr Restore process recovery for procs local to mpirun (first step towards restoring full capability). Define three new MCA params: 2010-04-28 04:06:57 +00:00
ess After hours spent chasing the stupid "abort" file, it became clear that we were always going to be plagued by that idiot contraption when trying to be good citizens and properly cleanup. So get rid of it by instead doing a messaging handshake with the local daemon. 2010-04-27 03:39:32 +00:00
filem - Replace combinations of 2009-08-20 11:42:18 +00:00
grpcomm Remove an errant '$' in the configure.m4 files. Was causing problems with configure. 2010-03-12 20:08:22 +00:00
iof IOF components should not assume they will be selected when queried - thus, they should not perform init functions until after selection. Create init/finalize entry points for that purpose, and have select init the module after it has been selected. 2010-04-16 18:51:27 +00:00
notifier Revamp the errmgr framework to provide a greater range of optional behaviors, including different behaviors for daemons, and remove several looping messages across the code base: 2010-04-23 04:44:41 +00:00
odls Restore process recovery for procs local to mpirun (first step towards restoring full capability). Define three new MCA params: 2010-04-28 04:06:57 +00:00
oob ErrMgr Framework redesign to better support fault tolerance development activities. 2010-03-23 21:28:02 +00:00
plm Restore process recovery for procs local to mpirun (first step towards restoring full capability). Define three new MCA params: 2010-04-28 04:06:57 +00:00
ras IBM has approved the release of the LoadLeveler sample code under the 2010-04-08 19:41:44 +00:00
rmaps Continue developing support for distributed virtual machines - minor changes to ensure correct jobid gets used and that dvm's can communicate with tools 2010-04-12 22:33:09 +00:00
rmcast Additional diag output 2010-04-16 14:48:37 +00:00
rml After hours spent chasing the stupid "abort" file, it became clear that we were always going to be plagued by that idiot contraption when trying to be good citizens and properly cleanup. So get rid of it by instead doing a messaging handshake with the local daemon. 2010-04-27 03:39:32 +00:00
routed Revamp the errmgr framework to provide a greater range of optional behaviors, including different behaviors for daemons, and remove several looping messages across the code base: 2010-04-23 04:44:41 +00:00
sensor Add a sensor framework to ORTE that monitors applications and notifies the errmgr when they exceed specified boundaries. Two modules are included here: 2010-04-26 22:15:57 +00:00
snapc r22885 missed a few symbol updates when it changed ompi_want_ft to opal_want_ft 2010-03-30 16:47:39 +00:00
state Enable these to build so others can more easily begin implementation 2010-04-23 22:46:33 +00:00