1
1
openmpi/orte/mca/plm
Abhishek Kulkarni afbe3e99c6 * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with
(OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a
 SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns
 back the native error code.

* Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form
  (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to
  decode 'ret' to get the native error code.

This commit was SVN r23162.
2010-05-17 23:08:56 +00:00
..
alps Create a new "heartbeat" module in the sensor framework and move the plm_base heartbeat code there. Add new proc and job states for heartbeat_failed. Remove the "heartbeat" cmd line option for orted as this is now done automatically if the --enable-heartbeat configure option is set. 2010-05-05 00:48:43 +00:00
base * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with 2010-05-17 23:08:56 +00:00
bproc Courtesy of Ralph and Jeff: 2009-10-24 01:04:35 +00:00
ccp Update a few module configuration for Windows. 2010-05-05 12:22:04 +00:00
lsf Create a new "heartbeat" module in the sensor framework and move the plm_base heartbeat code there. Add new proc and job states for heartbeat_failed. Remove the "heartbeat" cmd line option for orted as this is now done automatically if the --enable-heartbeat configure option is set. 2010-05-05 00:48:43 +00:00
process Update a few module configuration for Windows. 2010-05-05 12:22:04 +00:00
rsh * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with 2010-05-17 23:08:56 +00:00
rshd * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with 2010-05-17 23:08:56 +00:00
slurm * Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with 2010-05-17 23:08:56 +00:00
submit Revamp the errmgr framework to provide a greater range of optional behaviors, including different behaviors for daemons, and remove several looping messages across the code base: 2010-04-23 04:44:41 +00:00
tm Cleanup the debugger daemon co-launch code and add an ability to test it. Implement ability to co-launch debugger daemons upon attach to a running job for jobs launched under rsh, slurm, and tm environments (others can easily be added if desired). 2010-05-14 18:44:49 +00:00
tmd Create a new "heartbeat" module in the sensor framework and move the plm_base heartbeat code there. Add new proc and job states for heartbeat_failed. Remove the "heartbeat" cmd line option for orted as this is now done automatically if the --enable-heartbeat configure option is set. 2010-05-05 00:48:43 +00:00
xgrid Create a new "heartbeat" module in the sensor framework and move the plm_base heartbeat code there. Add new proc and job states for heartbeat_failed. Remove the "heartbeat" cmd line option for orted as this is now done automatically if the --enable-heartbeat configure option is set. 2010-05-05 00:48:43 +00:00
Makefile.am Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00
plm_types.h Create a new "heartbeat" module in the sensor framework and move the plm_base heartbeat code there. Add new proc and job states for heartbeat_failed. Remove the "heartbeat" cmd line option for orted as this is now done automatically if the --enable-heartbeat configure option is set. 2010-05-05 00:48:43 +00:00
plm.h Restore the original API to terminate individual processes instead of the entire job. This was originally removed as we didn't at that time know how to take advantage of it. Some of us are now working on proactive resilience methods that move procs prior to node failure, so this is now a required API. Modify the odls, plm, and orted functions to support this new functionality. 2009-07-13 02:29:17 +00:00