1
1
openmpi/orte/runtime
Ralph Castain 319758e3e0 Restore process recovery for procs local to mpirun (first step towards restoring full capability). Define three new MCA params:
1. orte_enable_recovery - default recovery policy, can be overridden on a per-job basis

2. orte_max_local_restarts - default max number of local restarts, can be overridden

3. orte_max_global_restarts - default max number of relocates, can be overridden

Implement the restart_proc API for the ODLS framework, reorganize the default fns a little to avoid copying code.

This commit was SVN r23057.
2010-04-28 04:06:57 +00:00
..
data_type_support Fix a warning 2010-04-27 15:59:24 +00:00
help-orte-runtime.txt Reorder the nidmap encoding function. Add a check to make sure we don't write 2009-07-15 19:36:53 +00:00
Makefile.am Nysal noticed some repeated header files; removed. 2009-05-28 12:05:42 +00:00
orte_cr.c Make sure to initialize orte_process_info.proc_type properly on restart. Otherwise the application will have type 'NONE' instead of 'APP'. 2009-05-12 14:14:05 +00:00
orte_cr.h - As long as a header declares _DECLSPEC functionality 2009-03-17 01:45:19 +00:00
orte_data_server.c ... Delayed due to notifier commits earlier this day ... 2009-04-29 01:32:14 +00:00
orte_data_server.h Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00
orte_finalize.c allow trunk to compile on red storm 2009-04-08 20:53:54 +00:00
orte_globals.c Restore process recovery for procs local to mpirun (first step towards restoring full capability). Define three new MCA params: 2010-04-28 04:06:57 +00:00
orte_globals.h Restore process recovery for procs local to mpirun (first step towards restoring full capability). Define three new MCA params: 2010-04-28 04:06:57 +00:00
orte_init.c Revamp the errmgr framework to provide a greater range of optional behaviors, including different behaviors for daemons, and remove several looping messages across the code base: 2010-04-23 04:44:41 +00:00
orte_locks.c Some minor updates to the locking system changes. Remove obsolete locks. Ensure the trigger event objects do not get deconstructed until the very end to avoid possible problems due to race conditions. Route all orted abnormal term tests through the trigger. 2008-08-06 11:31:06 +00:00
orte_locks.h ... Delayed due to notifier commits earlier this day ... 2009-04-29 01:32:14 +00:00
orte_mca_params.c Restore process recovery for procs local to mpirun (first step towards restoring full capability). Define three new MCA params: 2010-04-28 04:06:57 +00:00
orte_wait.c Have the trigger event return the event itself in the callback function so it can be reset, if desired 2009-10-15 02:35:53 +00:00
orte_wait.h This is a very large change to rename several #define values from 2009-05-06 20:11:28 +00:00
runtime_internals.h Modify the accounting system to recycle jobids. Properly recover resources from nodes and jobs upon completion. Adjustments in several places were required to deal with sparsely populated job, node, and proc arrays as a result of this change. 2009-03-03 16:39:13 +00:00
runtime.h A patch from UTK to allow orte_init(), opal_init(), and associated 2009-12-04 00:51:15 +00:00