Ralph Castain
|
b97f885c00
|
Restore the original API to terminate individual processes instead of the entire job. This was originally removed as we didn't at that time know how to take advantage of it. Some of us are now working on proactive resilience methods that move procs prior to node failure, so this is now a required API. Modify the odls, plm, and orted functions to support this new functionality.
Continue work on the resilient mapper, completing support for fault groups.
This commit was SVN r21639.
|
2009-07-13 02:29:17 +00:00 |
|
George Bosilca
|
7339530061
|
Remove the prototype of a non-existant function.
This commit was SVN r21500.
|
2009-06-23 19:50:23 +00:00 |
|
Ralph Castain
|
0336460b0a
|
Continue implementation of resilient operations by supporting reuse of jobids for restarted procs. Ensure that restarted processes have valid node and local ranks, and that node rank values are passed to direct-launched processes.
This commit was SVN r21385.
|
2009-06-06 01:08:47 +00:00 |
|
Ralph Castain
|
a95731fc68
|
Minor update to let apps set their own component selections if desired, while preserving slave behavior
This commit was SVN r21332.
|
2009-05-30 20:42:23 +00:00 |
|
Ralph Castain
|
4e0223a638
|
Add the ability to directly launch procs via rsh/ssh. Collect common functions in plm/base. Create a new global param to set assume_same_shell, alias'd back to plm_rsh_assume_same_shell (not deprecated).
This commit was SVN r21328.
|
2009-05-30 01:10:25 +00:00 |
|