1
1
openmpi/orte/mca/plm/rsh
Ralph Castain 88f5217a12 Cleanup the debugger daemon co-launch code and add an ability to test it. Implement ability to co-launch debugger daemons upon attach to a running job for jobs launched under rsh, slurm, and tm environments (others can easily be added if desired).
Add new mca params to test:

orte_debugger_test_daemon: Name of the executable to be used to simulate a debugger colaunch
orte_debugger_test_attach: Test debugger colaunch after debugger attachment

To test co-launch at job start, just set the orte_debugger_test_daemon param.

To test co-launch upon attach:
set orte_debugger_test_daemon
set orte_debugger_test_attach=1
set orte_enable_debug_cospawn_while_running=1
set orte_debugger_check_rate=<N> - defines the number of seconds to wait before "checking" for a debugger attaching

Added a "debugger" program to orte/test/mpi that just spins to simulate a debugger daemon.

This commit was SVN r23144.
2010-05-14 18:44:49 +00:00
..
configure.m4 Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00
configure.params Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00
help-plm-rsh.txt Add some error output to the plm rsh 2008-09-10 01:59:49 +00:00
Makefile.am Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00
plm_rsh_component.c Add the ability to directly launch procs via rsh/ssh. Collect common functions in plm/base. Create a new global param to set assume_same_shell, alias'd back to plm_rsh_assume_same_shell (not deprecated). 2009-05-30 01:10:25 +00:00
plm_rsh_module.c Cleanup the debugger daemon co-launch code and add an ability to test it. Implement ability to co-launch debugger daemons upon attach to a running job for jobs launched under rsh, slurm, and tm environments (others can easily be added if desired). 2010-05-14 18:44:49 +00:00
plm_rsh.h Restore the original API to terminate individual processes instead of the entire job. This was originally removed as we didn't at that time know how to take advantage of it. Some of us are now working on proactive resilience methods that move procs prior to node failure, so this is now a required API. Modify the odls, plm, and orted functions to support this new functionality. 2009-07-13 02:29:17 +00:00