Ralph Castain
b475421c16
As promised, rationalize the rsh support. Remove rshbase and the base rsh support, centralizing all rsh support into the rsh component. Remove the "slave" launch support as that experiment is complete. Fix tree spawn and make that the default method for rsh launch, turning it "off" for qrsh as that system does not support tree spawn.
...
This commit was SVN r25507.
2011-11-26 02:33:05 +00:00
Ralph Castain
30fb002524
Take the first small step towards rationalizing rsh support. Create a new "rshbase" component that contains a simple rsh module - no tree spawn, uses all the base functions for launch support. Extend the base rsh support functions to include those functions in common across all rsh modules.
...
Only a minor change made to the current rsh module to avoid a naming conflict. Otherwise, left it alone to avoid creating conflicts with other external work. The current rsh module remains the default for rsh/ssh support, and continues to contain the support for SGE and Loadleveler.
This commit was SVN r24593.
2011-03-30 01:15:07 +00:00
Jeff Squyres
477201e161
Fix "make dist" breakage
...
This commit was SVN r23105.
2010-05-06 18:47:20 +00:00
Ralph Castain
2ff1ae13e1
Create a new "heartbeat" module in the sensor framework and move the plm_base heartbeat code there. Add new proc and job states for heartbeat_failed. Remove the "heartbeat" cmd line option for orted as this is now done automatically if the --enable-heartbeat configure option is set.
...
This commit was SVN r23102.
2010-05-05 00:48:43 +00:00
Shiqing Fan
3d4e0472d6
Add windows support files into the tarball, including .windows, CMakeLists.txt files, and CMake modules. Thanks to Jeff for testing it on Linux.
...
This commit was SVN r21069.
2009-04-24 16:39:33 +00:00
Ralph Castain
f0af389910
Enable comm_spawn of slave processes, currently only active for the rsh, slurm, and tm environments. Establish support for local rsh environments in the plm/base so that rsh of local slaves can be done by any environment that supports it. Create new orte_rsh_agent param so users can specify rsh agent from outside of rsh plm, and sym link that to the old plm_rsh_agent and pls_rsh_agent options.
...
Modify the orte-bootproxy to pass prefix for the remote slave to support hetero/hybrid scenarios
This commit was SVN r20492.
2009-02-09 20:44:44 +00:00
Ralph Castain
0532d799d6
Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm.
...
Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed.
This commit was SVN r18664.
2008-06-18 03:15:56 +00:00
Ralph Castain
b456fb2d42
Upgrade the node/orted failure detection code to cover all environments. Use the native environment's capabilities where possible - e.g., SLURM detects orted failure and can report it. Elsewhere, use a heartbeat system to detect orted failure - e.g., for TM and rsh. Heart rate is set via mca param. The HNP checks for callback every 2*heartrate, declares orted failure if not seen in last 2*heartrate time.
...
Also detect orted failed-to-start by setting timeout on launch. Currently only used in TM launcher.
Neither detection is enabled by default, but are only active if heartrate is set and/or launch timeout is set. Exception for SLURM as orted failure is always detected and reported.
More info to come on devel list.
This commit was SVN r18555.
2008-06-02 21:46:34 +00:00
Ralph Castain
d70e2e8c2b
Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
...
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer
This commit was SVN r17632.
2008-02-28 01:57:57 +00:00