Ralph Castain
bd8b4f7f1e
Sorry for mid-day commit, but I had promised on the call to do this upon my return.
...
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.
Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.
This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Ralph Castain
3e7ab1212a
Since this has come up a number of times, have the rsh launcher add MCA params from the environment by default. If it finds that the cmd line is too long, error out with a message directing the user to set a param to ignore the environmental MCA params.
...
This commit was SVN r25581.
2011-12-07 01:24:36 +00:00
Ralph Castain
70ab8422b1
Per the internal comments, the delay between ssh invocations is not there for debugging purposes, but rather to allow for NIS authetication times. We have seen that problem in the past, so don't just do the delay when we are debugging - use the delay for the intended purpose. Also, allow for shorter than second-level delays as it doesn't always have to be so long.
...
This commit was SVN r25510.
2011-11-27 01:49:42 +00:00
Ralph Castain
b475421c16
As promised, rationalize the rsh support. Remove rshbase and the base rsh support, centralizing all rsh support into the rsh component. Remove the "slave" launch support as that experiment is complete. Fix tree spawn and make that the default method for rsh launch, turning it "off" for qrsh as that system does not support tree spawn.
...
This commit was SVN r25507.
2011-11-26 02:33:05 +00:00
Nysal Jan
c8c6b0edab
Improve LoadLeveler integration with Open MPI. Add support for LL native rsh agent - llspawn
...
This commit was SVN r24579.
2011-03-29 07:46:59 +00:00
Ralph Castain
b97f885c00
Restore the original API to terminate individual processes instead of the entire job. This was originally removed as we didn't at that time know how to take advantage of it. Some of us are now working on proactive resilience methods that move procs prior to node failure, so this is now a required API. Modify the odls, plm, and orted functions to support this new functionality.
...
Continue work on the resilient mapper, completing support for fault groups.
This commit was SVN r21639.
2009-07-13 02:29:17 +00:00
George Bosilca
7339530061
Remove the prototype of a non-existant function.
...
This commit was SVN r21500.
2009-06-23 19:50:23 +00:00
Ralph Castain
4e0223a638
Add the ability to directly launch procs via rsh/ssh. Collect common functions in plm/base. Create a new global param to set assume_same_shell, alias'd back to plm_rsh_assume_same_shell (not deprecated).
...
This commit was SVN r21328.
2009-05-30 01:10:25 +00:00
Rolf vandeVaart
515b99b357
Under SGE, the orted should not daemonize by default.
...
Also create mca parameter to force daemonization (previous
behavior) which might be needed on larger clusters or
to make use of the -notify flag with qsub.
This fixes trac:1783.
This commit was SVN r20582.
The following Trac tickets were found above:
Ticket 1783 --> https://svn.open-mpi.org/trac/ompi/ticket/1783
2009-02-18 18:02:38 +00:00
Ralph Castain
f0af389910
Enable comm_spawn of slave processes, currently only active for the rsh, slurm, and tm environments. Establish support for local rsh environments in the plm/base so that rsh of local slaves can be done by any environment that supports it. Create new orte_rsh_agent param so users can specify rsh agent from outside of rsh plm, and sym link that to the old plm_rsh_agent and pls_rsh_agent options.
...
Modify the orte-bootproxy to pass prefix for the remote slave to support hetero/hybrid scenarios
This commit was SVN r20492.
2009-02-09 20:44:44 +00:00
Ralph Castain
21cd4b9df8
Add pls_rsh_agent synonym to the PLM rsh component
...
This commit was SVN r19119.
2008-08-01 20:15:42 +00:00
Pak Lui
7b3d7dcac4
This commit closes trac:1300.
...
This commit was SVN r18473.
The following Trac tickets were found above:
Ticket 1300 --> https://svn.open-mpi.org/trac/ompi/ticket/1300
2008-05-21 22:35:04 +00:00
Josh Hursey
9971bc9d95
Merge in the mca_base_select changes per RFC:
...
http://www.open-mpi.org/community/lists/devel/2008/04/3779.php
{{{
svn merge -r 18276:18380 https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play .
}}}
Any components not in the trunk, but in one of the effected frameworks *must* be
updated. Contact the list, look at the RFC, or look at the diff for how to do this.
Sorry for the early commit of this, but I wanted to get it in today (per RFC) and
didn't know if I would have a chance later today.
This commit was SVN r18381.
2008-05-06 18:08:45 +00:00
Ralph Castain
7c7304466c
Add a binomial tree-based launch to ssh, turned "on" only when the plm_rsh_tree_spawned mca param is set to a non-zero value. This probably isn't a very optimized capability, but it does execute a tree-based launch that may scale better than linear at high node counts.
...
Add the daemon map capability to the ODLS to create and save a map of daemon vpid vs nodename from the launch message.
Cleanup a few places in the base plm launch support where we didn't adequately protect rml recv's from potentially executing sends.
This commit was SVN r18143.
2008-04-14 18:26:08 +00:00
Ralph Castain
d70e2e8c2b
Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately.
...
Remains to be tested to ensure everything came over cleanly, so please continue to withhold commits a little longer
This commit was SVN r17632.
2008-02-28 01:57:57 +00:00