* General TCP cleanup for OPAL / ORTE
* Simplifying the OOB by moving much of the logic into the RML
* Allowing the OOB RML component to do routing of messages
* Adding a component framework for handling routing tables
* Moving the xcast functionality from the OOB base to its own framework
Includes merge from tmp/bwb-oob-rml-merge revisions:
r15506, r15507, r15508, r15510, r15511, r15512, r15513
This commit was SVN r15528.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r15506
r15507
r15508
r15510
r15511
r15512
r15513
Cleanup ALL instances of output involving the printing of orte_process_name_t structures using the ORTE_NAME_ARGS macro so that the number of fields and type of data match. Replace those values with a new macro/function pair ORTE_NAME_PRINT that outputs a string (using the new thread safe data capability) so that any future changes to the printing of those structures can be accomplished with a change to a single point.
Note that I could not possibly find outputs that directly print the orte_process_name_t fields, but only dealt with those that used ORTE_NAME_ARGS. Hence, you may still have a few outputs that bark during compilation. Also, I could only verify those that fall within environments I can compile on, so other environments may yield some minor warnings.
This commit was SVN r15517.
build it's possible that we have to process an ack before this function
returns. If we don't release the lock here we cause a deadlock later
in ack processing function.
This commit was SVN r15441.
Fix the blasted iof null component so it only is selected if/when directed.
This commit was SVN r15437.
The following SVN revision numbers were found above:
r15390 --> open-mpi/ompi@bd65f8ba88
You will not see any impact from this change unless you use the syntax described in ticket #1023. I've tried as many of the RAS components as possible and saw no problem - there may be issues with other RAS components that would not compile on any of my systems. Anything that appears should be trivial to fix.
This commit was SVN r15427.
We no longer store whether we are a singleton in a MCA parameter, we now use a global constant. So all references to the MCA parameter must be removed.
This commit was SVN r15408.
The following SVN revision numbers were found above:
r15390 --> open-mpi/ompi@bd65f8ba88
The problem stemmed from no longer launching a local orted on the same node as mpirun. The orted would save and reuse the base environment. Mpirun didn't do that, and the odls was using the orted's globally saved environment (which wasn't being set).
This fix establishes a globally accessible base launch environment that both the orted and mpirun can utilize. Since we now use that, we don't need to pass it to the odls_launch_proc function, so remove that param from the API (and modify all components to handle the change).
This commit was SVN r15405.
Short description: major changes include -
1. singletons now fork/exec a local daemon to manage their operations.
2. the orte daemon code now resides in libopen-rte
3. daemons no longer use the orte triggering system during startup. Instead, they directly call back to their parent pls component to report ready to operate. A base function to count the callbacks has been provided.
I have modified all the pls components except xcpu and poe (don't understand either well enough to do it). Full functionality has been verified for rsh, SLURM, and TM systems. Compile has been verified for xgrid and gridengine.
This commit was SVN r15390.
VxWorks. Still some issues remaining, I'm sure.
Refs trac:1010
This commit was SVN r15320.
The following Trac tickets were found above:
Ticket 1010 --> https://svn.open-mpi.org/trac/ompi/ticket/1010
types and add STOP_AT_FIRST_PRIORITY type for framework configuration,
which allows all components at the highest priority that succeeds to
succeed
* Use STOP_AT_FIRST_PRIORITY type for gpr framework, so that the null
component isn't built when the replica and proxy components are
available.
This commit was SVN r15286.
The default odls has been updated and works fine. The process odls has been updated, but I could not verify its operation. The bproc ODLS has not been updated yet. Ralph will look at it soon.
This commit was SVN r15257.
* Remove the 'opal_mca_base_param_use_amca_sets' global variable
* Harness the fact that you can (read should) call the cmd_line functions
before initializing opal_init_util(). This pushes the MCA/GMCA/AMCA
command line options into the environment before OPAL inits and starts
to use these values. By putting the cmd_line parse before opal_init_util
in orterun and orted we only parse the *MCA parameter files once, and
correctly (alleviating the need to 'recache' the files on init.)
* Small bits of cleanup.
This commit was SVN r15219.
Per suggestion, if we don't find a valid shell via getpwuid(), also
check the $SHELL environment variable. Also perform a few minor
cleanups along the way.
This commit was SVN r15156.
The following Trac tickets were found above:
Ticket 1060 --> https://svn.open-mpi.org/trac/ompi/ticket/1060
The SnapC Full local Coordinator used this argument to attach to the job the
daemon would be launching. So once this option was removed C/R support broke.
This commit has the local coordinator attach to the job just before it is
launched by the ODLS module. This is a much cleaner solution, and will
eventually allow the SnapC modules to attach to multiple jobs launched
on a single machine.
This commit fixes the C/R regression introduced in r15007.
This commit was SVN r15121.
The following SVN revision numbers were found above:
r15007 --> open-mpi/ompi@85df3bd92f
the multiple threads accessing the OOB/registry asynchronously via the
callbacks. The quickest solution (but definitively not the cleanest) is
to serialize these callbacks in such a way that at any given time
only one thread can execute a callbacks.
This commit was SVN r15086.
through the win dll using multiple threads, we have to insure that
the oob callbacks happens only in a synchronous way or really bad
things happens with the current design (blocking messages from a receive
callback).
This commit was SVN r15069.