1. repair of the linear and direct routed modules
2. repair of the ompi/pubsub/orte module to correctly init routes to the ompi-server, and correctly handle failure to correctly parse the provided ompi-server URI
3. modification of orterun to accept both "file" and "FILE" for designating where the ompi-server URI is to be found - purely a convenience feature
4. resolution of a message ordering problem during the connect/accept handshake that allowed the "send-first" proc to attempt to send to the "recv-first" proc before the HNP had actually updated its routes.
Let this be a further reminder to all - message ordering is NOT guaranteed in the OOB
5. Repair the ompi/dpm/orte module to correctly init routes during connect/accept.
Reminder to all: messages sent to procs in another job family (i.e., started by a different mpirun) are ALWAYS routed through the respective HNPs. As per the comments in orte/routed, this is REQUIRED to maintain connect/accept (where only the root proc on each side is capable of init'ing the routes), allow communication between mpirun's using different routing modules, and to minimize connections on tools such as ompi-server. It is all taken care of "under the covers" by the OOB to ensure that a route back to the sender is maintained, even when the different mpirun's are using different routed modules.
6. corrections in the orte/odls to ensure proper identification of daemons participating in a dynamic launch
7. corrections in build/nidmap to support update of an existing nidmap during dynamic launch
8. corrected implementation of the update_arch function in the ESS, along with consolidation of a number of ESS operations into base functions for easier maintenance. The ability to support info from multiple jobs was added, although we don't currently do so - this will come later to support further fault recovery strategies
9. minor updates to several functions to remove unnecessary and/or no longer used variables and envar's, add some debugging output, etc.
10. addition of a new macro ORTE_PROC_IS_DAEMON that resolves to true if the provided proc is a daemon
There is still more cleanup to be done for efficiency, but this at least works.
Tested on single-node Mac, multi-node SLURM via odin. Tests included connect/accept, publish/lookup/unpublish, comm_spawn, comm_spawn_multiple, and singleton comm_spawn.
Fixes ticket #1256
This commit was SVN r18804.
- Change the arguments for launch failed function according to changeset r18611.
This commit was SVN r18795.
The following SVN revision numbers were found above:
r18611 --> open-mpi/ompi@7bee71aa59
is properly initialized and available in all cases (like ompi_info, where
the ess is never actually initialized). Fixes trac:1364.
This commit was SVN r18733.
The following Trac tickets were found above:
Ticket 1364 --> https://svn.open-mpi.org/trac/ompi/ticket/1364
Add ability for sys admins to prohibit putting session directories under specified locations. Thus, they can now protect parallel file systems from foolish user mistakes.
This commit was SVN r18721.
This commit ensures that attempts to access information that is unknowable or undefined returns appropriate invalid or not_supported values to avoid unexpected behavior and/or segfaults.
This commit was SVN r18692.
individual orte/tools/*/Makefile.am files. This causes "make" to
travese into every directory, even if it's not going to build anything
in that directory (which is a good thing). It also helps cleanup and
dist issues.
This also affects orte-checkpoint and orte-restart, but I couldn't get
--with-ft to compile properly; I'll pass along a heads-up to Josh to
ensure that I didn't break anything.
This commit was SVN r18680.
Ensure that routes to remote procs are set on the HNP before completing launch so that the debugger message can be sent. Solves a race condition that can exist in those environments where the HNP does not have local procs.
This commit was SVN r18674.
Update the ESS API so we can update the stored arch's should the modex include that info. Update ompi/proc to check/set the arch for remote procs, and add that function call to mpi_init right after the modex is done.
Setup to allow other grpcomm modules to decide whether or not to add the arch to the modex, and to detect if other entries have been made. If not, then the modex can just fall through. Begin setting up some logic in the "basic" module to handle different arch situations.
For now, default to the "bad" module so we will work in all situations, even though we may be sending around more info than we really require.
This fixes ticket #1340
This commit was SVN r18673.
Some minor changes to help facilitate debugger support so that both mpirun and yod can operate with it. Still to be completed.
This commit was SVN r18664.
is still to use the C based wrapper compilers (which have many more features
and are more well tested). The Perl compilers are enabled with the option
--enable-script-wrapper-compilers, which also ignores the option
--disable-binaries (ie --enable-script-wrapper-compilers --disable-binaries
will result in perl-based wrapper compilers being installed, but no other
binaries being installed).
This commit was SVN r18655.
This allows r18645 to fix the memory corruption issue, but also allows us to resolve the memory leaks cited by CID 1039
This commit was SVN r18646.
The following SVN revision numbers were found above:
r18645 --> open-mpi/ompi@53d83ba1c5