Update PMIx/PRRTE to ensure we pickup the default system and user MCA
param definitions during PMIx_server_setup_application so they get
propagated. Protect OPAL's MCA var processing so it doesn't try to
process a NULL filename when PMIx provides the params for it.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Remove pmix_config.h from the tarball. Deal with the case of no local
procs when register_nspace is called.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Consolidate the ompi_process_info and opal_process_info structs to
remove duplicate storage and conversion issues. Unwind some interweaving
of include files using opal.h. Silence a couple of warnings.
For now, set the arch to local if PMIX_ARCH is not found.
Signed-off-by: Ralph Castain <rhc@pmix.org>
PMIx:
- restore OPA support
PRRTE:
Restore support for several options
* -N for ppr:N:node
* INHERIT modifier for --map-by option, indicating that
the spawned job should inherit the placement options
of its parent. Only applicable to dynamically spawned
jobs
Signed-off-by: Ralph Castain <rhc@pmix.org>
This reverts commit 1aabbe456d5de8fc3c3d6f91eabb8851db7d63eb.
Update PMIx and PRRTE, plus PRRTE config integration
Cleanup how we pass the extra libs and LDFLAGS for linking against
external libevent, hwloc, and pmix installs.
Catch the flag indicating that PMIx provided the user-level default MCA
params so we don't go looking for them ourselves.
Cleanup misc config warnings
Signed-off-by: Ralph Castain <rhc@pmix.org>
Properly mark/detect that a daemon sourced the event broadcast to avoid
reinjecting it into the PMIx server library. Correct the source field
for the event notify call on launcher ready.
Update event notification for tool support
Deal with a variety of race conditions related to tool reconnection to a
different server.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Deprecate --am and --amca options
Avoid default param files on backend nodes
Any parameters in the PRRTE default or user param files will have been
picked up by prte and included in the environment sent to the prted, so
don't open those files on the backend.
Avoid picking up MCA param file info on backend
Avoid the scaling problem at PRRTE startup by only reading the system
and user param files on the frontend.
Complete revisions to cmd line parser for OMPI
Per specification, enforce following precedence order:
1. system-level default parameter file
1. user-level default parameter file
1. Anything found in the environment
1. "--tune" files. Note that "--amca" goes away and becomes equivalent to "--tune". Okay if it is provided more than once on a cmd line (we will aggregate the list of files, retaining order), but an error if a parameter is referenced in more than one file with a different value
1. "--mca" options. Again, error if the same option appears more than once with a different value. Allowed to override a parameter referenced in a "tune" file
1. "-x" options. Allowed to overwrite options given in a "tune" file, but cannot conflict with an explicit "--mca" option
1. all other options
Fix special handling of "-np"
Get agreement on jobid across the layers
Need all three pieces (PRRTE, PMIx, and OPAL) to agree on the nspace
conversion to jobid method
Ensure prte show_help messages get output
Print abnormal termination messages
Cleanup error reporting in persistent operations
Signed-off-by: Ralph Castain <rhc@pmix.org>
dd
Signed-off-by: Ralph Castain <rhc@pmix.org>
Use "prte" instead of "prun" for proxy execution of cmds like mpirun.
This avoids the fork/exec-rendezvous complexities and should result in
more reliable operation.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Extend the PMIx modex recv macros to cover the full set of
immediate/optional combinations. If PMIx_Init cannot reach a server,
then declare the MPI proc to be a singleton.
Provide full support for info values via PMIx
Catch all the values used in the "info" area of OMPI using data
available from PMIx instead of via envars. Update PMIx and PRRTE to sync
with their capabilities.
PMIx
- ensure cleanup of fork/exec children
- fix bug in gds/hash that left app info off of list
PRRTE
- fix multi-app bugs
- port setup_child logic from orte
- OMPI env changes
- set app->first_rank
- ensure common hostname across prun, prte, and pmix
- Fix "nolocal" support
Silence a warning from btl/vader
Signed-off-by: Ralph Castain <rhc@pmix.org>
- Avoid modifying single-dash options of applications
- Fix fetch of node/app-level info
- Return correct status code
Signed-off-by: Ralph Castain <rhc@pmix.org>
PMIx:
- Ensure that launchers open all required frameworks
- Pass back the tool's ID
- Fix race condition in IOF
PRRTE:
- Begin conversion to use of nspace in place of numeric jobid
- Restore support:
--report-bindings
--display-map
--display-devel-map
--display-topo
--do-not-launch
--xml-output
--display-allocation
Signed-off-by: Ralph Castain <rhc@pmix.org>
Restrict the search to the "immediate" range so at worst we check with
our local server and don't go up to the host daemon.
Signed-off-by: Ralph Castain <rhc@pmix.org>
- Deal with deprecated cmd line options and rndz files
- Protect against DVM collisions
- Update atomics
- Fix fork/exec and provide better tool support (PMIx)
- Ensure we cleanup completely upon terminating a tool/server that has
dropped rendezvous files
- Fix multithreaded launch race on remote node
- Fix bug where multiple threads can be modifying app->env
- Add a --no-ready-msg option to prte
- Utilize the PMIx_Spawn ability to do a fork/exec on our behalf to better
setup the "prte" DVM when running in proxy mode
- Provide better isolation between DVM instances
- Fix race condition in shutdown of PMIx fork/exec framework
- Ensure ompi personality gets added in proxy scenarios
Signed-off-by: Ralph Castain <rhc@pmix.org>
- remove stale s390 and MIPS atomics
- ensure envars from spawn are propagated
- fix make tarball
- ensure cleanup of default hostfile
Signed-off-by: Ralph Castain <rhc@pmix.org>
Fix a couple of spots in OMPI to resolve warnings. The one in comm_cid
in particular may be responsible for some/all of the comm_spawn issues
as it was passing an incorrect pointer to a macro, thus causing memory
corruption.
Update PRRTE and PMIx to deal with v3/v4 differences.
Signed-off-by: Ralph Castain <rhc@pmix.org>
When a job fails very quickly, it is possible that the spawning tool
won't get notified that the spawn completed prior to be told that the
job terminated. This can cause the tool to "hang" in PMIx_Spawn. Ensure
that PRRTE handles this case by guaranteeing we notify the spawner.
Track both PRRTE and PMIx masters as both have changed, though only the
PRRTE one is involved in this particular fix.
Signed-off-by: Ralph Castain <rhc@pmix.org>
- Ensure we accurately handle node name aliases
- Apply the local launch environ to apps prior to spawn
- Add error check if PMIx_Spawn fails
- Fix compiler warning
- Fix PGI vendor check
- Prevent mpirun from hanging if prte segfaults
- Fix absolute/relative path names to ensure "prte" and
"prted" are taken from same distribution as "mpirun"
Signed-off-by: Ralph Castain <rhc@pmix.org>
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.
remove orte: misc fixes
- UCX fixes
- VPATH issue
- oshmem fixes
- remove useless definition
- Add PRRTE submodule
- Get autogen.pl to traverse PRRTE submodule
- Remove stale orcm reference
- Configure embedded PRRTE
- Correctly pass the prefix to PRRTE
- Correctly set the OMPI_WANT_PRRTE am_conditional
- Move prrte configuration to the end of OMPI's configure.ac
- Make mpirun a symlink to prun, when available
- Fix makedist with --no-orte/--no-prrte option
- Add a `--no-prrte` option which is the same as the legacy
`--no-orte` option.
- Remove embedded PMIx tarball. Replace it with new submodule
pointing to OpenPMIx master repo's master branch
- Some cleanup in PRRTE integration and add config summary entry
- Correctly set the hostname
- Fix locality
- Fix singleton operations
- Fix support for "tune" and "am" options
Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>