Restrict the search to the "immediate" range so at worst we check with
our local server and don't go up to the host daemon.
Signed-off-by: Ralph Castain <rhc@pmix.org>
- Deal with deprecated cmd line options and rndz files
- Protect against DVM collisions
- Update atomics
- Fix fork/exec and provide better tool support (PMIx)
- Ensure we cleanup completely upon terminating a tool/server that has
dropped rendezvous files
- Fix multithreaded launch race on remote node
- Fix bug where multiple threads can be modifying app->env
- Add a --no-ready-msg option to prte
- Utilize the PMIx_Spawn ability to do a fork/exec on our behalf to better
setup the "prte" DVM when running in proxy mode
- Provide better isolation between DVM instances
- Fix race condition in shutdown of PMIx fork/exec framework
- Ensure ompi personality gets added in proxy scenarios
Signed-off-by: Ralph Castain <rhc@pmix.org>
External pmix installs are frequently in non-standard locations and
the path to their shared libraries are not ldconfig'd in because
there may be multiple pmix installs.
The way the configury was set up prior to this patch, the configuration
would fail soon after the PMIX config stuff was called because it
added some pmix lib stuff to the LDFLAGS, resulting in configury tests
for things that require running the configure test to fail.
This patch avoids this problem by resetting the LDFLAGS and LIBS back
to what they were prior to the run of the external PMIX detection.
The CFLAGS setting is left because there are many places in the ompi
and opal source code where pmix_common.h needs to be included.
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
If someone gives us a namespace that doesn't easily translate to an
integer, we have to create a mechanism for working around the
disconnect. PRRTE has been updated to give us a flag so we know we were
"natively" launched. If we don't see it, then fall back to generating a
hash of the nspace as our jobid. We then have to translate back/forth
between nspace and jobid using a lookup table.
Probably not the right long-term solution, but hopefully helps get us
thru for a bit.
Includes update of PRRTE pointer
Signed-off-by: Ralph Castain <rhc@pmix.org>
- remove stale s390 and MIPS atomics
- ensure envars from spawn are propagated
- fix make tarball
- ensure cleanup of default hostfile
Signed-off-by: Ralph Castain <rhc@pmix.org>
If you autogen.pl --without-prrte, we wouldn't configure or build PRRTE
support. However, configuring with --disable-internal-rte wasn't working
as it was being ignored. This led to some false errors when compiling
with an earlier PMIx v2.2 release.
That said, there were a couple of places that needed protection against
PMIx v2.2.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Fix a couple of spots in OMPI to resolve warnings. The one in comm_cid
in particular may be responsible for some/all of the comm_spawn issues
as it was passing an incorrect pointer to a macro, thus causing memory
corruption.
Update PRRTE and PMIx to deal with v3/v4 differences.
Signed-off-by: Ralph Castain <rhc@pmix.org>
When a job fails very quickly, it is possible that the spawning tool
won't get notified that the spawn completed prior to be told that the
job terminated. This can cause the tool to "hang" in PMIx_Spawn. Ensure
that PRRTE handles this case by guaranteeing we notify the spawner.
Track both PRRTE and PMIx masters as both have changed, though only the
PRRTE one is involved in this particular fix.
Signed-off-by: Ralph Castain <rhc@pmix.org>
- Ensure we accurately handle node name aliases
- Apply the local launch environ to apps prior to spawn
- Add error check if PMIx_Spawn fails
- Fix compiler warning
- Fix PGI vendor check
- Prevent mpirun from hanging if prte segfaults
- Fix absolute/relative path names to ensure "prte" and
"prted" are taken from same distribution as "mpirun"
Signed-off-by: Ralph Castain <rhc@pmix.org>
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.
remove orte: misc fixes
- UCX fixes
- VPATH issue
- oshmem fixes
- remove useless definition
- Add PRRTE submodule
- Get autogen.pl to traverse PRRTE submodule
- Remove stale orcm reference
- Configure embedded PRRTE
- Correctly pass the prefix to PRRTE
- Correctly set the OMPI_WANT_PRRTE am_conditional
- Move prrte configuration to the end of OMPI's configure.ac
- Make mpirun a symlink to prun, when available
- Fix makedist with --no-orte/--no-prrte option
- Add a `--no-prrte` option which is the same as the legacy
`--no-orte` option.
- Remove embedded PMIx tarball. Replace it with new submodule
pointing to OpenPMIx master repo's master branch
- Some cleanup in PRRTE integration and add config summary entry
- Correctly set the hostname
- Fix locality
- Fix singleton operations
- Fix support for "tune" and "am" options
Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Update PMIx to latest master to get supporting updates. For
connect/accept (part of comm_spawn as well), lookup locality for all
participating procs on the node and compute the relative locality so it
can be used for MPI operations.
Signed-off-by: Ralph Castain <rhc@pmix.org>
This link-back seems to be breaking OMPI for some reason. I'm not sure we need it in PMIx anyway, but we'll investigate over there.
Signed-off-by: Ralph Castain <rhc@pmix.org>
When Slurm is built against PMIx, some installations place a copy of the
PMIx library that Slurm is linking against in the Slurm PMI location.
Current configury ignores that location. The desired behavior is to look
for a PMIx lib in that location when --with-pmi is given. If the user
also specifies --with-pmix and gives a different location, then override
anything previously found and look for it where the user directed.
Signed-off-by: Ralph Castain <rhc@pmix.org>
PMIx is removing the --enable-embedded-libevent and
--enable-embedded-hwloc flags as they are confusing users. Instead, we
will use the --enable-embedded-mode to handle both of these options.
Update the embedded configury to handle it.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Update the OPAL glue configure code to correctly link the opal/pmix4
component to the hwloc used by OMPI instead of defaulting to the
system-level hwloc. Required a corresponding update to the PMIx hwloc
configure code so we treat hwloc the same way we handle libevent in
embedded scenarios.
Signed-off-by: Ralph Castain <rhc@pmix.org>
The PMIX_MODEX and PMIX_INFO_ARRAY macros were removed from the PMIx 3.1 standard.
Open MPI does not really need them (they are only used to be reported as not supported),
so smply #ifdef protect them to support an external PMIx v3.1
Refs. open-mpi/ompi#6247
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Add the --pset option for app_contexts so the user can provide a string
name for each app_context. Use the new PMIx pset attribute to store the
names in the PMIx local storage for retrieval
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Looks like a filename was missed when pmix sucked in the installdirs
framework. Fixing the typo fixes "make ctags" and "make cscope".
Signed-off-by: Brian Barrett <bbarrett@amazon.com>