Extend the PMIx modex recv macros to cover the full set of
immediate/optional combinations. If PMIx_Init cannot reach a server,
then declare the MPI proc to be a singleton.
Provide full support for info values via PMIx
Catch all the values used in the "info" area of OMPI using data
available from PMIx instead of via envars. Update PMIx and PRRTE to sync
with their capabilities.
PMIx
- ensure cleanup of fork/exec children
- fix bug in gds/hash that left app info off of list
PRRTE
- fix multi-app bugs
- port setup_child logic from orte
- OMPI env changes
- set app->first_rank
- ensure common hostname across prun, prte, and pmix
- Fix "nolocal" support
Silence a warning from btl/vader
Signed-off-by: Ralph Castain <rhc@pmix.org>
- Avoid modifying single-dash options of applications
- Fix fetch of node/app-level info
- Return correct status code
Signed-off-by: Ralph Castain <rhc@pmix.org>
PMIx:
- Ensure that launchers open all required frameworks
- Pass back the tool's ID
- Fix race condition in IOF
PRRTE:
- Begin conversion to use of nspace in place of numeric jobid
- Restore support:
--report-bindings
--display-map
--display-devel-map
--display-topo
--do-not-launch
--xml-output
--display-allocation
Signed-off-by: Ralph Castain <rhc@pmix.org>
- Port memchecker call from a1d502c.
- Remove unused memcheck macro variables.
- Some code readability improvements.
- Remove some stray +1's in dynamic comm cleanup.
- Re-add OPAL_ENABLE_DEBUG macro to osc header.
- Cleanup some printf's, and includes.
- Refactor cleanup of dpm_disconnect_objs.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
Keep track of the connected procs in vader_add_procs().
Otherwise, the same rank will reconnect the same shmem
segment (rank 0+...) multiple times instead of the next
one as intended.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
Temporary solution for the PML inconsistency issue discussed in #7475.
This patch address 2 things: first it make the PMIx key optional so that
if we are not in a full modex mode we don't do a direct modex, and
second it get the PML info from the vpid 0 instead of from the local
rank.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Restrict the search to the "immediate" range so at worst we check with
our local server and don't go up to the host daemon.
Signed-off-by: Ralph Castain <rhc@pmix.org>
- Deal with deprecated cmd line options and rndz files
- Protect against DVM collisions
- Update atomics
- Fix fork/exec and provide better tool support (PMIx)
- Ensure we cleanup completely upon terminating a tool/server that has
dropped rendezvous files
- Fix multithreaded launch race on remote node
- Fix bug where multiple threads can be modifying app->env
- Add a --no-ready-msg option to prte
- Utilize the PMIx_Spawn ability to do a fork/exec on our behalf to better
setup the "prte" DVM when running in proxy mode
- Provide better isolation between DVM instances
- Fix race condition in shutdown of PMIx fork/exec framework
- Ensure ompi personality gets added in proxy scenarios
Signed-off-by: Ralph Castain <rhc@pmix.org>
This commit removes the specialized support for Sparc v9 as the
architecture is unsupported. The architecture will continue to
work without CMA and using the GCC built-in atomic support.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This commit removes the unsupported __sync built-in atomics in
favor of the GCC built-ins. The priority order (if not modified
by configure flags) is: C11, custom atomics
(opal/include/opal/sys/*), then GCC built-ins.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
IA64 atomic support was deleted some time ago. Some of the references
to the architecture were not removed when the atomic support was. This
commit removes those lingering references. IA64 will continue to work
unsupported with the built-in atomics.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Before this change, the reference counters `opal_util_initialized`
and `opal_initialized` were incremented at the beginning of the
`opal_init_util` and the `opal_init` functions respectively.
In other words, they were incremented before fully initialized.
This causes the following program to abort by SIGFPE if
`--enable-timing` is enabled on `configure`.
```c
// need -lm option on link
int main(int argc, char *argv[])
{
// raise SIGFPE on division-by-zero
feenableexcept(FE_DIVBYZERO);
MPI_Init(&argc, &argv);
MPI_Finalize();
return 0;
}
```
The logic of the SIGFPE is:
1. `MPI_Init` calls `opal_init` through `ompi_rte_init`.
2. `opal_init` changes the value of `opal_initialized` to 1.
3. `opal_init` calls `opal_init_util`.
4. `opal_init_util` calls `opal_timing_ts_func` through
`OPAL_TIMING_ENV_INIT`, and `opal_timing_ts_func` returns
`get_ts_cycle` instead of `get_ts_gettimeofday` because
`opal_initialized` to 1.
(This is the problem)
5. `opal_init_util` calls `get_ts_cycle` through
`OPAL_TIMING_ENV_INIT`.
6. `get_ts_cycle` executes
`opal_timer_base_get_cycles()) / opal_timer_base_get_freq()`
and it raises SIGFPE (division-by-zero) because the OPAL TIMER
framework is not initialized yet and `opal_timer_base_get_freq`
returns 0.
This commit changes the increment timing of `opal_util_initialized`
and `opal_initialized` to the end of `opal_init_util` and the
`opal_init` functions respectively.
Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
The SM BTL was effectively removed a long time ago. All that was left
was a shell that warned people if they tried to use the SM BTL. For
v5.0, we plan to finally remove this ancient shell (and possibly
replace it with vader).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
External pmix installs are frequently in non-standard locations and
the path to their shared libraries are not ldconfig'd in because
there may be multiple pmix installs.
The way the configury was set up prior to this patch, the configuration
would fail soon after the PMIX config stuff was called because it
added some pmix lib stuff to the LDFLAGS, resulting in configury tests
for things that require running the configure test to fail.
This patch avoids this problem by resetting the LDFLAGS and LIBS back
to what they were prior to the run of the external PMIX detection.
The CFLAGS setting is left because there are many places in the ompi
and opal source code where pmix_common.h needs to be included.
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
Hwloc upstream has fixed a problem with embedded "make distcheck" that
was breaking that surfaced when you ran autogen in an Open MPI
tarball.
This submodule update takes in the upstream hwloc fixes for this
issue.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
If someone gives us a namespace that doesn't easily translate to an
integer, we have to create a mechanism for working around the
disconnect. PRRTE has been updated to give us a flag so we know we were
"natively" launched. If we don't see it, then fall back to generating a
hash of the nspace as our jobid. We then have to translate back/forth
between nspace and jobid using a lookup table.
Probably not the right long-term solution, but hopefully helps get us
thru for a bit.
Includes update of PRRTE pointer
Signed-off-by: Ralph Castain <rhc@pmix.org>
- remove stale s390 and MIPS atomics
- ensure envars from spawn are propagated
- fix make tarball
- ensure cleanup of default hostfile
Signed-off-by: Ralph Castain <rhc@pmix.org>
This commit removes the code specific to MIPS. This architecture
has been unsupported for some time. Open MPI will continue to work
on MIPS with C11 and __atomic but will not longer use CMA for
shared memory.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This commit removes the CMA support for s390 and s390x. These
architectures have been unsupported for awhile and no one has
verified that CMA actually works with Open MPI on these systems.
s390 and s390x will continue to work with Open MPI without CMA.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>