This commit removes the specialized support for Sparc v9 as the
architecture is unsupported. The architecture will continue to
work without CMA and using the GCC built-in atomic support.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This commit removes the unsupported __sync built-in atomics in
favor of the GCC built-ins. The priority order (if not modified
by configure flags) is: C11, custom atomics
(opal/include/opal/sys/*), then GCC built-ins.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
IA64 atomic support was deleted some time ago. Some of the references
to the architecture were not removed when the atomic support was. This
commit removes those lingering references. IA64 will continue to work
unsupported with the built-in atomics.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Before this change, the reference counters `opal_util_initialized`
and `opal_initialized` were incremented at the beginning of the
`opal_init_util` and the `opal_init` functions respectively.
In other words, they were incremented before fully initialized.
This causes the following program to abort by SIGFPE if
`--enable-timing` is enabled on `configure`.
```c
// need -lm option on link
int main(int argc, char *argv[])
{
// raise SIGFPE on division-by-zero
feenableexcept(FE_DIVBYZERO);
MPI_Init(&argc, &argv);
MPI_Finalize();
return 0;
}
```
The logic of the SIGFPE is:
1. `MPI_Init` calls `opal_init` through `ompi_rte_init`.
2. `opal_init` changes the value of `opal_initialized` to 1.
3. `opal_init` calls `opal_init_util`.
4. `opal_init_util` calls `opal_timing_ts_func` through
`OPAL_TIMING_ENV_INIT`, and `opal_timing_ts_func` returns
`get_ts_cycle` instead of `get_ts_gettimeofday` because
`opal_initialized` to 1.
(This is the problem)
5. `opal_init_util` calls `get_ts_cycle` through
`OPAL_TIMING_ENV_INIT`.
6. `get_ts_cycle` executes
`opal_timer_base_get_cycles()) / opal_timer_base_get_freq()`
and it raises SIGFPE (division-by-zero) because the OPAL TIMER
framework is not initialized yet and `opal_timer_base_get_freq`
returns 0.
This commit changes the increment timing of `opal_util_initialized`
and `opal_initialized` to the end of `opal_init_util` and the
`opal_init` functions respectively.
Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
The SM BTL was effectively removed a long time ago. All that was left
was a shell that warned people if they tried to use the SM BTL. For
v5.0, we plan to finally remove this ancient shell (and possibly
replace it with vader).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Hwloc upstream has fixed a problem with embedded "make distcheck" that
was breaking that surfaced when you ran autogen in an Open MPI
tarball.
This submodule update takes in the upstream hwloc fixes for this
issue.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
If someone gives us a namespace that doesn't easily translate to an
integer, we have to create a mechanism for working around the
disconnect. PRRTE has been updated to give us a flag so we know we were
"natively" launched. If we don't see it, then fall back to generating a
hash of the nspace as our jobid. We then have to translate back/forth
between nspace and jobid using a lookup table.
Probably not the right long-term solution, but hopefully helps get us
thru for a bit.
Includes update of PRRTE pointer
Signed-off-by: Ralph Castain <rhc@pmix.org>
- remove stale s390 and MIPS atomics
- ensure envars from spawn are propagated
- fix make tarball
- ensure cleanup of default hostfile
Signed-off-by: Ralph Castain <rhc@pmix.org>
This commit removes the code specific to MIPS. This architecture
has been unsupported for some time. Open MPI will continue to work
on MIPS with C11 and __atomic but will not longer use CMA for
shared memory.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This commit removes the CMA support for s390 and s390x. These
architectures have been unsupported for awhile and no one has
verified that CMA actually works with Open MPI on these systems.
s390 and s390x will continue to work with Open MPI without CMA.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
This commit fixes an issue in the MCA base variable system. The
code was retrieving the user home directory (from HOME) and
attempting to use it to build a search path for config files.
In this case user-level configuration directories have been
enabled so the appropriate thing to do is to print an error
message and return. This commit makes that change. It does not
ensure that HOME is set correctly.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
If you autogen.pl --without-prrte, we wouldn't configure or build PRRTE
support. However, configuring with --disable-internal-rte wasn't working
as it was being ignored. This led to some false errors when compiling
with an earlier PMIx v2.2 release.
That said, there were a couple of places that needed protection against
PMIx v2.2.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Fix a couple of spots in OMPI to resolve warnings. The one in comm_cid
in particular may be responsible for some/all of the comm_spawn issues
as it was passing an incorrect pointer to a macro, thus causing memory
corruption.
Update PRRTE and PMIx to deal with v3/v4 differences.
Signed-off-by: Ralph Castain <rhc@pmix.org>
When a job fails very quickly, it is possible that the spawning tool
won't get notified that the spawn completed prior to be told that the
job terminated. This can cause the tool to "hang" in PMIx_Spawn. Ensure
that PRRTE handles this case by guaranteeing we notify the spawner.
Track both PRRTE and PMIx masters as both have changed, though only the
PRRTE one is involved in this particular fix.
Signed-off-by: Ralph Castain <rhc@pmix.org>
- Ensure we accurately handle node name aliases
- Apply the local launch environ to apps prior to spawn
- Add error check if PMIx_Spawn fails
- Fix compiler warning
- Fix PGI vendor check
- Prevent mpirun from hanging if prte segfaults
- Fix absolute/relative path names to ensure "prte" and
"prted" are taken from same distribution as "mpirun"
Signed-off-by: Ralph Castain <rhc@pmix.org>
This commit fixes a potential deadlock that can occur between the
memory hooks and region registation. This deadlock occurs because
of a hold and wait error between two mutexes. The first mutex is
the VMA lock used to protect internal rcache/grdma structures and
the reader/writer lock in the interval tree.
In the case of the memory hooks a reader lock is obtained on the
interval tree then the VMA lock is obtained to remove the
registration from the LRU. In the case of LRU evictions the VMA
lock is obtained then the writer lock on the interval tree is
obtained. This leads to the deadlock.
To fix the issue the code that evicts from the LRU has been
updated to only invalidate the registration while the VMA lock
is held then remove the registration from the VMA after the
lock is released. This should completely eliminate the above
deadlock.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.
remove orte: misc fixes
- UCX fixes
- VPATH issue
- oshmem fixes
- remove useless definition
- Add PRRTE submodule
- Get autogen.pl to traverse PRRTE submodule
- Remove stale orcm reference
- Configure embedded PRRTE
- Correctly pass the prefix to PRRTE
- Correctly set the OMPI_WANT_PRRTE am_conditional
- Move prrte configuration to the end of OMPI's configure.ac
- Make mpirun a symlink to prun, when available
- Fix makedist with --no-orte/--no-prrte option
- Add a `--no-prrte` option which is the same as the legacy
`--no-orte` option.
- Remove embedded PMIx tarball. Replace it with new submodule
pointing to OpenPMIx master repo's master branch
- Some cleanup in PRRTE integration and add config summary entry
- Correctly set the hostname
- Fix locality
- Fix singleton operations
- Fix support for "tune" and "am" options
Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>