Before this change, the reference counters `opal_util_initialized`
and `opal_initialized` were incremented at the beginning of the
`opal_init_util` and the `opal_init` functions respectively.
In other words, they were incremented before fully initialized.
This causes the following program to abort by SIGFPE if
`--enable-timing` is enabled on `configure`.
```c
// need -lm option on link
int main(int argc, char *argv[])
{
// raise SIGFPE on division-by-zero
feenableexcept(FE_DIVBYZERO);
MPI_Init(&argc, &argv);
MPI_Finalize();
return 0;
}
```
The logic of the SIGFPE is:
1. `MPI_Init` calls `opal_init` through `ompi_rte_init`.
2. `opal_init` changes the value of `opal_initialized` to 1.
3. `opal_init` calls `opal_init_util`.
4. `opal_init_util` calls `opal_timing_ts_func` through
`OPAL_TIMING_ENV_INIT`, and `opal_timing_ts_func` returns
`get_ts_cycle` instead of `get_ts_gettimeofday` because
`opal_initialized` to 1.
(This is the problem)
5. `opal_init_util` calls `get_ts_cycle` through
`OPAL_TIMING_ENV_INIT`.
6. `get_ts_cycle` executes
`opal_timer_base_get_cycles()) / opal_timer_base_get_freq()`
and it raises SIGFPE (division-by-zero) because the OPAL TIMER
framework is not initialized yet and `opal_timer_base_get_freq`
returns 0.
This commit changes the increment timing of `opal_util_initialized`
and `opal_initialized` to the end of `opal_init_util` and the
`opal_init` functions respectively.
Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
1- Remove the common symbols issue: global variable not initialized. (#7424)
Move the variables to local scope within the set_info function.
2- Remove GPFS hints using datashipping: not used anymore
3- Redirect output stream to corresponding fs framework.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
The communicator might be not existent yet when mca_fs_gpfs_component_file_query() is called.
Therefore, we need to check it first before calling brodcast function.
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
The SM BTL was effectively removed a long time ago. All that was left
was a shell that warned people if they tried to use the SM BTL. For
v5.0, we plan to finally remove this ancient shell (and possibly
replace it with vader).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
External pmix installs are frequently in non-standard locations and
the path to their shared libraries are not ldconfig'd in because
there may be multiple pmix installs.
The way the configury was set up prior to this patch, the configuration
would fail soon after the PMIX config stuff was called because it
added some pmix lib stuff to the LDFLAGS, resulting in configury tests
for things that require running the configure test to fail.
This patch avoids this problem by resetting the LDFLAGS and LIBS back
to what they were prior to the run of the external PMIX detection.
The CFLAGS setting is left because there are many places in the ompi
and opal source code where pmix_common.h needs to be included.
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
Hwloc upstream has fixed a problem with embedded "make distcheck" that
was breaking that surfaced when you ran autogen in an Open MPI
tarball.
This submodule update takes in the upstream hwloc fixes for this
issue.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This commit changes the behavior of the individual sharedfp component. If
the component cannot create either the datafile or the metadatafile during File_open,
no error is being raised going forward. This allows applications that do not use shared
file pointer operations to continue execution without any issue.
If the user however subsequently calls MPI_File_write_shared or similar operations, an error
will be raised.
Fixes issue #7429
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
Mark the --enable-orterun-prefix-by-default and
--enable-mpirun-prefix-by-default options as deprecated, but continue to
honor them by translating them to the new
--enable-prte-prefix-by-default option.
Signed-off-by: Ralph Castain <rhc@pmix.org>
We're dealing with CLI options that have been deleted, not
deprecated.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 5aea4446288db9207fbc60f23ee903b149217121)
Useful when debugging RTE-related issues
Not for inclusion in the tarball - just added to git repo for use by
developers.
Signed-off-by: Ralph Castain <rhc@pmix.org>
If someone gives us a namespace that doesn't easily translate to an
integer, we have to create a mechanism for working around the
disconnect. PRRTE has been updated to give us a flag so we know we were
"natively" launched. If we don't see it, then fall back to generating a
hash of the nspace as our jobid. We then have to translate back/forth
between nspace and jobid using a lookup table.
Probably not the right long-term solution, but hopefully helps get us
thru for a bit.
Includes update of PRRTE pointer
Signed-off-by: Ralph Castain <rhc@pmix.org>
- remove stale s390 and MIPS atomics
- ensure envars from spawn are propagated
- fix make tarball
- ensure cleanup of default hostfile
Signed-off-by: Ralph Castain <rhc@pmix.org>
Per the developer's meeting, add detection of the deprecated --with-pmi
(and its associated --with-pmi-libdir) configure option and error out
with a polite note of the change in support
Since "--with-pmi" now shows in the configure help output, mark the help
string with a giant *DEPRECATED* to warn users not to use it
Signed-off-by: Ralph Castain <rhc@pmix.org>
Ma
Signed-off-by: Ralph Castain <rhc@pmix.org>
* `AUTOMAKE_JOBS` can improve the performance to `autogen.pl`
* The user can set this envar in the environment before calling
`autogen.pl` or use the new `-j #` option to set it.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>