1
1
openmpi/opal/runtime
Tsubasa Yanagibashi 7d5fbcfd76 opal: Fix opal_initialized reference counter
Before this change, the reference counters `opal_util_initialized`
and `opal_initialized` were incremented at the beginning of the
`opal_init_util` and the `opal_init` functions respectively.
In other words, they were incremented before fully initialized.

This causes the following program to abort by SIGFPE if
`--enable-timing` is enabled on `configure`.

```c
// need -lm option on link

int main(int argc, char *argv[])
{
    // raise SIGFPE on division-by-zero
    feenableexcept(FE_DIVBYZERO);
    MPI_Init(&argc, &argv);
    MPI_Finalize();
    return 0;
}
```

The logic of the SIGFPE is:

1. `MPI_Init` calls `opal_init` through `ompi_rte_init`.
2. `opal_init` changes the value of `opal_initialized` to 1.
3. `opal_init` calls `opal_init_util`.
4. `opal_init_util` calls `opal_timing_ts_func` through
   `OPAL_TIMING_ENV_INIT`, and `opal_timing_ts_func` returns
   `get_ts_cycle` instead of `get_ts_gettimeofday` because
   `opal_initialized` to 1.
   (This is the problem)
5. `opal_init_util` calls `get_ts_cycle` through
   `OPAL_TIMING_ENV_INIT`.
6. `get_ts_cycle` executes
   `opal_timer_base_get_cycles()) / opal_timer_base_get_freq()`
   and it raises SIGFPE (division-by-zero) because the OPAL TIMER
   framework is not initialized yet and `opal_timer_base_get_freq`
   returns 0.

This commit changes the increment timing of `opal_util_initialized`
and `opal_initialized` to the end of `opal_init_util` and the
`opal_init` functions respectively.

Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
2020-02-26 14:09:19 +09:00
..
help-opal_info.txt Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
help-opal-runtime.txt Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Makefile.am Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
opal_cr.c Handle asprintf errors with opal_asprintf wrapper 2018-10-08 16:43:53 -07:00
opal_cr.h scripted symbol name change (ompi_ prefix) 2017-07-11 02:13:23 -04:00
opal_finalize.c opal: fix compiler warning 2018-12-23 12:10:23 -08:00
opal_info_support.c Handle asprintf errors with opal_asprintf wrapper 2018-10-08 16:43:53 -07:00
opal_info_support.h opal_info: Add ability to report load failures 2017-04-12 16:06:21 -05:00
opal_init.c opal: Fix opal_initialized reference counter 2020-02-26 14:09:19 +09:00
opal_params.c opal: clean up init/finalize 2018-12-18 14:37:04 -07:00
opal_params.h cuda: add option to remove warning about missing libcuda. 2018-05-24 14:56:46 -07:00
opal_progress_threads.c opal_progress_threads: fix double RELEASE 2015-08-12 05:11:40 -07:00
opal_progress_threads.h opal_progress_thread: fix stale comment 2015-10-14 18:25:31 -07:00
opal_progress.c opal: clean up init/finalize 2018-12-18 14:37:04 -07:00
opal_progress.h opal: clean up init/finalize 2018-12-18 14:37:04 -07:00
opal.h opal: ensure opal_gethostname() always returns a value 2020-01-21 09:56:52 -08:00