7d5fbcfd76
Before this change, the reference counters `opal_util_initialized` and `opal_initialized` were incremented at the beginning of the `opal_init_util` and the `opal_init` functions respectively. In other words, they were incremented before fully initialized. This causes the following program to abort by SIGFPE if `--enable-timing` is enabled on `configure`. ```c // need -lm option on link int main(int argc, char *argv[]) { // raise SIGFPE on division-by-zero feenableexcept(FE_DIVBYZERO); MPI_Init(&argc, &argv); MPI_Finalize(); return 0; } ``` The logic of the SIGFPE is: 1. `MPI_Init` calls `opal_init` through `ompi_rte_init`. 2. `opal_init` changes the value of `opal_initialized` to 1. 3. `opal_init` calls `opal_init_util`. 4. `opal_init_util` calls `opal_timing_ts_func` through `OPAL_TIMING_ENV_INIT`, and `opal_timing_ts_func` returns `get_ts_cycle` instead of `get_ts_gettimeofday` because `opal_initialized` to 1. (This is the problem) 5. `opal_init_util` calls `get_ts_cycle` through `OPAL_TIMING_ENV_INIT`. 6. `get_ts_cycle` executes `opal_timer_base_get_cycles()) / opal_timer_base_get_freq()` and it raises SIGFPE (division-by-zero) because the OPAL TIMER framework is not initialized yet and `opal_timer_base_get_freq` returns 0. This commit changes the increment timing of `opal_util_initialized` and `opal_initialized` to the end of `opal_init_util` and the `opal_init` functions respectively. Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com> |
||
---|---|---|
.. | ||
help-opal_info.txt | ||
help-opal-runtime.txt | ||
Makefile.am | ||
opal_cr.c | ||
opal_cr.h | ||
opal_finalize.c | ||
opal_info_support.c | ||
opal_info_support.h | ||
opal_init.c | ||
opal_params.c | ||
opal_params.h | ||
opal_progress_threads.c | ||
opal_progress_threads.h | ||
opal_progress.c | ||
opal_progress.h | ||
opal.h |