remove improper use of hwloc_bitmap_free
When using the native aprun launcher, it was observed that there were frequent memory corruption errors occuring either during a PMI kvs-fence operation, or at mpi termation during opal cleanup of allocated objects. This was especially bad when using aprun --c none In some cases, the application would even just hang in finalize if using ptmalloc, owing to some kind of infinite loop in cleanup of small blocks, etc. It turns out that the proble was in orte_ess_base_proc_binding's improper use of opal_hwloc_base_get_available_cpus. The cpuset (bitmap) returned from that function is not meant to be freed by the caller. This problem is likely never observed when using the mpirun launcher as there's an early exit if the OMPI_MCA_orte_bound_at_launch environment variable is set. This commit was SVN r32809.
Этот коммит содержится в:
родитель
16d6e82ed2
Коммит
f8ac8bb6b0
@ -186,7 +186,6 @@ int orte_ess_base_proc_binding(void)
|
||||
goto error;
|
||||
}
|
||||
hwloc_bitmap_list_asprintf(&orte_process_info.cpuset, cpus);
|
||||
hwloc_bitmap_free(cpus);
|
||||
OPAL_OUTPUT_VERBOSE((5, orte_ess_base_framework.framework_output,
|
||||
"%s Process bound to core",
|
||||
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
|
||||
@ -231,12 +230,11 @@ int orte_ess_base_proc_binding(void)
|
||||
goto error;
|
||||
}
|
||||
hwloc_bitmap_list_asprintf(&orte_process_info.cpuset, cpus);
|
||||
hwloc_bitmap_free(cpus);
|
||||
orte_proc_is_bound = true;
|
||||
OPAL_OUTPUT_VERBOSE((5, orte_ess_base_framework.framework_output,
|
||||
"%s Process bound to %s",
|
||||
"%s Process bound to %p %s",
|
||||
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
|
||||
hwloc_obj_type_string(target)));
|
||||
cpus, hwloc_obj_type_string(target)));
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
Загрузка…
Ссылка в новой задаче
Block a user