1
1

remove improper use of hwloc_bitmap_free

When using the native aprun launcher, it was observed that
there were frequent memory corruption errors occuring either
during a PMI kvs-fence operation, or at mpi termation during
opal cleanup of allocated objects.  This was especially bad
when using

aprun --c none

In some cases, the application would even just hang in finalize
if using ptmalloc, owing to some kind of infinite loop in
cleanup of small blocks, etc.

It turns out that the proble was in orte_ess_base_proc_binding's
improper use of opal_hwloc_base_get_available_cpus.  The cpuset
(bitmap) returned from that function is not meant to be freed
by the caller.

This problem is likely never observed when using the mpirun launcher
as there's an early exit if the OMPI_MCA_orte_bound_at_launch
environment variable is set.

This commit was SVN r32809.
Этот коммит содержится в:
Howard Pritchard 2014-09-29 16:10:37 +00:00
родитель 16d6e82ed2
Коммит f8ac8bb6b0

Просмотреть файл

@ -186,7 +186,6 @@ int orte_ess_base_proc_binding(void)
goto error;
}
hwloc_bitmap_list_asprintf(&orte_process_info.cpuset, cpus);
hwloc_bitmap_free(cpus);
OPAL_OUTPUT_VERBOSE((5, orte_ess_base_framework.framework_output,
"%s Process bound to core",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)));
@ -231,12 +230,11 @@ int orte_ess_base_proc_binding(void)
goto error;
}
hwloc_bitmap_list_asprintf(&orte_process_info.cpuset, cpus);
hwloc_bitmap_free(cpus);
orte_proc_is_bound = true;
OPAL_OUTPUT_VERBOSE((5, orte_ess_base_framework.framework_output,
"%s Process bound to %s",
"%s Process bound to %p %s",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
hwloc_obj_type_string(target)));
cpus, hwloc_obj_type_string(target)));
break;
}
}