1
1

Fix oob_tcp tcp_component_close segfault with active listeners

oob_tcp in non-HNP mode shares libevent event_base with oob_base [1].
orte_oob_base_close calls:
(1) oob_tcp component_shutdown, then
(2) opal_progress_thread_finalize, then
(3) oob_tcp tcp_component_close [2].
opal_progress_thread_finalize calls tracker_destructor [3] that frees the
event_base [4]. If any oob_tcp event listeners are active at this time, oob_tcp
will crash trying to delete them at [5] [6].

This change moves oob_tcp event listener cleanup from component_close to
component_shutdown so that it happens before the event_base is freed.

[1] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_listener.c#L160
[2] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/base/oob_base_frame.c#L95
[3] https://github.com/open-mpi/ompi/blob/v4.0.1/opal/runtime/opal_progress_threads.c#L232
[4] https://github.com/open-mpi/ompi/blob/v4.0.1/opal/runtime/opal_progress_threads.c#L65
[5] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_component.c#L192
[6] https://github.com/open-mpi/ompi/blob/v4.0.1/orte/mca/oob/tcp/oob_tcp_listener.c#L955

Signed-off-by: Orivej Desh <orivej@gmx.fr>
Этот коммит содержится в:
Orivej Desh 2019-07-04 20:24:50 +00:00
родитель 5d51b2310d
Коммит 78b7e342bd

Просмотреть файл

@ -186,9 +186,6 @@ static int tcp_component_open(void)
*/ */
static int tcp_component_close(void) static int tcp_component_close(void)
{ {
/* cleanup listen event list */
OPAL_LIST_DESTRUCT(&mca_oob_tcp_component.listeners);
OBJ_DESTRUCT(&mca_oob_tcp_component.peers); OBJ_DESTRUCT(&mca_oob_tcp_component.peers);
if (NULL != mca_oob_tcp_component.ipv4conns) { if (NULL != mca_oob_tcp_component.ipv4conns) {
@ -710,6 +707,9 @@ static void component_shutdown(void)
(void **) &peer, node, &node); (void **) &peer, node, &node);
} }
/* cleanup listen event list */
OPAL_LIST_DESTRUCT(&mca_oob_tcp_component.listeners);
opal_output_verbose(2, orte_oob_base_framework.framework_output, opal_output_verbose(2, orte_oob_base_framework.framework_output,
"%s TCP SHUTDOWN done", "%s TCP SHUTDOWN done",
ORTE_NAME_PRINT(ORTE_PROC_MY_NAME)); ORTE_NAME_PRINT(ORTE_PROC_MY_NAME));