1
1
openmpi/orte
Wei Zhang 48cca8ab83 oob/tcp: fix a race condition on stop_thread pipe
This patch fix a race condition, which caused the main thread
to sometimes write to a closed pipe.

The following are details:

Currently, during shut down, the main thread will do the following:

1. set listen_thread_action to false.
2. write to stop_thread pipe to tell the listener thread to exit.

The listener thread do the following:

1. call select() to listen to a set of file descriptors with
   a maximum wait time.

2. check listen_thread_action. If it is false, close the stop_thread
   pipe.

The main thread will write to closed pipe, when

1. listener's call to select() finished because maximum wait time reached.

2. main thread set listen_thread_action to false

3. listener thread check listen_thread_action and closed the pipe

4. main thread write to the closed pipe.

This patch address the issue by having the main thread close the pipe
after the listener thread has been joined. This way, main thread
will both write and close the thread, so there is no conflict.

Note This patch was opened directly against v4.1.x branch because
the orte/mca/oob/tcp directory has been removed from master branch.

Signed-off-by: Wei Zhang <wzam@amazon.com>
2020-09-30 18:22:53 +00:00
..
bindings Expose opal_set_using_threads and improve error message on missing ompi_info. 2017-01-19 07:57:58 -05:00
etc Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
include Complete job control integration 2018-08-20 16:08:54 -07:00
mca oob/tcp: fix a race condition on stop_thread pipe 2020-09-30 18:22:53 +00:00
orted Add 'orte_' prefix to noop_mpir_breakpoint_ptr. 2019-09-19 08:47:17 -04:00
runtime Add a compilation flag that adds unwind info to all files that are present in the stack starting from MPI_Init. 2019-04-01 11:10:04 +01:00
test regx: fixed the order of hosts for ranges with different prefixes 2019-02-11 12:06:49 +02:00
tools orterun: remove duplicate code 2019-08-19 15:49:57 -04:00
util Remove stale ORTE code 2019-03-31 11:26:18 -07:00
common_sym_whitelist.txt common syms: whitelist bison-generated common symbols 2016-01-16 03:53:14 -08:00
Doxyfile Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
Makefile.am Purge whitespace from the repo 2015-06-23 20:59:57 -07:00