1
1
openmpi/opal
Nathan Hjelm e6f84e79de btl/uct: fix deadlock in connection code
This commit fixes a deadlock that can occur when using a TL that
supports the connect to endpoint model. The deadlock was occurring
while processing an incoming connection requests. This was done from
an active-message callback. For some unknown reason (at this time)
this callback was sometimes hanging. To avoid the issue the connection
active-message is saved for later processing.

At the same time I cleaned up the connection code to eliminate
duplicate messages when possible.

This commit also fixes some bugs in the active-message send path:

 - Correctly set all fragment fields in prepare_src.

 - Fix bug when using buffered-send. We were not reading the return
   code correctly (which is in bytes). This resulted in a message
   getting sent multiple times.

 - Don't try to progress sends from the btl_send function when in an
   active-message callback. It could lead to deep recursion and an
   eventual crash if we get a trace like
   send->progress->am_complete->ob1_callback->send->am_complete...

Closes #5820
Closes #5821

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 707d35deeb)
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2018-10-16 19:16:11 -06:00
..
class opal/fifo: fix 128-bit atomic fifo on Power9 2018-07-10 15:37:11 -06:00
datatype opal/dataype: add additional interface to retrieve more details about 2018-06-21 09:25:50 -05:00
dss Update ORTE to support PMIx v3 2018-03-02 02:00:31 -08:00
etc Correct the comment in the default MCA param template - we do not support a param called "component_path". The correct syntax is "mca_base_component_path" 2018-01-05 08:46:44 -08:00
include Complete job control integration 2018-08-20 16:08:54 -07:00
mca btl/uct: fix deadlock in connection code 2018-10-16 19:16:11 -06:00
memoryhooks opal: rename opal_atomic_init to opal_atomic_lock_init 2017-08-07 14:15:11 -06:00
runtime opal/progress: protect against multiple threads in event base 2018-09-21 14:40:08 -05:00
test/reachable reachable: add tests 2017-09-19 19:42:54 -07:00
threads opal/thread: Added keyword opal_thread_local for TLS. 2018-06-14 13:25:04 -07:00
tools Revert "Update to sync with OMPI master and cleanup to build" 2016-11-22 15:03:20 -08:00
util snprintf() length fix for info 2018-09-21 14:47:11 -05:00
win32 opal: standardize on max hostname length 2016-04-24 08:19:47 +02:00
common_sym_whitelist.txt opal: add code patcher framework 2016-04-13 17:16:13 -06:00
Makefile.am opal: remove generated asm code 2017-08-03 09:18:58 -06:00
win_makefile Purge whitespace from the repo 2015-06-23 20:59:57 -07:00