1
1
openmpi/opal/mca/btl
Nathan Hjelm cd11fc3081 btl/ugni: fix race condition that causes completions to be dropped
The send code in the ugni btl has an optimization that enables it to
return 1 (fragment gone) in some cases. This optimization involved
removing the btl ownership and callback flags to ensure the fragment
stuck around long enough for its completion flag to be checked. This
works fine for the single-threaded case but not in the multi-threaded
case. It is possible that a fragment will be completed by another
thread while a thread is in mca_btl_ugni_send. This competition can
lead to a leaked fragment, missed callback, or both. To fix the issue
without removing the optimization a reference count has been added to
the fragment. Callbacks and fragment release will not be made until
the fragment reference count has reach 0. The count is incremented
before sending the frag and decremented after the completion flag has
been checked. The fix has been verified to work using a multi-threaded
RMA benchmark with the osc/pt2pt component.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-02 12:14:31 -07:00
..
base Add the ability to send host buffers through one sized staging buffers and CUDA buffers through different sized buffers. Fixes performance issues 2015-07-02 11:11:15 -04:00
openib common syms: whitelist bison-generated common symbols 2016-01-16 03:53:14 -08:00
portals4 btl-portals4: remove unnecessary PtlMDBind result check 2015-12-14 12:09:01 -06:00
scif configury: test portability 2015-12-28 13:58:45 +09:00
self btl: btls are now required to set the send flag if supported 2015-09-10 08:55:54 -06:00
sm btl_sm: add a comment explaining why we rename(2) 2016-01-04 14:51:52 -05:00
smcuda Fix a few more places that utilized CUDA 4.1 checks 2015-10-30 09:43:24 -04:00
tcp btl/tcp: add missing #include <unistd.h> 2015-12-24 14:41:46 +09:00
template Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
ugni btl/ugni: fix race condition that causes completions to be dropped 2016-02-02 12:14:31 -07:00
usnic usnic: minor updates from code review 2016-02-01 11:14:30 -08:00
vader Fix Mellanox copyrights with respect to the following PRs: 2015-12-30 00:12:19 +06:00
btl.h opal/mpool: add support for passing access flags to register 2015-10-05 13:53:55 -06:00
Makefile.am Purge whitespace from the repo 2015-06-23 20:59:57 -07:00