1
1
openmpi/ompi/mca
Nathan Hjelm d7264aa613 osc/pt2pt: various threading fixes
This commit fixes several bugs identified by a new multi-threaded RMA
benchmarking suite. The following bugs have been identified and fixed:

 - The code that signaled the actual start of an access epoch changed
   the eager_send_active flag on a synchronization object without
   holding the object's lock. This could cause another thread waiting
   on eager sends to block indefinitely because the entirety of
   ompi_osc_pt2pt_sync_expected could exectute between the check of
   eager_send_active and the conditon wait of
   ompi_osc_pt2pt_sync_wait.

 - The bookkeeping of fragments could get screwed up when performing
   long put/accumulate operations from different threads. This was
   caused by the fragment flush code at the end of both put and
   accumulate. This code was put in place to avoid sending a large
   number of unexpected messages to a peer. To fix the bookkeeping
   issue we now 1) wait for eager sends to be active before stating
   any large isend's, and 2) keep track of the number of large isends
   associated with a fragment. If the number of large isends reaches
   32 the active fragment is flushed.

 - Use atomics to update the large receive/send tag counters. This
   prevents duplicate tags from being used. The tag space has also
   been updated to use the entire 16-bits of the tag space.

These changes should also fix open-mpi/ompi#1299.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-02-02 12:33:33 -07:00
..
bcol configury: test portability 2015-12-28 13:58:45 +09:00
bml bml r2: fix exclusivity comparison 2015-11-06 13:26:32 -08:00
coll Merge pull request #1321 from jladd-mlnx/topic/add-allgatherv-reduce 2016-01-25 20:46:52 -05:00
common Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
crcp mca/base: add priority output to mca_base_select 2015-10-19 12:32:41 -06:00
fbtl use the actual preadv and pwritev functions if available. That's what the fbtl interfaces have been designed for. 2016-01-07 08:29:17 -06:00
fcoll fix CID 1349739, CID 1349738, CID 1349736 and (probably) CID 1349740 (not entirely sure about the last one, since I don't understand why block[i] is a problem but max_len[i] allocated and treated exactly the same way 1 line later is not). 2016-01-21 08:32:23 -06:00
fs need to check for the parent dir as well, since the file might not exist yet. 2016-01-26 13:49:21 -06:00
io add a new field to the ompio data structure (stripe_count) and set it correctly on pvfs2 and lustre. 2016-01-17 09:48:49 -06:00
mtl mtl-portals4: initialize endpoint nid/pid when using logical mapping 2015-12-22 11:20:18 -06:00
op op/x86: change the owner to Ralph 2015-12-01 15:08:07 -08:00
osc osc/pt2pt: various threading fixes 2016-02-02 12:33:33 -07:00
pml Fix typo in error handling flow. 2016-01-14 22:28:54 +02:00
rte Add pmix120 component, update the error handling functions in the PMIx API. 2015-12-28 23:15:44 +09:00
sbgp configury: test portability 2015-12-28 13:58:45 +09:00
sharedfp will rivist the addproc component later in spring, right now it is constantly in the way of doing my tests. 2016-01-20 15:05:51 -06:00
topo topo/treematch: do not invoke hwloc_topology_{init,load} 2015-12-03 11:24:32 +09:00
vprotocol vprotocol/pessimist: use internal ompi_* insted of MPI_* 2015-11-20 13:46:19 +09:00
Makefile.am Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
mca.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00