d7375ec102
What's happening is that we're holding openib_btl->eager_rdma_lock when we call mca_btl_openib_endpoint_send_eager_rdma() on btl_openib_endpoint.c:1227. This in turn calls mca_btl_openib_endpoint_send() on line 1179. Then, if the endpoint state isn't MCA_BTL_IB_CONNECTED or MCA_BTL_IB_FAILED, we call opal_progress(), where we eventually try to lock openib_btl->eager_rdma_lock at btl_openib_component.c:997. The fix removes this lock altogether. Instead we atomically set local RDMA pointer to prevent other threads to create rdma buffer for the same endpoint. And we increment eager_rdma_buffers_count atomically thus polling thread doesn't need lock around it. This commit was SVN r12369. |
||
---|---|---|
.. | ||
allocator | ||
bml | ||
btl | ||
coll | ||
common | ||
io | ||
mpool | ||
mtl | ||
osc | ||
pml | ||
rcache | ||
topo | ||
win_makefile |