This commit cleans up some code in the passive target path. The code
used the buffered frag control send path but it is more appropriate to
use the unbuffered one. This avoids checking structures that are
should not be in use in this path.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
It is possible for another thread to process a lock ack before the
peer is set as locked. In this case either setting the locked or the
eager active flag might clobber the other thread. To address this the
flags have been made volatile and are set atomically. Since there is
no a opal_atomic_or or opal_atomic_and function just use cmpset for
now.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes some bugs uncovered during thread testing of
2.0.1rc1. With these fixes the component is running cleanly with
threads.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit changes the sematics of ompi request callbacks. If a
request's callback has freed or re-posted (using start) a request
the callback must return 1 instead of OMPI_SUCCESS. This indicates
to ompi_request_complete that the request should not be modified
further. This fixes a race condition in osc/pt2pt that could lead
to the req_state being inconsistent if a request is freed between
the callback and setting the request as complete.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The original lock_all algorithm in osc/pt2pt sent a lock message to
each peer in the communicator even if the peer is never the target of
an operation. Since this scales very poorly the implementation has
been replaced by one that locks the remote peer on first communication
after a call to MPI_Win_lock_all.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes an issue that can occur if a target gets overwhelmed with
requests. This can cause osc/pt2pt to go into deep recursion with a stack
like req_complete_cb -> ompi_osc_pt2pt_callback -> start -> req_complete_cb
-> ... . At small scale this is fine as the recursion depth stays small but
at larger scale we can quickly exhaust the stack processing frag requests.
To fix the issue the request callback now simply puts the request on a
list and returns. The osc/pt2pt progress function then handles the
processing and reposting of the request.
As part of this change osc/pt2pt can now post multiple fragment receive
requests per window. This should help prevent a target from being overwhelmed.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
It is possible for the start call to complete the requests. For this
reason the module rdma_frag field should be filled in before start is
called. If the request completes the completion callback will reset
the rdma_frag field to NULL. Fixes a bug discovered by @tkordenbrock.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
* mpi/start: fix bugs in cm and ob1 start functions
There were several problems with the implementation of start in Open
MPI:
- There are no checks whatsoever on the state of the request(s)
provided to MPI_Start/MPI_Start_all. It is erroneous to provide an
active request to either of these calls. Since we are already
looping over the provided requests there is little overhead in
verifying that the request can be started.
- Both ob1 and cm were always throwing away the request on the
initial call to start and start_all with a particular
request. Subsequent calls would see that the request was
pml_complete and reuse it. This introduced a leak as the initial
request was never freed. Since the only pml request that can
be mpi complete but not pml complete is a buffered send the
code to reallocate the request has been moved. To detect that
a request is indeed mpi complete but not pml complete isend_init
in both cm and ob1 now marks the new request as pml complete.
- If a new request was needed the callbacks on the original request
were not copied over to the new request. This can cause osc/pt2pt
to hang as the incoming message callback is never called.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
* osc/pt2pt: add request for gc after starting a new request
Starting a new receive may cause a recursive call into the pt2pt
frag receive function. If this happens and the prior request is
on the garbage collection list it could cause problems. This commit
moves the gc insert until after the new request has been posted.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
* Remodel the request.
Added the wait sync primitive and integrate it into the PML and MTL
infrastructure. The multi-threaded requests are now significantly
less heavy and less noisy (only the threads associated with completed
requests are signaled).
* Fix the condition to release the request.
Before this commit, a same PML tag may be used for distinct
communications for long messages. For example, consider a condition
where rank A calls ```MPI_PUT``` targeting rank B and rank B calls
```MPI_GET``` targeting rank A simultaneously.
A PML tag for the ```MPI_PUT``` is acquired on rank A and is used
for the long-message communication from rank A to rank B.
A PML tag for the ```MPI_GET``` is acquired on rank B and is used
for the long-message communication from rank A to rank B.
These two tags may become a same value because they are managed
independently on each rank. This will cause a data corruption.
This commit separates the tag used in a single RMA communication
call, one for communication from an origin to a target, and one
for communication from a target to an origin. A "base" tag
is acquired using ```get_tag``` function and PML tag is caluculated
from the base tag by ```tag_to_target``` and ```tag_to_origin```
function.
Fix CID 1324726 (#1 of 1): Free of address-of expression (BAD_FREE):
Indeed, if a lock conflicts with the lock_all we will end up trying to
free an invalid pointer.
Fix CID 1328826 (#1 of 1): Dereference after null check (FORWARD_NULL):
This was intentional but it would be a good idea to check for
module->comm being non_NULL to be safe. Also cleaned out some checks
for NULL before free().
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes several bugs identified by @ggouaillardet and MTT:
- Fix SEGV in long send completion caused by missing update to the
request callback data.
- Add an MPI_Barrier to the fence short-cut. This fixes potential
semantic issues where messages may be received before fence is
reached.
- Ensure fragments are flushed when using request-based RMA. This
allows MPI_Test/MPI_Wait/etc to work as expected.
- Restore the tag space back to 16-bits. It was intended that the
space be expanded to 32-bits but the required change to the
fragment headers was not committed. The tag space may be expanded
in a later commit.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes several bugs identified by a new multi-threaded RMA
benchmarking suite. The following bugs have been identified and fixed:
- The code that signaled the actual start of an access epoch changed
the eager_send_active flag on a synchronization object without
holding the object's lock. This could cause another thread waiting
on eager sends to block indefinitely because the entirety of
ompi_osc_pt2pt_sync_expected could exectute between the check of
eager_send_active and the conditon wait of
ompi_osc_pt2pt_sync_wait.
- The bookkeeping of fragments could get screwed up when performing
long put/accumulate operations from different threads. This was
caused by the fragment flush code at the end of both put and
accumulate. This code was put in place to avoid sending a large
number of unexpected messages to a peer. To fix the bookkeeping
issue we now 1) wait for eager sends to be active before stating
any large isend's, and 2) keep track of the number of large isends
associated with a fragment. If the number of large isends reaches
32 the active fragment is flushed.
- Use atomics to update the large receive/send tag counters. This
prevents duplicate tags from being used. The tag space has also
been updated to use the entire 16-bits of the tag space.
These changes should also fixopen-mpi/ompi#1299.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a bug that occurs when a post message comes in when
sending complete messages or while waiting for all outgoing messages
to flush. In that case the post message might get incorrecly
associated with the ending sync object.
References open-mpi/ompi#1012
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Fixed CID 1269712, 1269709, 1269706, 1269703, 1269694: Logically dead code
Remove extra NULL check as OMPI_OSC_PT2PT_REQUEST_ALLOC can never set the
request to NULL.
Fixes CID 1269668: Unchecked return value
False positive. Add (void) to indicate we do not care about the return code
from opal_hash_table_get_uint32.
Fixes CID 1324726: Free of address-of expression
Do not free lock if it was not allocated.
Fixes CID 1269658: Free of address-of expression
Never will happen but because op is always a built-in op there is no
reason to retain/release it anyway.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
Fix CID 1324733: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324734: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324735: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324736: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324737: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324751: Memory - illegal accesses (USE_AFTER_FREE)
Fix CID 1324750: (USE_AFTER_FREE)
Fix CID 1324749: Memory - corruptions (USE_AFTER_FREE)
Fix CID 1324748: Memory - illegal accesses (USE_AFTER_FREE)
Fix CID 1324747: (USE_AFTER_FREE)
Fix CID 1324746: Memory - corruptions (USE_AFTER_FREE)
Add missing return on an error path.
Fix CID 1324745: Code maintainability issues (UNUSED_VALUE)
Ignore return code from barrier. It was not being used anyway.
Fix CID 1324738: Null pointer dereferences (FORWARD_NULL)
Fix CID 1324741: Null pointer dereferences (REVERSE_INULL)
module->selected_btl can not be NULL in osc/rdma during normal
operation. Removed the unnecessary NULL check.
Fix CID 1324752: Memory - illegal accesses (USE_AFTER_FREE)
Move ompi_osc_pt2pt_module_lock_remove to before the lock is freed.
Fix CID 1324744: Uninitialized variables (UNINIT)
Fix CID 1324743: Uninitialized variables (UNINIT)
This array is not used unitialized but there is no reason not to use
calloc here to silence the warning.
The following CID is a false positive: 1324742. I will mark it such in
coverity.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit updates osc/pt2pt to allocate peer object as they are
needed rather than all at once. Additionally, to help improve the
memory footprint a new synchronization structure has been added.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
- MPI_Compare_and_swap
- MPI_Fetch_and_op
- MPI_Raccumulate
- MPI_Win_detach
Thanks to Michael Knobloch and Takahiro Kawashima for bringing this
to our attention
This commit fixes a bug identified by MTT that occurred when mixing
passive and active target synchronization. The bugs fixed in this
commit are:
- Do not update incoming fragment counts for any type of unbuffered
control message. These messages are out-of-band and should not be
considered towards the signal counts.
- Complete a change from using received counts to expected counts for
lock, unlock, and flush acks. Part of the change made it into
master before the rest was ready. This was preventing wakeups in
some cases.
- Turn the passive_target_access_epoch module member into a
counter. As long as at least one peer is locked we are in a
passive-target epoch and not an active target one. This fix will
ensure that fragment flags are set appropriately.
fixes#538
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The fragment flush code tries to send the active fragment before
sending any queued fragments. This could cause osc messages to arrive
out-of-order at the target (bad). Ensure ordering by alway sending
the active fragment after sending queued fragments.
This commit also fixes a bug when a synchronization message (unlock,
flush, complete) can not be packed at the end of an existing active
fragment. In this case the source process will end up sending 1 more
fragment than claimed in the synchronization message. To fix the issue
a check has been added that fixes the fragment count if this situation
is detected.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a bug in osc/pt2pt which causes MPI_Win_unlock_all
to hang. The problem was caused by code refactoring that moved the
unlock of the local process to after the loop that waits for unlock
acks. This will cause the code to loop forever waiting on the self
ack.
Fixes#444
Use of the old ompi_free_list_t and ompi_free_list_item_t is
deprecated. These classes will be removed in a future commit.
This commit updates the entire code base to use opal_free_list_t and
opal_free_list_item_t.
Notes:
OMPI_FREE_LIST_*_MT -> opal_free_list_* (uses opal_using_threads ())
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>