We had several problems in the old code:
1. We were specifying an arbitrary timeout (100 ms) and then abandoning
all remaining pending AV insert operations. We would then free the
endpoint buffer that we gave to fi_av_insert(), usually causing
libfabric's progress thread to write to a freed buffer.
2. We were claiming in a show_help message that the timeout was
controllable via an MCA parameter. This commit removes that
parameter, since there's no good method for us to specify a timeout
like this to libfabric right now.
3. We also weren't waiting for the correct number of fi_av_insert()
operations to complete. We were waiting for nprocs, which is
accidentally fine for 2 procs on separate hosts, but not for most
other proc counts.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
There was a bug in the openib btl handling this valid sequence of
calls:
desc = btl_alloc ();
btl_free (desc);
When triggered the bug would cause either fragment loss or undefined
behavior (SEGV, etc). The problem occured because btl_alloc contained
the logic to modify the pending fragment (length, etc) and these
changes were not corrected if the fragment was freed instead of sent.
To fix this issue I 1) moved some of the coalescing logic to the
btl_send function, and 2) retry the coalesced fragment on btl_free
if it was never sent. This appears to completely address the issue.
For systems with OFED's lacking XRC support, commit b3617e73
broke the build of the openib btl. This commit addresses
the issues introduced by this commit.
Use the pkg-config related m4 functions to find out where
Cray's xpmem.h and libxpmem are located on a system.
With this commit, there is no longer any need to have to
explicitly indicate an xpmem install location on the configure
line, at least for Cray systems running CLE 4.X and 5.X.
Replace temporary environment variables with a MCA
parameter for the ugni btl. A user wishing to
use the ugni btl async. progress thread needs to
set the request_progress_thread param to true.
For example, using env. variable format:
export OMPI_MCA_btl_ugni_request_progress_thread=1
Verified via testing with unit tests, etc. that
in fact BTE TX descriptors using CQs configured to
generate IRQs were in fact working correctly on Cray XC. Disable
send message back to self and just use IRQs generated
by completion of TX descriptors posted to BTE.
Honor enable_mpi_threads setting to enable the ugni btl
async progress thread. If the app doesn't request thread-multiple
the thread will not be created.
- by default allow to register maximum possible (i.e 2 * total_memory)
memory. This beheviour can be turned off using mca parameter
"btl_openib_allow_max_memory_registration"
- In fallback case, use device specific parameters to calulate
memory limit.
while cleaning up after receiving a zero byte on the connect socket
(localyy started connection), while another was trying to accept a
new connection from the same peer. Create a zero-timed event and
delocalize the accept into a timer_event.
Add support for registering an error callback, that can be used when a
connection is discovered as failed during the initialization process.
was quite subtle, and only happened on the process with the smallest
guid (as this process will tear down the connection created locally and
replace it with the result of accept). If multiple threads are active in
the system, the deadlock occurs during the recv event deletion as one
thread will hold the recv event lock of the endpoint and try to access
the TCP event base lock, while the other thread will hold the TCP event
base lock while trying to access the recv event lock (in case data is
available on the socket).
The proposed solution let the event callback fail to process the data,
preventing the deadlock and allowing the other thread to always complete
it's job. As the event is not execute the same triggered will trigger
again at the next opportunity, so this solution introduce a minimal
delay in the connection establishement.