We had several problems in the old code:
1. We were specifying an arbitrary timeout (100 ms) and then abandoning
all remaining pending AV insert operations. We would then free the
endpoint buffer that we gave to fi_av_insert(), usually causing
libfabric's progress thread to write to a freed buffer.
2. We were claiming in a show_help message that the timeout was
controllable via an MCA parameter. This commit removes that
parameter, since there's no good method for us to specify a timeout
like this to libfabric right now.
3. We also weren't waiting for the correct number of fi_av_insert()
operations to complete. We were waiting for nprocs, which is
accidentally fine for 2 procs on separate hosts, but not for most
other proc counts.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
There was a bug in the openib btl handling this valid sequence of
calls:
desc = btl_alloc ();
btl_free (desc);
When triggered the bug would cause either fragment loss or undefined
behavior (SEGV, etc). The problem occured because btl_alloc contained
the logic to modify the pending fragment (length, etc) and these
changes were not corrected if the fragment was freed instead of sent.
To fix this issue I 1) moved some of the coalescing logic to the
btl_send function, and 2) retry the coalesced fragment on btl_free
if it was never sent. This appears to completely address the issue.
For systems with OFED's lacking XRC support, commit b3617e73
broke the build of the openib btl. This commit addresses
the issues introduced by this commit.
This was more complicated than I would like, but it's just an
unfortunate GCC/clang difference. I don't have access to all the C
compilers out there, so this may still have problems with other
compilers that implement some form of `#pragma GCC diagnostic` support
but don't actually behave the same as some versions of GCC.
fixes#323
Use the pkg-config related m4 functions to find out where
Cray's xpmem.h and libxpmem are located on a system.
With this commit, there is no longer any need to have to
explicitly indicate an xpmem install location on the configure
line, at least for Cray systems running CLE 4.X and 5.X.
Replace temporary environment variables with a MCA
parameter for the ugni btl. A user wishing to
use the ugni btl async. progress thread needs to
set the request_progress_thread param to true.
For example, using env. variable format:
export OMPI_MCA_btl_ugni_request_progress_thread=1
Verified via testing with unit tests, etc. that
in fact BTE TX descriptors using CQs configured to
generate IRQs were in fact working correctly on Cray XC. Disable
send message back to self and just use IRQs generated
by completion of TX descriptors posted to BTE.
Honor enable_mpi_threads setting to enable the ugni btl
async progress thread. If the app doesn't request thread-multiple
the thread will not be created.
- by default allow to register maximum possible (i.e 2 * total_memory)
memory. This beheviour can be turned off using mca parameter
"btl_openib_allow_max_memory_registration"
- In fallback case, use device specific parameters to calulate
memory limit.
Properly test for some dependent libraries; don't just assume
elsewhere in Open MPI's configury will find those libraries. Also
consolidate some CPPFLAGS and clarify some comments.
As pointed out by @ggouaillardet, we were adding some unnecessary -I
flags to CPPFLAFGS when --without-hwloc was being used. This commit
slightly updates the hwloc191 component configury to only add such
things when the component is, in fact, going to be
compiled/installed.
Ensure that the <provider>_happy shell variables are initialized to
0. Without this, the --without-libfabric case would leave them
initialized, resulting in "test: -eq operator expecting a value" kinds
of errors.
while cleaning up after receiving a zero byte on the connect socket
(localyy started connection), while another was trying to accept a
new connection from the same peer. Create a zero-timed event and
delocalize the accept into a timer_event.
Add support for registering an error callback, that can be used when a
connection is discovered as failed during the initialization process.
This is a minor update to
open-mpi/ompi@c52601f0c5.
If we have vsnprintf(), we might as well not have the rest of the
guess_strlen() routine. Also document the nifty trick/behavior of
vsnprintf() that enables this shortcut (it was new to me!).
was quite subtle, and only happened on the process with the smallest
guid (as this process will tear down the connection created locally and
replace it with the result of accept). If multiple threads are active in
the system, the deadlock occurs during the recv event deletion as one
thread will hold the recv event lock of the endpoint and try to access
the TCP event base lock, while the other thread will hold the TCP event
base lock while trying to access the recv event lock (in case data is
available on the socket).
The proposed solution let the event callback fail to process the data,
preventing the deadlock and allowing the other thread to always complete
it's job. As the event is not execute the same triggered will trigger
again at the next opportunity, so this solution introduce a minimal
delay in the connection establishement.
On x86_64 reading a 128-bit value requires multiple instructions.
Under some conditions if the counted pointer counter is read before
the item pointer the fifo can be left in an inconsistent state. This
commit forces the read of the counter to always be read first.
The fifo does not appear to suffer from the same race.
It is possible the compiler can reorder the read of the head item and
the head itself. This could lead to a situation where the item
returned was not really the head item.
Thanks to Nathan for pointing out that I missed snipping one line in
2f9c69f016 (I removed the trailing
comment, but not the trailing pragma -- oops!).