while cleaning up after receiving a zero byte on the connect socket
(localyy started connection), while another was trying to accept a
new connection from the same peer. Create a zero-timed event and
delocalize the accept into a timer_event.
Add support for registering an error callback, that can be used when a
connection is discovered as failed during the initialization process.
was quite subtle, and only happened on the process with the smallest
guid (as this process will tear down the connection created locally and
replace it with the result of accept). If multiple threads are active in
the system, the deadlock occurs during the recv event deletion as one
thread will hold the recv event lock of the endpoint and try to access
the TCP event base lock, while the other thread will hold the TCP event
base lock while trying to access the recv event lock (in case data is
available on the socket).
The proposed solution let the event callback fail to process the data,
preventing the deadlock and allowing the other thread to always complete
it's job. As the event is not execute the same triggered will trigger
again at the next opportunity, so this solution introduce a minimal
delay in the connection establishement.
Thanks to Nathan for pointing out that I missed snipping one line in
2f9c69f016751954bc8384927ce7878d9882b56c (I removed the trailing
comment, but not the trailing pragma -- oops!).
Retain the hetero-nodes flag for those cases where the user *knows* that there are differences and our automated system isn't good enough to see it.
Will obviously require further refinement as we find out which variances it can detect, and which it cannot.
1. Ensure to override CFLAGS properly. Move the setting of CFLAGS outside the AM_CONDITIONAL so that Automake doesn't get confused (because CFLAGS is already set inside an AM_CONDITIONAL -- moving it outside the conditional ensure that this local CFLAGS override trumps all other CFLAGS overrides).
2. Only build libfabric on Linux. Add a little more configury to ensure that we only try to build libfabric on Linux.
3. Remove a dead/unused file
4. Fix typo in condition check
5. Use "false", not "/bin/false"
This commit represents the conversion of the usnic BTL from verbs to
libfabric.
For the moment, libfabric is embedded in Open MPI (currently in the
usnic BTL). This is because the libfabric API is still changing, and
also has not yet been released. Ultimately, this embedded copy of
libfabric will likely disappear and the usnic BTL will rely on an
external installation of libfabric.
New configure options:
* --with-libfabric: will cause configure to fail if libfabric support
cannot be built
* --without-libfabric: will prevent libfabric support from being built
* --with-libfabric=DIR: use an external libfabric installation
* --with-libfabric-libdir=LIBDIR: when paired with --with-libfabric=DIR,
use LIBDIR for the libfabric installation library dir
The --with-libnl3[-libdir] arguments are now gone.
- Rename opal_atomic_lifo_t to opal_lifo_t to reflect both atomic and
non-atomic usage. Added new routines (opal_lifo_*_st) for non-atomic
usage as well as routines conditioned off opal_using_threads(). The
atomic versions are always thread safe and the non-atomic are always
not thread safe.
- Add a new atomic lifo implementation that makes use of 128-bit
compare-and-swap. The new implementation should scale better with
larger numbers of threads.
- Add threading unit test for opal_lifo_t.
If OPAL_MODEX_RECV() returns OPAL_ERR_NOT_FOUND, the peer didn't
send any Portals4 BTL info. This is not a fatal error. Instead of
disqualifying the Portals4 BTL just ignore that peer.
@jsquyres reported this in #194.