while cleaning up after receiving a zero byte on the connect socket
(localyy started connection), while another was trying to accept a
new connection from the same peer. Create a zero-timed event and
delocalize the accept into a timer_event.
Add support for registering an error callback, that can be used when a
connection is discovered as failed during the initialization process.
This is a minor update to
open-mpi/ompi@c52601f0c5.
If we have vsnprintf(), we might as well not have the rest of the
guess_strlen() routine. Also document the nifty trick/behavior of
vsnprintf() that enables this shortcut (it was new to me!).
was quite subtle, and only happened on the process with the smallest
guid (as this process will tear down the connection created locally and
replace it with the result of accept). If multiple threads are active in
the system, the deadlock occurs during the recv event deletion as one
thread will hold the recv event lock of the endpoint and try to access
the TCP event base lock, while the other thread will hold the TCP event
base lock while trying to access the recv event lock (in case data is
available on the socket).
The proposed solution let the event callback fail to process the data,
preventing the deadlock and allowing the other thread to always complete
it's job. As the event is not execute the same triggered will trigger
again at the next opportunity, so this solution introduce a minimal
delay in the connection establishement.
Many of these tests were failing due to opal_init() failing in some
cases (because the opal shmem framework needs installed components, so
"make distcheck" would fail these tests because the opal shmem
components were not installed). However, all of these tests seem to
be fine with opal_init_util() -- so let's re-enable these tests.
On x86_64 reading a 128-bit value requires multiple instructions.
Under some conditions if the counted pointer counter is read before
the item pointer the fifo can be left in an inconsistent state. This
commit forces the read of the counter to always be read first.
The fifo does not appear to suffer from the same race.
It is possible the compiler can reorder the read of the head item and
the head itself. This could lead to a situation where the item
returned was not really the head item.
Thanks to Nathan for pointing out that I missed snipping one line in
2f9c69f016751954bc8384927ce7878d9882b56c (I removed the trailing
comment, but not the trailing pragma -- oops!).