7da1c4b875
In short applications, it's possible that the agent (i.e., local rank 0) will finalize after non-local rank 0 procs detect the connectivity checker named socket, but before they complete a connect() on it. As such, their connect() gets ECONNREFUSED. This commit adds a simple counter in the agent that won't let it quit before it accept()'s from all local procs, or 10 seconds goes by (whichever occurs first). This is similar to the timeout for the clients: they'll exit if they don't see the expected named socket within 10 seconds. |
||
---|---|---|
.. | ||
base | ||
openib | ||
portals4 | ||
scif | ||
self | ||
sm | ||
smcuda | ||
tcp | ||
template | ||
ugni | ||
usnic | ||
vader | ||
btl.h | ||
Makefile.am |