Show an example of using the btl_usnic_connectivity_map option. Also,
mention that another reason for the "total connectivity failure" may
be due to asymmetric / unexpected routing.
Reviewed by Dave Goodell.
cmr=v1.8.2:reviewer=ompi-rm1.8
This commit was SVN r32465.
The contrib/check-help-strings.pl gets confused if the topic is an
inline logic check, so separate it into two calls to show_help.
This commit was SVN r32455.
These messages were committed in the v1.8 branch in r32341, but were
never committed to the trunk (because we were waiting for the OPAL BTL
move). This commit brings the trunk and v1.8 help messages in line
with each other.
This commit was SVN r32445.
The following SVN revision numbers were found above:
r32341 --> open-mpi/ompi@5e752b4aba
Ensure that the connectivity checker agent only uses pointers from the
client that is the same process as the agent.
Not necessary for the v1.8 branch -- this is a trunk/v1.9-only problem.
This commit was SVN r32438.
Make the del_procs, module finalize, and endpoint destructors be the
same between trunk and v1.8, with one exception: the very beginning of
v1.8 module_finalize calls del_procs for each proc to simulate/pretend
the trunk/v1.9 PML behavior of calling del_procs before module_finalize.
This commit was SVN r32437.
Fixes an assertion failure in --enable-debug builds and SEGVs in normal
builds.
I'm not 100% sure I like this model, but it at least seems to be
consistent. Some variation on this scheme will need to be adapted to
the trunk, where usnic_del_procs() is called by the PML instead of
internally in usnic_finalize().
A related bug (but with different mechanics) is #4832.
This commit was SVN r32424.
Per patch from aryzhikh, initialize a copule of fields in the openib ini struct
Hard to understand why this has sat there for so long...
cmr=v1.8.2:reviewer=rhc:subject=initialize a copule of fields in the openib ini struct
This commit was SVN r32417.
The following Trac tickets were found above:
Ticket 4377 --> https://svn.open-mpi.org/trac/ompi/ticket/4377
Previously, the connectivity agent was pretty dumb: it took whatever
pings it got and ACKed them. Then we added an agent check to ensured
that the ping actually came from the source interface that it said it
came from. Now we add another check such that when a ping is received
on interface X that corresponds to usnic module Y, we ensure that the
source interface of the ping is on the all_endpoints list for module Y
(i.e., module Y expects to be able to talk to that peer interface).
This detects cases where peers have come to different conclusions
about which interfaces should be used to communicate (which is bad!).
This usually reflects a network misconfiguration.
Fixes CSCuq05389.
This commit was SVN r32383.
Ensure that incoming "ping" messages came from the IP address that
they think they came from. If they don't, drop them (because it is
probably routing error), which will likely eventually cause the
connectivity checker to timeout, and therefore cause the job to abort.
This commit was SVN r32368.
Update compat.h to only handle compatability between v1.7/v1.8 and
v1.9/2.0 (i.e., the current trunk). Remove what seems to be the last
vestiages of OMPI/ORTE pollution in the now-OPAL-ized usnic BTL.
Currently use a hard-coded constant for the MCW size (i.e.,
MPI_COMM_WORLD size) for some initialization values in the v1.9/2.0
series; still need to figure out something better there.
This commit was SVN r32365.
Adapt to moving down to OPAL: find a PML-registered error callback,
and use that when we don't have a module context and we need to abort.
Failing that, just call exit().
This commit was SVN r32362.
If --with-usnic is specified and we can't build the usnic BTL, abort.
If --without-usnic is specified, gracefully skip building the usnic
BTL. If neither is specified, do the OMPI-default behavior: try to
configure/build the usnic BTL, and if we can't, skip it.
Fixes CSCuq13889.
This commit was SVN r32349.
WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL
All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.
This commit was SVN r32317.