Per https://github.com/open-mpi/ompi/issues/3995, it should not be a
fatal error if the libnl checks fail. Instead, just fail the check
and let the upper layer decide what to do. In this case,
OPAL_CHECK_PACKAGE will mark this library as no good, and then
propagate that upward.
E.g., if libfoo fails the libnl check, and the user had specified
--with-libfoo, this will eventually cause configure to fail (because
the libnl check will fail with libfoo, which will cause
OPAL_CHECK_PACKAGE to fail with libfoo, which will ultimately cause
some upper-level logic to realize "a human asked for libfoo but we
could not provide it -- abort!").
However, if libfoo fails the libnl check and the user did *not*
specify --with-libfoo, then this will cause the upper layer to
silently skip libfoo (because the libnl check will fail libfoo, which
will cause OPAL_CHECK_PACKAGE to fail libfoo, but then the upper-level
logic will realize "oh, we can't use libfoo, but a human didn't ask
for it -- so just skip libfoo support.").
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Make sure hostnames are null terminated, even when they were
too long to fit in the hostname buffer.
Fixes: CID 1418232
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Be a little more deliberate about convering OMPI's --with-cuda CLI
value to hwloc's --enable-cuda configure option.
Also, unconditionally disable hwloc NVML support (because Open MPI is
not currently using it).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
purpose. Continue to link the new library back to libopen-pal to resolve the renamed symbols.
Update opal configure logic to set disable_dlopen when disable_mca_dso is given. Fix typos in disable_dlopen when setting variables (incorrect inclusion of quotes)
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
This allows mtl_ofi_provider_include to work with layered providers as well.
e.g. --mca mtl_ofi_provider_include "providerX;ofi_rxm"
Signed-off-by: yohann <yohann.burette@intel.com>
Add test suite for netlink and weighted reachable components. We
don't have a great way of running components through unit tests
today, so make them stand-alone tests that are run with mpirun
and such.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Amazon is going to use the reachable framework to fix some connection
bugs in the TCP BTL, so claim support ownership of the weighted and
netlink components.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Wire up the libnl utilities Jeff and Ralph added previously to
the netlink reachable component so that it actually does work.
The algorithm is a bit simplistic, but should work for our use
cases. If there's a route, assume the two interfaces can talk.
If there's no gateway, assume the two interfaces are in the
same subnet, and give preference to that connection. If there's
a gateway, assume there's a route, but the interfaces are not
in the same subnet.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
The netlink component's libnl wrapper code returned the
next hop in the route table to allow the calling code
to differentiate between same and different networks,
which is a fine comparison for IPv4, but is pretty
expensive for IPv6 (coming soon to a netlink component
near you). Rather than provide extra information
(the address of the next hop), just provide whether
there is a gateway or not, which is all the netlink
component actually needs.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
The netlink reachable component has never been released in a usable
form, but had code copied from usNIC to support both libnl-1 and
libnl-3. If nothing else, this code was a little buggy in
handling the case where libnl-3 but not libnl-route-3 were
installed. Jeff and I decided to drop libnl-1 support from the
netlink reachable component, given that it's getting pretty old
and the weighted component provides the same information that
the TCP BTL and OOB are using today, so libnl-1 customers won't
see a step backwards from where they are today.
Signed-off-by: Brian Barrett <bbarrett@mazon.com>
Based on work from usNIC, the best way to use the reachability
information the reachable components return is to build a
connectivity graph between the two peers and run a bipartite
graph solver. Rather than returning the "best" pairing,
the reachability framework now returns the entire mapping,
allowing a (soon to be added) graph solver to build the
"optimal" connectivity pairing.
Practically, this means changing the return type of the
reachable() function and rewriting the weighted_reachable()
function to return the full mapping. The netlink_reachable()
function still always returns NULL.
At the same time, fix bit-rot in the weighted component and
enable builds of the component by removing the opal_ignore.
Also, add IPv6 support to the weighted component to support
both use cases in the TCP BTL.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Initialize the reachable framework during opal_init() and tear
it back down during opal_finalize(). The framework was never
used, so the lack of initialization didn't matter, but this is
a required step in using the framework.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Ralph and Jeff created the reachable framework and added the
netlink component based on code copied from the usnic btl.
However, they never renamed all the symbols from the libnl
compatibility code. This patch finishes the rename.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Add a check for link-local IPv6 addresses to the net
interface to support better computation of network
pairings in the weighted reachable component.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>