If PMIX_PACKAGE_RANK is available, uses this value to select between multiple
NIC of equal distance between the current process. If this value is not
available, try to calculate it by getting the locality string from each local
process and assign a package_rank. If everything fails, fall back to using
process_id.rank to select the NIC. This last case is not ideal, but has a small
chance of occuring, and causes an output to be displayed to notify that this is
occuring.
Some of the information in master branch is not available for the multi-NIC
patch, such as myprocinfo.rank. This info is used to select between multiple
NIC of equal distance to the process. This adapts the previous commit to work
with the v4.1.x branch.
Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
(cherry picked from commit 8017f12801)
configure was previously failing to check for the fi_info.nic struct because
fabric.h relied on other header files in the ofi/include dir. This adds that
include to CPPFLAGS before running that check so that configure can check for
the struct.
Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
(cherry picked from commit 08e8205fb7)
Added the flag OPAL_OFI_PCI_DATA_AVAILABLE to remove accessing the nic
object in
fi_info when the ofi version does not support that structure.
Signed-off-by: Nikola Dancejic dancejic@amazon.com
(cherry picked from commit ae2a447b0e)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Make sure to get an RDM provider that can provide both local and
remote communication. We need this check because some providers could
be selected via RXD or RXM, but can't provide local communication, for
example.
Add OPAL_CHECK_OFI_VERSION_GE() m4 macro to check that the Libfabric
we're building against is >= a target version. Use this check in two
places:
1. MTL/OFI: Make sure it is >= v1.5, because the FI_LOCAL_COMM /
FI_REMOTE_COMM constants were introduced in Libfabric API v1.5.
2. BTL/usnic: It already had similar configury to check for Libfabric
>= v1.1, but the usnic component was checking for >= v1.3. So
update the btl/usnic configury to use the new macro and check for
>= v1.3.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 21bc9042e1)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Update the OPAL_CHECK_OFI configury macro:
- Make it safe to call the macro multiple times:
- The checks only execute the first time it is invoked
- Subsequent invocations, it just emits a friendly "checking..."
message so that configure output is sensible/logical
- With the goal of ultimately removing opal/mca/common/ofi, rename the
output variables from OPAL_CHECK_OFI to be
opal_ofi_{happy|CPPFLAGS|LDFLAGS|LIBS}.
- Update btl/usnic and mtl/ofi for these new conventions.
- Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that
causes the macro to be invoked at a fairly random time, which makes
configure stdout confusing / hard to grok.
- Remove a little left-over kruft in OPAL_CHECK_OFI, too (which
resulted in an indenting change, making the change to
opal_check_ofi.m4 look larger than it really is).
Thanks Alastair McKinstry for the report and initial fix.
Thanks Rashika Kheria for the reminder.
Updated from master cherry pick: the OFI BTL does not exist on the
v4.0.x branch. Therefore, did not include the OFI BTL changes on
master in this cherry pick.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f5e1a672cc)
This PR renames the common library for OFI libfabric from
libfabric to ofi. There are a number of reasons this
is good to do:
1) its shorter and replaces 9 characters with three for
function names for what may eventually be a fairly extensive interface
2) OFI is the term used for MTL and RML components that use
the OFI libfabric interface
3) A planned OSC component will also use the OFI term.
4) Other HPC libraries that can use OFI libfabric tend to use
the term "ofi" internally and also in their configure options
relevant to OFI libfabric (i.e. MPICH/CH4, Intel MPI, Sandia SHMEM)
There seem to be comments in places in the Open MPI source
code that indicate that this common library will be going away.
Far from it as we will want to be able to share things like
AV objects between OMPI and possibly OSHMEM components that
use the OFI libfabric interface.
This PR also adds a synonym to the --with-libfabric(-libdir)
configury options: --with-ofi and with-ofi-libdir.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>