mtl_btl_ofi_rcache_init() initializes patcher which should only take
place things are single threaded. OFI providers may start spawn threads,
so initialize the rcache before creating OFI objects to prevent races.
Authored-by: John L. Byrne <john.l.byrne@hpe.com>
Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
(cherry picked from commit f1b21cb776)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Added the flag OPAL_OFI_PCI_DATA_AVAILABLE to remove accessing the nic
object in
fi_info when the ofi version does not support that structure.
Signed-off-by: Nikola Dancejic dancejic@amazon.com
(cherry picked from commit ae2a447b0e)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Update the OPAL_CHECK_OFI configury macro:
- Make it safe to call the macro multiple times:
- The checks only execute the first time it is invoked
- Subsequent invocations, it just emits a friendly "checking..."
message so that configure output is sensible/logical
- With the goal of ultimately removing opal/mca/common/ofi, rename the
output variables from OPAL_CHECK_OFI to be
opal_ofi_{happy|CPPFLAGS|LDFLAGS|LIBS}.
- Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions.
- Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that
causes the macro to be invoked at a fairly random time, which makes
configure stdout confusing / hard to grok.
- Remove a little left-over kruft in OPAL_CHECK_OFI, too (which
resulted in an indenting change, making the change to
opal_check_ofi.m4 look larger than it really is).
Thanks Alastair McKinstry for the report and initial fix.
Thanks Rashika Kheria for the reminder.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f5e1a672cc)
NOTE: This patch was cherry-picked into the v4.0.x branch as 9ad871fc,
but the OFI BTL changes were skipped, because the OFI BTL was not in
the v4.0.x branch. This version of the cherry pick brings in the
changes to the OFI BTL.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Adds the capability to select a NIC based on hardware locality.
Creates a list of NICs that share the same cpuset as the process,
then selects the NIC based on the (local rank) % (number of NICs).
If no NICs are available that share the same cpuset, the selection process
will create a list of all available NICs and make a selection based on
(local rank) % (number of NICs)
Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
(cherry picked from commit 167d75b42a)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Change ompi_mtl_ofi_get_endpoint() to call the active PML's
add_procs() rather than the OFI MTL add_procs() directly when
discovering a new process during operation.
Functionally, this has no impact in correct operation. However,
the current behavior means that the heterogenous and active PML
checks are not being executed in the dynamic discovery case.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 64d70b3076)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Per suggestion of @awlauria
Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
(cherry picked from commit ab4875ddc2)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Per suggestion of @awlauria, added some comments about
the need to free ep before resources it points to.
Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
(cherry picked from commit 1bc3dab118)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
This fix is from John L. Byrne (john.l.byrne@hpe.com).
When OFI Libfabric binds objects to endpoints, before the object can
be successfully closed, the endpoint must first be freed. For scalable
endpoints, objects can also be bound to transmit and receive contexts,
and for objects that are bound to contexts, we need to first free the
contexts before freeing the endpoint. We also need to clear the memory
registration cache.
If we don't clean up properly, then fi\_close may not be able to close
the domain because the dom will have a non-zero ref count.
Signed-off-by: harumi kuno <harumi.kuno@hpe.com>
(cherry picked from commit 3095fabf94)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Some versions of Libfabric contain a bug in EFA where FI_REMOTE_COMM and
FI_LOCAL_COMM are not advertised. In order to workaround this, we need to call
fi_getinfo() without those capability bits to see if EFA is available first.
Also move around some of the provider include/exclude list logic so we can skip
this workaround if applicable.
Signed-off-by: Robert Wespetal <wesper@amazon.com>
(cherry picked from commit 49128a7adb)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Make sure to get an RDM provider that can provide both local and
remote communication. We need this check because some providers could
be selected via RXD or RXM, but can't provide local communication, for
example.
Add OPAL_CHECK_OFI_VERSION_GE() m4 macro to check that the Libfabric
we're building against is >= a target version. Use this check in two
places:
1. MTL/OFI: Make sure it is >= v1.5, because the FI_LOCAL_COMM /
FI_REMOTE_COMM constants were introduced in Libfabric API v1.5.
2. BTL/usnic: It already had similar configury to check for Libfabric
>= v1.1, but the usnic component was checking for >= v1.3. So
update the btl/usnic configury to use the new macro and check for
>= v1.3.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 21bc9042e1)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
As discussed in open-mpi/ompi#2519 the common component does not depend
on libfabric yet. This commit introduces this dependency by just calling
fi_version().
Signed-off-by: guserav <erik.zeiske@hpe.com>
(cherry picked from commit 8a67a95c99)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
The changes made in f5e1a672cc
have been done after the common/ofi component was removed and thus the
component doesn't reflect the changes made their.
Namely f5e1a672cc changed:
- How to call OPAL_CHECK_OFI (It sets opal_ofi_happy to yes now)
- Dropped the common part in the build flags for ofi
Signed-off-by: guserav <erik.zeiske@web.de>
(cherry picked from commit 0e25c95eae)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Updated the OFI MTL's Recv cancel to be a non-blocking call to match
the MPI spec. Given fi_cancel succeeded, then it is expected that the
user will wait on the request to read the result of if the cancel has
completed.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com
(cherry picked from commit 25bdd118ac)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
For the non thread-grouping paths, only the first (0th) OFI context
should be used for communication. Otherwise this would access a non existant
array item and cause segfault.
While at it, clarifiy some content regarding SEPs in README (Credit to Matias Cabral
for README edits).
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 6edcc479c4)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Replace all tabs with spaces. No code or logic changes.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit b556cabfe9)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Replace all tabs with spaces. No code or logic changes.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit aba2571881)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
It doesn't seem like the BTL was using uninitialized pointer. But simply
setting the rcache pointer to NULL after destroying it makes the valgrind
errors go away.
Fixes Issue #6345
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 786e686d43)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
When we exceed the threshold number of contexts created, print appropriate help
text
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 9cabcfdbba)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Provide the av_attr.count hint (number of addresses that will be
inserted into the address vector through the life of the process)
at initialization of the address vector. It's ok to be a bit
wrong, but some endpoints (RxR) can benefit by not going through
the slow growth realloc churn.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 44be7f139a)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
With MTLs, there's no "other transport" when the remote side
does not have an active NIC, so we should print a useful error
message when the modex failed (indicating lack of a NIC on
the remote side).
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit fe25097194)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Moving to a model where we have users actively _enable_ SEP feature for use
rather than opening SEP by default if provider supports it. This allows us to
not regress (either functionally or for performance reasons) any apps that were
working correctly on regular endpoints.
Also, providing MCA to specify number of OFI contexts to create and default
this value to 1 (Given btl/ofi also creates one by default, this reduces the
incidence of a scenario where we allocate all available contexts by default and
if btl/ofi asks for one more, then provider breaks as it doesn't support it).
While at it, spruce up README on SEP content.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 37f9aff2a0)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
-> Added new targets in Makefile.am to call a new build script
generate-opt-funcs.pl to generate specialized functions for
each *.pm file.
-> Added new perl module *.pm files for send,isend,irecv,iprobe,improbe
which are loaded by generate-opt-funcs.pl to create new source files
that correspond to the name of the .pm file to be used as part of
MTL OFI.
-> Added mtl_ofi_opt.pm.template and updated README with details on the
specialization features and how to add additional specialization
support.
-> Added new opt_common/mtl_ofi_opt_common.pm containing common
functions for generating the specialized functions used by
all other *.pm modules.
-> Added new mtl_ofi.h which includes the definitions for the
function symbol table for storing the specialized functions along
with the definitions for the initialization functions for the
corresponding function pointers.
-> Based off the OFI provider capabilities the specialized function
pointers are assigned at mtl_ofi_component_init to the corresponding
MTL OFI function.
-> mca_mtl_ofi_module_t has been updated with the symbol table
struct which is assigned at component init.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit bef5f50a42)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
For cases when the number of local processes is greater than the number of
available contexts, the SEP initialization phase would calculate the number of
contexts to provision for each rank to be 0 and would eventually crash.
Fix the issue here by using regular endpoints in the event the number of local
processes is more than available contexts. This fixes issue #6182.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit e5e19dfcf7)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Commit 109d0569ff introduced a crash when an error occurred
before ofi_ctxt was allocated, including when no providers
passed the selection logic. Properly check that the pointer
is not NULL in the error cleanup code before dereferencing
the pointer.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 6e15128d96)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
OFI MTL supports OFI Scalable Endpoints feature as means to improve
multi-threaded application throughput and message rate. Currently the feature
is designed to utilize multiple TX/RX contexts exposed by the OFI provider in
conjunction with a multi-communicator MPI application model. For more
information, refer to README under mtl/ofi.
Reviewed-by: Matias Cabral <matias.a.cabral@intel.com>
Reviewed-by: Neil Spruit <neil.r.spruit@intel.com>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 109d0569ff)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
When an application is not using multiple threads to call into MPI, we can
safely ask for FI_THREAD_DOMAIN setting from the provider as it should
translate to the least amount of locking in provider.
Conversely, for applications using THREAD_MULTIPLE, explicitly ask for
FI_THREAD_SAFE to prevent race conditions.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 5cbcae79d8)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
The 2 sided communication support is added for non-tagmatching provider
to take advantage of this BTL and PML OB1. The current state is
"functional" and not optimized for performance.
Two sided support is disabled by default and can be turned on by mca
parameter: "mca_btl_ofi_mode".
Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
(cherry picked from commit 080115d440)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
OFI providers may reserve some of the upper bits of the tag for
internal usage and expose it using mem_tag_format. Check for that
and adjust communicator bits as needed.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit d996f529c0)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Looks like this script was left over from quite a long time ago, and
was expecting CLI params from the "old"-style Automake test engine.
Update it to look for `--test-name` to get the test name, and update a
few other minor style things.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit e8277d9d06)
As discussed, a feature is being added to libpsm2 to correctly handle
the case where the library is opened by multiple OMPI transports in the same
process. (For example, the OFI BTL and the PSM2 MTL).
* Improved error message to indicate required libpsm2 version.
* Adds a test at autogen/configure time for the existence of
PSM2_LIB_REFCOUNT_CAP.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
(cherry picked from commit f10305a49f)
This is a fix based on a bugreport on github/mailing list from CGNS.
The core of the problem was that different processes entered different branches of
our aggregator selection logic, due to the fact that in some cases processes had
a matching file_view size and contiguous chunk size (thus assuming 1-D distribution),
and some processes did not (thus assuming 2-D distribution). The fix is to calculate
the avg. file view size across all processes and use this value, thus ensuring that
all processes enter the same branch.
Fixes issue #7809
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit 4a8a330bba)
- there is new API to detect missing memmory events.
Enabled using of new UCX API to detect missing events
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d6bff6ffbd)
Keep all comments in the user-facing mpi.h.in as "old style" C
comments: /* */. This gives us maximum portability, just on the off
chance that a user's C compiler does not support //-style comments.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit d522c27037)
1. __STDC_VERSION__ isn't necessarily defined (e.g., by C++
compilers). So check to make sure it is defined before we actually
check the value.
2. If we're in C++11 (or later), use static_assert().
3. Split the static assert macro in two macros:
* THIS_SYMBOL_WAS_REMOVED_IN_MPI30(...): Insert a valid expression
(i.e., 0, because it's only used with MPI_Datatype values, and
since MPI_Datatype is a pointer, 0 is a valid RHS expression)
before invoking the static assert so that we don't get a syntax
error instead of the actual static assert error.
* THIS_FUNCTION_WAS_REMOVED_IN_MPI30(...): No need for the valid
expression; just invoke the assert functionality.
Also remove an errant "\".
Thanks to Constantine Khrulev and Martin Audet for identifying the
issue and suggesting to use C11's static_assert().
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 835f8f1834)
* Adds the `schizo/jsm` component that detects if the process was
direct launched with IBM's Job Step Manager (JSM). JSM is a PMIx
enhanced runtime environment so flag it as such.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 4f1de51371)