This option is problematic, and has never worked in an Open MPI v4.0.x
release tarball. Given that PMIx is now available elsewhere, it isn't
worth fixing this option.
See https://github.com/open-mpi/ompi/issues/6228 for more detail.
NOTE: This is a v4.0.x-specific commit because this option no longer
exists on master because we deleted the entire pmix3x component.
Hence, it's not possible to cherry-pick anything from master back to
the v4.0.x branch.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 447b14061880e218371f9eb0cbe427b8358d45b8)
also add common verbose variable.
Note the verbosity thing is a little tricky owing to the way the MCA frameworks and components are registered and
and initialized. The BTL's are registered/initialized prior to the MTL components even getting registered.
Here's the change in ofi mtl mca parameters. Before commit:
MCA mtl ofi: parameter "mtl_ofi_provider_include" (current value: "psm2", data source: environment, level: 1 user/basic, type: string)
Comma-delimited list of OFI providers that are considered for use (e.g., "psm,psm2"; an empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_exclude.
MCA mtl ofi: parameter "mtl_ofi_provider_exclude" (current value: "shm,sockets,tcp,udp,rstream", data source: default, level: 1 user/basic, type: string)
Comma-delimited list of OFI providers that are not considered for use (default: "sockets,mxm"; empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_include.
After commit:
MCA btl ofi: parameter "btl_ofi_provider_include" (current value: "", data source: default, level: 1 user/basic, type: string, synonym of: opal_common_ofi_provider_include)
Comma-delimited list of OFI providers that are considered for use (e.g., "psm,psm2"; an empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_exclude.
MCA btl ofi: parameter "btl_ofi_provider_exclude" (current value: "shm,sockets,tcp,udp,rstream", data source: default, level: 1 user/basic, type: string, synonym of: opal_common_ofi_provider_exclude)
Comma-delimited list of OFI providers that are not considered for use (default: "sockets,mxm"; empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_include.
MCA mtl ofi: parameter "mtl_ofi_provider_exclude" (current value: "shm,sockets,tcp,udp,rstream", data source: default, level: 1 user/basic, type: string, synonym of: opal_common_ofi_provider_exclude)
Comma-delimited list of OFI providers that are not considered for use (default: "sockets,mxm"; empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_include.
MCA mtl ofi: parameter "mtl_ofi_verbose" (current value: "0", data source: default, level: 3 user/all, type: int, synonym of: opal_common_ofi_verbose)
related to #7755
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 9f1081a07a)
(cherry picked from commit 45b643d0cf)
This bug was first seen in a different product that's using the same
interception code as OMPI. But I think it's potentially in OMPI too.
In my vanilla build of OMPI master on RH8 if I "gdb libopen-pal.so" and
"disassemble intercept_brk", I'm seeing a suspicious extra instruction
in front of PATCHER_BEGIN:
0x00000000000d6778 <+40>: std r2,24(r1) // something gcc put in front
0x00000000000d677c <+44>: std r2,96(r1) // PATCHER_BEGIN's toc_save
0x00000000000d6780 <+48>: nop // NOPs from PATCHER_BEGIN
0x00000000000d6784 <+52>: nop // that get replaced
0x00000000000d6788 <+56>: nop // by instructions that
0x00000000000d678c <+60>: nop // change r2
0x00000000000d6790 <+64>: nop //
Later there are loads from that location like
0x000000000019e0e4 <+132>: ld r2,24(r1)
that make me nervous since that's the pre-updated value.
I believe this is the same thing Nathan is describing way back in a9bc692d
and his solution was to put a second call around each interception, where
the outer call is just
intercept_brk():
PATCHER_BEGIN
_intercept_brk()
PATCHER_END
and the inner call _intercept_brk() is where the bulk of the code goes.
What I'm seeing is that _intercept_brk() is being inlined and probably
negating Nathan's fix. So I want to add __opal_attribute_noinline__ to
restore the fix.
With this commit in place, the disassembly of intercept_brk becomes tiny
because it's no longer inlining _intercept_brk() and the susipicious
early save of r2 is gone. I made the same fix to all the intercept_*
functions, although intercept_brk was the only one that had a suspicious
save of r2.
As far as empirical failures though, we only have those from the non-OMPI
product that's using the same patcher code. I'm not actually getting OMPI
to fail from the above suspicious data being saved in r1+24.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit ddd1f578ec)
mtl_btl_ofi_rcache_init() initializes patcher which should only take
place things are single threaded. OFI providers may start spawn threads,
so initialize the rcache before creating OFI objects to prevent races.
Authored-by: John L. Byrne <john.l.byrne@hpe.com>
Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
(cherry picked from commit f1b21cb776)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Added the flag OPAL_OFI_PCI_DATA_AVAILABLE to remove accessing the nic
object in
fi_info when the ofi version does not support that structure.
Signed-off-by: Nikola Dancejic dancejic@amazon.com
(cherry picked from commit ae2a447b0e)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Update the OPAL_CHECK_OFI configury macro:
- Make it safe to call the macro multiple times:
- The checks only execute the first time it is invoked
- Subsequent invocations, it just emits a friendly "checking..."
message so that configure output is sensible/logical
- With the goal of ultimately removing opal/mca/common/ofi, rename the
output variables from OPAL_CHECK_OFI to be
opal_ofi_{happy|CPPFLAGS|LDFLAGS|LIBS}.
- Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions.
- Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that
causes the macro to be invoked at a fairly random time, which makes
configure stdout confusing / hard to grok.
- Remove a little left-over kruft in OPAL_CHECK_OFI, too (which
resulted in an indenting change, making the change to
opal_check_ofi.m4 look larger than it really is).
Thanks Alastair McKinstry for the report and initial fix.
Thanks Rashika Kheria for the reminder.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f5e1a672cc)
NOTE: This patch was cherry-picked into the v4.0.x branch as 9ad871fc,
but the OFI BTL changes were skipped, because the OFI BTL was not in
the v4.0.x branch. This version of the cherry pick brings in the
changes to the OFI BTL.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Adds the capability to select a NIC based on hardware locality.
Creates a list of NICs that share the same cpuset as the process,
then selects the NIC based on the (local rank) % (number of NICs).
If no NICs are available that share the same cpuset, the selection process
will create a list of all available NICs and make a selection based on
(local rank) % (number of NICs)
Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
(cherry picked from commit 167d75b42a)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Per suggestion of @awlauria
Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
(cherry picked from commit ab4875ddc2)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Per suggestion of @awlauria, added some comments about
the need to free ep before resources it points to.
Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
(cherry picked from commit 1bc3dab118)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
This fix is from John L. Byrne (john.l.byrne@hpe.com).
When OFI Libfabric binds objects to endpoints, before the object can
be successfully closed, the endpoint must first be freed. For scalable
endpoints, objects can also be bound to transmit and receive contexts,
and for objects that are bound to contexts, we need to first free the
contexts before freeing the endpoint. We also need to clear the memory
registration cache.
If we don't clean up properly, then fi\_close may not be able to close
the domain because the dom will have a non-zero ref count.
Signed-off-by: harumi kuno <harumi.kuno@hpe.com>
(cherry picked from commit 3095fabf94)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Make sure to get an RDM provider that can provide both local and
remote communication. We need this check because some providers could
be selected via RXD or RXM, but can't provide local communication, for
example.
Add OPAL_CHECK_OFI_VERSION_GE() m4 macro to check that the Libfabric
we're building against is >= a target version. Use this check in two
places:
1. MTL/OFI: Make sure it is >= v1.5, because the FI_LOCAL_COMM /
FI_REMOTE_COMM constants were introduced in Libfabric API v1.5.
2. BTL/usnic: It already had similar configury to check for Libfabric
>= v1.1, but the usnic component was checking for >= v1.3. So
update the btl/usnic configury to use the new macro and check for
>= v1.3.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 21bc9042e1)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
As discussed in open-mpi/ompi#2519 the common component does not depend
on libfabric yet. This commit introduces this dependency by just calling
fi_version().
Signed-off-by: guserav <erik.zeiske@hpe.com>
(cherry picked from commit 8a67a95c99)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
The changes made in f5e1a672cc
have been done after the common/ofi component was removed and thus the
component doesn't reflect the changes made their.
Namely f5e1a672cc changed:
- How to call OPAL_CHECK_OFI (It sets opal_ofi_happy to yes now)
- Dropped the common part in the build flags for ofi
Signed-off-by: guserav <erik.zeiske@web.de>
(cherry picked from commit 0e25c95eae)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Replace all tabs with spaces. No code or logic changes.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit b556cabfe9)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
It doesn't seem like the BTL was using uninitialized pointer. But simply
setting the rcache pointer to NULL after destroying it makes the valgrind
errors go away.
Fixes Issue #6345
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 786e686d43)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
The 2 sided communication support is added for non-tagmatching provider
to take advantage of this BTL and PML OB1. The current state is
"functional" and not optimized for performance.
Two sided support is disabled by default and can be turned on by mca
parameter: "mca_btl_ofi_mode".
Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
(cherry picked from commit 080115d440)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
- there is new API to detect missing memmory events.
Enabled using of new UCX API to detect missing events
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d6bff6ffbd)
These op codes used to be in bits/ipc.h but were removed in glibc in 2015
with a comment saying they should be defined in internal headers:
https://sourceware.org/bugzilla/show_bug.cgi?id=18560
and when glibc uses that syscall it seems to do so from its own definitions:
https://github.com/bminor/glibc/search?q=IPCOP_shmat&unscoped_q=IPCOP_shmat
So I think using #ifndef and defining them if they're not already defined
using the values from glibc is the best option.
At IBM it was the testing on redhat 8 that found this as an issue
(the opcodes being undefined on the system made the #define HAS_SHMDT
evaluate to false so intercept_shmat / intercept_shmdt were
left undefined so shmat/shmdt memory events went unintercepted).
(cherry picked from commit e8fab058da)
Signed-off-by: Mark Allen <markalle@us.ibm.com>
correctly use strlen(char *) instead of sizeof(char *)
Thanks Georg Geiser for reporting this issue.
Refs. open-mpi/ompi#7772
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit c450b21405)
* `libevent_core.so` contains the core functionality that we depend upon
- `libevent.so` library has been identified as the legacy target.
- `libevent_core.so` exists as far back as Libevent 2.0.5 (oldest supported by OMPI)
* `libevent_pthreads.so` can work with either `-levent` or `-levent_core`
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
* LSF ships a `libevent.so` that is no related to the `libevent.so`
shipped with Libevent.
* Add some checks to the configure logic to detect scenarios where this
conflict can be detected, and provide the user with a descriptive
warning message.
- When detected by `event/external` this is just a warning since
the internal component may be able to be used instead.
- This happens when the user supplies the LSF path via the
`LDFLAGS` envar instead of via `--with-lsf-libdir`.
- When detected by a LSF component and LSF was explicitly requested
then this becomes an error. Otherwise it will just print the warning
and that component will fail to build.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
* This should have been `LDFLAGS` not `LIBS`. Either works, but
`LDFLAGS` is more correct. We should also include `CPPFLAGS`
just in case the header is important to the check.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Keep track of the connected procs in vader_add_procs().
Otherwise, the same rank will reconnect the same shmem
segment (rank 0+...) multiple times instead of the next
one as intended.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit f69c8d6819)
Build was broken by mistake in commit d40662edc41a5a4d09ae690b640cfdeeb24e15a1
Fixes#7362
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 907ad854b4)
Both opal_hwloc_base_get_relative_locality() and _get_locality_string()
iterate over hwloc levels to build the proc locality information.
Unfortunately, NUMA nodes are not in those normal levels anymore since 2.0.
We have to explicitly look a the special NUMA level to get that locality info.
I am factorizing the core of the iterations inside dedicated "_by_depth"
functions and calling them again for the NUMA level at the end of the loops.
Thanks to Hatem Elshazly for reporting the NUMA communicator split failure
at https://www.mail-archive.com/users@lists.open-mpi.org/msg33589.html
It looks like only the opal_hwloc_base_get_locality_string() part is needed
to fix that split, but there's no reason not to fix get_relative_locality()
as well.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit ea80a20e10)
This PR removes the constant defining the max attachment address and
replaces it with the largest address that shows up in /proc/self/maps.
This should address issues found on AARCH64 where the max address
may differ based on the configuration.
Since the calculated max address may differ between processes the
max address is sent as part of the modex and stored in the endpoint
data.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 728d51f9f3)
This commit fixes an issue discovered in the XPMEM registration cache. It
was possible for a registration to be invalidated by multiple threads
leading to a double-free situation or re-use of an invalidated registration.
This commit fixes the issue by setting the INVALID flag on a registation
when it will be deleted. The flag is set while iterating over the tree
to take advantage of the fact that a registration can not be removed
from the VMA tree by a thread while another thread is traversing the VMA
tree.
References #6524
References #7030Closes#6534
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit f86f805be1)
There are cases where the same interval may be in the tree multiple
times. This generally isn't a problem when searching the tree but
may cause issues when attempting to delete a particular registration
from the tree. The issue is fixed by breaking a low value tie by
checking the high value then the interval data.
If the high, low, and data of a new insertion exactly matches an
existing interval then an assertion is raised.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 1145abc0b7)