Advance to hwloc-2.1.0rc2-33-g38433c0f, which includes a .gitignore
update that we want here in Open MPI.
Be warned; this is actually 33 commits beyond the hwloc v2.1.0 tag.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This PR removes the constant defining the max attachment address and
replaces it with the largest address that shows up in /proc/self/maps.
This should address issues found on AARCH64 where the max address
may differ based on the configuration.
Since the calculated max address may differ between processes the
max address is sent as part of the modex and stored in the endpoint
data.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
The opal_gethostname() function provides a more robust mechanism
to retrieve the hostname than gethostname(), which can return
results that are not null-terminated, and which can vary in its
behavior from system to system.
opal_gethostname() just returns the value in opal_process_info.nodename;
this is populated in opal_init_gethostname() inside opal_init.c.
-Changed all gethostname calls in opal subtree to opal_gethostname
-Changed all gethostname calls in orte subtree to opal_gethostname
-Changed all gethostname calls in ompi subdir to opal_gethostname
-Changed all gethostname calls in oshmem subdir to opal_gethostname
-Changed opal_if.c in test subdir to use opal_gethostname
-Changed opal_init.c to include opal_init_gethostname. This function
returns an int and directly sets opal_process_info.nodename per
jsquyres' modifications.
Relates to open-mpi#6801
Signed-off-by: Charles Shereda <cpshereda@lanl.gov>
Make sure to get an RDM provider that can provide both local and
remote communication. We need this check because some providers could
be selected via RXD or RXM, but can't provide local communication, for
example.
Add OPAL_CHECK_OFI_VERSION_GE() m4 macro to check that the Libfabric
we're building against is >= a target version. Use this check in two
places:
1. MTL/OFI: Make sure it is >= v1.5, because the FI_LOCAL_COMM /
FI_REMOTE_COMM constants were introduced in Libfabric API v1.5.
2. BTL/usnic: It already had similar configury to check for Libfabric
>= v1.1, but the usnic component was checking for >= v1.3. So
update the btl/usnic configury to use the new macro and check for
>= v1.3.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This commit fixes an issue discovered in the XPMEM registration cache. It
was possible for a registration to be invalidated by multiple threads
leading to a double-free situation or re-use of an invalidated registration.
This commit fixes the issue by setting the INVALID flag on a registation
when it will be deleted. The flag is set while iterating over the tree
to take advantage of the fact that a registration can not be removed
from the VMA tree by a thread while another thread is traversing the VMA
tree.
References #6524
References #7030Closes#6534
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
There are cases where the same interval may be in the tree multiple
times. This generally isn't a problem when searching the tree but
may cause issues when attempting to delete a particular registration
from the tree. The issue is fixed by breaking a low value tie by
checking the high value then the interval data.
If the high, low, and data of a new insertion exactly matches an
existing interval then an assertion is raised.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Rename the component to be "hwloc2" (since it can now be any v2.x.y
version of hwloc), and make the embedded copy of hwloc be a git
submodule.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Clarify in README what --with-hwloc does in its different use cases.
Also, ensure that the behavior when specifying `--with-hwloc` is the
same as if that option is not specified at all. This is what we did
in Open MPI <= v3.x; looks like we inadvertantly caused `--with-hwloc`
to be synonymous with `--with-hwloc=external` in v4.0.0.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
We have decided to show interfaces that are identical to itself as
reachable. This is consistent with the previous netmask logic when
determining reachability.
Signed-off-by: William Zhang <wilzhang@amazon.com>
Due to the way netlinks detects reachability, it will not show an
interface as reachable to itself, even if it can pass through a loopback
interface. To maintain similar behavior with netmasks, we display an
interface as reachable to itself.
Signed-off-by: William Zhang <wilzhang@amazon.com>
Both opal_hwloc_base_get_relative_locality() and _get_locality_string()
iterate over hwloc levels to build the proc locality information.
Unfortunately, NUMA nodes are not in those normal levels anymore since 2.0.
We have to explicitly look a the special NUMA level to get that locality info.
I am factorizing the core of the iterations inside dedicated "_by_depth"
functions and calling them again for the NUMA level at the end of the loops.
Thanks to Hatem Elshazly for reporting the NUMA communicator split failure
at https://www.mail-archive.com/users@lists.open-mpi.org/msg33589.html
It looks like only the opal_hwloc_base_get_locality_string() part is needed
to fix that split, but there's no reason not to fix get_relative_locality()
as well.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
The clang compiler (or at least the one used by the Cray CCE 9 and newer)
doesn't like what we're doing passing non _Atomic pointers to C11 atomics.
Fix the ones that keep vader from compiling using Cray CCE 9 and 10.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
This patch fixes#7147 by preventing overflow when multiplying
the count and the blocklen. The count reflects MPI count and is
therefore bound to the size of an int (it is an uint32_t) while the
blocklen can be merged together to represent the largest contiguous
memory layout and it is therefore promoted to a size_t.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
related to #7128
The UCX crew is no longer guaranteeing that the UCT API is going to be frozen,
so this is kind of a whack-a-mole problem trying to keep the BTL UCT working
with various changing UCT APIs.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
This commit fixes a configure bug that caused flow control to be
disabled regardless of the configure options used.
Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>