- increase number of max segments to allow application be launched
on some Ubuntu configurations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit f742f289ea)
Automake's Fortran compilation rules inexplicably use CPPFLAGS and
AM_CPPFLAGS. Unfortunately, this can cause problems in some cases
(e.g., picking up already-installed mpi.mod in a system-default
include search path).
So in relevant module-using Fortran compilation Makefile.am's, zero
out CPPFLAGS and AM_CPPFLAGS.
This has a side-effect of requiring that we compile the one .c file in
the F08 library in a new, separate subdirectory (with its own
Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit ab398f4b9a)
These -D's are for C compilation, not Fortran compilation. Remove
this useless statement.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f4a47a5a8e)
Both opal_hwloc_base_get_relative_locality() and _get_locality_string()
iterate over hwloc levels to build the proc locality information.
Unfortunately, NUMA nodes are not in those normal levels anymore since 2.0.
We have to explicitly look a the special NUMA level to get that locality info.
I am factorizing the core of the iterations inside dedicated "_by_depth"
functions and calling them again for the NUMA level at the end of the loops.
Thanks to Hatem Elshazly for reporting the NUMA communicator split failure
at https://www.mail-archive.com/users@lists.open-mpi.org/msg33589.html
It looks like only the opal_hwloc_base_get_locality_string() part is needed
to fix that split, but there's no reason not to fix get_relative_locality()
as well.
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit ea80a20e10)
Fix the C types for the following:
* MPI_UNWEIGHTED
* MPI_WEIGHTS_EMPTY
* MPI_ARGV_NULL
* MPI_ARGVS_NULL
* MPI_ERRCODES_IGNORE
There is lengthy discussion on
https://github.com/open-mpi/ompi/pull/7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree). We erroneously had several of these array-like sentinel
values be "array-like" values in C. E.g., MPI_ERRCODES_IGNORE was an
(int *) in C while its corresponding Fortran type was "integer,
dimension(1)". On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int*)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.
That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.
(**) NOTE: This code has been wrong for years. This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the *addresses* of these instances,
not their *types* (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same. But it caused linker warning messages,
and even caused errors in some cases.
Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.
This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):
% mpifort -fuse-ld=lld /dev/null
ld.lld: error: corrupt input file: version definition index 0 for symbol
mpi_fortran_argv_null_ is out of bounds
>>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so
...
If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.
This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c0073dd).
Fixesopen-mpi/ompi#7209
Signed-off-by: Fangrui Song <i@maskray.me>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 5609268e90)
We tried doing an RC2 built without updating the greek,
and found where that failed in build automation.
Reving again for rc3, as we've already applied the rc2 tag.
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
* Additionally, fixes the `NULL` option to `OMPI_MCA_plm_rsh_agent`
would would also lead to a segv. Now it operates as intended by
disqualifying the `rsh` component and falling back onto the `isolated`
component.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 62d0058738)
Improves the performance when excess non-blocking operations are posted
by periodically calling progress on ucx workers.
Co-authored with:
Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 1b58e3d073)
This PR removes the constant defining the max attachment address and
replaces it with the largest address that shows up in /proc/self/maps.
This should address issues found on AARCH64 where the max address
may differ based on the configuration.
Since the calculated max address may differ between processes the
max address is sent as part of the modex and stored in the endpoint
data.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 728d51f9f3)
This commit fixes an issue discovered in the XPMEM registration cache. It
was possible for a registration to be invalidated by multiple threads
leading to a double-free situation or re-use of an invalidated registration.
This commit fixes the issue by setting the INVALID flag on a registation
when it will be deleted. The flag is set while iterating over the tree
to take advantage of the fact that a registration can not be removed
from the VMA tree by a thread while another thread is traversing the VMA
tree.
References #6524
References #7030Closes#6534
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit f86f805be1)
There are cases where the same interval may be in the tree multiple
times. This generally isn't a problem when searching the tree but
may cause issues when attempting to delete a particular registration
from the tree. The issue is fixed by breaking a low value tie by
checking the high value then the interval data.
If the high, low, and data of a new insertion exactly matches an
existing interval then an assertion is raised.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 1145abc0b7)
Until now sqrt(n) was missed as a factor for odd square numbers n. This
lead to suboptimal results of MPI_Dims_create for input numbers like 9,
25, 49, ... Fix the results by including sqrt(n) in the search for
factors.
Refs: #7186
Signed-off-by: Michael Lass <bevan@bi-co.net>
(cherry picked from commit 67490118ad)
Compiling OMPI on cray systems using latest Cray compilers (clang based)
yielded some compiler warnings from ompio/lustre. Squash these warnings.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit e66a7cef11)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>