improve configury to check whether icc is handling no long double.
This prevents seeing 100s of messages like this:
icc: command line warning #10148: option '-Wno-long-double' not supported
A similar patch will be needed for pmix.
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
(cherry picked from commit 6df0e53421)
Reduce scatter block and reduce scatter algorithms were hitting
correctness issues for non commutative strided tests. We will revert to
the original default algorithms for those two collectives (basic linear
and non overlapping respectively) in the non commutative op case.
See #8010
Signed-off-by: William Zhang <wilzhang@amazon.com>
(cherry picked from commit 57b95bcb45)
the ofi mtl mrecv was not properly setting the message in/out
arg to MPI_MRECV to MPI_MESSAGE_NULL.
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
(cherry picked from commit e6f81ed6d6)
The ofi_rxm provider is dependent upon the underlying hardware for its
implementation of FI_DELIVERY_COMPLETE. Since this can lead to early
completions, we disable the provider to avoid correctness issues.
This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.
Signed-off-by: William Zhang <wilzhang@amazon.com>
(cherry picked from commit 41acfee2bb)
EFA incorrectly implements FI_DELIVERY_COMPLETE in earlier libfabric
versions. While FI_DELIVERY_COMPLETE would be advertised by the
provider, completions would return too early by not accounting for
bounce buffers on the receive side. This would cause the BTL
to receive early completions that lead to correctness issues.
This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.
Signed-off-by: William Zhang <wilzhang@amazon.com>
(cherry picked from commit a7dcfd9874)
Alter the test to validate misaligned data.
Fixes#7954.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit b6d71aa893)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
The btl/ofi does not currently utilize the common ofi include/exclude
list. Added verification code similar to the mtl/ofi that will check if
the info object is in the include or exclude list. If it isn't in the
include list or is in the exclude list, validate_info will return
OPAL_ERROR. The btl/ofi will no longer pass a provider name as a hint
when calling getinfo, instead filtering the provider during
validate_info.
This patch also moves the is_in_list MTL function into common code and
adds additional debugging output to the BTL to match the MTL standard.
Signed-off-by: William Zhang <wilzhang@amazon.com>
(cherry picked from commit 9b8f463a76)
The default algorithm selections were out of date and not performing
well. After gathering data from OMPI developers, new default algorithm
decisions were selected for:
allgather
allgatherv
allreduce
alltoall
alltoallv
barrier
bcast
gather
reduce
reduce_scatter_block
reduce_scatter
scatter
These results were gathered using the ompi-collectives-tuning package
and then averaged amongst the results gathered from multiple OMPI
developers on their clusters.
You can access the graphs and averaged data here:
https://drive.google.com/drive/folders/1MV5E9gN-5tootoWoh62aoXmN0jiWiqh3
Signed-off-by: William Zhang <wilzhang@amazon.com>
(cherry picked from commit ce40cfbaa5)
If building Open MPI with sanitizers, e.g
$ configure CC=clang CFLAGS=-fsanitize=address ....
configure test programs are also build with the sanitizers and will
report errors resulting in configure to fail.
Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
The missing include file causes an error when using an external version of LibEvent.
Signed-off-by: tomhers <tom.herschberg@gmail.com>
(cherry picked from commit 88f9d2c90f)
bugfix: provider selection would not differentiate between ipv4
and ipv6 addresses which would cause some nodes to be unable
to communicate between each other. Adding a check for address
format to provider selection to ensure that all nodes use the
same address format.
Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
(cherry picked from commit 7e46371301)
Add logic to handle different architectural capabilities
Detect the compiler flags necessary to build specialized
versions of the MPI_OP. Once the different flavors (AVX512,
AVX2, AVX) are built, detect at runtime which is the best
match with the current processor capabilities.
Add validation checks for loadu 256 and 512 bits.
Add validation tests for MPI_Op.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 14b3c70628)
do not check some input parameters when an {in,out}degree is zero
Thanks Junchao Zhang for analyzing and reporting this issue.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 5655d64bd3)
NOTE: This is intentionally not a cherry pick from master. Instead,
this is a cherry-pick from the equivalent commit on the v4.0.x branch.
See below.
There is a problem with the mpi_f08 module when sizeof(int) !=
sizeof(INTEGER): the size of TYPE(MPI_Status) is too small. This
causes buffer overruns when Open MPI is configured with (for example)
sizeof(int)==4 and sizeof(INTEGER)==8, and then you call the mpi_f08
MPI_RECV subroutine. This will end up copying the resulting C
MPI_Status to the buffer pointing to the Fortran status, but the code
does not know if the Fortran status is an mpif.h status or a
TYPE(MPI_Status) -- it just blindly copies over as if the Fortran
status is an INTEGER array of length MPI_STATUS_SIZE. Unfortunately,
TYPE(MPI_Status) is actually smaller than this, so we overrun the
buffer. Hilarity ensues.
The simple fix for this is to make TYPE(MPI_Status) the same size as
INTEGER(MPI_STATUS_SIZE), but we can't do that here on the release
branch because it will break ABI.
This commit does the following:
- checks to see if we're in a sizeof(int) != sizeof(INTEGER) scenario
- if so, if the user has not specifically excluded building the
mpi_f08 module, display a Giant Error Message (GEM) and abort
configure.
This is unusual; we don't usually abort configure when feature XYZ
can't be built -- if the user didn't specifically ask for XYZ, we
just emit a notice that we won't build XYZ and continue.
This situation is a little different because we're on a release
branch: prior releases have built mpi_f08 by default -- even in this
"bad" scenario. Hence, in this case, we explicitly tell the user that
this is now a known-bad scenario and abort. In the GEM, we give the
user two options:
1. Change their compiler flags so that sizeof(int) == sizeof(INTEGER)
and re-run configure, or
2. Explicitly disable the mpi_f08 module via --enable-mpi-fortran=usempi
Thanks to @ahaichen for reporting the issue.
Note: the proper fix has been implemented on master (i.e., what will
become v5.0.0), but since that breaks ABI, we can't cherry pick it
back here to an existing release branch series. Hence, we
cherry-picked this fix from the v4.0.x branch.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 27836a614b9c29d7636cdf1a9b838b1532281a8a)