This PR removes the MCA_BTL_DES_FLAGS_PUT and MCA_BTL_DES_FLAGS_GET
descriptor flags. At some point these had some meaning but they were
replaced by the rcache access flags.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Reduce scatter block and reduce scatter algorithms were hitting
correctness issues for non commutative strided tests. We will revert to
the original default algorithms for those two collectives (basic linear
and non overlapping respectively) in the non commutative op case.
See #8010
Signed-off-by: William Zhang <wilzhang@amazon.com>
improve configury to check whether icc is handling no long double.
This prevents seeing 100s of messages like this:
icc: command line warning #10148: option '-Wno-long-double' not supported
A similar patch will be needed for pmix.
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
the ofi mtl mrecv was not properly setting the message in/out
arg to MPI_MRECV to MPI_MESSAGE_NULL.
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
MPI-4 is finally cleaning up its language: an MPI "exception" does not
actually exist. The only thing that exists is an MPI "error" (and
associated handlers). This commit replaces all relevant uses of the
word "exception" with "error". Note that this is still applicable in
versions of the MPI standard less than MPI-4.0 (indeed, nearly all the
cases fixed in this commit are just changes to comments, anyway).
One exception to this is the Java bindings, where there's an
MPIException class. In hindsight, it probably should have been named
MPIError, but changing it now would break anyone who is using the Java
bindings.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The C++ bindings were removed a while ago;
MPI::ERRORS_THROW_EXCEPTIONS and MPI_ERRORS_THROW_EXCEPTIONS no longer
exist.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The ofi_rxm provider is dependent upon the underlying hardware for its
implementation of FI_DELIVERY_COMPLETE. Since this can lead to early
completions, we disable the provider to avoid correctness issues.
This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.
Signed-off-by: William Zhang <wilzhang@amazon.com>
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
make things happen before the terminal call
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
The btl/ofi does not currently utilize the common ofi include/exclude
list. Added verification code similar to the mtl/ofi that will check if
the info object is in the include or exclude list. If it isn't in the
include list or is in the exclude list, validate_info will return
OPAL_ERROR. The btl/ofi will no longer pass a provider name as a hint
when calling getinfo, instead filtering the provider during
validate_info.
This patch also moves the is_in_list MTL function into common code and
adds additional debugging output to the BTL to match the MTL standard.
Signed-off-by: William Zhang <wilzhang@amazon.com>
EFA incorrectly implements FI_DELIVERY_COMPLETE in earlier libfabric
versions. While FI_DELIVERY_COMPLETE would be advertised by the
provider, completions would return too early by not accounting for
bounce buffers on the receive side. This would cause the BTL
to receive early completions that lead to correctness issues.
This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.
Signed-off-by: William Zhang <wilzhang@amazon.com>
The default algorithm selections were out of date and not performing
well. After gathering data from OMPI developers, new default algorithm
decisions were selected for:
allgather
allgatherv
allreduce
alltoall
alltoallv
barrier
bcast
gather
reduce
reduce_scatter_block
reduce_scatter
scatter
These results were gathered using the ompi-collectives-tuning package
and then averaged amongst the results gathered from multiple OMPI
developers on their clusters.
You can access the graphs and averaged data here:
https://drive.google.com/drive/folders/1MV5E9gN-5tootoWoh62aoXmN0jiWiqh3
Signed-off-by: William Zhang <wilzhang@amazon.com>