- Add support for fallback to previous coll module on non-commutative operations (#30)
- Replace mutexes by atomic operations.
- Use the correct nbc request type (for both ibcast and ireduce)
* coll/base: document type casts in ompi_coll_base_retain_*
- add module-wide topology cache
- use standard instead of synchronous send and add mca parameter to control mode of initial send in ireduce/ibcast
- reduce number of memory allocations
- call the default request completion.
- Remove the requests from the Fortran lookup conversion tables before completing
and free it.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
Co-authored-by: Joseph Schuchart <schuchart@hlrs.de>
This PR removes the MCA_BTL_DES_FLAGS_PUT and MCA_BTL_DES_FLAGS_GET
descriptor flags. At some point these had some meaning but they were
replaced by the rcache access flags.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
Reduce scatter block and reduce scatter algorithms were hitting
correctness issues for non commutative strided tests. We will revert to
the original default algorithms for those two collectives (basic linear
and non overlapping respectively) in the non commutative op case.
See #8010
Signed-off-by: William Zhang <wilzhang@amazon.com>
As it is possible to have multiple outstanding non-blocking collectives
provided by different collective modules, we need a consistent
mechanism to allow them to select unique tags for each instance of a
collective.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* piggybacking Bull functionalities
* coll/adapt: Fix naming conventions and C11 atomic use
This commit fixes some naming convention issues, such as function names
which should follow the naming ompi_coll_adapt instead of
mca_coll_adapt, reserved for component and module naming (cf. tuned
collective component);
It also fixes the use of _Atomic construct, which is only valid in C11.
OPAL constructs have already been adapted to that use, so use
opal_atomic_* types instead.
* coll/adapt: Remove unused component field in module
This commit removes an unneeded field referencing the component in the
module of adapt, as it is already available through the
mca_coll_adapt_component global variable.
Signed-off-by: Marc Sergent <marc.sergent@atos.net>
Co-authored-by: Lemarinier, Pierre <pierre.lemarinier@atos.net>
Co-authored-by: pierrele <31764860+pierrele@users.noreply.github.com>
improve configury to check whether icc is handling no long double.
This prevents seeing 100s of messages like this:
icc: command line warning #10148: option '-Wno-long-double' not supported
A similar patch will be needed for pmix.
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
the ofi mtl mrecv was not properly setting the message in/out
arg to MPI_MRECV to MPI_MESSAGE_NULL.
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
MPI-4 is finally cleaning up its language: an MPI "exception" does not
actually exist. The only thing that exists is an MPI "error" (and
associated handlers). This commit replaces all relevant uses of the
word "exception" with "error". Note that this is still applicable in
versions of the MPI standard less than MPI-4.0 (indeed, nearly all the
cases fixed in this commit are just changes to comments, anyway).
One exception to this is the Java bindings, where there's an
MPIException class. In hindsight, it probably should have been named
MPIError, but changing it now would break anyone who is using the Java
bindings.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The C++ bindings were removed a while ago;
MPI::ERRORS_THROW_EXCEPTIONS and MPI_ERRORS_THROW_EXCEPTIONS no longer
exist.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The ofi_rxm provider is dependent upon the underlying hardware for its
implementation of FI_DELIVERY_COMPLETE. Since this can lead to early
completions, we disable the provider to avoid correctness issues.
This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.
Signed-off-by: William Zhang <wilzhang@amazon.com>
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
make things happen before the terminal call
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
The btl/ofi does not currently utilize the common ofi include/exclude
list. Added verification code similar to the mtl/ofi that will check if
the info object is in the include or exclude list. If it isn't in the
include list or is in the exclude list, validate_info will return
OPAL_ERROR. The btl/ofi will no longer pass a provider name as a hint
when calling getinfo, instead filtering the provider during
validate_info.
This patch also moves the is_in_list MTL function into common code and
adds additional debugging output to the BTL to match the MTL standard.
Signed-off-by: William Zhang <wilzhang@amazon.com>
EFA incorrectly implements FI_DELIVERY_COMPLETE in earlier libfabric
versions. While FI_DELIVERY_COMPLETE would be advertised by the
provider, completions would return too early by not accounting for
bounce buffers on the receive side. This would cause the BTL
to receive early completions that lead to correctness issues.
This is not an issue in the mtl/ofi as it does not require
FI_DELIVERY_COMPLETE.
Signed-off-by: William Zhang <wilzhang@amazon.com>