open-mpi/ompi@0fe756d416 Introduced
a bug in coll/hcoll component. The ompi_requests allocated by
libhcoll would be treated as coll_base_nbc_request during
ompi_coll_base_retain_<> call. Afterwards this would lead to a
segv in the request cleanup.
Fix: since libhcoll interface does not distinguish between the
blocling/non-blocking requests use coll_base_nbc_request all the
time and initialize it properly in
coll/hcoll/get_coll_handle(). It is still within 2 cache lines.
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
a non blocking collective might return ompi_request_null, so we should not
retain anything in that case.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Since ompi_coll_base_nbc_request_t is to be used in an
opal_free_list_t, it must be returned into a "clean" state.
So cleanup some data in the callback completion subroutines.
This fixes a regression introduced in open-mpi/ompi@0fe756d416
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
base ompi_coll_libnbc_request_t on top of ompi_coll_base_nbc_request_t
to correctly support the retention of datatypes/operators
This fixes a regression introduced in open-mpi/ompi@0fe756d416
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Use linear with sync alltoall algorithm for certain message/comm size
ranges. Does not affect default fixed decision, unless HPCX (with its
custom parameters) is used or corresponding mca is set.
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd
after a call to a non blocking collective and before the non-blocking
collective completes.
Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is
invoked, and set a request callback so they are free'd when the MPI_Request
completes.
Thanks Thomas Ponweiser for reporting this
Fixesopen-mpi/ompi#2151Fixesopen-mpi/ompi#1304
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
If user sets HCOLL_EXTERNAL_UCM_EVENTS=1 then we try init opal
memory framework and register a mem release cb. Otherwise, rely on ucx.
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
... and add `MPI_COMPLEX4`.
This commit changes values of existing `OMPI_DATATYPE_MPI_*` macros.
This change does not affect ABI compatibility of `libmpi.so` and the
like because these values are only used in OMPI internal code.
On the other hand, `ompi_datatype_t::id` values of existing datatypes
are not changed and 73 is newly assigned to for `MPI_COMPLEX4` to
retain ABI compatibility.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
... and `ompi_mpi_c_short_float_complex` and `ompi_mpi_cxx_sfltcplex`.
These are Open MPI internal variables intended to be defined as
`MPI_SHORT_FLOAT`, `MPI_C_SHORT_FLOAT_COMPLEX`, and
`MPI_CXX_SHORT_FLOAT_COMPLEX` in the future.
`OMPI_DATATYPE_MPI_C_SHORT_FLOAT_COMPLEX` is also required to
support `MPI_COMPLEX4` in the next commit.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
The type `short float` is proposed for the C language in ISO/IEC JTC
1/SC 22 WG 14 (C WG) for mainly IEEE 754-2008 binary16, a.k.a.
half-precision floating point or FP16.
By this commit, `short float` and `short float _Complex` are detected
in `configure` and used in Open MPI internal code. `MPI_SHORT_FLOAT`
and its complex number version are not added yet.
This commit changes values of existing `OPAL_DATATYPE_*` macros.
This change does not affect ABI compatibility of `libmpi.so` and the
like because these values are only used in OPAL and OMPI internal code.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
Correctly bubble up errors in NBC collective operations
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
The error field of requests needs to be rearmed at start, not at create
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
1. Remove debug output in iallgather (I have forgotten to remove it).
2. Remove an incorrect comment in description of ibcast
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.
MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
An implementation of R. Rabenseifner's algorithm for MPI_Iallreduce.
This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by an allgather.
Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
Implements recursive doubling algorithm for MPI_Iallgather.
The algorithm can be used only for power-of-two number of processes.
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
An implementation of R. Rabenseifner's algorithm for MPI_Ireduce.
This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by a gather.
Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
The Open MPI code base assumed that asprintf always behaved like
the FreeBSD variant, where ptr is set to NULL on error. However,
the C standard (and Linux) only guarantee that the return code will
be -1 on error and leave ptr undefined. Rather than fix all the
usage in the code, we use opal_asprintf() wrapper instead, which
guarantees the BSD-like behavior of ptr always being set to NULL.
In addition to being correct, this will fix many, many warnings
in the Open MPI code base.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Implements recursive doubling algorithm for MPI_Iexscan.
The algorithm preserves order of operations so it can be used both
by commutative and non-commutative operations.
The MCA parameter 'coll_libnbc_iexscan_algorithm' was added for dynamic
algorithm selection.
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
Implements recursive doubling algorithm for MPI_Iscan. The algorithm preserves order of operations so it can be used both by commutative and non-commutative operations.
The MCA parameter coll_libnbc_iscan_algorithm was added for dynamic algorithm selection.
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
The parameter passed to NBC_Return_handle() was incorrectly casted
and not dereferenced.
Thanks Yossi for the bug report.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Gcc 8 identified hb_tree_csearch() as an infinite recursion, and it
turns out that we never call this function, anyway. So just remove
it.
Fixes#5670.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This commit updates the entire codebase to use specific opal types for
all atomic variables. This is a change from the prior atomic support
which required the use of the volatile keyword. This is the first step
towards implementing support for C11 atomics as that interface
requires the use of types declared with the _Atomic keyword.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
always initialize 'size'.
Only the a2a_sched_diss() alltoall algorithm is impacted,
and this algo is currently unused, so there is no need
to backport nor update the NEWS file for now.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>