1
1

coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms

PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.

MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 88d781056f43934a93e16db556b340e72cdd3742)
Этот коммит содержится в:
Aravind Gopalakrishnan 2018-10-24 15:31:33 -07:00
родитель cd1e927e1b
Коммит 5a74ddb34d

Просмотреть файл

@ -119,7 +119,11 @@ int ompi_coll_tuned_alltoall_intra_dec_fixed(const void *sbuf, int scount,
the University of Tennessee (2GB MX) up to 64 nodes.
Has better performance for messages of intermediate sizes than the old one */
/* determine block size */
ompi_datatype_type_size(sdtype, &dsize);
if (MPI_IN_PLACE != sbuf) {
ompi_datatype_type_size(sdtype, &dsize);
} else {
ompi_datatype_type_size(rdtype, &dsize);
}
block_dsize = dsize * (ptrdiff_t)scount;
if ((block_dsize < (size_t) ompi_coll_tuned_alltoall_small_msg)
@ -549,7 +553,11 @@ int ompi_coll_tuned_allgather_intra_dec_fixed(const void *sbuf, int scount,
}
/* Determine complete data size */
ompi_datatype_type_size(sdtype, &dsize);
if (MPI_IN_PLACE != sbuf) {
ompi_datatype_type_size(sdtype, &dsize);
} else {
ompi_datatype_type_size(rdtype, &dsize);
}
total_dsize = dsize * (ptrdiff_t)scount * (ptrdiff_t)communicator_size;
OPAL_OUTPUT((ompi_coll_tuned_stream, "ompi_coll_tuned_allgather_intra_dec_fixed"
@ -644,7 +652,12 @@ int ompi_coll_tuned_allgatherv_intra_dec_fixed(const void *sbuf, int scount,
}
/* Determine complete data size */
ompi_datatype_type_size(sdtype, &dsize);
if (MPI_IN_PLACE != sbuf) {
ompi_datatype_type_size(sdtype, &dsize);
} else {
ompi_datatype_type_size(rdtype, &dsize);
}
total_dsize = 0;
for (i = 0; i < communicator_size; i++) {
total_dsize += dsize * (ptrdiff_t)rcounts[i];