1
1

coll/tuned: don't select algorithms knowing when it's clear they would fall back to linear

Bcast: scatter_allgather and scatter_allgather_ring expect N_elem >= N_procs
Allreduce: rabenseifner expects N_elem >= pow2 nearest to N_procs

In all cases, the implementations will fall back to a linear implementation,
which will most likely yield the worst performance (noted for 4B bcast on 128 ranks)

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Этот коммит содержится в:
Joseph Schuchart 2020-11-05 18:32:12 +01:00
родитель 7261255b8d
Коммит 04d198fc9f

Просмотреть файл

@ -3,7 +3,7 @@
* Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana * Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
* University Research and Technology * University Research and Technology
* Corporation. All rights reserved. * Corporation. All rights reserved.
* Copyright (c) 2004-2015 The University of Tennessee and The University * Copyright (c) 2004-2020 The University of Tennessee and The University
* of Tennessee Research Foundation. All rights * of Tennessee Research Foundation. All rights
* reserved. * reserved.
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, * Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
@ -567,9 +567,7 @@ int ompi_coll_tuned_bcast_intra_dec_fixed(void *buff, int count,
* {9, "scatter_allgather_ring"}, * {9, "scatter_allgather_ring"},
*/ */
if (communicator_size < 4) { if (communicator_size < 4) {
if (total_dsize < 2) { if (total_dsize < 32) {
alg = 9;
} else if (total_dsize < 32) {
alg = 3; alg = 3;
} else if (total_dsize < 256) { } else if (total_dsize < 256) {
alg = 5; alg = 5;
@ -591,9 +589,7 @@ int ompi_coll_tuned_bcast_intra_dec_fixed(void *buff, int count,
alg = 5; alg = 5;
} }
} else if (communicator_size < 8) { } else if (communicator_size < 8) {
if (total_dsize < 2) { if (total_dsize < 64) {
alg = 8;
} else if (total_dsize < 64) {
alg = 5; alg = 5;
} else if (total_dsize < 128) { } else if (total_dsize < 128) {
alg = 6; alg = 6;
@ -639,8 +635,6 @@ int ompi_coll_tuned_bcast_intra_dec_fixed(void *buff, int count,
} else if (communicator_size < 256) { } else if (communicator_size < 256) {
if (total_dsize < 2) { if (total_dsize < 2) {
alg = 6; alg = 6;
} else if (total_dsize < 128) {
alg = 8;
} else if (total_dsize < 16384) { } else if (total_dsize < 16384) {
alg = 5; alg = 5;
} else if (total_dsize < 32768) { } else if (total_dsize < 32768) {