An implementation of R. Rabenseifner's algorithm for MPI_Ireduce.
This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by a gather.
Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 7bd63e79c865080c801a45ed852602bdc4eb4d8f)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>