Implements butterfly algorithm for MPI_Reduce_scatter. The algorithm can be used both by commutative and non-commutative operations, for power-of-two and non-power-of-two number of processes. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>