openmpi

Mikhail Brinskii 79006f4e5a COLL/BASE: Fix linear sync all2all Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>		2019-06-06 19:22:42 +03:00
..
base.h	Revert "Update to sync with OMPI master and cleanup to build"	2016-11-22 15:03:20 -08:00
coll_base_allgather.c	coll-base-allgather: fix MPI_IN_PLACE processing	2018-07-18 10:27:00 +07:00
coll_base_allgatherv.c	Squash a bunch of harmless compiler warnings.	2018-09-26 12:15:21 -07:00
coll_base_allreduce.c	Always return a valid error code from collective operations	2018-09-14 13:46:35 -04:00
coll_base_alltoall.c	COLL/BASE: Fix linear sync all2all	2019-06-06 19:22:42 +03:00
coll_base_alltoallv.c	Always return a valid error code from collective operations	2018-09-14 13:46:35 -04:00
coll_base_barrier.c	Always return a valid error code from collective operations	2018-09-14 13:46:35 -04:00
coll_base_bcast.c	Remove few warnings in libnbc identified by clang-1000.11.45.2	2018-10-17 18:04:39 -04:00
coll_base_comm_select.c	coll: Update COLL module interface version to 2.3.0	2018-06-11 17:22:16 +09:00
coll_base_comm_unselect.c	coll: Add persistent collective communication request feature	2018-06-11 09:53:37 +09:00
coll_base_exscan.c	coll/basic: move [ex]scan from coll/basic to coll/base	2018-04-04 13:41:01 +09:00
coll_base_find_available.c	Topic/monitoring (#3109 )	2017-06-26 18:21:39 +02:00
coll_base_frame.c	coll/base: add knomial tree algorithm for MPI_Bcast	2018-06-19 13:01:26 -06:00
coll_base_functions.h	coll/base: add MPI_Bcast based on a binomial tree scatter followed by a ring allgather	2018-07-16 08:56:09 -06:00
coll_base_gather.c	Always return a valid error code from collective operations	2018-09-14 13:46:35 -04:00
coll_base_reduce_scatter_block.c	coll/base: Add MPI_Bcast based on a scatter followed by an allgather	2018-06-21 11:47:07 -06:00
coll_base_reduce_scatter.c	Always return a valid error code from collective operations	2018-09-14 13:46:35 -04:00
coll_base_reduce.c	Always return a valid error code from collective operations	2018-09-14 13:46:35 -04:00
coll_base_scan.c	coll/basic: move [ex]scan from coll/basic to coll/base	2018-04-04 13:41:01 +09:00
coll_base_scatter.c	coll/base/scatter: replaces right skewed binomial tree (in order) with left skewed binomial tree	2018-07-09 10:04:41 -06:00
coll_base_topo.c	coll/base: add knomial tree algorithm for MPI_Bcast	2018-06-19 13:01:26 -06:00
coll_base_topo.h	Change the tree_next to a flexible array member	2018-06-19 13:01:26 -06:00
coll_base_util.c	Always return a valid error code from collective operations	2018-09-14 13:46:35 -04:00
coll_base_util.h	coll/base: Add MPI_Bcast based on a scatter followed by an allgather	2018-06-21 11:47:07 -06:00
coll_tags.h	coll/base: add recursive doubling algorithm for MPI_Reduce_scatter_block	2018-04-23 11:02:31 +07:00
help-mca-coll-base.txt	Revert "Update to sync with OMPI master and cleanup to build"	2016-11-22 15:03:20 -08:00
Makefile.am	Resolve merge conflicts	2018-05-03 07:28:32 +07:00
owner.txt	Revert "Update to sync with OMPI master and cleanup to build"	2016-11-22 15:03:20 -08:00
README.memory_management	Revert "Update to sync with OMPI master and cleanup to build"	2016-11-22 15:03:20 -08:00

README.memory_management

    /* This comment applies to all collectives (including the basic
     * module) where we allocate a temporary buffer.  For the next few
     * lines of code, it's tremendously complicated how we decided that
     * this was the Right Thing to do.  Sit back and enjoy.  And prepare
     * to have your mind warped. :-)
     *
     * Recall some definitions (I always get these backwards, so I'm
     * going to put them here):
     *
     * extent: the length from the lower bound to the upper bound -- may
     * be considerably larger than the buffer required to hold the data
     * (or smaller!  But it's easiest to think about when it's larger).
     *
     * true extent: the exact number of bytes required to hold the data
     * in the layout pattern in the datatype.
     *
     * For example, consider the following buffer (just talking about
     * true_lb, extent, and true extent -- extrapolate for true_ub:
     *
     * A              B                                       C
     * --------------------------------------------------------
     * |              |                                       |
     * --------------------------------------------------------
     *
     * There are multiple cases:
     *
     * 1. A is what we give to MPI_Send (and friends), and A is where
     * the data starts, and C is where the data ends.  In this case:
     *
     * - extent: C-A
     * - true extent: C-A
     * - true_lb: 0
     *
     * A                                                      C
     * --------------------------------------------------------
     * |                                                      |
     * --------------------------------------------------------
     * <=======================extent=========================>
     * <======================true extent=====================>
     *
     * 2. A is what we give to MPI_Send (and friends), B is where the
     * data starts, and C is where the data ends.  In this case:
     *
     * - extent: C-A
     * - true extent: C-B
     * - true_lb: positive
     *
     * A              B                                       C
     * --------------------------------------------------------
     * |              |           User buffer                 |
     * --------------------------------------------------------
     * <=======================extent=========================>
     * <===============true extent=============>
     *
     * 3. B is what we give to MPI_Send (and friends), A is where the
     * data starts, and C is where the data ends.  In this case:
     *
     * - extent: C-A
     * - true extent: C-A
     * - true_lb: negative
     *
     * A              B                                       C
     * --------------------------------------------------------
     * |              |           User buffer                 |
     * --------------------------------------------------------
     * <=======================extent=========================>
     * <======================true extent=====================>
     *
     * 4. MPI_BOTTOM is what we give to MPI_Send (and friends), B is
     * where the data starts, and C is where the data ends.  In this
     * case:
     *
     * - extent: C-MPI_BOTTOM
     * - true extent: C-B
     * - true_lb: [potentially very large] positive
     *
     * MPI_BOTTOM     B                                       C
     * --------------------------------------------------------
     * |              |           User buffer                 |
     * --------------------------------------------------------
     * <=======================extent=========================>
     * <===============true extent=============>
     *
     * So in all cases, for a temporary buffer, all we need to malloc()
     * is a buffer of size true_extent.  We therefore need to know two
     * pointer values: what value to give to MPI_Send (and friends) and
     * what value to give to free(), because they might not be the same.
     *
     * Clearly, what we give to free() is exactly what was returned from
     * malloc().  That part is easy.  :-)
     *
     * What we give to MPI_Send (and friends) is a bit more complicated.
     * Let's take the 4 cases from above:
     *
     * 1. If A is what we give to MPI_Send and A is where the data
     * starts, then clearly we give to MPI_Send what we got back from
     * malloc().
     *
     * 2. If B is what we get back from malloc, but we give A to
     * MPI_Send, then the buffer range [A,B) represents "dead space"
     * -- no data will be put there.  So it's safe to give B-true_lb to
     * MPI_Send.  More specifically, the true_lb is positive, so B-true_lb is
     * actually A.
     *
     * 3. If A is what we get back from malloc, and B is what we give to
     * MPI_Send, then the true_lb is negative, so A-true_lb will actually equal
     * B.
     *
     * 4. Although this seems like the weirdest case, it's actually
     * quite similar to case #2 -- the pointer we give to MPI_Send is
     * smaller than the pointer we got back from malloc().
     *
     * Hence, in all cases, we give (return_from_malloc - true_lb) to MPI_Send.
     *
     * This works fine and dandy if we only have (count==1), which we
     * rarely do.  ;-) So we really need to allocate (true_extent +
     * ((count - 1) * extent)) to get enough space for the rest.  This may
     * be more than is necessary, but it's ok.
     *
     * Simple, no?  :-)
     *
     */