openmpi/ompi/mca/coll/base/README.memory_management

    /* This comment applies to all collectives (including the basic
     * module) where we allocate a temporary buffer.  For the next few
     * lines of code, it's tremendously complicated how we decided that
     * this was the Right Thing to do.  Sit back and enjoy.  And prepare
     * to have your mind warped. :-)
     *
     * Recall some definitions (I always get these backwards, so I'm
     * going to put them here):
     *
     * extent: the length from the lower bound to the upper bound -- may
     * be considerably larger than the buffer required to hold the data
     * (or smaller!  But it's easiest to think about when it's larger).
     *
     * true extent: the exact number of bytes required to hold the data
     * in the layout pattern in the datatype.
     *
     * For example, consider the following buffer (just talking about
     * true_lb, extent, and true extent -- extrapolate for true_ub:
     *
     * A              B                                       C
     * --------------------------------------------------------
     * |              |                                       |
     * --------------------------------------------------------
     *
     * There are multiple cases:
     *
     * 1. A is what we give to MPI_Send (and friends), and A is where
     * the data starts, and C is where the data ends.  In this case:
     *
     * - extent: C-A
     * - true extent: C-A
     * - true_lb: 0
     *
     * A                                                      C
     * --------------------------------------------------------
     * |                                                      |
     * --------------------------------------------------------
     * <=======================extent=========================>
     * <======================true extent=====================>
     *
     * 2. A is what we give to MPI_Send (and friends), B is where the
     * data starts, and C is where the data ends.  In this case:
     *
     * - extent: C-A
     * - true extent: C-B
     * - true_lb: positive
     *
     * A              B                                       C
     * --------------------------------------------------------
     * |              |           User buffer                 |
     * --------------------------------------------------------
     * <=======================extent=========================>
     * <===============true extent=============>
     *
     * 3. B is what we give to MPI_Send (and friends), A is where the
     * data starts, and C is where the data ends.  In this case:
     *
     * - extent: C-A
     * - true extent: C-A
     * - true_lb: negative
     *
     * A              B                                       C
     * --------------------------------------------------------
     * |              |           User buffer                 |
     * --------------------------------------------------------
     * <=======================extent=========================>
     * <======================true extent=====================>
     *
     * 4. MPI_BOTTOM is what we give to MPI_Send (and friends), B is
     * where the data starts, and C is where the data ends.  In this
     * case:
     *
     * - extent: C-MPI_BOTTOM
     * - true extent: C-B
     * - true_lb: [potentially very large] positive
     *
     * MPI_BOTTOM     B                                       C
     * --------------------------------------------------------
     * |              |           User buffer                 |
     * --------------------------------------------------------
     * <=======================extent=========================>
     * <===============true extent=============>
     *
     * So in all cases, for a temporary buffer, all we need to malloc()
     * is a buffer of size true_extent.  We therefore need to know two
     * pointer values: what value to give to MPI_Send (and friends) and
     * what value to give to free(), because they might not be the same.
     *
     * Clearly, what we give to free() is exactly what was returned from
     * malloc().  That part is easy.  :-)
     *
     * What we give to MPI_Send (and friends) is a bit more complicated.
     * Let's take the 4 cases from above:
     *
     * 1. If A is what we give to MPI_Send and A is where the data
     * starts, then clearly we give to MPI_Send what we got back from
     * malloc().
     *
     * 2. If B is what we get back from malloc, but we give A to
     * MPI_Send, then the buffer range [A,B) represents "dead space"
     * -- no data will be put there.  So it's safe to give B-true_lb to
     * MPI_Send.  More specifically, the true_lb is positive, so B-true_lb is
     * actually A.
     *
     * 3. If A is what we get back from malloc, and B is what we give to
     * MPI_Send, then the true_lb is negative, so A-true_lb will actually equal
     * B.
     *
     * 4. Although this seems like the weirdest case, it's actually
     * quite similar to case #2 -- the pointer we give to MPI_Send is
     * smaller than the pointer we got back from malloc().
     *
     * Hence, in all cases, we give (return_from_malloc - true_lb) to MPI_Send.
     *
     * This works fine and dandy if we only have (count==1), which we
     * rarely do.  ;-) So we really need to allocate (true_extent +
     * ((count - 1) * extent)) to get enough space for the rest.  This may
     * be more than is necessary, but it's ok.
     *
     * Simple, no?  :-)
     *
     */