2006-12-21 21:40:02 +03:00
|
|
|
/*
|
|
|
|
* Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
|
|
|
|
* University Research and Technology
|
|
|
|
* Corporation. All rights reserved.
|
2015-02-15 22:47:27 +03:00
|
|
|
* Copyright (c) 2004-2015 The University of Tennessee and The University
|
2006-12-21 21:40:02 +03:00
|
|
|
* of Tennessee Research Foundation. All rights
|
|
|
|
* reserved.
|
|
|
|
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
|
|
|
* University of Stuttgart. All rights reserved.
|
|
|
|
* Copyright (c) 2004-2005 The Regents of the University of California.
|
|
|
|
* All rights reserved.
|
2009-01-14 06:22:54 +03:00
|
|
|
* Copyright (c) 2009 University of Houston. All rights reserved.
|
2013-03-28 01:09:41 +04:00
|
|
|
* Copyright (c) 2013 Los Alamos National Security, LLC. All Rights
|
|
|
|
* reserved.
|
2014-10-14 11:26:54 +04:00
|
|
|
* Copyright (c) 2014 Research Organization for Information Science
|
|
|
|
* and Technology (RIST). All rights reserved.
|
2006-12-21 21:40:02 +03:00
|
|
|
* $COPYRIGHT$
|
|
|
|
*
|
|
|
|
* Additional copyrights may follow
|
|
|
|
*
|
|
|
|
* $HEADER$
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "ompi_config.h"
|
|
|
|
|
|
|
|
#include "mpi.h"
|
- Check, whether the compiler supports __builtin_clz (count leading
zeroes);
if so, use it for bit-operations like opal_cube_dim and opal_hibit.
Implement two versions of power-of-two.
In case of opal_next_poweroftwo, this reduces the average execution
time from 83 cycles to 4 cycles (Intel Nehalem, icc, -O2, inlining,
measured rdtsc, with loop over 2^27 values).
Numbers for other functions are similar (but of course heavily depend
on the usage, e.g. opal_hibit() with a start of 4 does not save
much). The bsr instruction on AMD Opteron is also not as fast.
- Replace various places where the next power-of-two is computed.
Tested on Intel Nehalem Cluster with openib, compilers GNU-4.6.1 and
Intel-12.0.4 using mpi_testsuite -t "Collective" with 128 processes.
This commit was SVN r25270.
2011-10-12 02:49:01 +04:00
|
|
|
#include "opal/util/bit_ops.h"
|
2006-12-21 21:40:02 +03:00
|
|
|
#include "ompi/constants.h"
|
- Split the datatype engine into two parts: an MPI specific part in
OMPI
and a language agnostic part in OPAL. The convertor is completely
moved into OPAL. This offers several benefits as described in RFC
http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
namely:
- Fewer basic types (int* and float* types, boolean and wchar
- Fixing naming scheme to ompi-nomenclature.
- Usability outside of the ompi-layer.
- Due to the fixed nature of simple opal types, their information is
completely
known at compile time and therefore constified
- With fewer datatypes (22), the actual sizes of bit-field types may be
reduced
from 64 to 32 bits, allowing reorganizing the opal_datatype
structure, eliminating holes and keeping data required in convertor
(upon send/recv) in one cacheline...
This has implications to the convertor-datastructure and other parts
of the code.
- Several performance tests have been run, the netpipe latency does not
change with
this patch on Linux/x86-64 on the smoky cluster.
- Extensive tests have been done to verify correctness (no new
regressions) using:
1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
ompi-ddt:
a. running both trunk and ompi-ddt resulted in no differences
(except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
correctly).
b. with --enable-memchecker and running under valgrind (one buglet
when run with static found in test-suite, commited)
2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
all passed (except for the dynamic/ tests failed!! as trunk/MTT)
3. compilation and usage of HDF5 tests on Jaguar using PGI and
PathScale compilers.
4. compilation and usage on Scicortex.
- Please note, that for the heterogeneous case, (-m32 compiled
binaries/ompi), neither
ompi-trunk, nor ompi-ddt branch would successfully launch.
This commit was SVN r21641.
2009-07-13 08:56:31 +04:00
|
|
|
#include "ompi/datatype/ompi_datatype.h"
|
2006-12-21 21:40:02 +03:00
|
|
|
#include "ompi/communicator/communicator.h"
|
|
|
|
#include "ompi/mca/coll/coll.h"
|
|
|
|
#include "ompi/mca/coll/base/coll_tags.h"
|
2015-02-15 22:47:27 +03:00
|
|
|
#include "ompi/mca/coll/base/coll_base_functions.h"
|
|
|
|
#include "coll_base_topo.h"
|
|
|
|
#include "coll_base_util.h"
|
2006-12-21 21:40:02 +03:00
|
|
|
|
|
|
|
/*
|
2015-02-15 22:47:27 +03:00
|
|
|
* ompi_coll_base_allgather_intra_bruck
|
2006-12-21 21:40:02 +03:00
|
|
|
*
|
|
|
|
* Function: allgather using O(log(N)) steps.
|
|
|
|
* Accepts: Same arguments as MPI_Allgather
|
|
|
|
* Returns: MPI_SUCCESS or error code
|
|
|
|
*
|
|
|
|
* Description: Variation to All-to-all algorithm described by Bruck et al.in
|
|
|
|
* "Efficient Algorithms for All-to-all Communications
|
|
|
|
* in Multiport Message-Passing Systems"
|
|
|
|
* Memory requirements: non-zero ranks require shift buffer to perform final
|
|
|
|
* step in the algorithm.
|
2015-02-15 22:47:27 +03:00
|
|
|
*
|
2006-12-21 21:40:02 +03:00
|
|
|
* Example on 6 nodes:
|
|
|
|
* Initialization: everyone has its own buffer at location 0 in rbuf
|
|
|
|
* This means if user specified MPI_IN_PLACE for sendbuf
|
|
|
|
* we must copy our block from recvbuf to begining!
|
|
|
|
* # 0 1 2 3 4 5
|
|
|
|
* [0] [1] [2] [3] [4] [5]
|
|
|
|
* Step 0: send message to (rank - 2^0), receive message from (rank + 2^0)
|
|
|
|
* # 0 1 2 3 4 5
|
|
|
|
* [0] [1] [2] [3] [4] [5]
|
|
|
|
* [1] [2] [3] [4] [5] [0]
|
|
|
|
* Step 1: send message to (rank - 2^1), receive message from (rank + 2^1)
|
|
|
|
* message contains all blocks from location 0 to 2^1*block size
|
|
|
|
* # 0 1 2 3 4 5
|
|
|
|
* [0] [1] [2] [3] [4] [5]
|
|
|
|
* [1] [2] [3] [4] [5] [0]
|
|
|
|
* [2] [3] [4] [5] [0] [1]
|
|
|
|
* [3] [4] [5] [0] [1] [2]
|
|
|
|
* Step 2: send message to (rank - 2^2), receive message from (rank + 2^2)
|
2015-02-15 22:47:27 +03:00
|
|
|
* message size is "all remaining blocks"
|
2006-12-21 21:40:02 +03:00
|
|
|
* # 0 1 2 3 4 5
|
|
|
|
* [0] [1] [2] [3] [4] [5]
|
|
|
|
* [1] [2] [3] [4] [5] [0]
|
|
|
|
* [2] [3] [4] [5] [0] [1]
|
|
|
|
* [3] [4] [5] [0] [1] [2]
|
|
|
|
* [4] [5] [0] [1] [2] [3]
|
|
|
|
* [5] [0] [1] [2] [3] [4]
|
|
|
|
* Finalization: Do a local shift to get data in correct place
|
|
|
|
* # 0 1 2 3 4 5
|
|
|
|
* [0] [0] [0] [0] [0] [0]
|
|
|
|
* [1] [1] [1] [1] [1] [1]
|
|
|
|
* [2] [2] [2] [2] [2] [2]
|
|
|
|
* [3] [3] [3] [3] [3] [3]
|
|
|
|
* [4] [4] [4] [4] [4] [4]
|
|
|
|
* [5] [5] [5] [5] [5] [5]
|
|
|
|
*/
|
2015-02-15 22:47:27 +03:00
|
|
|
int ompi_coll_base_allgather_intra_bruck(void *sbuf, int scount,
|
2006-12-21 21:40:02 +03:00
|
|
|
struct ompi_datatype_t *sdtype,
|
|
|
|
void* rbuf, int rcount,
|
|
|
|
struct ompi_datatype_t *rdtype,
|
2007-08-19 07:37:49 +04:00
|
|
|
struct ompi_communicator_t *comm,
|
2011-02-25 03:28:32 +03:00
|
|
|
mca_coll_base_module_t *module)
|
2006-12-21 21:40:02 +03:00
|
|
|
{
|
2012-04-06 19:48:07 +04:00
|
|
|
int line = -1, rank, size, sendto, recvfrom, distance, blockcount, err = 0;
|
|
|
|
ptrdiff_t slb, rlb, sext, rext;
|
|
|
|
char *tmpsend = NULL, *tmprecv = NULL;
|
|
|
|
|
|
|
|
size = ompi_comm_size(comm);
|
|
|
|
rank = ompi_comm_rank(comm);
|
|
|
|
|
2015-02-15 22:47:27 +03:00
|
|
|
OPAL_OUTPUT((ompi_coll_base_framework.framework_output,
|
|
|
|
"coll:base:allgather_intra_bruck rank %d", rank));
|
2012-04-06 19:48:07 +04:00
|
|
|
|
|
|
|
err = ompi_datatype_get_extent (sdtype, &slb, &sext);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
err = ompi_datatype_get_extent (rdtype, &rlb, &rext);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
/* Initialization step:
|
2015-02-15 22:47:27 +03:00
|
|
|
- if send buffer is not MPI_IN_PLACE, copy send buffer to block 0 of
|
2012-04-06 19:48:07 +04:00
|
|
|
receive buffer, else
|
|
|
|
- if rank r != 0, copy r^th block from receive buffer to block 0.
|
|
|
|
*/
|
|
|
|
tmprecv = (char*) rbuf;
|
|
|
|
if (MPI_IN_PLACE != sbuf) {
|
|
|
|
tmpsend = (char*) sbuf;
|
|
|
|
err = ompi_datatype_sndrcv(tmpsend, scount, sdtype, tmprecv, rcount, rdtype);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
2012-10-09 01:34:04 +04:00
|
|
|
} else if (0 != rank) { /* non root with MPI_IN_PLACE */
|
2012-04-06 19:48:07 +04:00
|
|
|
tmpsend = ((char*)rbuf) + (ptrdiff_t)rank * (ptrdiff_t)rcount * rext;
|
|
|
|
err = ompi_datatype_copy_content_same_ddt(rdtype, rcount, tmprecv, tmpsend);
|
|
|
|
if (err < 0) { line = __LINE__; goto err_hndl; }
|
|
|
|
}
|
2015-02-15 22:47:27 +03:00
|
|
|
|
2012-04-06 19:48:07 +04:00
|
|
|
/* Communication step:
|
|
|
|
At every step i, rank r:
|
|
|
|
- doubles the distance
|
2015-02-15 22:47:27 +03:00
|
|
|
- sends message which starts at begining of rbuf and has size
|
2012-04-06 19:48:07 +04:00
|
|
|
(blockcount * rcount) to rank (r - distance)
|
|
|
|
- receives message of size blockcount * rcount from rank (r + distance)
|
|
|
|
at location (rbuf + distance * rcount * rext)
|
2015-02-15 22:47:27 +03:00
|
|
|
- blockcount doubles until last step when only the remaining data is
|
2012-04-06 19:48:07 +04:00
|
|
|
exchanged.
|
|
|
|
*/
|
|
|
|
blockcount = 1;
|
|
|
|
tmpsend = (char*) rbuf;
|
|
|
|
for (distance = 1; distance < size; distance<<=1) {
|
|
|
|
|
|
|
|
recvfrom = (rank + distance) % size;
|
|
|
|
sendto = (rank - distance + size) % size;
|
|
|
|
|
|
|
|
tmprecv = tmpsend + (ptrdiff_t)distance * (ptrdiff_t)rcount * rext;
|
|
|
|
|
|
|
|
if (distance <= (size >> 1)) {
|
|
|
|
blockcount = distance;
|
2015-02-15 22:47:27 +03:00
|
|
|
} else {
|
2012-04-06 19:48:07 +04:00
|
|
|
blockcount = size - distance;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Sendreceive */
|
2015-02-15 22:47:27 +03:00
|
|
|
err = ompi_coll_base_sendrecv(tmpsend, blockcount * rcount, rdtype,
|
2012-04-06 19:48:07 +04:00
|
|
|
sendto, MCA_COLL_BASE_TAG_ALLGATHER,
|
2015-02-15 22:47:27 +03:00
|
|
|
tmprecv, blockcount * rcount, rdtype,
|
2012-04-06 19:48:07 +04:00
|
|
|
recvfrom, MCA_COLL_BASE_TAG_ALLGATHER,
|
|
|
|
comm, MPI_STATUS_IGNORE, rank);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Finalization step:
|
|
|
|
On all nodes except 0, data needs to be shifted locally:
|
2015-02-15 22:47:27 +03:00
|
|
|
- create temporary shift buffer,
|
|
|
|
see discussion in coll_basic_reduce.c about the size and begining
|
2012-04-06 19:48:07 +04:00
|
|
|
of temporary buffer.
|
2014-11-14 07:22:01 +03:00
|
|
|
- copy blocks [0 .. (size - rank - 1)] from rbuf to shift buffer
|
|
|
|
- move blocks [(size - rank) .. size] from rbuf to begining of rbuf
|
2012-04-06 19:48:07 +04:00
|
|
|
- copy blocks from shift buffer starting at block [rank] in rbuf.
|
|
|
|
*/
|
|
|
|
if (0 != rank) {
|
|
|
|
ptrdiff_t true_extent, true_lb;
|
|
|
|
char *free_buf = NULL, *shift_buf = NULL;
|
|
|
|
|
|
|
|
err = ompi_datatype_get_true_extent(rdtype, &true_lb, &true_extent);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
2014-11-14 07:22:01 +03:00
|
|
|
free_buf = (char*) calloc(((true_extent +
|
2012-04-06 19:48:07 +04:00
|
|
|
((ptrdiff_t)(size - rank) * (ptrdiff_t)rcount - 1) * rext)),
|
|
|
|
sizeof(char));
|
2015-02-15 22:47:27 +03:00
|
|
|
if (NULL == free_buf) {
|
|
|
|
line = __LINE__; err = OMPI_ERR_OUT_OF_RESOURCE; goto err_hndl;
|
2012-04-06 19:48:07 +04:00
|
|
|
}
|
2014-11-06 13:25:18 +03:00
|
|
|
shift_buf = free_buf - true_lb;
|
2014-11-14 07:22:01 +03:00
|
|
|
|
2012-10-09 01:34:04 +04:00
|
|
|
/* 1. copy blocks [0 .. (size - rank - 1)] from rbuf to shift buffer */
|
2012-04-06 19:48:07 +04:00
|
|
|
err = ompi_datatype_copy_content_same_ddt(rdtype, ((ptrdiff_t)(size - rank) * (ptrdiff_t)rcount),
|
2012-10-09 01:34:04 +04:00
|
|
|
shift_buf, rbuf);
|
2012-04-06 19:48:07 +04:00
|
|
|
if (err < 0) { line = __LINE__; goto err_hndl; }
|
2006-12-21 21:40:02 +03:00
|
|
|
|
2012-10-09 01:34:04 +04:00
|
|
|
/* 2. move blocks [(size - rank) .. size] from rbuf to the begining of rbuf */
|
2012-04-06 19:48:07 +04:00
|
|
|
tmpsend = (char*) rbuf + (ptrdiff_t)(size - rank) * (ptrdiff_t)rcount * rext;
|
2015-02-15 22:47:27 +03:00
|
|
|
err = ompi_datatype_copy_content_same_ddt(rdtype, (ptrdiff_t)rank * (ptrdiff_t)rcount,
|
2012-10-09 01:34:04 +04:00
|
|
|
rbuf, tmpsend);
|
2012-04-06 19:48:07 +04:00
|
|
|
if (err < 0) { line = __LINE__; goto err_hndl; }
|
2006-12-21 21:40:02 +03:00
|
|
|
|
2012-10-09 01:34:04 +04:00
|
|
|
/* 3. copy blocks from shift buffer back to rbuf starting at block [rank]. */
|
2012-04-06 19:48:07 +04:00
|
|
|
tmprecv = (char*) rbuf + (ptrdiff_t)rank * (ptrdiff_t)rcount * rext;
|
2015-02-15 22:47:27 +03:00
|
|
|
err = ompi_datatype_copy_content_same_ddt(rdtype, (ptrdiff_t)(size - rank) * (ptrdiff_t)rcount,
|
2012-04-06 19:48:07 +04:00
|
|
|
tmprecv, shift_buf);
|
|
|
|
if (err < 0) { line = __LINE__; goto err_hndl; }
|
2006-12-21 21:40:02 +03:00
|
|
|
|
2012-04-06 19:48:07 +04:00
|
|
|
free(free_buf);
|
|
|
|
}
|
2006-12-21 21:40:02 +03:00
|
|
|
|
2012-04-06 19:48:07 +04:00
|
|
|
return OMPI_SUCCESS;
|
2006-12-21 21:40:02 +03:00
|
|
|
|
|
|
|
err_hndl:
|
2015-02-15 22:47:27 +03:00
|
|
|
OPAL_OUTPUT((ompi_coll_base_framework.framework_output, "%s:%4d\tError occurred %d, rank %2d",
|
2012-04-06 19:48:07 +04:00
|
|
|
__FILE__, line, err, rank));
|
|
|
|
return err;
|
2006-12-21 21:40:02 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2015-02-15 22:47:27 +03:00
|
|
|
* ompi_coll_base_allgather_intra_recursivedoubling
|
2006-12-21 21:40:02 +03:00
|
|
|
*
|
|
|
|
* Function: allgather using O(log(N)) steps.
|
|
|
|
* Accepts: Same arguments as MPI_Allgather
|
|
|
|
* Returns: MPI_SUCCESS or error code
|
|
|
|
*
|
|
|
|
* Description: Recursive doubling algorithm for MPI_Allgather implementation.
|
|
|
|
* This algorithm is used in MPICH-2 for small- and medium-sized
|
|
|
|
* messages on power-of-two processes.
|
|
|
|
*
|
2015-02-15 22:47:27 +03:00
|
|
|
* Limitation: Current implementation only works on power-of-two number of
|
|
|
|
* processes.
|
2006-12-21 21:40:02 +03:00
|
|
|
* In case this algorithm is invoked on non-power-of-two
|
|
|
|
* processes, Bruck algorithm will be invoked.
|
2015-02-15 22:47:27 +03:00
|
|
|
*
|
2006-12-21 21:40:02 +03:00
|
|
|
* Memory requirements:
|
|
|
|
* No additional memory requirements beyond user-supplied buffers.
|
2015-02-15 22:47:27 +03:00
|
|
|
*
|
2006-12-21 21:40:02 +03:00
|
|
|
* Example on 4 nodes:
|
|
|
|
* Initialization: everyone has its own buffer at location rank in rbuf
|
2015-02-15 22:47:27 +03:00
|
|
|
* # 0 1 2 3
|
2006-12-21 21:40:02 +03:00
|
|
|
* [0] [ ] [ ] [ ]
|
|
|
|
* [ ] [1] [ ] [ ]
|
|
|
|
* [ ] [ ] [2] [ ]
|
|
|
|
* [ ] [ ] [ ] [3]
|
|
|
|
* Step 0: exchange data with (rank ^ 2^0)
|
2015-02-15 22:47:27 +03:00
|
|
|
* # 0 1 2 3
|
2006-12-21 21:40:02 +03:00
|
|
|
* [0] [0] [ ] [ ]
|
|
|
|
* [1] [1] [ ] [ ]
|
|
|
|
* [ ] [ ] [2] [2]
|
|
|
|
* [ ] [ ] [3] [3]
|
|
|
|
* Step 1: exchange data with (rank ^ 2^1) (if you can)
|
2015-02-15 22:47:27 +03:00
|
|
|
* # 0 1 2 3
|
2006-12-21 21:40:02 +03:00
|
|
|
* [0] [0] [0] [0]
|
|
|
|
* [1] [1] [1] [1]
|
|
|
|
* [2] [2] [2] [2]
|
|
|
|
* [3] [3] [3] [3]
|
|
|
|
*
|
|
|
|
* TODO: Modify the algorithm to work with any number of nodes.
|
|
|
|
* We can modify code to use identical implementation like MPICH-2:
|
2015-02-15 22:47:27 +03:00
|
|
|
* - using recursive-halving algorithm, at the end of each step,
|
2006-12-21 21:40:02 +03:00
|
|
|
* determine if there are nodes who did not exchange their data in that
|
|
|
|
* step, and send them appropriate messages.
|
|
|
|
*/
|
2015-02-15 22:47:27 +03:00
|
|
|
int
|
|
|
|
ompi_coll_base_allgather_intra_recursivedoubling(void *sbuf, int scount,
|
2006-12-21 21:40:02 +03:00
|
|
|
struct ompi_datatype_t *sdtype,
|
|
|
|
void* rbuf, int rcount,
|
|
|
|
struct ompi_datatype_t *rdtype,
|
2007-08-19 07:37:49 +04:00
|
|
|
struct ompi_communicator_t *comm,
|
2012-04-06 19:48:07 +04:00
|
|
|
mca_coll_base_module_t *module)
|
2006-12-21 21:40:02 +03:00
|
|
|
{
|
2012-04-06 19:48:07 +04:00
|
|
|
int line = -1, rank, size, pow2size, err;
|
|
|
|
int remote, distance, sendblocklocation;
|
|
|
|
ptrdiff_t slb, rlb, sext, rext;
|
|
|
|
char *tmpsend = NULL, *tmprecv = NULL;
|
|
|
|
|
|
|
|
size = ompi_comm_size(comm);
|
|
|
|
rank = ompi_comm_rank(comm);
|
|
|
|
|
|
|
|
pow2size = opal_next_poweroftwo (size);
|
|
|
|
pow2size >>=1;
|
|
|
|
|
|
|
|
/* Current implementation only handles power-of-two number of processes.
|
2015-02-15 22:47:27 +03:00
|
|
|
If the function was called on non-power-of-two number of processes,
|
2012-04-06 19:48:07 +04:00
|
|
|
print warning and call bruck allgather algorithm with same parameters.
|
|
|
|
*/
|
|
|
|
if (pow2size != size) {
|
2015-02-15 22:47:27 +03:00
|
|
|
OPAL_OUTPUT((ompi_coll_base_framework.framework_output,
|
|
|
|
"coll:base:allgather_intra_recursivedoubling WARNING: non-pow-2 size %d, switching to bruck algorithm",
|
2012-04-06 19:48:07 +04:00
|
|
|
size));
|
|
|
|
|
2015-02-15 22:47:27 +03:00
|
|
|
return ompi_coll_base_allgather_intra_bruck(sbuf, scount, sdtype,
|
2012-04-06 19:48:07 +04:00
|
|
|
rbuf, rcount, rdtype,
|
|
|
|
comm, module);
|
|
|
|
}
|
|
|
|
|
2015-02-15 22:47:27 +03:00
|
|
|
OPAL_OUTPUT((ompi_coll_base_framework.framework_output,
|
|
|
|
"coll:base:allgather_intra_recursivedoubling rank %d, size %d",
|
2012-04-06 19:48:07 +04:00
|
|
|
rank, size));
|
|
|
|
|
|
|
|
err = ompi_datatype_get_extent (sdtype, &slb, &sext);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
err = ompi_datatype_get_extent (rdtype, &rlb, &rext);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
/* Initialization step:
|
2015-02-15 22:47:27 +03:00
|
|
|
- if send buffer is not MPI_IN_PLACE, copy send buffer to block 0 of
|
2012-04-06 19:48:07 +04:00
|
|
|
receive buffer
|
2006-12-21 21:40:02 +03:00
|
|
|
*/
|
2012-04-06 19:48:07 +04:00
|
|
|
if (MPI_IN_PLACE != sbuf) {
|
|
|
|
tmpsend = (char*) sbuf;
|
|
|
|
tmprecv = (char*) rbuf + (ptrdiff_t)rank * (ptrdiff_t)rcount * rext;
|
|
|
|
err = ompi_datatype_sndrcv(tmpsend, scount, sdtype, tmprecv, rcount, rdtype);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
2015-02-15 22:47:27 +03:00
|
|
|
}
|
|
|
|
|
2012-04-06 19:48:07 +04:00
|
|
|
/* Communication step:
|
|
|
|
At every step i, rank r:
|
|
|
|
- exchanges message with rank remote = (r ^ 2^i).
|
|
|
|
|
|
|
|
*/
|
|
|
|
sendblocklocation = rank;
|
|
|
|
for (distance = 0x1; distance < size; distance<<=1) {
|
|
|
|
remote = rank ^ distance;
|
|
|
|
|
|
|
|
if (rank < remote) {
|
|
|
|
tmpsend = (char*)rbuf + (ptrdiff_t)sendblocklocation * (ptrdiff_t)rcount * rext;
|
|
|
|
tmprecv = (char*)rbuf + (ptrdiff_t)(sendblocklocation + distance) * (ptrdiff_t)rcount * rext;
|
|
|
|
} else {
|
|
|
|
tmpsend = (char*)rbuf + (ptrdiff_t)sendblocklocation * (ptrdiff_t)rcount * rext;
|
|
|
|
tmprecv = (char*)rbuf + (ptrdiff_t)(sendblocklocation - distance) * (ptrdiff_t)rcount * rext;
|
|
|
|
sendblocklocation -= distance;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Sendreceive */
|
2015-02-15 22:47:27 +03:00
|
|
|
err = ompi_coll_base_sendrecv(tmpsend, (ptrdiff_t)distance * (ptrdiff_t)rcount, rdtype,
|
2012-04-06 19:48:07 +04:00
|
|
|
remote, MCA_COLL_BASE_TAG_ALLGATHER,
|
|
|
|
tmprecv, (ptrdiff_t)distance * (ptrdiff_t)rcount, rdtype,
|
|
|
|
remote, MCA_COLL_BASE_TAG_ALLGATHER,
|
|
|
|
comm, MPI_STATUS_IGNORE, rank);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
return OMPI_SUCCESS;
|
2006-12-21 21:40:02 +03:00
|
|
|
|
|
|
|
err_hndl:
|
2015-02-15 22:47:27 +03:00
|
|
|
OPAL_OUTPUT((ompi_coll_base_framework.framework_output, "%s:%4d\tError occurred %d, rank %2d",
|
2012-04-06 19:48:07 +04:00
|
|
|
__FILE__, line, err, rank));
|
|
|
|
return err;
|
2006-12-21 21:40:02 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
2015-02-15 22:47:27 +03:00
|
|
|
* ompi_coll_base_allgather_intra_ring
|
2006-12-21 21:40:02 +03:00
|
|
|
*
|
|
|
|
* Function: allgather using O(N) steps.
|
|
|
|
* Accepts: Same arguments as MPI_Allgather
|
|
|
|
* Returns: MPI_SUCCESS or error code
|
|
|
|
*
|
|
|
|
* Description: Ring algorithm for all gather.
|
|
|
|
* At every step i, rank r receives message from rank (r - 1)
|
|
|
|
* containing data from rank (r - i - 1) and sends message to rank
|
|
|
|
* (r + 1) containing data from rank (r - i), with wrap arounds.
|
|
|
|
* Memory requirements:
|
|
|
|
* No additional memory requirements.
|
2015-02-15 22:47:27 +03:00
|
|
|
*
|
2006-12-21 21:40:02 +03:00
|
|
|
*/
|
2015-02-15 22:47:27 +03:00
|
|
|
int ompi_coll_base_allgather_intra_ring(void *sbuf, int scount,
|
2006-12-21 21:40:02 +03:00
|
|
|
struct ompi_datatype_t *sdtype,
|
|
|
|
void* rbuf, int rcount,
|
|
|
|
struct ompi_datatype_t *rdtype,
|
2007-08-19 07:37:49 +04:00
|
|
|
struct ompi_communicator_t *comm,
|
2012-04-06 19:48:07 +04:00
|
|
|
mca_coll_base_module_t *module)
|
2006-12-21 21:40:02 +03:00
|
|
|
{
|
2012-04-06 19:48:07 +04:00
|
|
|
int line = -1, rank, size, err, sendto, recvfrom, i, recvdatafrom, senddatafrom;
|
|
|
|
ptrdiff_t slb, rlb, sext, rext;
|
|
|
|
char *tmpsend = NULL, *tmprecv = NULL;
|
|
|
|
|
|
|
|
size = ompi_comm_size(comm);
|
|
|
|
rank = ompi_comm_rank(comm);
|
|
|
|
|
2015-02-15 22:47:27 +03:00
|
|
|
OPAL_OUTPUT((ompi_coll_base_framework.framework_output,
|
|
|
|
"coll:base:allgather_intra_ring rank %d", rank));
|
2012-04-06 19:48:07 +04:00
|
|
|
|
|
|
|
err = ompi_datatype_get_extent (sdtype, &slb, &sext);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
err = ompi_datatype_get_extent (rdtype, &rlb, &rext);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
/* Initialization step:
|
|
|
|
- if send buffer is not MPI_IN_PLACE, copy send buffer to appropriate block
|
|
|
|
of receive buffer
|
|
|
|
*/
|
|
|
|
tmprecv = (char*) rbuf + (ptrdiff_t)rank * (ptrdiff_t)rcount * rext;
|
|
|
|
if (MPI_IN_PLACE != sbuf) {
|
|
|
|
tmpsend = (char*) sbuf;
|
|
|
|
err = ompi_datatype_sndrcv(tmpsend, scount, sdtype, tmprecv, rcount, rdtype);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
2015-02-15 22:47:27 +03:00
|
|
|
}
|
|
|
|
|
2012-04-06 19:48:07 +04:00
|
|
|
/* Communication step:
|
|
|
|
At every step i: 0 .. (P-1), rank r:
|
|
|
|
- receives message from [(r - 1 + size) % size] containing data from rank
|
|
|
|
[(r - i - 1 + size) % size]
|
|
|
|
- sends message to rank [(r + 1) % size] containing data from rank
|
|
|
|
[(r - i + size) % size]
|
2015-02-15 22:47:27 +03:00
|
|
|
- sends message which starts at begining of rbuf and has size
|
2012-04-06 19:48:07 +04:00
|
|
|
*/
|
|
|
|
sendto = (rank + 1) % size;
|
|
|
|
recvfrom = (rank - 1 + size) % size;
|
|
|
|
|
|
|
|
for (i = 0; i < size - 1; i++) {
|
|
|
|
recvdatafrom = (rank - i - 1 + size) % size;
|
|
|
|
senddatafrom = (rank - i + size) % size;
|
|
|
|
|
|
|
|
tmprecv = (char*)rbuf + (ptrdiff_t)recvdatafrom * (ptrdiff_t)rcount * rext;
|
|
|
|
tmpsend = (char*)rbuf + (ptrdiff_t)senddatafrom * (ptrdiff_t)rcount * rext;
|
|
|
|
|
|
|
|
/* Sendreceive */
|
2015-02-15 22:47:27 +03:00
|
|
|
err = ompi_coll_base_sendrecv(tmpsend, rcount, rdtype, sendto,
|
2012-04-06 19:48:07 +04:00
|
|
|
MCA_COLL_BASE_TAG_ALLGATHER,
|
|
|
|
tmprecv, rcount, rdtype, recvfrom,
|
|
|
|
MCA_COLL_BASE_TAG_ALLGATHER,
|
|
|
|
comm, MPI_STATUS_IGNORE, rank);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
return OMPI_SUCCESS;
|
2006-12-21 21:40:02 +03:00
|
|
|
|
|
|
|
err_hndl:
|
2015-02-15 22:47:27 +03:00
|
|
|
OPAL_OUTPUT((ompi_coll_base_framework.framework_output, "%s:%4d\tError occurred %d, rank %2d",
|
2012-04-06 19:48:07 +04:00
|
|
|
__FILE__, line, err, rank));
|
|
|
|
return err;
|
2006-12-21 21:40:02 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2015-02-15 22:47:27 +03:00
|
|
|
* ompi_coll_base_allgather_intra_neighborexchange
|
2006-12-21 21:40:02 +03:00
|
|
|
*
|
|
|
|
* Function: allgather using N/2 steps (O(N))
|
|
|
|
* Accepts: Same arguments as MPI_Allgather
|
|
|
|
* Returns: MPI_SUCCESS or error code
|
|
|
|
*
|
|
|
|
* Description: Neighbor Exchange algorithm for allgather.
|
2015-02-15 22:47:27 +03:00
|
|
|
* Described by Chen et.al. in
|
|
|
|
* "Performance Evaluation of Allgather Algorithms on
|
2006-12-21 21:40:02 +03:00
|
|
|
* Terascale Linux Cluster with Fast Ethernet",
|
2015-02-15 22:47:27 +03:00
|
|
|
* Proceedings of the Eighth International Conference on
|
2006-12-21 21:40:02 +03:00
|
|
|
* High-Performance Computing inn Asia-Pacific Region
|
|
|
|
* (HPCASIA'05), 2005
|
2015-02-15 22:47:27 +03:00
|
|
|
*
|
2006-12-21 21:40:02 +03:00
|
|
|
* Rank r exchanges message with one of its neighbors and
|
|
|
|
* forwards the data further in the next step.
|
|
|
|
*
|
|
|
|
* No additional memory requirements.
|
2015-02-15 22:47:27 +03:00
|
|
|
*
|
2006-12-21 21:40:02 +03:00
|
|
|
* Limitations: Algorithm works only on even number of processes.
|
|
|
|
* For odd number of processes we switch to ring algorithm.
|
2015-02-15 22:47:27 +03:00
|
|
|
*
|
2006-12-21 21:40:02 +03:00
|
|
|
* Example on 6 nodes:
|
|
|
|
* Initial state
|
|
|
|
* # 0 1 2 3 4 5
|
|
|
|
* [0] [ ] [ ] [ ] [ ] [ ]
|
|
|
|
* [ ] [1] [ ] [ ] [ ] [ ]
|
|
|
|
* [ ] [ ] [2] [ ] [ ] [ ]
|
|
|
|
* [ ] [ ] [ ] [3] [ ] [ ]
|
|
|
|
* [ ] [ ] [ ] [ ] [4] [ ]
|
|
|
|
* [ ] [ ] [ ] [ ] [ ] [5]
|
|
|
|
* Step 0:
|
|
|
|
* # 0 1 2 3 4 5
|
|
|
|
* [0] [0] [ ] [ ] [ ] [ ]
|
|
|
|
* [1] [1] [ ] [ ] [ ] [ ]
|
|
|
|
* [ ] [ ] [2] [2] [ ] [ ]
|
|
|
|
* [ ] [ ] [3] [3] [ ] [ ]
|
|
|
|
* [ ] [ ] [ ] [ ] [4] [4]
|
|
|
|
* [ ] [ ] [ ] [ ] [5] [5]
|
|
|
|
* Step 1:
|
|
|
|
* # 0 1 2 3 4 5
|
|
|
|
* [0] [0] [0] [ ] [ ] [0]
|
|
|
|
* [1] [1] [1] [ ] [ ] [1]
|
|
|
|
* [ ] [2] [2] [2] [2] [ ]
|
|
|
|
* [ ] [3] [3] [3] [3] [ ]
|
|
|
|
* [4] [ ] [ ] [4] [4] [4]
|
|
|
|
* [5] [ ] [ ] [5] [5] [5]
|
|
|
|
* Step 2:
|
|
|
|
* # 0 1 2 3 4 5
|
|
|
|
* [0] [0] [0] [0] [0] [0]
|
|
|
|
* [1] [1] [1] [1] [1] [1]
|
|
|
|
* [2] [2] [2] [2] [2] [2]
|
|
|
|
* [3] [3] [3] [3] [3] [3]
|
|
|
|
* [4] [4] [4] [4] [4] [4]
|
|
|
|
* [5] [5] [5] [5] [5] [5]
|
|
|
|
*/
|
2015-02-15 22:47:27 +03:00
|
|
|
int
|
|
|
|
ompi_coll_base_allgather_intra_neighborexchange(void *sbuf, int scount,
|
2006-12-21 21:40:02 +03:00
|
|
|
struct ompi_datatype_t *sdtype,
|
|
|
|
void* rbuf, int rcount,
|
|
|
|
struct ompi_datatype_t *rdtype,
|
2007-08-19 07:37:49 +04:00
|
|
|
struct ompi_communicator_t *comm,
|
2012-04-06 19:48:07 +04:00
|
|
|
mca_coll_base_module_t *module)
|
2006-12-21 21:40:02 +03:00
|
|
|
{
|
2012-04-06 19:48:07 +04:00
|
|
|
int line = -1, rank, size, i, even_rank, err;
|
|
|
|
int neighbor[2], offset_at_step[2], recv_data_from[2], send_data_from;
|
|
|
|
ptrdiff_t slb, rlb, sext, rext;
|
|
|
|
char *tmpsend = NULL, *tmprecv = NULL;
|
|
|
|
|
|
|
|
size = ompi_comm_size(comm);
|
|
|
|
rank = ompi_comm_rank(comm);
|
|
|
|
|
|
|
|
if (size % 2) {
|
2015-02-15 22:47:27 +03:00
|
|
|
OPAL_OUTPUT((ompi_coll_base_framework.framework_output,
|
|
|
|
"coll:base:allgather_intra_neighborexchange WARNING: odd size %d, switching to ring algorithm",
|
2012-04-06 19:48:07 +04:00
|
|
|
size));
|
2015-02-15 22:47:27 +03:00
|
|
|
return ompi_coll_base_allgather_intra_ring(sbuf, scount, sdtype,
|
2012-04-06 19:48:07 +04:00
|
|
|
rbuf, rcount, rdtype,
|
|
|
|
comm, module);
|
|
|
|
}
|
|
|
|
|
2015-02-15 22:47:27 +03:00
|
|
|
OPAL_OUTPUT((ompi_coll_base_framework.framework_output,
|
|
|
|
"coll:base:allgather_intra_neighborexchange rank %d", rank));
|
2012-04-06 19:48:07 +04:00
|
|
|
|
|
|
|
err = ompi_datatype_get_extent (sdtype, &slb, &sext);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
err = ompi_datatype_get_extent (rdtype, &rlb, &rext);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
/* Initialization step:
|
|
|
|
- if send buffer is not MPI_IN_PLACE, copy send buffer to appropriate block
|
|
|
|
of receive buffer
|
|
|
|
*/
|
|
|
|
tmprecv = (char*) rbuf + (ptrdiff_t)rank *(ptrdiff_t) rcount * rext;
|
|
|
|
if (MPI_IN_PLACE != sbuf) {
|
|
|
|
tmpsend = (char*) sbuf;
|
|
|
|
err = ompi_datatype_sndrcv(tmpsend, scount, sdtype, tmprecv, rcount, rdtype);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
2015-02-15 22:47:27 +03:00
|
|
|
}
|
2012-04-06 19:48:07 +04:00
|
|
|
|
|
|
|
/* Determine neighbors, order in which blocks will arrive, etc. */
|
|
|
|
even_rank = !(rank % 2);
|
|
|
|
if (even_rank) {
|
|
|
|
neighbor[0] = (rank + 1) % size;
|
|
|
|
neighbor[1] = (rank - 1 + size) % size;
|
|
|
|
recv_data_from[0] = rank;
|
|
|
|
recv_data_from[1] = rank;
|
|
|
|
offset_at_step[0] = (+2);
|
|
|
|
offset_at_step[1] = (-2);
|
|
|
|
} else {
|
|
|
|
neighbor[0] = (rank - 1 + size) % size;
|
|
|
|
neighbor[1] = (rank + 1) % size;
|
|
|
|
recv_data_from[0] = neighbor[0];
|
|
|
|
recv_data_from[1] = neighbor[0];
|
|
|
|
offset_at_step[0] = (-2);
|
|
|
|
offset_at_step[1] = (+2);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Communication loop:
|
|
|
|
- First step is special: exchange a single block with neighbor[0].
|
2015-02-15 22:47:27 +03:00
|
|
|
- Rest of the steps:
|
|
|
|
update recv_data_from according to offset, and
|
2012-04-06 19:48:07 +04:00
|
|
|
exchange two blocks with appropriate neighbor.
|
|
|
|
the send location becomes previous receve location.
|
|
|
|
*/
|
|
|
|
tmprecv = (char*)rbuf + (ptrdiff_t)neighbor[0] * (ptrdiff_t)rcount * rext;
|
|
|
|
tmpsend = (char*)rbuf + (ptrdiff_t)rank * (ptrdiff_t)rcount * rext;
|
|
|
|
/* Sendreceive */
|
2015-02-15 22:47:27 +03:00
|
|
|
err = ompi_coll_base_sendrecv(tmpsend, rcount, rdtype, neighbor[0],
|
2012-04-06 19:48:07 +04:00
|
|
|
MCA_COLL_BASE_TAG_ALLGATHER,
|
|
|
|
tmprecv, rcount, rdtype, neighbor[0],
|
|
|
|
MCA_COLL_BASE_TAG_ALLGATHER,
|
|
|
|
comm, MPI_STATUS_IGNORE, rank);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
/* Determine initial sending location */
|
|
|
|
if (even_rank) {
|
|
|
|
send_data_from = rank;
|
|
|
|
} else {
|
|
|
|
send_data_from = recv_data_from[0];
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 1; i < (size / 2); i++) {
|
|
|
|
const int i_parity = i % 2;
|
2015-02-15 22:47:27 +03:00
|
|
|
recv_data_from[i_parity] =
|
2012-04-06 19:48:07 +04:00
|
|
|
(recv_data_from[i_parity] + offset_at_step[i_parity] + size) % size;
|
2006-12-21 21:40:02 +03:00
|
|
|
|
2012-04-06 19:48:07 +04:00
|
|
|
tmprecv = (char*)rbuf + (ptrdiff_t)recv_data_from[i_parity] * (ptrdiff_t)rcount * rext;
|
|
|
|
tmpsend = (char*)rbuf + (ptrdiff_t)send_data_from * rcount * rext;
|
2015-02-15 22:47:27 +03:00
|
|
|
|
2012-04-06 19:48:07 +04:00
|
|
|
/* Sendreceive */
|
2015-02-15 22:47:27 +03:00
|
|
|
err = ompi_coll_base_sendrecv(tmpsend, (ptrdiff_t)2 * (ptrdiff_t)rcount, rdtype,
|
|
|
|
neighbor[i_parity],
|
2012-04-06 19:48:07 +04:00
|
|
|
MCA_COLL_BASE_TAG_ALLGATHER,
|
|
|
|
tmprecv, (ptrdiff_t)2 * (ptrdiff_t)rcount, rdtype,
|
|
|
|
neighbor[i_parity],
|
|
|
|
MCA_COLL_BASE_TAG_ALLGATHER,
|
|
|
|
comm, MPI_STATUS_IGNORE, rank);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
send_data_from = recv_data_from[i_parity];
|
|
|
|
}
|
2006-12-21 21:40:02 +03:00
|
|
|
|
2012-04-06 19:48:07 +04:00
|
|
|
return OMPI_SUCCESS;
|
2006-12-21 21:40:02 +03:00
|
|
|
|
|
|
|
err_hndl:
|
2015-02-15 22:47:27 +03:00
|
|
|
OPAL_OUTPUT((ompi_coll_base_framework.framework_output, "%s:%4d\tError occurred %d, rank %2d",
|
2012-04-06 19:48:07 +04:00
|
|
|
__FILE__, line, err, rank));
|
|
|
|
return err;
|
2006-12-21 21:40:02 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2015-02-15 22:47:27 +03:00
|
|
|
int ompi_coll_base_allgather_intra_two_procs(void *sbuf, int scount,
|
2006-12-21 21:40:02 +03:00
|
|
|
struct ompi_datatype_t *sdtype,
|
|
|
|
void* rbuf, int rcount,
|
|
|
|
struct ompi_datatype_t *rdtype,
|
2007-08-19 07:37:49 +04:00
|
|
|
struct ompi_communicator_t *comm,
|
2012-04-06 19:48:07 +04:00
|
|
|
mca_coll_base_module_t *module)
|
2006-12-21 21:40:02 +03:00
|
|
|
{
|
2012-04-06 19:48:07 +04:00
|
|
|
int line = -1, err, rank, remote;
|
|
|
|
char *tmpsend = NULL, *tmprecv = NULL;
|
|
|
|
ptrdiff_t sext, rext, lb;
|
2006-12-21 21:40:02 +03:00
|
|
|
|
2012-04-06 19:48:07 +04:00
|
|
|
rank = ompi_comm_rank(comm);
|
2006-12-21 21:40:02 +03:00
|
|
|
|
2015-02-15 22:47:27 +03:00
|
|
|
OPAL_OUTPUT((ompi_coll_base_framework.framework_output,
|
|
|
|
"ompi_coll_base_allgather_intra_two_procs rank %d", rank));
|
2006-12-21 21:40:02 +03:00
|
|
|
|
2012-04-06 19:48:07 +04:00
|
|
|
err = ompi_datatype_get_extent (sdtype, &lb, &sext);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
2006-12-21 21:40:02 +03:00
|
|
|
|
2012-04-06 19:48:07 +04:00
|
|
|
err = ompi_datatype_get_extent (rdtype, &lb, &rext);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
2006-12-21 21:40:02 +03:00
|
|
|
|
2012-04-06 19:48:07 +04:00
|
|
|
/* Exchange data:
|
|
|
|
- compute source and destinations
|
|
|
|
- send receive data
|
2006-12-21 21:40:02 +03:00
|
|
|
*/
|
2012-04-06 19:48:07 +04:00
|
|
|
remote = rank ^ 0x1;
|
|
|
|
|
|
|
|
tmpsend = (char*)sbuf;
|
|
|
|
if (MPI_IN_PLACE == sbuf) {
|
|
|
|
tmpsend = (char*)rbuf + (ptrdiff_t)rank * (ptrdiff_t)rcount * rext;
|
|
|
|
scount = rcount;
|
|
|
|
sdtype = rdtype;
|
|
|
|
}
|
|
|
|
tmprecv = (char*)rbuf + (ptrdiff_t)remote * (ptrdiff_t)rcount * rext;
|
|
|
|
|
2015-02-15 22:47:27 +03:00
|
|
|
err = ompi_coll_base_sendrecv(tmpsend, scount, sdtype, remote,
|
2012-04-06 19:48:07 +04:00
|
|
|
MCA_COLL_BASE_TAG_ALLGATHER,
|
|
|
|
tmprecv, rcount, rdtype, remote,
|
|
|
|
MCA_COLL_BASE_TAG_ALLGATHER,
|
|
|
|
comm, MPI_STATUS_IGNORE, rank);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
|
|
|
|
/* Place your data in correct location if necessary */
|
|
|
|
if (MPI_IN_PLACE != sbuf) {
|
2015-02-15 22:47:27 +03:00
|
|
|
err = ompi_datatype_sndrcv((char*)sbuf, scount, sdtype,
|
2012-04-06 19:48:07 +04:00
|
|
|
(char*)rbuf + (ptrdiff_t)rank * (ptrdiff_t)rcount * rext, rcount, rdtype);
|
|
|
|
if (MPI_SUCCESS != err) { line = __LINE__; goto err_hndl; }
|
|
|
|
}
|
|
|
|
|
|
|
|
return MPI_SUCCESS;
|
2006-12-21 21:40:02 +03:00
|
|
|
|
|
|
|
err_hndl:
|
2015-02-15 22:47:27 +03:00
|
|
|
OPAL_OUTPUT((ompi_coll_base_framework.framework_output, "%s:%4d\tError occurred %d, rank %2d",
|
2012-04-06 19:48:07 +04:00
|
|
|
__FILE__, line, err, rank));
|
|
|
|
return err;
|
2006-12-21 21:40:02 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Linear functions are copied from the BASIC coll module
|
|
|
|
* they do not segment the message and are simple implementations
|
2015-02-15 22:47:27 +03:00
|
|
|
* but for some small number of nodes and/or small data sizes they
|
|
|
|
* are just as fast as base/tree based segmenting operations
|
2006-12-21 21:40:02 +03:00
|
|
|
* and as such may be selected by the decision functions
|
|
|
|
* These are copied into this module due to the way we select modules
|
|
|
|
* in V1. i.e. in V2 we will handle this differently and so will not
|
|
|
|
* have to duplicate code.
|
2015-02-15 22:47:27 +03:00
|
|
|
* JPG following the examples from other coll_base implementations. Dec06.
|
2006-12-21 21:40:02 +03:00
|
|
|
*/
|
|
|
|
|
|
|
|
/* copied function (with appropriate renaming) starts here */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* allgather_intra_basic_linear
|
|
|
|
*
|
|
|
|
* Function: - allgather using other MPI collections
|
|
|
|
* Accepts: - same as MPI_Allgather()
|
|
|
|
* Returns: - MPI_SUCCESS or error code
|
|
|
|
*/
|
|
|
|
int
|
2015-02-15 22:47:27 +03:00
|
|
|
ompi_coll_base_allgather_intra_basic_linear(void *sbuf, int scount,
|
|
|
|
struct ompi_datatype_t *sdtype,
|
2006-12-21 21:40:02 +03:00
|
|
|
void *rbuf,
|
2015-02-15 22:47:27 +03:00
|
|
|
int rcount,
|
2006-12-21 21:40:02 +03:00
|
|
|
struct ompi_datatype_t *rdtype,
|
2007-08-19 07:37:49 +04:00
|
|
|
struct ompi_communicator_t *comm,
|
2012-04-06 19:48:07 +04:00
|
|
|
mca_coll_base_module_t *module)
|
2006-12-21 21:40:02 +03:00
|
|
|
{
|
|
|
|
int err;
|
2007-02-23 21:55:41 +03:00
|
|
|
ptrdiff_t lb, extent;
|
2006-12-21 21:40:02 +03:00
|
|
|
|
|
|
|
/* Handle MPI_IN_PLACE (see explanantion in reduce.c for how to
|
|
|
|
allocate temp buffer) -- note that rank 0 can use IN_PLACE
|
2007-02-21 23:39:56 +03:00
|
|
|
natively, and we can just alias the right position in rbuf
|
|
|
|
as sbuf and avoid using a temporary buffer if gather is
|
|
|
|
implemented correctly */
|
2006-12-21 21:40:02 +03:00
|
|
|
if (MPI_IN_PLACE == sbuf && 0 != ompi_comm_rank(comm)) {
|
2012-04-06 19:48:07 +04:00
|
|
|
ompi_datatype_get_extent(rdtype, &lb, &extent);
|
|
|
|
sbuf = ((char*) rbuf) + (ompi_comm_rank(comm) * extent * rcount);
|
|
|
|
sdtype = rdtype;
|
|
|
|
scount = rcount;
|
2015-02-15 22:47:27 +03:00
|
|
|
}
|
2006-12-21 21:40:02 +03:00
|
|
|
|
|
|
|
/* Gather and broadcast. */
|
|
|
|
|
2007-08-19 07:37:49 +04:00
|
|
|
err = comm->c_coll.coll_gather(sbuf, scount, sdtype,
|
2012-04-06 19:48:07 +04:00
|
|
|
rbuf, rcount, rdtype,
|
|
|
|
0, comm, comm->c_coll.coll_gather_module);
|
2006-12-21 21:40:02 +03:00
|
|
|
if (MPI_SUCCESS == err) {
|
2012-03-06 02:23:44 +04:00
|
|
|
size_t length = (ptrdiff_t)rcount * ompi_comm_size(comm);
|
2012-03-05 18:30:30 +04:00
|
|
|
if( length < (size_t)INT_MAX ) {
|
2012-03-06 02:23:44 +04:00
|
|
|
err = comm->c_coll.coll_bcast(rbuf, (ptrdiff_t)rcount * ompi_comm_size(comm), rdtype,
|
2012-03-05 18:30:30 +04:00
|
|
|
0, comm, comm->c_coll.coll_bcast_module);
|
|
|
|
} else {
|
|
|
|
ompi_datatype_t* temptype;
|
|
|
|
ompi_datatype_create_contiguous(ompi_comm_size(comm), rdtype, &temptype);
|
|
|
|
ompi_datatype_commit(&temptype);
|
|
|
|
err = comm->c_coll.coll_bcast(rbuf, rcount, temptype,
|
|
|
|
0, comm, comm->c_coll.coll_bcast_module);
|
2012-03-05 19:54:53 +04:00
|
|
|
ompi_datatype_destroy(&temptype);
|
2012-03-05 18:30:30 +04:00
|
|
|
}
|
2006-12-21 21:40:02 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* All done */
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* copied function (with appropriate renaming) ends here */
|