2004-08-21 04:49:07 +04:00
|
|
|
/*
|
2005-11-05 22:57:48 +03:00
|
|
|
* Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
|
|
|
|
* University Research and Technology
|
|
|
|
* Corporation. All rights reserved.
|
2006-08-24 20:38:08 +04:00
|
|
|
* Copyright (c) 2004-2006 The University of Tennessee and The University
|
2005-11-05 22:57:48 +03:00
|
|
|
* of Tennessee Research Foundation. All rights
|
|
|
|
* reserved.
|
2004-11-28 23:09:25 +03:00
|
|
|
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
|
|
|
* University of Stuttgart. All rights reserved.
|
2005-03-24 15:43:37 +03:00
|
|
|
* Copyright (c) 2004-2005 The Regents of the University of California.
|
|
|
|
* All rights reserved.
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
* Copyright (c) 2009 Cisco Systems, Inc. All rights reserved.
|
2004-11-22 04:38:40 +03:00
|
|
|
* $COPYRIGHT$
|
|
|
|
*
|
|
|
|
* Additional copyrights may follow
|
|
|
|
*
|
2004-08-21 04:49:07 +04:00
|
|
|
* $HEADER$
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "ompi_config.h"
|
|
|
|
|
2009-05-22 02:36:06 +04:00
|
|
|
#ifdef HAVE_STRING_H
|
|
|
|
#include <string.h>
|
|
|
|
#endif
|
|
|
|
|
- Split the datatype engine into two parts: an MPI specific part in
OMPI
and a language agnostic part in OPAL. The convertor is completely
moved into OPAL. This offers several benefits as described in RFC
http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
namely:
- Fewer basic types (int* and float* types, boolean and wchar
- Fixing naming scheme to ompi-nomenclature.
- Usability outside of the ompi-layer.
- Due to the fixed nature of simple opal types, their information is
completely
known at compile time and therefore constified
- With fewer datatypes (22), the actual sizes of bit-field types may be
reduced
from 64 to 32 bits, allowing reorganizing the opal_datatype
structure, eliminating holes and keeping data required in convertor
(upon send/recv) in one cacheline...
This has implications to the convertor-datastructure and other parts
of the code.
- Several performance tests have been run, the netpipe latency does not
change with
this patch on Linux/x86-64 on the smoky cluster.
- Extensive tests have been done to verify correctness (no new
regressions) using:
1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
ompi-ddt:
a. running both trunk and ompi-ddt resulted in no differences
(except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
correctly).
b. with --enable-memchecker and running under valgrind (one buglet
when run with static found in test-suite, commited)
2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
all passed (except for the dynamic/ tests failed!! as trunk/MTT)
3. compilation and usage of HDF5 tests on Jaguar using PGI and
PathScale compilers.
4. compilation and usage on Scicortex.
- Please note, that for the heterogeneous case, (-m32 compiled
binaries/ompi), neither
ompi-trunk, nor ompi-ddt branch would successfully launch.
This commit was SVN r21641.
2009-07-13 08:56:31 +04:00
|
|
|
#include "opal/datatype/opal_convertor.h"
|
|
|
|
#include "opal/sys/atomic.h"
|
2006-02-12 04:33:29 +03:00
|
|
|
#include "ompi/constants.h"
|
2005-09-15 06:18:16 +04:00
|
|
|
#include "ompi/communicator/communicator.h"
|
|
|
|
#include "ompi/mca/coll/coll.h"
|
|
|
|
#include "ompi/op/op.h"
|
2004-08-21 04:49:07 +04:00
|
|
|
#include "coll_sm.h"
|
|
|
|
|
|
|
|
|
2005-09-15 06:18:16 +04:00
|
|
|
/*
|
|
|
|
* Local functions
|
|
|
|
*/
|
|
|
|
static int reduce_inorder(void *sbuf, void* rbuf, int count,
|
|
|
|
struct ompi_datatype_t *dtype,
|
|
|
|
struct ompi_op_t *op,
|
2007-08-19 07:37:49 +04:00
|
|
|
int root, struct ompi_communicator_t *comm,
|
2008-07-29 02:40:57 +04:00
|
|
|
mca_coll_base_module_t *module);
|
2005-09-15 06:18:16 +04:00
|
|
|
#define WANT_REDUCE_NO_ORDER 0
|
|
|
|
#if WANT_REDUCE_NO_ORDER
|
|
|
|
static int reduce_no_order(void *sbuf, void* rbuf, int count,
|
|
|
|
struct ompi_datatype_t *dtype,
|
|
|
|
struct ompi_op_t *op,
|
2007-08-19 07:37:49 +04:00
|
|
|
int root, struct ompi_communicator_t *comm,
|
2008-07-29 02:40:57 +04:00
|
|
|
mca_coll_base_module_t *module);
|
2005-09-15 06:18:16 +04:00
|
|
|
#endif
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Useful utility routine
|
|
|
|
*/
|
|
|
|
static inline int min(int a, int b)
|
|
|
|
{
|
|
|
|
return (a < b) ? a : b;
|
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Shared memory reduction.
|
2004-08-21 04:49:07 +04:00
|
|
|
*
|
2005-09-15 06:18:16 +04:00
|
|
|
* Simply farms out to the associative or non-associative functions.
|
2004-08-21 04:49:07 +04:00
|
|
|
*/
|
2005-01-30 04:42:57 +03:00
|
|
|
int mca_coll_sm_reduce_intra(void *sbuf, void* rbuf, int count,
|
|
|
|
struct ompi_datatype_t *dtype,
|
|
|
|
struct ompi_op_t *op,
|
2007-08-19 07:37:49 +04:00
|
|
|
int root, struct ompi_communicator_t *comm,
|
2008-07-29 02:40:57 +04:00
|
|
|
mca_coll_base_module_t *module)
|
2004-08-21 04:49:07 +04:00
|
|
|
{
|
2006-10-18 00:20:58 +04:00
|
|
|
size_t size;
|
2007-08-19 07:37:49 +04:00
|
|
|
mca_coll_sm_module_t *sm_module = (mca_coll_sm_module_t*) module;
|
2005-09-15 06:18:16 +04:00
|
|
|
|
|
|
|
/* There are several possibilities:
|
|
|
|
*
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
* 0. If the datatype is larger than a segment, fall back to
|
|
|
|
* underlying module
|
2005-09-15 06:18:16 +04:00
|
|
|
* 1. If the op is user-defined, use the strict order
|
|
|
|
* 2. If the op is intrinsic:
|
|
|
|
* a. If the op is float-associative, use the unordered
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
* b. If the op is not float-associative:
|
2005-09-15 06:18:16 +04:00
|
|
|
* i. if the data is floating point, use the strict order
|
|
|
|
* ii. if the data is not floating point, use the unordered
|
|
|
|
*/
|
|
|
|
|
- Split the datatype engine into two parts: an MPI specific part in
OMPI
and a language agnostic part in OPAL. The convertor is completely
moved into OPAL. This offers several benefits as described in RFC
http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
namely:
- Fewer basic types (int* and float* types, boolean and wchar
- Fixing naming scheme to ompi-nomenclature.
- Usability outside of the ompi-layer.
- Due to the fixed nature of simple opal types, their information is
completely
known at compile time and therefore constified
- With fewer datatypes (22), the actual sizes of bit-field types may be
reduced
from 64 to 32 bits, allowing reorganizing the opal_datatype
structure, eliminating holes and keeping data required in convertor
(upon send/recv) in one cacheline...
This has implications to the convertor-datastructure and other parts
of the code.
- Several performance tests have been run, the netpipe latency does not
change with
this patch on Linux/x86-64 on the smoky cluster.
- Extensive tests have been done to verify correctness (no new
regressions) using:
1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
ompi-ddt:
a. running both trunk and ompi-ddt resulted in no differences
(except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
correctly).
b. with --enable-memchecker and running under valgrind (one buglet
when run with static found in test-suite, commited)
2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
all passed (except for the dynamic/ tests failed!! as trunk/MTT)
3. compilation and usage of HDF5 tests on Jaguar using PGI and
PathScale compilers.
4. compilation and usage on Scicortex.
- Please note, that for the heterogeneous case, (-m32 compiled
binaries/ompi), neither
ompi-trunk, nor ompi-ddt branch would successfully launch.
This commit was SVN r21641.
2009-07-13 08:56:31 +04:00
|
|
|
ompi_datatype_type_size(dtype, &size);
|
2006-10-18 00:20:58 +04:00
|
|
|
if ((int)size > mca_coll_sm_component.sm_control_size) {
|
2007-08-19 07:37:49 +04:00
|
|
|
return sm_module->previous_reduce(sbuf, rbuf, count,
|
|
|
|
dtype, op, root, comm,
|
|
|
|
sm_module->previous_reduce_module);
|
2005-09-15 06:18:16 +04:00
|
|
|
}
|
|
|
|
#if WANT_REDUCE_NO_ORDER
|
2009-10-02 21:13:56 +04:00
|
|
|
else {
|
|
|
|
/* Lazily enable the module the first time we invoke a
|
|
|
|
collective on it */
|
|
|
|
if (!sm_module->enabled) {
|
|
|
|
if (OMPI_SUCCESS !=
|
|
|
|
(ret = ompi_coll_sm_lazy_enable(module, comm))) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!ompi_op_is_intrinsic(op) ||
|
|
|
|
(ompi_op_is_intrinsic(op) && !ompi_op_is_float_assoc(op) &&
|
|
|
|
0 != (dtype->flags & OMPI_DATATYPE_FLAG_DATA_FLOAT))) {
|
|
|
|
return reduce_inorder(sbuf, rbuf, count, dtype, op,
|
|
|
|
root, comm, module);
|
|
|
|
} else {
|
|
|
|
return reduce_no_order(sbuf, rbuf, count, dtype, op,
|
|
|
|
root, comm, module);
|
|
|
|
}
|
2005-09-15 06:18:16 +04:00
|
|
|
}
|
|
|
|
#else
|
|
|
|
else {
|
2009-10-02 21:13:56 +04:00
|
|
|
/* Lazily enable the module the first time we invoke a
|
|
|
|
collective on it */
|
|
|
|
if (!sm_module->enabled) {
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (OMPI_SUCCESS !=
|
|
|
|
(ret = ompi_coll_sm_lazy_enable(module, comm))) {
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-08-19 07:37:49 +04:00
|
|
|
return reduce_inorder(sbuf, rbuf, count, dtype, op, root, comm, module);
|
2005-09-15 06:18:16 +04:00
|
|
|
}
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* In-order shared memory reduction.
|
|
|
|
*
|
|
|
|
* This function performs the reduction in order -- combining elements
|
|
|
|
* starting with (0 operation 1), then (result operation 2), then
|
|
|
|
* (result operation 3), etc.
|
|
|
|
*
|
|
|
|
* Root's algorithm:
|
|
|
|
*
|
|
|
|
* If our datatype is "friendly" (i.e., the representation of the
|
|
|
|
* buffer is the same packed as it is unpacked), then the root doesn't
|
|
|
|
* need a temporary buffer -- we can combine the operands directly
|
|
|
|
* from the shared memory segments to the root's rbuf. Otherwise, we
|
|
|
|
* need a receive convertor and receive each fragment into a temporary
|
|
|
|
* buffer where we can combine that operan with the root's rbuf.
|
|
|
|
*
|
|
|
|
* In general, there are two loops:
|
|
|
|
*
|
|
|
|
* 1. loop over all fragments (which must be done in units of an
|
|
|
|
* integer number of datatypes -- remember that if this function is
|
|
|
|
* called, we know that the datattype is smaller than the max size of
|
|
|
|
* a fragment, so this is definitely possible)
|
|
|
|
*
|
|
|
|
* 2. loop over all the processes -- 0 to (comm_size-1).
|
|
|
|
* For process 0:
|
|
|
|
* - if the root==0, copy the *entire* buffer (i.e., don't copy
|
|
|
|
* fragment by fragment -- might as well copy the entire thing) the
|
|
|
|
* first time through the algorithm, and no-op every other time
|
|
|
|
* - else, copy from the shmem fragment to the out buffer
|
|
|
|
* For all other proceses:
|
|
|
|
* - if root==i, combine the relevant fragment from the sbuf to the
|
|
|
|
* relevant fragment on the rbuf
|
|
|
|
* - else, if the datatype is friendly, combine relevant fragment from
|
|
|
|
* the shmem segment to the relevant fragment in the rbuf. Otherwise,
|
|
|
|
* use the convertor to copy the fragment out of shmem into a temp
|
|
|
|
* buffer and do the combination from there to the rbuf.
|
|
|
|
*
|
|
|
|
* If we don't have a friendly datatype, then free the temporary
|
|
|
|
* buffer at the end.
|
|
|
|
*/
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
|
2009-10-02 21:13:56 +04:00
|
|
|
|
2005-09-15 06:18:16 +04:00
|
|
|
static int reduce_inorder(void *sbuf, void* rbuf, int count,
|
|
|
|
struct ompi_datatype_t *dtype,
|
|
|
|
struct ompi_op_t *op,
|
2007-08-19 07:37:49 +04:00
|
|
|
int root, struct ompi_communicator_t *comm,
|
2008-07-29 02:40:57 +04:00
|
|
|
mca_coll_base_module_t *module)
|
2005-09-15 06:18:16 +04:00
|
|
|
{
|
|
|
|
struct iovec iov;
|
2007-08-19 07:37:49 +04:00
|
|
|
mca_coll_sm_module_t *sm_module = (mca_coll_sm_module_t*) module;
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
mca_coll_sm_comm_t *data = sm_module->sm_comm_data;
|
2005-09-15 06:18:16 +04:00
|
|
|
int ret, rank, size;
|
|
|
|
int flag_num, segment_num, max_segment_num;
|
|
|
|
size_t total_size, max_data, bytes;
|
|
|
|
mca_coll_sm_in_use_flag_t *flag;
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
mca_coll_sm_data_index_t *index;
|
2006-10-18 00:20:58 +04:00
|
|
|
size_t ddt_size;
|
2005-09-15 06:18:16 +04:00
|
|
|
size_t segment_ddt_count, segment_ddt_bytes, zero = 0;
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
ptrdiff_t true_lb, true_extent, lb, extent;
|
2005-09-15 06:18:16 +04:00
|
|
|
|
|
|
|
/* Setup some identities */
|
|
|
|
|
|
|
|
rank = ompi_comm_rank(comm);
|
|
|
|
size = ompi_comm_size(comm);
|
|
|
|
|
|
|
|
/* Figure out how much we should have the convertor copy. We need
|
|
|
|
to have it be in units of a datatype -- i.e., we only want to
|
|
|
|
copy a whole datatype worth of data or none at all (we've
|
|
|
|
already guaranteed above that the datatype is not larger than a
|
|
|
|
segment, so we'll at least get 1). */
|
|
|
|
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
/* ddt_size is the packed size (e.g., MPI_SHORT_INT is 6) */
|
- Split the datatype engine into two parts: an MPI specific part in
OMPI
and a language agnostic part in OPAL. The convertor is completely
moved into OPAL. This offers several benefits as described in RFC
http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
namely:
- Fewer basic types (int* and float* types, boolean and wchar
- Fixing naming scheme to ompi-nomenclature.
- Usability outside of the ompi-layer.
- Due to the fixed nature of simple opal types, their information is
completely
known at compile time and therefore constified
- With fewer datatypes (22), the actual sizes of bit-field types may be
reduced
from 64 to 32 bits, allowing reorganizing the opal_datatype
structure, eliminating holes and keeping data required in convertor
(upon send/recv) in one cacheline...
This has implications to the convertor-datastructure and other parts
of the code.
- Several performance tests have been run, the netpipe latency does not
change with
this patch on Linux/x86-64 on the smoky cluster.
- Extensive tests have been done to verify correctness (no new
regressions) using:
1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
ompi-ddt:
a. running both trunk and ompi-ddt resulted in no differences
(except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
correctly).
b. with --enable-memchecker and running under valgrind (one buglet
when run with static found in test-suite, commited)
2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
all passed (except for the dynamic/ tests failed!! as trunk/MTT)
3. compilation and usage of HDF5 tests on Jaguar using PGI and
PathScale compilers.
4. compilation and usage on Scicortex.
- Please note, that for the heterogeneous case, (-m32 compiled
binaries/ompi), neither
ompi-trunk, nor ompi-ddt branch would successfully launch.
This commit was SVN r21641.
2009-07-13 08:56:31 +04:00
|
|
|
ompi_datatype_type_size(dtype, &ddt_size);
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
/* extent is from lb to ub (e.g., MPI_SHORT_INT is 8) */
|
|
|
|
ompi_datatype_get_extent(dtype, &lb, &extent);
|
|
|
|
/* true_extent is extent of actual type map, ignoring lb and ub
|
|
|
|
(e.g., MPI_SHORT_INT is 8) */
|
|
|
|
ompi_datatype_get_true_extent(dtype, &true_lb, &true_extent);
|
2005-09-15 06:18:16 +04:00
|
|
|
segment_ddt_count = mca_coll_sm_component.sm_fragment_size / ddt_size;
|
|
|
|
iov.iov_len = segment_ddt_bytes = segment_ddt_count * ddt_size;
|
|
|
|
total_size = ddt_size * count;
|
|
|
|
|
|
|
|
bytes = 0;
|
|
|
|
|
|
|
|
/* Only have one top-level decision as to whether I'm the root or
|
|
|
|
not. Do this at the slight expense of repeating a little logic
|
|
|
|
-- but it's better than a conditional branch in every loop
|
|
|
|
iteration. */
|
|
|
|
|
|
|
|
/*********************************************************************
|
|
|
|
* Root
|
|
|
|
*********************************************************************/
|
|
|
|
|
|
|
|
if (root == rank) {
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
opal_convertor_t rtb_convertor, rbuf_convertor;
|
2005-09-15 06:18:16 +04:00
|
|
|
char *reduce_temp_buffer, *free_buffer, *reduce_target;
|
|
|
|
char *inplace_temp;
|
|
|
|
int peer;
|
2006-08-24 20:38:08 +04:00
|
|
|
size_t count_left = (size_t)count;
|
2005-09-15 06:18:16 +04:00
|
|
|
int frag_num = 0;
|
|
|
|
bool first_operation = true;
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
bool sbuf_copied_to_rbuf = false;
|
2009-10-02 21:13:56 +04:00
|
|
|
|
2005-09-15 06:18:16 +04:00
|
|
|
/* If the datatype is the same packed as it is unpacked, we
|
|
|
|
can save a memory copy and just do the reduction operation
|
|
|
|
directly from the shared memory segment. However, if the
|
|
|
|
representation is not the same, then we need to get a
|
|
|
|
receive convertor and a temporary buffer to receive
|
|
|
|
into. */
|
|
|
|
|
- Split the datatype engine into two parts: an MPI specific part in
OMPI
and a language agnostic part in OPAL. The convertor is completely
moved into OPAL. This offers several benefits as described in RFC
http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
namely:
- Fewer basic types (int* and float* types, boolean and wchar
- Fixing naming scheme to ompi-nomenclature.
- Usability outside of the ompi-layer.
- Due to the fixed nature of simple opal types, their information is
completely
known at compile time and therefore constified
- With fewer datatypes (22), the actual sizes of bit-field types may be
reduced
from 64 to 32 bits, allowing reorganizing the opal_datatype
structure, eliminating holes and keeping data required in convertor
(upon send/recv) in one cacheline...
This has implications to the convertor-datastructure and other parts
of the code.
- Several performance tests have been run, the netpipe latency does not
change with
this patch on Linux/x86-64 on the smoky cluster.
- Extensive tests have been done to verify correctness (no new
regressions) using:
1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
ompi-ddt:
a. running both trunk and ompi-ddt resulted in no differences
(except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
correctly).
b. with --enable-memchecker and running under valgrind (one buglet
when run with static found in test-suite, commited)
2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
all passed (except for the dynamic/ tests failed!! as trunk/MTT)
3. compilation and usage of HDF5 tests on Jaguar using PGI and
PathScale compilers.
4. compilation and usage on Scicortex.
- Please note, that for the heterogeneous case, (-m32 compiled
binaries/ompi), neither
ompi-trunk, nor ompi-ddt branch would successfully launch.
This commit was SVN r21641.
2009-07-13 08:56:31 +04:00
|
|
|
if (ompi_datatype_is_contiguous_memory_layout(dtype, count)) {
|
2005-10-04 01:34:58 +04:00
|
|
|
reduce_temp_buffer = free_buffer = NULL;
|
2005-09-15 06:18:16 +04:00
|
|
|
} else {
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
/* When we have a non-contiguous datatype, we need one or
|
|
|
|
* two convertors:
|
|
|
|
*
|
|
|
|
* rtb_convertor: unpacking from the shmem to the
|
|
|
|
* reduce_temp_buffer (where we can then apply the
|
|
|
|
* reduction).
|
|
|
|
*
|
|
|
|
* rbuf_convertor: unpacking from the shmem directly to the
|
|
|
|
* rbuf (no need to go to the reduce_temp_buffer first and
|
|
|
|
* then apply the reduction -- just copy straight to the
|
|
|
|
* target buffer).
|
|
|
|
*/
|
|
|
|
OBJ_CONSTRUCT(&rtb_convertor, opal_convertor_t);
|
|
|
|
OBJ_CONSTRUCT(&rbuf_convertor, opal_convertor_t);
|
2005-09-15 06:18:16 +04:00
|
|
|
|
|
|
|
/* See lengthy comment in coll basic reduce about
|
|
|
|
explanation for how to malloc the extra buffer. Note
|
|
|
|
that we do not need a buffer big enough to hold "count"
|
|
|
|
instances of the datatype (i.e., big enough to hold the
|
|
|
|
entire user buffer) -- we only need to be able to hold
|
|
|
|
"segment_ddt_count" instances (i.e., the number of
|
|
|
|
instances that can be held in a single fragment) */
|
|
|
|
|
2006-08-24 20:38:08 +04:00
|
|
|
free_buffer = (char*)malloc(true_extent +
|
|
|
|
(segment_ddt_count - 1) * extent);
|
2005-09-15 06:18:16 +04:00
|
|
|
if (NULL == free_buffer) {
|
|
|
|
return OMPI_ERR_OUT_OF_RESOURCE;
|
|
|
|
}
|
|
|
|
reduce_temp_buffer = free_buffer - lb;
|
|
|
|
|
|
|
|
/* Trickery here: we use a potentially smaller count than
|
|
|
|
the user count -- use the largest count that is <=
|
|
|
|
user's count that will fit within a single segment. */
|
|
|
|
|
|
|
|
if (OMPI_SUCCESS !=
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
(ret = opal_convertor_copy_and_prepare_for_recv(
|
|
|
|
ompi_mpi_local_convertor,
|
|
|
|
&(dtype->super),
|
|
|
|
segment_ddt_count,
|
|
|
|
reduce_temp_buffer,
|
|
|
|
0,
|
|
|
|
&rtb_convertor))) {
|
2005-09-15 06:18:16 +04:00
|
|
|
free(free_buffer);
|
|
|
|
return ret;
|
|
|
|
}
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
|
|
|
|
/* See if we need the rbuf_convertor */
|
|
|
|
if (size - 1 != rank) {
|
|
|
|
if (OMPI_SUCCESS !=
|
|
|
|
(ret = opal_convertor_copy_and_prepare_for_recv(
|
|
|
|
ompi_mpi_local_convertor,
|
|
|
|
&(dtype->super),
|
2009-10-02 21:13:56 +04:00
|
|
|
count,
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
rbuf,
|
|
|
|
0,
|
|
|
|
&rbuf_convertor))) {
|
|
|
|
free(free_buffer);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
}
|
2005-09-15 06:18:16 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* If we're a) doing MPI_IN_PLACE (which means we're the root
|
|
|
|
-- wouldn't have gotten down here with MPI_IN_PLACE if we
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
weren't the root), and b) we're not rank (size-1), then we
|
|
|
|
need to copy the rbuf into a temporary buffer and use that
|
|
|
|
as the sbuf */
|
2005-09-15 06:18:16 +04:00
|
|
|
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
if (MPI_IN_PLACE == sbuf && (size - 1) != rank) {
|
2006-08-24 20:38:08 +04:00
|
|
|
inplace_temp = (char*)malloc(true_extent + (count - 1) * extent);
|
2005-09-15 06:18:16 +04:00
|
|
|
if (NULL == inplace_temp) {
|
|
|
|
if (NULL != free_buffer) {
|
|
|
|
free(free_buffer);
|
|
|
|
}
|
|
|
|
return OMPI_ERR_OUT_OF_RESOURCE;
|
|
|
|
}
|
|
|
|
sbuf = inplace_temp - lb;
|
2009-09-30 18:02:47 +04:00
|
|
|
ompi_datatype_copy_content_same_ddt(dtype, count, (char *) sbuf, (char *) rbuf);
|
2005-09-15 06:18:16 +04:00
|
|
|
} else {
|
|
|
|
inplace_temp = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Main loop over receiving / reducing fragments */
|
|
|
|
|
|
|
|
do {
|
2005-10-04 01:34:58 +04:00
|
|
|
flag_num = (data->mcb_operation_count %
|
2005-09-15 06:18:16 +04:00
|
|
|
mca_coll_sm_component.sm_comm_num_in_use_flags);
|
|
|
|
FLAG_SETUP(flag_num, flag, data);
|
2009-09-22 02:20:44 +04:00
|
|
|
FLAG_WAIT_FOR_IDLE(flag, reduce_root_flag_label);
|
2005-10-04 01:34:58 +04:00
|
|
|
FLAG_RETAIN(flag, size, data->mcb_operation_count);
|
|
|
|
++data->mcb_operation_count;
|
2005-09-15 06:18:16 +04:00
|
|
|
|
|
|
|
/* Loop over all the segments in this set */
|
|
|
|
|
2005-10-01 03:12:23 +04:00
|
|
|
segment_num =
|
|
|
|
flag_num * mca_coll_sm_component.sm_segs_per_inuse_flag;
|
|
|
|
max_segment_num =
|
|
|
|
(flag_num + 1) * mca_coll_sm_component.sm_segs_per_inuse_flag;
|
2009-10-02 21:13:56 +04:00
|
|
|
reduce_target = (((char*) rbuf) + (frag_num * extent * segment_ddt_count));
|
2005-09-15 06:18:16 +04:00
|
|
|
do {
|
|
|
|
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
/* Note that all the other coll modules reduce from
|
|
|
|
process (size-1) to 0, so that's the order we'll do
|
|
|
|
it here. */
|
|
|
|
/* Process (size-1) is the root (special case) */
|
|
|
|
if (size - 1 == rank) {
|
2005-10-04 01:34:58 +04:00
|
|
|
/* If we're the root *and* the first process to be
|
|
|
|
combined *and* this is the first segment in the
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
entire algorithm, then just copy the whole sbuf
|
|
|
|
to rbuf. That way, we never need to copy from
|
|
|
|
my sbuf again (i.e., do the copy all at once
|
|
|
|
since all the data is local, and then don't
|
|
|
|
worry about it for the rest of the
|
2005-10-04 01:34:58 +04:00
|
|
|
algorithm) */
|
|
|
|
if (first_operation) {
|
|
|
|
first_operation = false;
|
|
|
|
if (MPI_IN_PLACE != sbuf) {
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
ompi_datatype_copy_content_same_ddt(dtype, count,
|
|
|
|
reduce_target, (char*)sbuf);
|
2005-10-04 01:34:58 +04:00
|
|
|
}
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
sbuf_copied_to_rbuf = true;
|
2005-10-04 01:34:58 +04:00
|
|
|
}
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Process (size-1) is not the root */
|
|
|
|
else {
|
|
|
|
/* Wait for the data to be copied into shmem, just
|
|
|
|
like any other non-root process */
|
|
|
|
index = &(data->mcb_data_index[segment_num]);
|
2009-09-22 02:20:44 +04:00
|
|
|
PARENT_WAIT_FOR_NOTIFY_SPECIFIC(size - 1, rank, index, max_data, reduce_root_parent_label1);
|
2005-10-04 01:34:58 +04:00
|
|
|
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
/* If the datatype is contiguous, just copy it
|
|
|
|
straight to the reduce_target */
|
2005-10-04 01:34:58 +04:00
|
|
|
if (NULL == free_buffer) {
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
memcpy(reduce_target, ((char*)index->mcbmi_data) +
|
|
|
|
(size - 1) * mca_coll_sm_component.sm_fragment_size, max_data);
|
|
|
|
}
|
|
|
|
/* If the datatype is noncontiguous, use the
|
|
|
|
rbuf_convertor to unpack it straight to the
|
|
|
|
rbuf */
|
|
|
|
else {
|
|
|
|
max_data = segment_ddt_bytes;
|
|
|
|
COPY_FRAGMENT_OUT(rbuf_convertor, size - 1, index,
|
2005-10-04 01:34:58 +04:00
|
|
|
iov, max_data);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Loop over all the remaining processes, receiving
|
|
|
|
and reducing them in order */
|
2005-09-15 06:18:16 +04:00
|
|
|
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
for (peer = size - 2; peer >= 0; --peer) {
|
2005-09-15 06:18:16 +04:00
|
|
|
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
/* Handle the case where the source is this
|
|
|
|
process (which, by definition, excludes the
|
|
|
|
sbuf_copied_to_rbuf case because that can
|
|
|
|
*only* happen when root==0). In this case, we
|
|
|
|
don't need to wait for the peer (i.e., me) to
|
|
|
|
copy into shmem -- just reduce directly from my
|
|
|
|
sbuf. */
|
2005-09-15 06:18:16 +04:00
|
|
|
if (rank == peer) {
|
2005-10-04 01:34:58 +04:00
|
|
|
ompi_op_reduce(op,
|
|
|
|
((char *) sbuf) +
|
2009-10-02 21:13:56 +04:00
|
|
|
frag_num * extent * segment_ddt_count,
|
2005-10-04 01:34:58 +04:00
|
|
|
reduce_target,
|
|
|
|
min(count_left, segment_ddt_count),
|
|
|
|
dtype);
|
2005-09-15 06:18:16 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Now handle the case where the source is not
|
|
|
|
this process. Wait for the process to copy to
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
the segment into shmem. */
|
2005-09-15 06:18:16 +04:00
|
|
|
else {
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
index = &(data->mcb_data_index[segment_num]);
|
2005-09-15 06:18:16 +04:00
|
|
|
PARENT_WAIT_FOR_NOTIFY_SPECIFIC(peer, rank,
|
2009-09-22 02:20:44 +04:00
|
|
|
index, max_data, reduce_root_parent_label2);
|
2005-09-15 06:18:16 +04:00
|
|
|
|
|
|
|
/* If we don't need an extra buffer, then do the
|
|
|
|
reduction operation on the fragment straight
|
|
|
|
from the shmem. */
|
|
|
|
|
|
|
|
if (NULL == free_buffer) {
|
2005-10-04 01:34:58 +04:00
|
|
|
ompi_op_reduce(op,
|
|
|
|
(index->mcbmi_data +
|
|
|
|
(peer * mca_coll_sm_component.sm_fragment_size)),
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
reduce_target,
|
|
|
|
min(count_left, segment_ddt_count),
|
2005-10-04 01:34:58 +04:00
|
|
|
dtype);
|
2005-09-15 06:18:16 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Otherwise, unpack the fragment to the temporary
|
|
|
|
buffer and then do the reduction from there */
|
|
|
|
|
|
|
|
else {
|
2005-10-04 01:34:58 +04:00
|
|
|
/* Unpack the fragment into my temporary
|
|
|
|
buffer */
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
max_data = segment_ddt_bytes;
|
|
|
|
COPY_FRAGMENT_OUT(rtb_convertor, peer, index,
|
2005-10-04 01:34:58 +04:00
|
|
|
iov, max_data);
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
opal_convertor_set_position(&rtb_convertor, &zero);
|
2005-10-04 01:34:58 +04:00
|
|
|
|
|
|
|
/* Do the reduction on this fragment */
|
|
|
|
ompi_op_reduce(op, reduce_temp_buffer,
|
|
|
|
reduce_target,
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
min(count_left, segment_ddt_count),
|
2005-10-04 01:34:58 +04:00
|
|
|
dtype);
|
2005-09-15 06:18:16 +04:00
|
|
|
}
|
|
|
|
} /* whether this process was me or not */
|
|
|
|
} /* loop over all proceses */
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
|
2005-09-15 06:18:16 +04:00
|
|
|
/* We've iterated through all the processes -- now we
|
|
|
|
move on to the next segment */
|
|
|
|
|
|
|
|
count_left -= segment_ddt_count;
|
|
|
|
bytes += segment_ddt_bytes;
|
|
|
|
++segment_num;
|
|
|
|
++frag_num;
|
2009-10-02 21:13:56 +04:00
|
|
|
reduce_target += extent * segment_ddt_count;
|
2005-09-15 06:18:16 +04:00
|
|
|
} while (bytes < total_size && segment_num < max_segment_num);
|
|
|
|
|
|
|
|
/* Root is now done with this set of segments */
|
|
|
|
FLAG_RELEASE(flag);
|
|
|
|
} while (bytes < total_size);
|
|
|
|
|
|
|
|
/* Kill the convertor, if we had one */
|
|
|
|
|
|
|
|
if (NULL != free_buffer) {
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
OBJ_DESTRUCT(&rtb_convertor);
|
|
|
|
OBJ_DESTRUCT(&rbuf_convertor);
|
2005-09-15 06:18:16 +04:00
|
|
|
free(free_buffer);
|
|
|
|
}
|
|
|
|
if (NULL != inplace_temp) {
|
|
|
|
free(inplace_temp);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*********************************************************************
|
|
|
|
* Non-root
|
|
|
|
*********************************************************************/
|
|
|
|
|
|
|
|
else {
|
|
|
|
/* Here we get a convertor for the full count that the user
|
|
|
|
provided (as opposed to the convertor that the root got) */
|
|
|
|
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
opal_convertor_t sbuf_convertor;
|
|
|
|
OBJ_CONSTRUCT(&sbuf_convertor, opal_convertor_t);
|
2005-09-15 06:18:16 +04:00
|
|
|
if (OMPI_SUCCESS !=
|
|
|
|
(ret =
|
- Split the datatype engine into two parts: an MPI specific part in
OMPI
and a language agnostic part in OPAL. The convertor is completely
moved into OPAL. This offers several benefits as described in RFC
http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
namely:
- Fewer basic types (int* and float* types, boolean and wchar
- Fixing naming scheme to ompi-nomenclature.
- Usability outside of the ompi-layer.
- Due to the fixed nature of simple opal types, their information is
completely
known at compile time and therefore constified
- With fewer datatypes (22), the actual sizes of bit-field types may be
reduced
from 64 to 32 bits, allowing reorganizing the opal_datatype
structure, eliminating holes and keeping data required in convertor
(upon send/recv) in one cacheline...
This has implications to the convertor-datastructure and other parts
of the code.
- Several performance tests have been run, the netpipe latency does not
change with
this patch on Linux/x86-64 on the smoky cluster.
- Extensive tests have been done to verify correctness (no new
regressions) using:
1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
ompi-ddt:
a. running both trunk and ompi-ddt resulted in no differences
(except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
correctly).
b. with --enable-memchecker and running under valgrind (one buglet
when run with static found in test-suite, commited)
2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
all passed (except for the dynamic/ tests failed!! as trunk/MTT)
3. compilation and usage of HDF5 tests on Jaguar using PGI and
PathScale compilers.
4. compilation and usage on Scicortex.
- Please note, that for the heterogeneous case, (-m32 compiled
binaries/ompi), neither
ompi-trunk, nor ompi-ddt branch would successfully launch.
This commit was SVN r21641.
2009-07-13 08:56:31 +04:00
|
|
|
opal_convertor_copy_and_prepare_for_send(ompi_mpi_local_convertor,
|
|
|
|
&(dtype->super),
|
2005-09-15 06:18:16 +04:00
|
|
|
count,
|
|
|
|
sbuf,
|
2006-03-17 21:46:48 +03:00
|
|
|
0,
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
&sbuf_convertor))) {
|
2005-09-15 06:18:16 +04:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Loop over sending fragments to the root */
|
|
|
|
|
|
|
|
do {
|
|
|
|
flag_num = (data->mcb_operation_count %
|
|
|
|
mca_coll_sm_component.sm_comm_num_in_use_flags);
|
|
|
|
|
|
|
|
/* Wait for the root to mark this set of segments as
|
|
|
|
ours */
|
|
|
|
FLAG_SETUP(flag_num, flag, data);
|
2009-09-22 02:20:44 +04:00
|
|
|
FLAG_WAIT_FOR_OP(flag, data->mcb_operation_count, reduce_nonroot_flag_label);
|
2005-09-15 06:18:16 +04:00
|
|
|
++data->mcb_operation_count;
|
|
|
|
|
|
|
|
/* Loop over all the segments in this set */
|
|
|
|
|
2005-10-01 03:12:23 +04:00
|
|
|
segment_num =
|
|
|
|
flag_num * mca_coll_sm_component.sm_segs_per_inuse_flag;
|
|
|
|
max_segment_num =
|
|
|
|
(flag_num + 1) * mca_coll_sm_component.sm_segs_per_inuse_flag;
|
2005-09-15 06:18:16 +04:00
|
|
|
do {
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
index = &(data->mcb_data_index[segment_num]);
|
2005-09-15 06:18:16 +04:00
|
|
|
|
2005-10-04 18:52:59 +04:00
|
|
|
/* Copy from the user's buffer to my shared mem
|
2005-09-15 06:18:16 +04:00
|
|
|
segment */
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
max_data = segment_ddt_bytes;
|
|
|
|
COPY_FRAGMENT_IN(sbuf_convertor, index, rank, iov, max_data);
|
2005-09-15 06:18:16 +04:00
|
|
|
bytes += max_data;
|
|
|
|
|
|
|
|
/* Wait for the write to absolutely complete */
|
|
|
|
opal_atomic_wmb();
|
|
|
|
|
2005-10-04 18:52:59 +04:00
|
|
|
/* Tell my parent (always the reduction root -- we're
|
|
|
|
ignoring the mcb_tree parent/child relationships
|
|
|
|
here) that this fragment is ready */
|
|
|
|
CHILD_NOTIFY_PARENT(rank, root, index, max_data);
|
2005-09-15 06:18:16 +04:00
|
|
|
|
|
|
|
++segment_num;
|
|
|
|
} while (bytes < total_size && segment_num < max_segment_num);
|
|
|
|
|
|
|
|
/* We're finished with this set of segments */
|
|
|
|
FLAG_RELEASE(flag);
|
|
|
|
} while (bytes < total_size);
|
|
|
|
|
|
|
|
/* Kill the convertor */
|
|
|
|
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 04:25:21 +04:00
|
|
|
OBJ_DESTRUCT(&sbuf_convertor);
|
2005-09-15 06:18:16 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* All done */
|
|
|
|
|
|
|
|
return OMPI_SUCCESS;
|
2004-08-21 04:49:07 +04:00
|
|
|
}
|
|
|
|
|
2005-09-15 06:18:16 +04:00
|
|
|
|
|
|
|
#if WANT_REDUCE_NO_ORDER
|
|
|
|
/**
|
|
|
|
* Unordered shared memory reduction.
|
|
|
|
*
|
|
|
|
* This function performs the reduction in whatever order the operands
|
|
|
|
* arrive.
|
|
|
|
*/
|
|
|
|
static int reduce_no_order(void *sbuf, void* rbuf, int count,
|
|
|
|
struct ompi_datatype_t *dtype,
|
|
|
|
struct ompi_op_t *op,
|
2007-08-19 07:37:49 +04:00
|
|
|
int root, struct ompi_communicator_t *comm,
|
2008-07-29 02:40:57 +04:00
|
|
|
mca_coll_base_module_t *module)
|
2005-09-15 06:18:16 +04:00
|
|
|
{
|
|
|
|
return OMPI_ERR_NOT_IMPLEMENTED;
|
|
|
|
}
|
|
|
|
#endif
|