2004-08-21 00:49:07 +00:00
|
|
|
/*
|
2007-03-16 23:11:45 +00:00
|
|
|
* Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana
|
2005-11-05 19:57:48 +00:00
|
|
|
* University Research and Technology
|
|
|
|
* Corporation. All rights reserved.
|
2006-08-24 16:38:08 +00:00
|
|
|
* Copyright (c) 2004-2006 The University of Tennessee and The University
|
2005-11-05 19:57:48 +00:00
|
|
|
* of Tennessee Research Foundation. All rights
|
|
|
|
* reserved.
|
2004-11-28 20:09:25 +00:00
|
|
|
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
|
|
|
* University of Stuttgart. All rights reserved.
|
2005-03-24 12:43:37 +00:00
|
|
|
* Copyright (c) 2004-2005 The Regents of the University of California.
|
|
|
|
* All rights reserved.
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
* Copyright (c) 2008-2009 Cisco Systems, Inc. All rights reserved.
|
2004-11-22 01:38:40 +00:00
|
|
|
* $COPYRIGHT$
|
|
|
|
*
|
|
|
|
* Additional copyrights may follow
|
|
|
|
*
|
2004-11-22 00:37:56 +00:00
|
|
|
* $HEADER$
|
2004-08-21 00:49:07 +00:00
|
|
|
*/
|
2005-09-06 21:41:55 +00:00
|
|
|
/** @file */
|
2004-08-21 00:49:07 +00:00
|
|
|
|
|
|
|
#ifndef MCA_COLL_SM_EXPORT_H
|
|
|
|
#define MCA_COLL_SM_EXPORT_H
|
|
|
|
|
|
|
|
#include "ompi_config.h"
|
|
|
|
|
|
|
|
#include "mpi.h"
|
2005-07-15 20:01:35 +00:00
|
|
|
#include "opal/mca/mca.h"
|
- Split the datatype engine into two parts: an MPI specific part in
OMPI
and a language agnostic part in OPAL. The convertor is completely
moved into OPAL. This offers several benefits as described in RFC
http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
namely:
- Fewer basic types (int* and float* types, boolean and wchar
- Fixing naming scheme to ompi-nomenclature.
- Usability outside of the ompi-layer.
- Due to the fixed nature of simple opal types, their information is
completely
known at compile time and therefore constified
- With fewer datatypes (22), the actual sizes of bit-field types may be
reduced
from 64 to 32 bits, allowing reorganizing the opal_datatype
structure, eliminating holes and keeping data required in convertor
(upon send/recv) in one cacheline...
This has implications to the convertor-datastructure and other parts
of the code.
- Several performance tests have been run, the netpipe latency does not
change with
this patch on Linux/x86-64 on the smoky cluster.
- Extensive tests have been done to verify correctness (no new
regressions) using:
1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
ompi-ddt:
a. running both trunk and ompi-ddt resulted in no differences
(except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
correctly).
b. with --enable-memchecker and running under valgrind (one buglet
when run with static found in test-suite, commited)
2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
all passed (except for the dynamic/ tests failed!! as trunk/MTT)
3. compilation and usage of HDF5 tests on Jaguar using PGI and
PathScale compilers.
4. compilation and usage on Scicortex.
- Please note, that for the heterogeneous case, (-m32 compiled
binaries/ompi), neither
ompi-trunk, nor ompi-ddt branch would successfully launch.
This commit was SVN r21641.
2009-07-13 04:56:31 +00:00
|
|
|
#include "opal/datatype/opal_convertor.h"
|
2005-07-15 20:01:35 +00:00
|
|
|
#include "ompi/mca/coll/coll.h"
|
2010-06-09 16:58:52 +00:00
|
|
|
#include "ompi/mca/common/sm/common_sm.h"
|
2004-08-21 00:49:07 +00:00
|
|
|
|
2007-08-19 03:37:49 +00:00
|
|
|
BEGIN_C_DECLS
|
|
|
|
|
2009-09-21 22:20:44 +00:00
|
|
|
/* Attempt to give some sort of progress / fairness if we're blocked
|
|
|
|
in an sm collective for a long time: call opal_progress once in a
|
|
|
|
great while. Use a "goto" label for expdiency to exit loops. */
|
|
|
|
#define SPIN_CONDITION_MAX 100000
|
|
|
|
#define SPIN_CONDITION(cond, exit_label) \
|
|
|
|
do { int i; \
|
|
|
|
if (cond) goto exit_label; \
|
|
|
|
for (i = 0; i < SPIN_CONDITION_MAX; ++i) { \
|
|
|
|
if (cond) { goto exit_label; } \
|
|
|
|
} \
|
|
|
|
opal_progress(); \
|
|
|
|
} while (1); \
|
|
|
|
exit_label:
|
2004-08-21 00:49:07 +00:00
|
|
|
|
2005-07-15 20:01:35 +00:00
|
|
|
/**
|
|
|
|
* Structure to hold the sm coll component. First it holds the
|
|
|
|
* base coll component, and then holds a bunch of
|
|
|
|
* sm-coll-component-specific stuff (e.g., current MCA param
|
|
|
|
* values).
|
|
|
|
*/
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
typedef struct mca_coll_sm_component_t {
|
2005-07-15 20:01:35 +00:00
|
|
|
/** Base coll component */
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_component_2_0_0_t super;
|
2005-07-15 20:01:35 +00:00
|
|
|
|
2005-08-23 21:22:00 +00:00
|
|
|
/** MCA parameter: Priority of this component */
|
2005-07-15 20:01:35 +00:00
|
|
|
int sm_priority;
|
2005-08-23 21:22:00 +00:00
|
|
|
|
|
|
|
/** MCA parameter: Length of a cache line or page (in bytes) */
|
|
|
|
int sm_control_size;
|
|
|
|
|
2005-09-03 11:49:46 +00:00
|
|
|
/** MCA parameter: Number of "in use" flags in each
|
|
|
|
communicator's area in the data mpool */
|
|
|
|
int sm_comm_num_in_use_flags;
|
|
|
|
|
2005-08-23 21:22:00 +00:00
|
|
|
/** MCA parameter: Number of segments for each communicator in
|
|
|
|
the data mpool */
|
2005-09-03 11:49:46 +00:00
|
|
|
int sm_comm_num_segments;
|
2005-08-23 21:22:00 +00:00
|
|
|
|
|
|
|
/** MCA parameter: Fragment size for data */
|
|
|
|
int sm_fragment_size;
|
2004-08-21 00:49:07 +00:00
|
|
|
|
2005-08-23 21:22:00 +00:00
|
|
|
/** MCA parameter: Degree of tree for tree-based collectives */
|
|
|
|
int sm_tree_degree;
|
2004-08-21 00:49:07 +00:00
|
|
|
|
2005-09-03 11:49:46 +00:00
|
|
|
/** MCA parameter: Number of processes to use in the
|
|
|
|
calculation of the "info" MCA parameter */
|
|
|
|
int sm_info_comm_size;
|
|
|
|
|
2005-09-06 21:41:55 +00:00
|
|
|
/******* end of MCA params ********/
|
|
|
|
|
2005-09-30 23:12:23 +00:00
|
|
|
/** How many fragment segments are protected by a single
|
|
|
|
in-use flags. This is solely so that we can only perform
|
|
|
|
the division once and then just use the value without
|
|
|
|
having to re-calculate. */
|
|
|
|
int sm_segs_per_inuse_flag;
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
} mca_coll_sm_component_t;
|
2005-08-23 21:22:00 +00:00
|
|
|
|
2005-09-02 12:57:47 +00:00
|
|
|
/**
|
|
|
|
* Structure for representing a node in the tree
|
|
|
|
*/
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
typedef struct mca_coll_sm_tree_node_t {
|
2005-09-02 12:57:47 +00:00
|
|
|
/** Arbitrary ID number, starting from 0 */
|
|
|
|
int mcstn_id;
|
|
|
|
/** Pointer to parent, or NULL if root */
|
|
|
|
struct mca_coll_sm_tree_node_t *mcstn_parent;
|
|
|
|
/** Number of children, or 0 if a leaf */
|
|
|
|
int mcstn_num_children;
|
|
|
|
/** Pointer to an array of children, or NULL if 0 ==
|
|
|
|
mcstn_num_children */
|
|
|
|
struct mca_coll_sm_tree_node_t **mcstn_children;
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
} mca_coll_sm_tree_node_t;
|
2005-09-02 12:57:47 +00:00
|
|
|
|
2005-09-06 21:41:55 +00:00
|
|
|
/**
|
|
|
|
* Simple structure comprising the "in use" flags. Contains two
|
|
|
|
* members: the number of processes that are currently using this
|
|
|
|
* set of segments and the operation number of the current
|
|
|
|
* operation.
|
|
|
|
*/
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
typedef struct mca_coll_sm_in_use_flag_t {
|
2005-09-06 21:41:55 +00:00
|
|
|
/** Number of processes currently using this set of
|
|
|
|
segments */
|
|
|
|
volatile uint32_t mcsiuf_num_procs_using;
|
|
|
|
/** Must match data->mcb_count */
|
|
|
|
volatile uint32_t mcsiuf_operation_count;
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
} mca_coll_sm_in_use_flag_t;
|
2005-09-06 21:41:55 +00:00
|
|
|
|
2005-08-23 21:22:00 +00:00
|
|
|
/**
|
|
|
|
* Structure containing pointers to various arrays of data in the
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
* per-communicator shmem data segment (one of these indexes a
|
|
|
|
* single segment in the per-communicator shmem data segment).
|
|
|
|
* Nothing is hard-coded because all the array lengths and
|
|
|
|
* displacements of the pointers all depend on how many processes
|
|
|
|
* are in the communicator.
|
2005-08-23 21:22:00 +00:00
|
|
|
*/
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
typedef struct mca_coll_sm_data_index_t {
|
2005-09-02 12:57:47 +00:00
|
|
|
/** Pointer to beginning of control data */
|
2005-10-04 14:52:59 +00:00
|
|
|
uint32_t volatile *mcbmi_control;
|
2005-09-02 12:57:47 +00:00
|
|
|
/** Pointer to beginning of message fragment data */
|
|
|
|
char *mcbmi_data;
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
} mca_coll_sm_data_index_t;
|
2005-08-23 21:22:00 +00:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Structure for the sm coll module to hang off the communicator.
|
|
|
|
* Contains communicator-specific information, including pointers
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
* into the per-communicator shmem data data segment for this
|
|
|
|
* comm's sm collective operations area.
|
2005-08-23 21:22:00 +00:00
|
|
|
*/
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
typedef struct mca_coll_sm_comm_t {
|
|
|
|
/* Meta data that we get back from the common mmap allocation
|
|
|
|
function */
|
2010-06-09 16:58:52 +00:00
|
|
|
mca_common_sm_module_t *sm_bootstrap_meta;
|
2005-08-23 21:22:00 +00:00
|
|
|
|
2005-09-02 12:57:47 +00:00
|
|
|
/** Pointer to my barrier control pages (odd index pages are
|
|
|
|
"in", even index pages are "out") */
|
|
|
|
uint32_t *mcb_barrier_control_me;
|
2005-08-23 21:22:00 +00:00
|
|
|
|
2005-09-02 12:57:47 +00:00
|
|
|
/** Pointer to my parent's barrier control pages (will be NULL
|
|
|
|
for communicator rank 0; odd index pages are "in", even
|
|
|
|
index pages are "out") */
|
|
|
|
uint32_t *mcb_barrier_control_parent;
|
|
|
|
|
|
|
|
/** Pointers to my childrens' barrier control pages (they're
|
|
|
|
contiguous in memory, so we only point to the base -- the
|
|
|
|
number of children is in my entry in the mcb_tree); will
|
|
|
|
be NULL if this process has no children (odd index pages
|
|
|
|
are "in", even index pages are "out") */
|
|
|
|
uint32_t *mcb_barrier_control_children;
|
2005-08-23 22:02:28 +00:00
|
|
|
|
2005-09-02 12:57:47 +00:00
|
|
|
/** Number of barriers that we have executed (i.e., which set
|
|
|
|
of barrier buffers to use). */
|
|
|
|
int mcb_barrier_count;
|
2005-08-23 22:02:28 +00:00
|
|
|
|
2005-09-06 21:41:55 +00:00
|
|
|
/** "In use" flags indicating which segments are available */
|
|
|
|
mca_coll_sm_in_use_flag_t *mcb_in_use_flags;
|
|
|
|
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
/** Array of indexes into the per-communicator shmem data
|
|
|
|
segment for control and data fragment passing (containing
|
|
|
|
pointers to each segments control and data areas). */
|
|
|
|
mca_coll_sm_data_index_t *mcb_data_index;
|
2005-08-23 22:02:28 +00:00
|
|
|
|
2005-09-02 12:57:47 +00:00
|
|
|
/** Array of graph nodes representing the tree used for
|
|
|
|
communications */
|
|
|
|
mca_coll_sm_tree_node_t *mcb_tree;
|
2005-08-23 22:02:28 +00:00
|
|
|
|
2005-09-02 12:57:47 +00:00
|
|
|
/** Operation number (i.e., which segment number to use) */
|
2005-09-06 21:41:55 +00:00
|
|
|
uint32_t mcb_operation_count;
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
} mca_coll_sm_comm_t;
|
2007-08-19 03:37:49 +00:00
|
|
|
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
/** Coll sm module */
|
|
|
|
typedef struct mca_coll_sm_module_t {
|
|
|
|
/** Base module */
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t super;
|
2007-08-19 03:37:49 +00:00
|
|
|
|
2009-10-02 17:13:56 +00:00
|
|
|
/* Whether this module has been lazily initialized or not yet */
|
|
|
|
bool enabled;
|
|
|
|
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
/* Data that hangs off the communicator */
|
|
|
|
mca_coll_sm_comm_t *sm_comm_data;
|
|
|
|
|
|
|
|
/* Underlying reduce function and module */
|
2007-08-19 03:37:49 +00:00
|
|
|
mca_coll_base_module_reduce_fn_t previous_reduce;
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *previous_reduce_module;
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
} mca_coll_sm_module_t;
|
2007-08-19 03:37:49 +00:00
|
|
|
OBJ_CLASS_DECLARATION(mca_coll_sm_module_t);
|
2005-01-30 01:42:57 +00:00
|
|
|
|
2005-07-15 20:01:35 +00:00
|
|
|
/**
|
|
|
|
* Global component instance
|
2005-01-30 01:42:57 +00:00
|
|
|
*/
|
2006-08-27 04:49:02 +00:00
|
|
|
OMPI_MODULE_DECLSPEC extern mca_coll_sm_component_t mca_coll_sm_component;
|
2004-08-21 00:49:07 +00:00
|
|
|
|
|
|
|
/*
|
2005-01-30 01:42:57 +00:00
|
|
|
* coll module functions
|
2004-08-21 00:49:07 +00:00
|
|
|
*/
|
2005-03-27 13:05:23 +00:00
|
|
|
int mca_coll_sm_init_query(bool enable_progress_threads,
|
2007-08-19 03:37:49 +00:00
|
|
|
bool enable_mpi_threads);
|
2004-08-21 00:49:07 +00:00
|
|
|
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *
|
2007-08-19 03:37:49 +00:00
|
|
|
mca_coll_sm_comm_query(struct ompi_communicator_t *comm, int *priority);
|
2009-10-02 17:13:56 +00:00
|
|
|
|
|
|
|
/* Lazily enable a module (since it involves expensive/slow mmap
|
|
|
|
allocation, etc.) */
|
|
|
|
int ompi_coll_sm_lazy_enable(mca_coll_base_module_t *module,
|
|
|
|
struct ompi_communicator_t *comm);
|
2004-08-21 00:49:07 +00:00
|
|
|
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_allgather_intra(void *sbuf, int scount,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *sdtype,
|
|
|
|
void *rbuf, int rcount,
|
|
|
|
struct ompi_datatype_t *rdtype,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2007-08-19 03:37:49 +00:00
|
|
|
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_allgatherv_intra(void *sbuf, int scount,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *sdtype,
|
|
|
|
void * rbuf, int *rcounts, int *disps,
|
|
|
|
struct ompi_datatype_t *rdtype,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_allreduce_intra(void *sbuf, void *rbuf, int count,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *dtype,
|
|
|
|
struct ompi_op_t *op,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_alltoall_intra(void *sbuf, int scount,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *sdtype,
|
|
|
|
void* rbuf, int rcount,
|
|
|
|
struct ompi_datatype_t *rdtype,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_alltoallv_intra(void *sbuf, int *scounts, int *sdisps,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *sdtype,
|
|
|
|
void *rbuf, int *rcounts, int *rdisps,
|
|
|
|
struct ompi_datatype_t *rdtype,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_alltoallw_intra(void *sbuf, int *scounts, int *sdisps,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t **sdtypes,
|
|
|
|
void *rbuf, int *rcounts, int *rdisps,
|
|
|
|
struct ompi_datatype_t **rdtypes,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2007-08-19 03:37:49 +00:00
|
|
|
int mca_coll_sm_barrier_intra(struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_bcast_intra(void *buff, int count,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *datatype,
|
|
|
|
int root,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_bcast_log_intra(void *buff, int count,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *datatype,
|
|
|
|
int root,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_exscan_intra(void *sbuf, void *rbuf, int count,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *dtype,
|
|
|
|
struct ompi_op_t *op,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_gather_intra(void *sbuf, int scount,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *sdtype, void *rbuf,
|
|
|
|
int rcount, struct ompi_datatype_t *rdtype,
|
|
|
|
int root, struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_gatherv_intra(void *sbuf, int scount,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *sdtype, void *rbuf,
|
|
|
|
int *rcounts, int *disps,
|
|
|
|
struct ompi_datatype_t *rdtype, int root,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_reduce_intra(void *sbuf, void* rbuf, int count,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *dtype,
|
|
|
|
struct ompi_op_t *op,
|
|
|
|
int root,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_reduce_log_intra(void *sbuf, void* rbuf, int count,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *dtype,
|
|
|
|
struct ompi_op_t *op,
|
|
|
|
int root,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_reduce_scatter_intra(void *sbuf, void *rbuf,
|
2007-08-19 03:37:49 +00:00
|
|
|
int *rcounts,
|
|
|
|
struct ompi_datatype_t *dtype,
|
|
|
|
struct ompi_op_t *op,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_scan_intra(void *sbuf, void *rbuf, int count,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *dtype,
|
|
|
|
struct ompi_op_t *op,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_scatter_intra(void *sbuf, int scount,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *sdtype, void *rbuf,
|
|
|
|
int rcount, struct ompi_datatype_t *rdtype,
|
|
|
|
int root, struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2005-01-30 01:42:57 +00:00
|
|
|
int mca_coll_sm_scatterv_intra(void *sbuf, int *scounts, int *disps,
|
2007-08-19 03:37:49 +00:00
|
|
|
struct ompi_datatype_t *sdtype,
|
|
|
|
void* rbuf, int rcount,
|
|
|
|
struct ompi_datatype_t *rdtype, int root,
|
|
|
|
struct ompi_communicator_t *comm,
|
2008-07-28 22:40:57 +00:00
|
|
|
mca_coll_base_module_t *module);
|
2007-03-16 23:11:45 +00:00
|
|
|
|
|
|
|
int mca_coll_sm_ft_event(int state);
|
2005-01-30 01:42:57 +00:00
|
|
|
|
2005-09-07 13:33:43 +00:00
|
|
|
/**
|
|
|
|
* Global variables used in the macros (essentially constants, so
|
|
|
|
* these are thread safe)
|
|
|
|
*/
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
extern uint32_t mca_coll_sm_one;
|
2005-09-07 13:33:43 +00:00
|
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Macro to setup flag usage
|
|
|
|
*/
|
|
|
|
#define FLAG_SETUP(flag_num, flag, data) \
|
|
|
|
(flag) = (mca_coll_sm_in_use_flag_t*) \
|
|
|
|
(((char *) (data)->mcb_in_use_flags) + \
|
2005-09-15 02:18:16 +00:00
|
|
|
((flag_num) * mca_coll_sm_component.sm_control_size))
|
2005-09-07 13:33:43 +00:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Macro to wait for the in-use flag to become idle (used by the root)
|
|
|
|
*/
|
2009-09-21 22:20:44 +00:00
|
|
|
#define FLAG_WAIT_FOR_IDLE(flag, label) \
|
|
|
|
SPIN_CONDITION(0 == (flag)->mcsiuf_num_procs_using, label)
|
2005-09-07 13:33:43 +00:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Macro to wait for a flag to indicate that it's ready for this
|
|
|
|
* operation (used by non-root processes to know when FLAG_SET() has
|
|
|
|
* been called)
|
|
|
|
*/
|
2009-09-21 22:20:44 +00:00
|
|
|
#define FLAG_WAIT_FOR_OP(flag, op, label) \
|
|
|
|
SPIN_CONDITION((op) == flag->mcsiuf_operation_count, label)
|
2005-09-07 13:33:43 +00:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Macro to set an in-use flag with relevant data to claim it
|
|
|
|
*/
|
|
|
|
#define FLAG_RETAIN(flag, num_procs, op_count) \
|
|
|
|
(flag)->mcsiuf_num_procs_using = (num_procs); \
|
2005-09-15 02:18:16 +00:00
|
|
|
(flag)->mcsiuf_operation_count = (op_count)
|
2005-09-07 13:33:43 +00:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Macro to release an in-use flag from this process
|
|
|
|
*/
|
|
|
|
#define FLAG_RELEASE(flag) \
|
2005-09-15 02:18:16 +00:00
|
|
|
opal_atomic_add(&(flag)->mcsiuf_num_procs_using, -1)
|
2005-09-07 13:33:43 +00:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Macro to copy a single segment in from a user buffer to a shared
|
|
|
|
* segment
|
|
|
|
*/
|
2005-09-15 02:18:16 +00:00
|
|
|
#define COPY_FRAGMENT_IN(convertor, index, rank, iov, max_data) \
|
2005-09-07 13:33:43 +00:00
|
|
|
(iov).iov_base = \
|
|
|
|
(index)->mcbmi_data + \
|
2005-09-15 02:18:16 +00:00
|
|
|
((rank) * mca_coll_sm_component.sm_fragment_size); \
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
(iov).iov_len = (max_data); \
|
|
|
|
opal_convertor_pack(&(convertor), &(iov), &mca_coll_sm_one, \
|
2006-10-26 23:11:26 +00:00
|
|
|
&(max_data) )
|
2005-09-07 13:33:43 +00:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Macro to copy a single segment out from a shared segment to a user
|
|
|
|
* buffer
|
|
|
|
*/
|
|
|
|
#define COPY_FRAGMENT_OUT(convertor, src_rank, index, iov, max_data) \
|
|
|
|
(iov).iov_base = (((char*) (index)->mcbmi_data) + \
|
Fixes trac:1988. The little bug that turned out to be huge. Yoinks.
* Various cosmetic/style updates in the btl sm
* Clean up concept of mpool module (I think that code was written way
back when the concept of "modules" was fuzzy)
* Bring over some old fixes from the /tmp/timattox-sm-coll/ tree to
fix potential segv's when mmap'ed regions were at different
addresses in different processes (thanks Tim!).
* Change sm coll to no longer use mpool as its main source of shmem;
rather, just mmap its own segment (because it's fixed size --
there was nothing to be gained by using mpool; shedding the use of
mpool saved a lot of complexity in the sm coll setup). This
effectively made Tim's fixes moot (because now everything is an
offset into the mmap that is computed locally; there are no global
pointers). :-)
* Slightly updated common/sm to allow making mmap's for a specific
set of procs (vs. ''all'' procs in the process). This potentially
allows for same-host-inter-proc mmaps -- yay!
* Fixed many, many things in the coll sm (particularly in reduce):
* Fixed handling of MPI_IN_PLACE in reduce and allreduce
* Fixed handling of non-contiguous datatypes in reduce
* Changed the order of reductions to go from process (n-1)'s data
to process 0's data, because that's how all other OMPI coll
components work
* Fixed lots of usage of ddt functions
* When using a non-contiguous datatype, if the root process is not
(n-1), now we used a 2nd convertor to copy from shmem to the rbuf
(saves a memory copy vs. what was done before)
* Lots and lots of little cleanups, clarifications, and minor
optimizations (although still more could be done -- e.g., I think
the use of write memory barriers is fairly sub-optimal; they
could be ganged together at the root, for example)
I'm marking this as "fixes trac:1988" and closing the ticket; if something
is still broken, we can re-open the ticket.
This commit was SVN r21967.
The following Trac tickets were found above:
Ticket 1988 --> https://svn.open-mpi.org/trac/ompi/ticket/1988
2009-09-15 00:25:21 +00:00
|
|
|
((src_rank) * (mca_coll_sm_component.sm_fragment_size))); \
|
|
|
|
(iov).iov_len = (max_data); \
|
|
|
|
opal_convertor_unpack(&(convertor), &(iov), &mca_coll_sm_one, \
|
2006-10-26 23:11:26 +00:00
|
|
|
&(max_data) )
|
2005-09-07 13:33:43 +00:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Macro to memcpy a fragment between one shared segment and another
|
|
|
|
*/
|
|
|
|
#define COPY_FRAGMENT_BETWEEN(src_rank, dest_rank, index, len) \
|
|
|
|
memcpy(((index)->mcbmi_data + \
|
|
|
|
((dest_rank) * mca_coll_sm_component.sm_fragment_size)), \
|
|
|
|
((index)->mcbmi_data + \
|
|
|
|
((src_rank) * \
|
|
|
|
mca_coll_sm_component.sm_fragment_size)), \
|
2005-09-15 02:18:16 +00:00
|
|
|
(len))
|
2005-09-07 13:33:43 +00:00
|
|
|
|
|
|
|
/**
|
2005-09-15 02:18:16 +00:00
|
|
|
* Macro to tell children that a segment is ready (normalize
|
|
|
|
* the child's ID based on the shift used to calculate the "me" node
|
|
|
|
* in the tree). Used in fan out opertations.
|
2005-09-07 13:33:43 +00:00
|
|
|
*/
|
|
|
|
#define PARENT_NOTIFY_CHILDREN(children, num_children, index, value) \
|
2005-09-15 02:18:16 +00:00
|
|
|
do { \
|
|
|
|
for (i = 0; i < (num_children); ++i) { \
|
|
|
|
*((size_t*) \
|
|
|
|
(((char*) index->mcbmi_control) + \
|
|
|
|
(mca_coll_sm_component.sm_control_size * \
|
|
|
|
(((children)[i]->mcstn_id + root) % size)))) = (value); \
|
|
|
|
} \
|
|
|
|
} while (0)
|
2005-09-07 13:33:43 +00:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Macro for childen to wait for parent notification (use real rank).
|
2005-09-15 02:18:16 +00:00
|
|
|
* Save the value passed and then reset it when done. Used in fan out
|
|
|
|
* operations.
|
2005-09-07 13:33:43 +00:00
|
|
|
*/
|
2009-09-21 22:20:44 +00:00
|
|
|
#define CHILD_WAIT_FOR_NOTIFY(rank, index, value, label) \
|
2005-09-15 02:18:16 +00:00
|
|
|
do { \
|
2005-10-04 14:52:59 +00:00
|
|
|
uint32_t volatile *ptr = ((uint32_t*) \
|
2005-09-15 02:18:16 +00:00
|
|
|
(((char*) index->mcbmi_control) + \
|
|
|
|
((rank) * mca_coll_sm_component.sm_control_size))); \
|
2009-09-21 22:20:44 +00:00
|
|
|
SPIN_CONDITION(0 != *ptr, label); \
|
2005-09-15 02:18:16 +00:00
|
|
|
(value) = *ptr; \
|
|
|
|
*ptr = 0; \
|
|
|
|
} while (0)
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Macro for children to tell parent that the data is ready in their
|
|
|
|
* segment. Used for fan in operations.
|
|
|
|
*/
|
|
|
|
#define CHILD_NOTIFY_PARENT(child_rank, parent_rank, index, value) \
|
2005-10-04 14:52:59 +00:00
|
|
|
((size_t volatile *) \
|
2005-09-15 02:18:16 +00:00
|
|
|
(((char*) (index)->mcbmi_control) + \
|
|
|
|
(mca_coll_sm_component.sm_control_size * \
|
|
|
|
(parent_rank))))[(child_rank)] = (value)
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Macro for parent to wait for a specific child to tell it that the
|
|
|
|
* data is in the child's segment. Save the value when done. Used
|
|
|
|
* for fan in operations.
|
|
|
|
*/
|
2009-09-21 22:20:44 +00:00
|
|
|
#define PARENT_WAIT_FOR_NOTIFY_SPECIFIC(child_rank, parent_rank, index, value, label) \
|
2005-09-15 02:18:16 +00:00
|
|
|
do { \
|
2005-10-04 14:52:59 +00:00
|
|
|
size_t volatile *ptr = ((size_t volatile *) \
|
2005-09-15 02:18:16 +00:00
|
|
|
(((char*) index->mcbmi_control) + \
|
|
|
|
(mca_coll_sm_component.sm_control_size * \
|
|
|
|
(parent_rank)))) + child_rank; \
|
2009-09-21 22:20:44 +00:00
|
|
|
SPIN_CONDITION(0 != *ptr, label); \
|
2005-09-15 02:18:16 +00:00
|
|
|
(value) = *ptr; \
|
|
|
|
*ptr = 0; \
|
|
|
|
} while (0)
|
2005-09-07 13:33:43 +00:00
|
|
|
|
2007-08-19 03:37:49 +00:00
|
|
|
END_C_DECLS
|
|
|
|
|
2004-08-21 00:49:07 +00:00
|
|
|
#endif /* MCA_COLL_SM_EXPORT_H */
|