Currently 3 algorithms are available:
- non-overlapping, reduce + scatterv, (works for non-commutative operations)
- recursive halving algorithm (copied from basic module)
- ring algorithm (similar to allreduce ring, for large messages)
This commit was SVN r13929.
outstanding requests can be limited using mca parameters.
The implementation passed Intel, IMB-3.2, and mpi_test_suite tests over
TCP and MX up to 128 processes (64 nodes), on both 32-bit and 64-bit machines.
It is not activated by default, but it should be useful for really large
communicator sizes.
This commit was SVN r13720.
It contains four algorithms:
Bruck (ciel(logP) steps), Recursive Doubling (log(P) for power-of-2 processes), Ring (P-1 steps),
and Neighbor Exchange (P/2 steps for even number of processes).
All algorithms passed occ, IMB-2.3, and intel verification tests from ompi-tests/ for up to 56 processes.
The fixed decision function is based on results collected over MX on the Grig cluster at
the University of Tennessee at Knoxville.
I have also added (and commented out) copy of MPICH2 decision function for allgather
(from their IJHPCA 2005 paper).
This commit was SVN r12910.
size and diplacement of data-type. After this patch all data can contain size_t bytes
and the displacements are defined as ptrdiff_t. All of the files I was able to compile
have been modified to match this requirement.
This commit was SVN r12146.
(1) As pointed out by Torsten after Jeff comment that there are 15 collectives yesterday.. nope.. I have 16 but
miss counted them in my ifdefs (I had two #11s). Replaces with enum...
(2) Added a readonly MCA param for how many backend algorithms are available per collective (used by benchmarker/STS)
This allowed me to remove the tuned query internal functions and replace them with ompi_coll_tuned_forced_max_algorithms[COLL].
(3) I was reading the user forced MCA params for the collectives on each comm create (module init) but I then put the
values into a global set of variables (like ompi_coll_tuned_reduce_forced_algorithm).
To fix this and make the code neater:
(a) The component looks up the MCA param indices on Open if dynamic_rules is set via the
ompi_coll_tuned_COLLECTIVE_intra_check_forced_init () call.
(b) Got rid of the ompi_coll_ompi_coll_tuned_COLLECTIVE_forced_algorithm/segmentsize/etc globals with a struct that
is now cached on the module data hung off the communicator. i.e. done right.
(c) On module init if dynamic rules enabled we call a general getvalues routine (in coll_tuned_forced.c) to get the
CURRENT values using the MCA param indices and then put them on the modules data segment.
A shorter version of getvalues exists for barrier which only needs the algorithm choice
This commit was SVN r9663.
- move files out of toplevel include/ and etc/, moving it into the
sub-projects
- rather than including config headers with <project>/include,
have them as <project>
- require all headers to be included with a project prefix, with
the exception of the config headers ({opal,orte,ompi}_config.h
mpi.h, and mpif.h)
This commit was SVN r8985.
(apparently we've been doing this in opal and orte, but not in ompi
yet). All public symbols begin with "ompi_coll_tuned_" (not
mca_coll_tuned_) except the component struct. Now this component
passes the illegal symbol report with no hits.
This commit was SVN r8589.
Lots of misc fixes: printfs->opal_output, handles fanin/out correctly for forced ops
unused vars, correct calculations on meaning of 'msgsize' for decision functions
(varies depending on algorithm), etc
This commit was SVN r8113.