- Allreduce algorithms:
- Recursive doubling is used for small messages (up to 10KB) and can be used for
both commutative and non-commutative operations.
Recursive doubling passed OCC, IMB-3.2, Intel (Allreduce_c, Allreduce_loc_c, and
Allreduce_user_c), mpi_test_suite (Allreduce MIN/MAX, and Allreduce MIN/MAX with
MPI_IN_PLACE) tests on TCP up to 36 nodes and MX up to 64 nodes.
- Ring algorithms performs well for larger messages but cannot be used for
non-commutative operations. It passed the same tests as recursive doubling, except
some of the non-commutative tests in Intel benchmarks Allreduce_loc_c and Allreduce_user_c
(which was expected).
- MPI_Allreduce with new decision function passed all of the tests mentioned above.
- Cleaning up coll_tuned_util. Moving isendrecv to static inline just like sendrecv.
This commit was SVN r13252.
- removing static qualification on ompi_coll_tuned_sendrecv
- adding ompi_coll_tuned_isendrecv function which posts isend and irecv requests
These changes are separate from but necessary for new algorithms I am working on.
This commit was SVN r13161.
- utilizing coll_tuned_util functions
- setting line length to 80.
This implementation uses standard send messages (instead of synchronous ones).
The change improved our performance over MX multiple number of times, however,
there exists a small potential that last message to be sent can be delayed
(until next mpi call, which means potentially infinitely).
If this shows to be a problem, I will modify the algorithms to use synchronous
send as last operation (which will incur performance penalty again).
This commit was SVN r13071.
- in allgather algorithms I replaces irecv-isend-waitall sequence with
call to ompi_coll_tuned_sendrecv
- most of the functions in util code and allgather decision function conform to 80 character line width.
-
This commit was SVN r13069.
components that use configure.m4 for configuration or are always built.
The macro has not been needed since moving to configure types other than
configure.stub
Fixes trac:590
This commit was SVN r13031.
The following Trac tickets were found above:
Ticket 590 --> https://svn.open-mpi.org/trac/ompi/ticket/590
* Make sure that the pval always writes to the correct portion of the
lval. This only matters on 32 bit big endian machines.
* On 32 bit machines when assigning to pval, the other 4 bytes of lval
weren't being written, which could lead to bogus data
We use macros so that there aren't casts all over the code and the pval
assignment can occur to the correct 4 bytes. Refs trac:587
This commit was SVN r12974.
The following Trac tickets were found above:
Ticket 587 --> https://svn.open-mpi.org/trac/ompi/ticket/587
It contains four algorithms:
Bruck (ciel(logP) steps), Recursive Doubling (log(P) for power-of-2 processes), Ring (P-1 steps),
and Neighbor Exchange (P/2 steps for even number of processes).
All algorithms passed occ, IMB-2.3, and intel verification tests from ompi-tests/ for up to 56 processes.
The fixed decision function is based on results collected over MX on the Grig cluster at
the University of Tennessee at Knoxville.
I have also added (and commented out) copy of MPICH2 decision function for allgather
(from their IJHPCA 2005 paper).
This commit was SVN r12910.
Accordingly, there are new APIs to the name service to support the ability to get a job's parent, root, immediate children, and all its descendants. In addition, the terminate_job, terminate_orted, and signal_job APIs for the PLS have been modified to accept attributes that define the extent of their actions. For example, doing a "terminate_job" with an attribute of ORTE_NS_INCLUDE_DESCENDANTS will terminate the given jobid AND all jobs that descended from it.
I have tested this capability on a MacBook under rsh, Odin under SLURM, and LANL's Flash (bproc). It worked successfully on non-MPI jobs (both simple and including a spawn), and MPI jobs (again, both simple and with a spawn).
This commit was SVN r12597.
- consistent arguments checking (not allowing to select an algorithm which
is not available)
- consistent way of computing the segcount (number of datatypes by segment).
- small cleanups.
- more informative debugging messages.
This commit was SVN r12545.
description. Most of the bcast algorithms can be completed using this
generic function once we create the tree structure. Add all kind of
trees.
There are 2 versions of the generic bcast function. One using overlapping
between receives (for intermediary nodes) and then blocking sends to all
childs and another where all sends are non blocking. I still have to
figure out which one give the smallest overhead.
This commit was SVN r12530.
N gatherv's:
for (i = 0 ... size)
MPI_Gatherv(..., root = i, ...)
The new algorithm simply does (effectively):
MPI_Gatherv(..., root = 0, ...)
MPI_Bcast(..., root = 0, ...)
This commit was SVN r12469.
allocation logic is completely done outside the data-type engine (in the PML) there is
no need for any special case inside the data-type engine. There is less arguments for
the ompi_convertor_pack and ompi_convertor_unpack as well (the last field free_after is
not required anymore as there is no memory allocated in the engine itself). This change
affect all components using datatypes. I test most of them, but it might happens that I
miss some ... If it's the case please let me know (don't shoot the pianist!!).
This commit was SVN r12331.
the default decision functions (for broadcast, reduce and barrier) are based on a
high performance network (not TCP). It should give good performance (really good) for
any network having the following caracteristics: small latency (5 microseconds) and good
bandwidth (more than 1Gb/s).
+ Cleanup of the reduce algorithms, plus 2 new algorithms (binary and binomial). Now most
of the reduce algorithms use a generic tree based function for completing the reduce.
+ Added macros for computing the trees (they are used for bcast and reduce right now).
+ Allow the usage of all 5 topologies.
+ Jelena's implementation of a binary tree that can be used for non commutative operations.
Right now only the tree building function is there, it will get activated soon.
+ Some others minor cleanups.
This commit was SVN r12326.
all platforms. The only exceptions (and I will not deal with them
anytime soon) are on Windows:
- the write functions which require the length to be an int when it's
a size_t on all UNIX variants.
- all iovec manipulation functions where the iov_len is again an int
when it's a size_t on most of the UNIXes.
As these only happens on Windows, so I think we're set for now :)
This commit was SVN r12215.