1
1
openmpi/ompi/mca/coll
Jelena Pjesivac-Grbovic 3eac49aa59 Adding flow control for leaf nodes in generalized reduce structure.
This "feature" is disabled by default and it should not affect the current performance.

In case when the message size is large and segment size is smaller than eager size for particular interface,
the leaf nodes in generalized reduce function can overflood parent nodes by sending all segments without 
any synchronization.  This can cause the parent to have HIGH number of unexpected messages (think 16MB 
message with 1KB segments for example).  In case of binomial algorithm root node always has at least one
child which is leaf, so this can potentially affect the root's performance significantly [Especially in 
large communicators where root may have quite a few children (binomial tree for example)].
When the segment size is bigger than the eager size, rendezvous protocol ensures that this does 
not happen so it is not necessary.
Originally, the problem was exposed in "infinite" bucket allocator clean up time for "small" segment sizes
(which may explain some "deadlocks" on Thunderbird tests).

To prevent this, we allow user to specify mca parameter "--mca coll_tuned_reduce_algorithm_max_requests NUM"
this limits number of outstanding messages from a leaf node in generalized reduce to the parent to NUM.
Messages are sent as non-blocking synchrnous messages, so syncronization happens at "wait" time.
The synchronization actually improved performance of pipeline and binomial algorithm for large message sizes
with 1KB segments over MX, but I need to test it some more to make sure it is consistent.

Since there is no easy way to find out what is "the eager" size for particular btl, I set the limit to 4000B.
If message/individual segment size is greater than 4000B - we will not use this feature.  This variable may
or may not be exposed as mca parameter later...

I did not have any problems running it and both "default" and "synchronous" tests passed Intel Reduce* tests 
up to 80 processes (over MX).

This commit was SVN r14518.
2007-04-25 20:39:53 +00:00
..
base Next step in the project split, mainly source code re-arranging 2006-02-12 01:33:29 +00:00
basic Just like r14289 on the ORTE trunk: 2007-04-12 11:19:42 +00:00
demo Just like r14289 on the ORTE trunk: 2007-04-12 11:19:42 +00:00
hierarch Just like r14289 on the ORTE trunk: 2007-04-12 11:19:42 +00:00
inter Just like r14289 on the ORTE trunk: 2007-04-12 11:19:42 +00:00
libnbc Just like r14289 on the ORTE trunk: 2007-04-12 11:19:42 +00:00
self Just like r14289 on the ORTE trunk: 2007-04-12 11:19:42 +00:00
sm Just like r14289 on the ORTE trunk: 2007-04-12 11:19:42 +00:00
tuned Adding flow control for leaf nodes in generalized reduce structure. 2007-04-25 20:39:53 +00:00
coll.h Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). 2007-03-16 23:11:45 +00:00
Makefile.am fixes suggested by Ralf for supporting both Libtool 1 and 2 in Open MPI... 2005-12-19 03:10:23 +00:00