1
1
openmpi/ompi/mca
Jelena Pjesivac-Grbovic 3eac49aa59 Adding flow control for leaf nodes in generalized reduce structure.
This "feature" is disabled by default and it should not affect the current performance.

In case when the message size is large and segment size is smaller than eager size for particular interface,
the leaf nodes in generalized reduce function can overflood parent nodes by sending all segments without 
any synchronization.  This can cause the parent to have HIGH number of unexpected messages (think 16MB 
message with 1KB segments for example).  In case of binomial algorithm root node always has at least one
child which is leaf, so this can potentially affect the root's performance significantly [Especially in 
large communicators where root may have quite a few children (binomial tree for example)].
When the segment size is bigger than the eager size, rendezvous protocol ensures that this does 
not happen so it is not necessary.
Originally, the problem was exposed in "infinite" bucket allocator clean up time for "small" segment sizes
(which may explain some "deadlocks" on Thunderbird tests).

To prevent this, we allow user to specify mca parameter "--mca coll_tuned_reduce_algorithm_max_requests NUM"
this limits number of outstanding messages from a leaf node in generalized reduce to the parent to NUM.
Messages are sent as non-blocking synchrnous messages, so syncronization happens at "wait" time.
The synchronization actually improved performance of pipeline and binomial algorithm for large message sizes
with 1KB segments over MX, but I need to test it some more to make sure it is consistent.

Since there is no easy way to find out what is "the eager" size for particular btl, I set the limit to 4000B.
If message/individual segment size is greater than 4000B - we will not use this feature.  This variable may
or may not be exposed as mca parameter later...

I did not have any problems running it and both "default" and "synchronous" tests passed Intel Reduce* tests 
up to 80 processes (over MX).

This commit was SVN r14518.
2007-04-25 20:39:53 +00:00
..
allocator Add some comments on the internals of the bucket structure. Alter the cleanup 2007-04-17 20:43:30 +00:00
bml make sure not to go out of bounds. element i+1 of bml_btls 2007-04-22 21:43:34 +00:00
btl Cosmetics. Brian fixes my crappy code and I fix the curly braces. 2007-04-25 20:17:19 +00:00
coll Adding flow control for leaf nodes in generalized reduce structure. 2007-04-25 20:39:53 +00:00
common This fixes the initialization of the usable size of the shared memory. 2007-03-07 13:28:06 +00:00
crcp Re-worked the implementation of the LAM-like coord component. 2007-04-21 20:35:01 +00:00
io A minor change to ROMIO's configure script: make it use exactly the 2007-04-17 03:10:06 +00:00
mpool Just like r14289 on the ORTE trunk: 2007-04-12 11:19:42 +00:00
mtl Just like r14289 on the ORTE trunk: 2007-04-12 11:19:42 +00:00
osc Just like r14289 on the ORTE trunk: 2007-04-12 11:19:42 +00:00
pml Per a developer request - 2007-04-24 17:08:48 +00:00
rcache Just like r14289 on the ORTE trunk: 2007-04-12 11:19:42 +00:00
topo Just like r14289 on the ORTE trunk: 2007-04-12 11:19:42 +00:00