1
1
openmpi/orte/mca/grpcomm
Ralph Castain 6166278e18 Improve the scalability of the modex operation and fix a bug reported by Tim P
The bug was a race condition in the barrier operation that caused the barrier in MPI_Finalize to fail on very short programs.

Scalaiblity was improved by using the daemons to aggregate modex and barrier messages before sending them to the rank=0 proc. Improvement is proportional to ppn, of course, but there really wasn't a scaling problem at low ppn anyway. This modification also paves the way for better allgather operations since now all the data for each node is sitting at the daemon level, and the daemons are now aware that a collective operation on the OOB is underway (so they -can- participate in a collective of their own to support it).

Also added better diagnostics to map out the timing associated with MPI_Init - turned on by -mca orte_timing 1.

This commit was SVN r17988.
2008-03-27 15:17:53 +00:00
..
base Improve the scalability of the modex operation and fix a bug reported by Tim P 2008-03-27 15:17:53 +00:00
basic Improve the scalability of the modex operation and fix a bug reported by Tim P 2008-03-27 15:17:53 +00:00
cnos Fix the allgather and allgather_list functions to avoid deadlocks at large node/proc counts. Violated the RML rules here - we received the allgather buffer and then did an xcast, which causes a send to go out, and is then subsequently received by the sender. This fix breaks that pattern by forcing the recv to complete outside of the function itself - thus, the allgather and allgather_list always complete their recvs before returning or sending. 2008-03-24 20:50:31 +00:00
configure.m4 Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00
grpcomm_types.h Fix xcast so it works in multi-node situations where the user specifies a particular mode to use (e.g., direct). 2008-03-03 20:07:02 +00:00
grpcomm.h Fix the allgather and allgather_list functions to avoid deadlocks at large node/proc counts. Violated the RML rules here - we received the allgather buffer and then did an xcast, which causes a send to go out, and is then subsequently received by the sender. This fix breaks that pattern by forcing the recv to complete outside of the function itself - thus, the allgather and allgather_list always complete their recvs before returning or sending. 2008-03-24 20:50:31 +00:00
Makefile.am Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00