1
1
openmpi/orte/mca/plm
Ralph Castain 6166278e18 Improve the scalability of the modex operation and fix a bug reported by Tim P
The bug was a race condition in the barrier operation that caused the barrier in MPI_Finalize to fail on very short programs.

Scalaiblity was improved by using the daemons to aggregate modex and barrier messages before sending them to the rank=0 proc. Improvement is proportional to ppn, of course, but there really wasn't a scaling problem at low ppn anyway. This modification also paves the way for better allgather operations since now all the data for each node is sitting at the daemon level, and the daemons are now aware that a collective operation on the OOB is underway (so they -can- participate in a collective of their own to support it).

Also added better diagnostics to map out the timing associated with MPI_Init - turned on by -mca orte_timing 1.

This commit was SVN r17988.
2008-03-27 15:17:53 +00:00
..
alps Add default hostfile parameter plus --default-hostfile command line option. 2008-03-05 04:54:57 +00:00
base Maintain the mapping bookmark across multiple comm_spawns 2008-03-27 00:19:13 +00:00
ccp Select the windows CCP component at runtime by testing if we are on Windows cluster. 2008-03-07 01:31:53 +00:00
gridengine Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure. 2008-03-23 23:10:15 +00:00
lsf Add default hostfile parameter plus --default-hostfile command line option. 2008-03-05 04:54:57 +00:00
md Just turn these off for now - will revisit later 2008-03-20 13:25:35 +00:00
process Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure. 2008-03-23 23:10:15 +00:00
rsh Don't forget to cleanup once we're done. 2008-03-25 22:42:24 +00:00
slurm Don't forget to cleanup once we're done. 2008-03-25 22:42:24 +00:00
slurmd Bring some sanity to the exit code returned by mpirun. Ensure that we provide a non-zero code if something goes wrong, including someone exiting after calling mpi_init without calling mpi_finalize. 2008-03-19 19:00:51 +00:00
submit Remove the obsolete and largely unused orte_system_info structure. The only fields that were used in that struct were nodeid and nodename - these have been transferred to the orte_process_info structure. 2008-03-23 23:10:15 +00:00
tm Improve the scalability of the modex operation and fix a bug reported by Tim P 2008-03-27 15:17:53 +00:00
tmd Just turn these off for now - will revisit later 2008-03-20 13:25:35 +00:00
xgrid Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00
Makefile.am Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00
plm_types.h Bring some sanity to the exit code returned by mpirun. Ensure that we provide a non-zero code if something goes wrong, including someone exiting after calling mpi_init without calling mpi_finalize. 2008-03-19 19:00:51 +00:00
plm.h Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00