1
1
openmpi/orte/mca/ess
George Bosilca addaf7aaf8 Repair the tree spawn. The problem seems to come from the fact
that now the HNP send the messages using the routed component. In the case
of tree spawn, when a intermediary node spawn a child it doesn't know how
to forward a message to it, so when the node-map message is coming from
the HNP (as there is nothing yet in the contact/routing table) the message
is sent back the way it came. As a result the node-map message keeps jumping
between the HNP and the first level orteds.

The solution is to add a new option to the children orte_parent_uri, which
is only set when the orted is _not_ directly spawned by the HNP. When this
option is present on the argument list, the orted will add the parent to
its routing, and force the parent to update his routes (by sending the URI).
With this approach, the routing tree is build in same time as the processes
are spawned, and all messages from the HNP can be routed to the leaves.

However, this is far from an optimal solution. Right now, this so called tree
spawn, only spawn the children in a tree without doing anything about the
"connect back to the HNP" step. The HNP is flooded with reports from all the
orted. The total number of messages is higher than in the non tree startup
scheme, so we do not expect this approach to be scalable in the current
incarnation. A complete overhaul of the tree startup is required in order
improve the scalability. Stay tuned!

This commit was SVN r21504.
2009-06-23 22:10:25 +00:00
..
alps We know what a daemon is there is no need to dig into the nidmap to find it out. 2009-06-23 20:43:45 +00:00
base Repair the tree spawn. The problem seems to come from the fact 2009-06-23 22:10:25 +00:00
bproc Add the proc_get_daemon capability to the bproc launcher. 2009-06-23 20:21:55 +00:00
cm We know what a daemon is there is no need to dig into the nidmap to find it out. 2009-06-23 20:43:45 +00:00
cnos Modify the orte_process_info structure to handle a broader range of process types by replacing the individual booleans with a 32-bit bitmap. Use a set of #define's to define the individual bits, and a set of matching macros to test for them. Update the orte code base to use the macros instead of the booleans. 2009-05-04 11:07:40 +00:00
env We know what a daemon is there is no need to dig into the nidmap to find it out. 2009-06-23 20:43:45 +00:00
hnp We know what a daemon is there is no need to dig into the nidmap to find it out. 2009-06-23 20:43:45 +00:00
lsf We know what a daemon is there is no need to dig into the nidmap to find it out. 2009-06-23 20:43:45 +00:00
portals_utcp Modify the orte_process_info structure to handle a broader range of process types by replacing the individual booleans with a 32-bit bitmap. Use a set of #define's to define the individual bits, and a set of matching macros to test for them. Update the orte code base to use the macros instead of the booleans. 2009-05-04 11:07:40 +00:00
singleton We know what a daemon is there is no need to dig into the nidmap to find it out. 2009-06-23 20:43:45 +00:00
slave Add the ability to directly launch procs via rsh/ssh. Collect common functions in plm/base. Create a new global param to set assume_same_shell, alias'd back to plm_rsh_assume_same_shell (not deprecated). 2009-05-30 01:10:25 +00:00
slurm We know what a daemon is there is no need to dig into the nidmap to find it out. 2009-06-23 20:43:45 +00:00
slurmd We know what a daemon is there is no need to dig into the nidmap to find it out. 2009-06-23 20:43:45 +00:00
tm We know what a daemon is there is no need to dig into the nidmap to find it out. 2009-06-23 20:43:45 +00:00
tool Take the next step towards fully utilizing static ports for the daemons to eliminate the initial "phone home" to mpirun by modifying the orted termination procedure to eliminate the need for a full barrier-like operation. Instead, we add a "onesided" barrier to the grpcomm framework API that releases the orted once it has completed its own contribution to the barrier - i.e., the orteds now exit as the "ack" message rolls up towards mpirun instead of sending the "ack" directly to mpirun. 2009-05-11 14:11:44 +00:00
configure.m4 Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00
ess.h Modify the orte_process_info structure to handle a broader range of process types by replacing the individual booleans with a 32-bit bitmap. Use a set of #define's to define the individual bits, and a set of matching macros to test for them. Update the orte code base to use the macros instead of the booleans. 2009-05-04 11:07:40 +00:00
Makefile.am Merge the ORTE devel branch into the main trunk. Details of what this means will be circulated separately. 2008-02-28 01:57:57 +00:00