1
1
openmpi/orte/mca/rmaps/round_robin
Ralph Castain 0c0fe022ff This is a first cut at fixing the problem of comm_spawn children being mapped onto the same nodes as their parents. I am not convinced the behavior implemented here is the long-term right one, but hopefully it will help alleviate the situation for now.
In this implementation, we begin mapping on the first node that has at least one slot available as measured by the slots_inuse versus the soft limit. If none of the nodes meet that criterion, we just start at the beginning of the node list since we are oversubscribed anyway.

Note that we ignore this logic if the user specifies a mapping - then it's just "user beware".

The real root cause of the problem is that we don't adjust sched_yield as we add processes onto a node. Hence, the node becomes oversubscribed and performance goes into the toilet. What we REALLY need to do to solve the problem is:

(a) modify the PLS components so they reuse the existing daemons, 

(b) create a way to tell a running process to adjust its sched_yield, and

(c) modify the ODLS components to update the sched_yield on a process per the new method

Until we do that, we will continue to have this problem - all this fix (and any subsequent one that focuses solely on the mapper) does is hopefully make it happen less often.

This commit was SVN r12145.
2006-10-17 19:35:00 +00:00
..
configure.params Update the copyright notices for IU and UTK. 2005-11-05 19:57:48 +00:00
help-orte-rmaps-rr.txt Per LANL's stated need, add functionality that runs a.out across ALL available process slots if no num_proc is specified on the command line. However, please note the following limitation: we ONLY allow ONE application to be specified on the command line when this feature is invoked. If multiple apps are specified, the user MUST also specify the number to be launched for each and every one of them. 2006-07-10 21:25:33 +00:00
Makefile.am Update the copyright notices for IU and UTK. 2005-11-05 19:57:48 +00:00
rmaps_rr_component.c Add a new option to launch "pernode" - launches one process/node across all available nodes. 2006-10-07 19:50:12 +00:00
rmaps_rr.c This is a first cut at fixing the problem of comm_spawn children being mapped onto the same nodes as their parents. I am not convinced the behavior implemented here is the long-term right one, but hopefully it will help alleviate the situation for now. 2006-10-17 19:35:00 +00:00
rmaps_rr.h Add a new option to launch "pernode" - launches one process/node across all available nodes. 2006-10-07 19:50:12 +00:00