1
1
openmpi/orte/mca/ras
Ralph Castain 0c0fe022ff This is a first cut at fixing the problem of comm_spawn children being mapped onto the same nodes as their parents. I am not convinced the behavior implemented here is the long-term right one, but hopefully it will help alleviate the situation for now.
In this implementation, we begin mapping on the first node that has at least one slot available as measured by the slots_inuse versus the soft limit. If none of the nodes meet that criterion, we just start at the beginning of the node list since we are oversubscribed anyway.

Note that we ignore this logic if the user specifies a mapping - then it's just "user beware".

The real root cause of the problem is that we don't adjust sched_yield as we add processes onto a node. Hence, the node becomes oversubscribed and performance goes into the toilet. What we REALLY need to do to solve the problem is:

(a) modify the PLS components so they reuse the existing daemons, 

(b) create a way to tell a running process to adjust its sched_yield, and

(c) modify the ODLS components to update the sched_yield on a process per the new method

Until we do that, we will continue to have this problem - all this fix (and any subsequent one that focuses solely on the mapper) does is hopefully make it happen less often.

This commit was SVN r12145.
2006-10-17 19:35:00 +00:00
..
base This is a first cut at fixing the problem of comm_spawn children being mapped onto the same nodes as their parents. I am not convinced the behavior implemented here is the long-term right one, but hopefully it will help alleviate the situation for now. 2006-10-17 19:35:00 +00:00
bjs This commit looks a lot bigger than it is, so relax :-) 2006-10-17 16:06:17 +00:00
dash_host This commit looks a lot bigger than it is, so relax :-) 2006-10-17 16:06:17 +00:00
gridengine This commit looks a lot bigger than it is, so relax :-) 2006-10-17 16:06:17 +00:00
hostfile This commit looks a lot bigger than it is, so relax :-) 2006-10-17 16:06:17 +00:00
loadleveler Add some missing headers. 2006-10-17 17:28:02 +00:00
localhost This commit looks a lot bigger than it is, so relax :-) 2006-10-17 16:06:17 +00:00
lsf_bproc This commit looks a lot bigger than it is, so relax :-) 2006-10-17 16:06:17 +00:00
proxy This commit looks a lot bigger than it is, so relax :-) 2006-10-17 16:06:17 +00:00
slurm Add some missing headers. 2006-10-17 17:28:02 +00:00
tm Add some missing headers. 2006-10-17 17:28:02 +00:00
xgrid Add some missing headers. 2006-10-17 17:28:02 +00:00
Makefile.am Fix a bunch of install locations for header files 2005-12-08 00:54:44 +00:00
ras_types.h And ORTE is ready for prime-time. All Windows tricks are in: 2006-08-23 03:32:36 +00:00
ras.h This commit looks a lot bigger than it is, so relax :-) 2006-10-17 16:06:17 +00:00