2f20a38c98
We were stuck in an infinite loop inside the rmaps round_robin component when the user specified a host, then over subscribed it. Instead of retuning an error, we looped forever. For example: $ cat hostfile A slots=2 max-slots=2 B slots=2 max-slots=2 $ mpirun -np 3 --hostfile hostfile --host B <hang> The loop would not terminate because both host A and B are in the 'nodes' structure as they are both allocated to the job. However, after allocating 2 slots to host B, we remove it from the node list leaving us with a 'nodes' structure with just A in it. Since we can't use host A, we keep looping here until we find a node that we can use. This patch checks to make sure that if we get into this situation where rmaps is looping over the list a second time without finding a node during the first pass then we know that there are no nodes left to use, so we have a resource allocation error, and should return to the user. This patch should be moved to all of the release branches This commit was SVN r10131. |
||
---|---|---|
.. | ||
base | ||
round_robin | ||
Makefile.am | ||
rmaps_types.h | ||
rmaps.h |