1
1
openmpi/orte/mca/plm
Ralph Castain c56acf60ca Although we never really thought about it, we made an unconscious assumption in the mapper system - we assumed that the daemons would be placed on nodes in the order that the nodes appear in the allocation. In other words, we assumed that the launch environment would map processes in node order.
Turns out, this isn't necessarily true. The Cray, for example, launches processes in a toroidal pattern, thus causing the daemons to wind up somewhere other than what we thought. Other environments (e.g., slurm) are also capable of such behavior, depending upon the default mapping algorithm they are told to use.

Resolve this problem by making the daemon-to-node assignment in the affected environments when the daemon calls back and tells us what node it is on. Order the nodes in the mapping list so they are in daemon-vpid order as opposed to the order in which they show in the allocation. For environments that don't exhibit this mapping behavior (e.g., rsh), this won't have any impact.

Also, clean up the vm launch procedure a little bit so it more closely aligns with the state machine implementation that is coming, and remove some lingering "slave" code.

This commit was SVN r25551.
2011-11-30 19:58:24 +00:00
..
alps Although we never really thought about it, we made an unconscious assumption in the mapper system - we assumed that the daemons would be placed on nodes in the order that the nodes appear in the allocation. In other words, we assumed that the launch environment would map processes in node order. 2011-11-30 19:58:24 +00:00
base Although we never really thought about it, we made an unconscious assumption in the mapper system - we assumed that the daemons would be placed on nodes in the order that the nodes appear in the allocation. In other words, we assumed that the launch environment would map processes in node order. 2011-11-30 19:58:24 +00:00
ccp Although we never really thought about it, we made an unconscious assumption in the mapper system - we assumed that the daemons would be placed on nodes in the order that the nodes appear in the allocation. In other words, we assumed that the launch environment would map processes in node order. 2011-11-30 19:58:24 +00:00
lsf Although we never really thought about it, we made an unconscious assumption in the mapper system - we assumed that the daemons would be placed on nodes in the order that the nodes appear in the allocation. In other words, we assumed that the launch environment would map processes in node order. 2011-11-30 19:58:24 +00:00
poe This is actually a much smaller commit than it appears at first glance - it just touches a lot of files. The --without-rte-support configuration option has never really been implemented completely. The option caused various objects not to be defined and conditionally compiled some base functions, but did nothing to prevent build of the component libraries. Unfortunately, since many of those components use objects covered by the option, it caused builds to break if those components were allowed to build. 2011-11-22 21:24:35 +00:00
process Although we never really thought about it, we made an unconscious assumption in the mapper system - we assumed that the daemons would be placed on nodes in the order that the nodes appear in the allocation. In other words, we assumed that the launch environment would map processes in node order. 2011-11-30 19:58:24 +00:00
rsh Although we never really thought about it, we made an unconscious assumption in the mapper system - we assumed that the daemons would be placed on nodes in the order that the nodes appear in the allocation. In other words, we assumed that the launch environment would map processes in node order. 2011-11-30 19:58:24 +00:00
slurm Although we never really thought about it, we made an unconscious assumption in the mapper system - we assumed that the daemons would be placed on nodes in the order that the nodes appear in the allocation. In other words, we assumed that the launch environment would map processes in node order. 2011-11-30 19:58:24 +00:00
submit This is actually a much smaller commit than it appears at first glance - it just touches a lot of files. The --without-rte-support configuration option has never really been implemented completely. The option caused various objects not to be defined and conditionally compiled some base functions, but did nothing to prevent build of the component libraries. Unfortunately, since many of those components use objects covered by the option, it caused builds to break if those components were allowed to build. 2011-11-22 21:24:35 +00:00
tm Although we never really thought about it, we made an unconscious assumption in the mapper system - we assumed that the daemons would be placed on nodes in the order that the nodes appear in the allocation. In other words, we assumed that the launch environment would map processes in node order. 2011-11-30 19:58:24 +00:00
xgrid This is actually a much smaller commit than it appears at first glance - it just touches a lot of files. The --without-rte-support configuration option has never really been implemented completely. The option caused various objects not to be defined and conditionally compiled some base functions, but did nothing to prevent build of the component libraries. Unfortunately, since many of those components use objects covered by the option, it caused builds to break if those components were allowed to build. 2011-11-22 21:24:35 +00:00
Makefile.am Fix mistake that came in via the ompi-agen tree in r23764. The mistake wasn't part of the core autogen upgrade; it was an additional 'bonus' cleanup. Oops. The mistake will always create a set of directories under installdir, even if you do not --with-devel-headers. The set of directories will be empty, but still -- they should not be there at all. This commit fixes that -- the directories are not created at all if you do not --with-devel-headers 2010-09-24 22:53:28 +00:00
plm_types.h Add a new proc state 2011-05-19 21:25:58 +00:00
plm.h Restore the original API to terminate individual processes instead of the entire job. This was originally removed as we didn't at that time know how to take advantage of it. Some of us are now working on proactive resilience methods that move procs prior to node failure, so this is now a required API. Modify the odls, plm, and orted functions to support this new functionality. 2009-07-13 02:29:17 +00:00