The problem arises when a hostfile is used, and the user provides host names without specifying the slots= paramater. In these cases, we assign slots=1, but automatically allow oversubscription since that number isn't confirmed. We then provide a separate parameter by which the user can direct that we assign the number of slots based on the sensed hardware - e.g., by telling us to set the #slots equal to the #cores on each node. However, this has been set to "off" by default.
In order to make this a little less complex for the user, set the default such that we automatically set #slots equal to #cores (or #hwt's if use_hwthreads_as_cpus has been set) only for those cases where the user provides names in a hostfile but does not provide slot information.
Also cleanup some a couple of issues in the mapping/binding system:
* ensure we only override the binding directive if we are oversubscribed *and* overload is not allowed
* ensure that the MPI procs don't attempt to bind themselves if they are launched by an orted as any binding directive (no matter what it was) would have been serviced by the orted on launch
* minor cleanup to the warning message when oversubscribed and binding was requested
cmr=v1.7.5:reviewer=rhc:subject=update mapping/binding system
This commit was SVN r30909.
r of slots remaing to be assigned being only one/node. Regardless, it didn't work for the case where nodes were defined in ascending order of slots.
Tetsuya's proposed patch didn't solve the problem for me, but it did correct the case where cpus/proc > 1. The final patch requires that we loop over the assignment
algo until all procs are assigned or all nodes are filled - any remaining procs are then handled in the cleanup loop.
cmr=v1.7.5:reviewer=rhc:subject=fix map-by node for different cases
This commit was SVN r30798.
better error message when there is only one socket available
fixed by Elena, reviewed by Miked
cmr=v1.7.5:reviewer=ompi-rm1.7
This commit was SVN r30787.
can't apply its normal precedence rules.
So...print a big "deprecated" warning for the old params and error out if a conflict is detected. I know that isn't what people really wanted, but it's the best we
can do. If only the old style param is given, then process it after the warning.
Extend the current map-by param to add support for ppr and cpus-per-proc, adding the latter to the list of allowed modifiers using "pe=n" for processing elements/proc. Thus, you can map-by socket:pe=2,oversubscribe to map by socket, binding 2 processing elements/process, with oversubscription allowed. Or you can map-by ppr:2:socket:pe=4 to map two processes to every socket in the allocation, binding each process to 4 processing elements.
For those wondering, a processing element is defined as a hwthread if --use-hwthreads-as-cpus is given, or else as a core.
Refs trac:4117
This commit was SVN r30620.
The following Trac tickets were found above:
Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
Refs trac:4117. Please use this commit rather than the patch attached to
the ticket; the patch had a few mistakes in the tweaked wording.
This commit was SVN r30362.
The following SVN revision numbers were found above:
r30298 --> open-mpi/ompi@58479399c3
The following Trac tickets were found above:
Ticket 4117 --> https://svn.open-mpi.org/trac/ompi/ticket/4117
Thanks to Paul Hargrove for reporting the problem on NetBSD.
cmr=v1.7.4:reviewer=jsquyres:subject=Handle the case of nodes that do not report cores
This commit was SVN r30180.
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi. This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.
This commit was SVN r30140.
after last refactoring in rmaps, map-by dist:hca was disabled.
reverting it back
found/fixed by Elena, reviewed by miked
cmr=v1.7.4:reviewer=ompi-rm1.7
This commit was SVN r30118.
Fix comm_spawn on a single host - with the new default mapping scheme, we were incorrectly computing the number of procs to put on the node.
Refs trac:4003
This commit was SVN r30033.
The following Trac tickets were found above:
Ticket 4003 --> https://svn.open-mpi.org/trac/ompi/ticket/4003
Reset topology usage for each node as we bind as multiple nodes may be linked to the same topology object. This will need to be revisited for scale as it does take some non-zero time to reset the usage each iteration. However, storing individual topology objects for every node consumes memory, so it's a tradeoff.
cmr=v1.7.4:reviewer=jsquyres:subject=Eliminate excessive binding/memory warnings
This commit was SVN r29978.