1
1
Этот коммит содержится в:
Ralph Castain 2014-10-16 21:04:50 -07:00
родитель 43aff4d8b3
Коммит f9d620e3a7

Просмотреть файл

@ -77,6 +77,24 @@ process starter, as opposed to, for example, \fIrsh\fR or \fIssh\fR,
which require the use of a hostfile, or will default to running all X which require the use of a hostfile, or will default to running all X
copies on the localhost), scheduling (by default) in a round-robin fashion by copies on the localhost), scheduling (by default) in a round-robin fashion by
CPU slot. See the rest of this page for more details. CPU slot. See the rest of this page for more details.
.P
Please note that mpirun automatically binds processes as of the start of the
v1.8 series. Two binding patterns are used in the absence of any further directives:
.TP 18
.B Bind to core:
when the number of processes is <= 2
.
.
.TP
.B Bind to socket:
when the number of processes is > 2
.
.
.P
If your application uses threads, then you probably want to ensure that you are
either not bound at all (by specifying --bind-to none), or bound to multiple cores
using an appropriate binding level or specific number of processing elements per
application process.
. .
.\" ************************** .\" **************************
.\" Options Section .\" Options Section
@ -128,7 +146,14 @@ cause orterun to exit.
. .
. .
.P .P
To specify which hosts (nodes) of the cluster to run on: Use one of the following options to specify which hosts (nodes) of the cluster to run on. Note
that as of the start of the v1.8 release, mpirun will launch a daemon onto each host in the
allocation (as modified by the following options) at the very beginning of execution, regardless
of whether or not application processes will eventually be mapped to execute there. This is
done to allow collection of hardware topology information from the remote nodes, thus allowing
us to map processes against known topology. However, it is a change from the behavior in prior releases
where daemons were only launched \fRafter\fP mapping was complete, and thus only occurred on
nodes where application processes would actually be executing.
. .
. .
.TP .TP
@ -151,7 +176,9 @@ Synonym for \fI-hostfile\fP.
. .
. .
.P .P
To specify the number of processes to launch: The following options specify the number of processes to launch. Note that none
of the options imply a particular binding policy - e.g., requesting N processes
for each socket does not imply that the processes will be bound to the socket.
. .
. .
.TP .TP
@ -167,6 +194,11 @@ error (without beginning execution of the application) otherwise.
. .
. .
.TP .TP
.B —map-by ppr:N:<object>
Launch N times the number of objects of the specified type on each node.
.
.
.TP
.B -npersocket\fR,\fP --npersocket <#persocket> .B -npersocket\fR,\fP --npersocket <#persocket>
On each node, launch this many processes times the number of processor On each node, launch this many processes times the number of processor
sockets on the node. sockets on the node.
@ -253,7 +285,7 @@ For process binding:
.TP .TP
.B --bind-to <foo> .B --bind-to <foo>
Bind processes to the specified object, defaults to \fIcore\fP. Supported options Bind processes to the specified object, defaults to \fIcore\fP. Supported options
include slot, hwthread, core, socket, numa, board, and none. include slot, hwthread, core, l1cache, l2cache, l3cache, socket, numa, board, and none.
. .
.TP .TP
.B -cpus-per-proc\fR,\fP --cpus-per-proc <#perproc> .B -cpus-per-proc\fR,\fP --cpus-per-proc <#perproc>
@ -749,13 +781,13 @@ Consider the same hostfile as above, again with \fI-np\fP 6:
mpirun 0 1 2 3 4 5 mpirun 0 1 2 3 4 5
mpirun -bynode 0 3 1 4 2 5 mpirun --map-by node 0 3 1 4 2 5
mpirun -nolocal 0 1 2 3 4 5 mpirun -nolocal 0 1 2 3 4 5
. .
.PP .PP
The \fI-bynode\fP option does likewise but numbers the processes in "by node" The \fI--map-by node\fP option will load balance the processes across
in a round-robin fashion. the available nodes, numbering each process in a round-robin fashion.
. .
.PP .PP
The \fI-nolocal\fP option prevents any processes from being mapped onto the The \fI-nolocal\fP option prevents any processes from being mapped onto the
@ -821,19 +853,32 @@ mpirun -H aa -np 1 hostname : -H bb,cc -np 2 uptime
will launch process 0 running \fIhostname\fP on node aa and processes 1 and 2 will launch process 0 running \fIhostname\fP on node aa and processes 1 and 2
each running \fIuptime\fP on nodes bb and cc, respectively. each running \fIuptime\fP on nodes bb and cc, respectively.
. .
.SS Mapping Processes to Nodes: Using Arbitrary Mappings .SS Mapping, Ranking, and Binding: Oh My!
. .
The mapping of process processes to nodes can be prescribed not just OpenMPI employs a three-phase procedure for assigning process locations and
ranks. The \fImapping\fP step is used to assign a default location to each process
based on the mapper being employed. Mapping by slot, node, and sequentially results
in the assignment of the processes to the node level. In contrast, mapping by object, allows
the mapper to assign the process to an actual object on each node.
.
.PP
\fBNote:\fP the location assigned to the process is independent of where it will be bound - the
assignment is used solely as input to the binding algorithm.
.
.PP
The mapping of process processes to nodes can be defined not just
with general policies but also, if necessary, using arbitrary mappings with general policies but also, if necessary, using arbitrary mappings
that cannot be described by a simple policy. One can use the "sequential that cannot be described by a simple policy. One can use the "sequential
mapper," which reads the hostfile line by line, assigning processes mapper," which reads the hostfile line by line, assigning processes
to nodes in whatever order the hostfile specifies. Use the to nodes in whatever order the hostfile specifies. Use the
\fI-mca rmaps seq\fP option. For example, using the same hostfile \fI-mca rmaps seq\fP option. For example, using the same hostfile
as before as before:
. .
.TP 4 .PP
mpirun -hostfile myhostfile ./a.out mpirun -hostfile myhostfile -mca rmaps seq ./a.out
will launch three processes, on nodes aa, bb, and cc, respectively. .
.PP
will launch three processes, one on each of nodes aa, bb, and cc, respectively.
The slot counts don't matter; one process is launched per line on The slot counts don't matter; one process is launched per line on
whatever node is listed on the line. whatever node is listed on the line.
. .
@ -842,9 +887,31 @@ Another way to specify arbitrary mappings is with a rankfile, which
gives you detailed control over process binding as well. Rankfiles gives you detailed control over process binding as well. Rankfiles
are discussed below. are discussed below.
. .
.SS Process Binding .PP
The second phase focuses on the \fIranking\fP of the process within the job. OpenMPI
separates this from the mapping procedure to allow more flexibility in the
relative placement of MPI ranks. This is best illustrated by considering the
following two cases where we used the —map-by ppr:2:socket option:
. .
Processes may be bound to specific resources on a node. This can .PP
node aa node bb
rank-by core 0 1 ! 2 3 4 5 ! 6 7
rank-by socket 0 2 ! 1 3 4 6 ! 5 7
rank-by socket:span 0 4 ! 1 5 2 6 ! 3 7
.
.PP
Ranking by core and by slot provide the identical result - a simple progression of ranks across
each node. Ranking by socket does a round-robin ranking within each node until all processes
have been assigned a rank, and then progresses to the next node. Adding the \fIspan\fP
modifier to the ranking directive causes the ranking algorithm to treat the entire allocation
as a single entity - thus, the ranks are assigned across all sockets before circling back
around to the beginning.
.
.PP
The \fIbinding\fP phase actually binds each process to a given set of processors. This can
improve performance if the operating system is placing processes improve performance if the operating system is placing processes
suboptimally. For example, it might oversubscribe some multi-core suboptimally. For example, it might oversubscribe some multi-core
processor sockets, leaving other sockets idle; this can lead processor sockets, leaving other sockets idle; this can lead
@ -856,20 +923,23 @@ processes excessively, regardless of how optimally those processes
were placed to begin with. were placed to begin with.
. .
.PP .PP
To bind processes, one must first associate them with the resources The processors to be used for binding
on which they should run. For example, the \fI--map-by core\fP option can be identified in terms of topological groupings - e.g., binding to an l3cache will bind
associates the processes on a node with successive cores. Or, each process to all processors in the l3cache within their assigned location. Thus, if a process
\fI--map-by socket\fP associates the processes with successive processor sockets, is assigned by the mapper to a certain socket, then a \fI—bind-to l3cache\fP directive will cause
cycling through the sockets in a round-robin fashion if necessary. the process to be bound to the l3cache within that socket.
And \fI-cpus-per-proc\fP indicates how many cores to bind per process.
. .
.PP .PP
But, such association is meaningless unless the processes are actually To help balance loads, the binding directive uses a round-robin method when binding to
bound to those resources. The binding option specifies the granularity levels lower than used in the mapper. For example, consider the case where a job is
of binding -- say, with \fI-bind-to core\fP or \fI-bind-to socket\fP. mapped to the socket level, and then bound to core. Each socket will have multiple cores,
One can also turn binding off with \fI-bind-to none\fP, which is so if multiple processes are mapped to a given socket, the binding algorithm will assign
typically the default. each process located to a socket to a unique core in a round-robin manner.
.\" JMS ^^ THE ABOVE STATEMENT IS NO LONGER TRUE. .
.PP
Alternatively, processes mapped by l2cache and then bound to socket will simply be bound
to all the processors in the socket where they are located. In this manner, users can
exert detailed control over relative rank location and binding.
. .
.PP .PP
Finally, \fI--report-bindings\fP can be used to report bindings. Finally, \fI--report-bindings\fP can be used to report bindings.
@ -921,30 +991,17 @@ Their usage is less convenient than that of \fImpirun\fP options.
On the other hand, MCA parameters can be set not only on the \fImpirun\fP On the other hand, MCA parameters can be set not only on the \fImpirun\fP
command line, but alternatively in a system or user mca-params.conf file command line, but alternatively in a system or user mca-params.conf file
or as environment variables, as described in the MCA section below. or as environment variables, as described in the MCA section below.
The correspondences are: Some examples include:
.
mpirun option MCA parameter key value
--map-by core rmaps_base_schedule_policy core
--map-by socket rmaps_base_schedule_policy socket
--bind-to core orte_process_binding core
--bind-to socket orte_process_binding socket
--bind-to none orte_process_binding none
.\" JMS I DON'T KNOW IF THESE ARE STILL THE RIGHT MCA PARAM NAMES
. .
.PP .PP
The \fIorte_process_binding\fP value can also take on the mpirun option MCA parameter key value
\fI:if-avail\fP attribute. This attribute means that processes
will be bound only if this is supported on the underlying
operating system. Without the attribute, if there is no
such support, the binding request results in an error.
For example, you could have
.
% cat $HOME/.openmpi/mca-params.conf --map-by core rmaps_base_mapping_policy core
rmaps_base_schedule_policy = socket --map-by socket rmaps_base_mapping_policy socket
orte_process_binding = socket:if-avail --rank-by core rmaps_base_ranking_policy core
--bind-to core hwloc_base_binding_policy core
--bind-to socket hwloc_base_binding_policy socket
--bind-to none hwloc_base_binding_policy none
. .
. .
.SS Rankfiles .SS Rankfiles
@ -1218,13 +1275,15 @@ is equivalent to
.SS Exported Environment Variables .SS Exported Environment Variables
. .
All environment variables that are named in the form OMPI_* will automatically All environment variables that are named in the form OMPI_* will automatically
be exported to new processes on the local and remote nodes. be exported to new processes on the local and remote nodes. Environmental
The \fI\-x\fP option to \fImpirun\fP can be used to export specific environment parameters can also be set/forwarded to the new processes using the new MCA
variables to the new processes. While the syntax of the \fI\-x\fP parameter \fImca_base_env_list\fP. The \fI\-x\fP option to \fImpirun\fP has
option allows the definition of new variables, note that the parser been deprecated, but the syntax of the new MCA param follows that prior
for this option is currently not very sophisticated - it does not even example. While the syntax of the \fI\-x\fP option and MCA param
allows the definition of new variables, note that the parser
for these options are currently not very sophisticated - it does not even
understand quoted values. Users are advised to set variables in the understand quoted values. Users are advised to set variables in the
environment and use \fI\-x\fP to export them; not to define them. environment and use the option to export them; not to define them.
. .
. .
. .