Update the orterun man page
Этот коммит содержится в:
родитель
43aff4d8b3
Коммит
f9d620e3a7
@ -77,6 +77,24 @@ process starter, as opposed to, for example, \fIrsh\fR or \fIssh\fR,
|
||||
which require the use of a hostfile, or will default to running all X
|
||||
copies on the localhost), scheduling (by default) in a round-robin fashion by
|
||||
CPU slot. See the rest of this page for more details.
|
||||
.P
|
||||
Please note that mpirun automatically binds processes as of the start of the
|
||||
v1.8 series. Two binding patterns are used in the absence of any further directives:
|
||||
.TP 18
|
||||
.B Bind to core:
|
||||
when the number of processes is <= 2
|
||||
.
|
||||
.
|
||||
.TP
|
||||
.B Bind to socket:
|
||||
when the number of processes is > 2
|
||||
.
|
||||
.
|
||||
.P
|
||||
If your application uses threads, then you probably want to ensure that you are
|
||||
either not bound at all (by specifying --bind-to none), or bound to multiple cores
|
||||
using an appropriate binding level or specific number of processing elements per
|
||||
application process.
|
||||
.
|
||||
.\" **************************
|
||||
.\" Options Section
|
||||
@ -128,7 +146,14 @@ cause orterun to exit.
|
||||
.
|
||||
.
|
||||
.P
|
||||
To specify which hosts (nodes) of the cluster to run on:
|
||||
Use one of the following options to specify which hosts (nodes) of the cluster to run on. Note
|
||||
that as of the start of the v1.8 release, mpirun will launch a daemon onto each host in the
|
||||
allocation (as modified by the following options) at the very beginning of execution, regardless
|
||||
of whether or not application processes will eventually be mapped to execute there. This is
|
||||
done to allow collection of hardware topology information from the remote nodes, thus allowing
|
||||
us to map processes against known topology. However, it is a change from the behavior in prior releases
|
||||
where daemons were only launched \fRafter\fP mapping was complete, and thus only occurred on
|
||||
nodes where application processes would actually be executing.
|
||||
.
|
||||
.
|
||||
.TP
|
||||
@ -151,7 +176,9 @@ Synonym for \fI-hostfile\fP.
|
||||
.
|
||||
.
|
||||
.P
|
||||
To specify the number of processes to launch:
|
||||
The following options specify the number of processes to launch. Note that none
|
||||
of the options imply a particular binding policy - e.g., requesting N processes
|
||||
for each socket does not imply that the processes will be bound to the socket.
|
||||
.
|
||||
.
|
||||
.TP
|
||||
@ -167,6 +194,11 @@ error (without beginning execution of the application) otherwise.
|
||||
.
|
||||
.
|
||||
.TP
|
||||
.B —map-by ppr:N:<object>
|
||||
Launch N times the number of objects of the specified type on each node.
|
||||
.
|
||||
.
|
||||
.TP
|
||||
.B -npersocket\fR,\fP --npersocket <#persocket>
|
||||
On each node, launch this many processes times the number of processor
|
||||
sockets on the node.
|
||||
@ -253,7 +285,7 @@ For process binding:
|
||||
.TP
|
||||
.B --bind-to <foo>
|
||||
Bind processes to the specified object, defaults to \fIcore\fP. Supported options
|
||||
include slot, hwthread, core, socket, numa, board, and none.
|
||||
include slot, hwthread, core, l1cache, l2cache, l3cache, socket, numa, board, and none.
|
||||
.
|
||||
.TP
|
||||
.B -cpus-per-proc\fR,\fP --cpus-per-proc <#perproc>
|
||||
@ -749,13 +781,13 @@ Consider the same hostfile as above, again with \fI-np\fP 6:
|
||||
|
||||
mpirun 0 1 2 3 4 5
|
||||
|
||||
mpirun -bynode 0 3 1 4 2 5
|
||||
mpirun --map-by node 0 3 1 4 2 5
|
||||
|
||||
mpirun -nolocal 0 1 2 3 4 5
|
||||
.
|
||||
.PP
|
||||
The \fI-bynode\fP option does likewise but numbers the processes in "by node"
|
||||
in a round-robin fashion.
|
||||
The \fI--map-by node\fP option will load balance the processes across
|
||||
the available nodes, numbering each process in a round-robin fashion.
|
||||
.
|
||||
.PP
|
||||
The \fI-nolocal\fP option prevents any processes from being mapped onto the
|
||||
@ -821,19 +853,32 @@ mpirun -H aa -np 1 hostname : -H bb,cc -np 2 uptime
|
||||
will launch process 0 running \fIhostname\fP on node aa and processes 1 and 2
|
||||
each running \fIuptime\fP on nodes bb and cc, respectively.
|
||||
.
|
||||
.SS Mapping Processes to Nodes: Using Arbitrary Mappings
|
||||
.SS Mapping, Ranking, and Binding: Oh My!
|
||||
.
|
||||
The mapping of process processes to nodes can be prescribed not just
|
||||
OpenMPI employs a three-phase procedure for assigning process locations and
|
||||
ranks. The \fImapping\fP step is used to assign a default location to each process
|
||||
based on the mapper being employed. Mapping by slot, node, and sequentially results
|
||||
in the assignment of the processes to the node level. In contrast, mapping by object, allows
|
||||
the mapper to assign the process to an actual object on each node.
|
||||
.
|
||||
.PP
|
||||
\fBNote:\fP the location assigned to the process is independent of where it will be bound - the
|
||||
assignment is used solely as input to the binding algorithm.
|
||||
.
|
||||
.PP
|
||||
The mapping of process processes to nodes can be defined not just
|
||||
with general policies but also, if necessary, using arbitrary mappings
|
||||
that cannot be described by a simple policy. One can use the "sequential
|
||||
mapper," which reads the hostfile line by line, assigning processes
|
||||
to nodes in whatever order the hostfile specifies. Use the
|
||||
\fI-mca rmaps seq\fP option. For example, using the same hostfile
|
||||
as before
|
||||
as before:
|
||||
.
|
||||
.TP 4
|
||||
mpirun -hostfile myhostfile ./a.out
|
||||
will launch three processes, on nodes aa, bb, and cc, respectively.
|
||||
.PP
|
||||
mpirun -hostfile myhostfile -mca rmaps seq ./a.out
|
||||
.
|
||||
.PP
|
||||
will launch three processes, one on each of nodes aa, bb, and cc, respectively.
|
||||
The slot counts don't matter; one process is launched per line on
|
||||
whatever node is listed on the line.
|
||||
.
|
||||
@ -842,9 +887,31 @@ Another way to specify arbitrary mappings is with a rankfile, which
|
||||
gives you detailed control over process binding as well. Rankfiles
|
||||
are discussed below.
|
||||
.
|
||||
.SS Process Binding
|
||||
.PP
|
||||
The second phase focuses on the \fIranking\fP of the process within the job. OpenMPI
|
||||
separates this from the mapping procedure to allow more flexibility in the
|
||||
relative placement of MPI ranks. This is best illustrated by considering the
|
||||
following two cases where we used the —map-by ppr:2:socket option:
|
||||
.
|
||||
Processes may be bound to specific resources on a node. This can
|
||||
.PP
|
||||
node aa node bb
|
||||
|
||||
rank-by core 0 1 ! 2 3 4 5 ! 6 7
|
||||
|
||||
rank-by socket 0 2 ! 1 3 4 6 ! 5 7
|
||||
|
||||
rank-by socket:span 0 4 ! 1 5 2 6 ! 3 7
|
||||
.
|
||||
.PP
|
||||
Ranking by core and by slot provide the identical result - a simple progression of ranks across
|
||||
each node. Ranking by socket does a round-robin ranking within each node until all processes
|
||||
have been assigned a rank, and then progresses to the next node. Adding the \fIspan\fP
|
||||
modifier to the ranking directive causes the ranking algorithm to treat the entire allocation
|
||||
as a single entity - thus, the ranks are assigned across all sockets before circling back
|
||||
around to the beginning.
|
||||
.
|
||||
.PP
|
||||
The \fIbinding\fP phase actually binds each process to a given set of processors. This can
|
||||
improve performance if the operating system is placing processes
|
||||
suboptimally. For example, it might oversubscribe some multi-core
|
||||
processor sockets, leaving other sockets idle; this can lead
|
||||
@ -856,20 +923,23 @@ processes excessively, regardless of how optimally those processes
|
||||
were placed to begin with.
|
||||
.
|
||||
.PP
|
||||
To bind processes, one must first associate them with the resources
|
||||
on which they should run. For example, the \fI--map-by core\fP option
|
||||
associates the processes on a node with successive cores. Or,
|
||||
\fI--map-by socket\fP associates the processes with successive processor sockets,
|
||||
cycling through the sockets in a round-robin fashion if necessary.
|
||||
And \fI-cpus-per-proc\fP indicates how many cores to bind per process.
|
||||
The processors to be used for binding
|
||||
can be identified in terms of topological groupings - e.g., binding to an l3cache will bind
|
||||
each process to all processors in the l3cache within their assigned location. Thus, if a process
|
||||
is assigned by the mapper to a certain socket, then a \fI—bind-to l3cache\fP directive will cause
|
||||
the process to be bound to the l3cache within that socket.
|
||||
.
|
||||
.PP
|
||||
But, such association is meaningless unless the processes are actually
|
||||
bound to those resources. The binding option specifies the granularity
|
||||
of binding -- say, with \fI-bind-to core\fP or \fI-bind-to socket\fP.
|
||||
One can also turn binding off with \fI-bind-to none\fP, which is
|
||||
typically the default.
|
||||
.\" JMS ^^ THE ABOVE STATEMENT IS NO LONGER TRUE.
|
||||
To help balance loads, the binding directive uses a round-robin method when binding to
|
||||
levels lower than used in the mapper. For example, consider the case where a job is
|
||||
mapped to the socket level, and then bound to core. Each socket will have multiple cores,
|
||||
so if multiple processes are mapped to a given socket, the binding algorithm will assign
|
||||
each process located to a socket to a unique core in a round-robin manner.
|
||||
.
|
||||
.PP
|
||||
Alternatively, processes mapped by l2cache and then bound to socket will simply be bound
|
||||
to all the processors in the socket where they are located. In this manner, users can
|
||||
exert detailed control over relative rank location and binding.
|
||||
.
|
||||
.PP
|
||||
Finally, \fI--report-bindings\fP can be used to report bindings.
|
||||
@ -921,30 +991,17 @@ Their usage is less convenient than that of \fImpirun\fP options.
|
||||
On the other hand, MCA parameters can be set not only on the \fImpirun\fP
|
||||
command line, but alternatively in a system or user mca-params.conf file
|
||||
or as environment variables, as described in the MCA section below.
|
||||
The correspondences are:
|
||||
.
|
||||
|
||||
mpirun option MCA parameter key value
|
||||
|
||||
--map-by core rmaps_base_schedule_policy core
|
||||
--map-by socket rmaps_base_schedule_policy socket
|
||||
--bind-to core orte_process_binding core
|
||||
--bind-to socket orte_process_binding socket
|
||||
--bind-to none orte_process_binding none
|
||||
.\" JMS I DON'T KNOW IF THESE ARE STILL THE RIGHT MCA PARAM NAMES
|
||||
Some examples include:
|
||||
.
|
||||
.PP
|
||||
The \fIorte_process_binding\fP value can also take on the
|
||||
\fI:if-avail\fP attribute. This attribute means that processes
|
||||
will be bound only if this is supported on the underlying
|
||||
operating system. Without the attribute, if there is no
|
||||
such support, the binding request results in an error.
|
||||
For example, you could have
|
||||
.
|
||||
mpirun option MCA parameter key value
|
||||
|
||||
% cat $HOME/.openmpi/mca-params.conf
|
||||
rmaps_base_schedule_policy = socket
|
||||
orte_process_binding = socket:if-avail
|
||||
--map-by core rmaps_base_mapping_policy core
|
||||
--map-by socket rmaps_base_mapping_policy socket
|
||||
--rank-by core rmaps_base_ranking_policy core
|
||||
--bind-to core hwloc_base_binding_policy core
|
||||
--bind-to socket hwloc_base_binding_policy socket
|
||||
--bind-to none hwloc_base_binding_policy none
|
||||
.
|
||||
.
|
||||
.SS Rankfiles
|
||||
@ -1218,13 +1275,15 @@ is equivalent to
|
||||
.SS Exported Environment Variables
|
||||
.
|
||||
All environment variables that are named in the form OMPI_* will automatically
|
||||
be exported to new processes on the local and remote nodes.
|
||||
The \fI\-x\fP option to \fImpirun\fP can be used to export specific environment
|
||||
variables to the new processes. While the syntax of the \fI\-x\fP
|
||||
option allows the definition of new variables, note that the parser
|
||||
for this option is currently not very sophisticated - it does not even
|
||||
be exported to new processes on the local and remote nodes. Environmental
|
||||
parameters can also be set/forwarded to the new processes using the new MCA
|
||||
parameter \fImca_base_env_list\fP. The \fI\-x\fP option to \fImpirun\fP has
|
||||
been deprecated, but the syntax of the new MCA param follows that prior
|
||||
example. While the syntax of the \fI\-x\fP option and MCA param
|
||||
allows the definition of new variables, note that the parser
|
||||
for these options are currently not very sophisticated - it does not even
|
||||
understand quoted values. Users are advised to set variables in the
|
||||
environment and use \fI\-x\fP to export them; not to define them.
|
||||
environment and use the option to export them; not to define them.
|
||||
.
|
||||
.
|
||||
.
|
||||
|
Загрузка…
x
Ссылка в новой задаче
Block a user