Update the orterun man page

2014-10-16 21:04:50 -07:00 · 2014-10-16 21:04:50 -07:00 · f9d620e3a7
--- a/orte/tools/orterun/orterun.1in
+++ b/orte/tools/orterun/orterun.1in
@ -77,6 +77,24 @@ process starter, as opposed to, for example, \fIrsh\fR or \fIssh\fR,
 which require the use of a hostfile, or will default to running all X
 copies on the localhost), scheduling (by default) in a round-robin fashion by
 CPU slot.  See the rest of this page for more details.
 .P
 Please note that mpirun automatically binds processes as of the start of the
 v1.8 series. Two binding patterns are used in the absence of any further directives:
 .TP 18
 .B Bind to core:
 when the number of processes is <= 2
 .
 .
 .TP
 .B Bind to socket:
 when the number of processes is > 2
 .
 .
 .P
 If your application uses threads, then you probably want to ensure that you are
 either not bound at all (by specifying --bind-to none), or bound to multiple cores
 using an appropriate binding level or specific number of processing elements per
 application process.
 .
 .\" **************************
 .\"    Options Section
@ -128,7 +146,14 @@ cause orterun to exit.
 .
 .
 .P
-To specify which hosts (nodes) of the cluster to run on:
+Use one of the following options to specify which hosts (nodes) of the cluster to run on. Note
 that as of the start of the v1.8 release, mpirun will launch a daemon onto each host in the
 allocation (as modified by the following options) at the very beginning of execution, regardless
 of whether or not application processes will eventually be mapped to execute there. This is
 done to allow collection of hardware topology information from the remote nodes, thus allowing
 us to map processes against known topology. However, it is a change from the behavior in prior releases
 where daemons were only launched \fRafter\fP mapping was complete, and thus only occurred on
 nodes where application processes would actually be executing.
 .
 .
 .TP
@ -151,7 +176,9 @@ Synonym for \fI-hostfile\fP.
 .
 .
 .P
-To specify the number of processes to launch:
+The following options specify the number of processes to launch. Note that none
 of the options imply a particular binding policy - e.g., requesting N processes
 for each socket does not imply that the processes will be bound to the socket.
 .
 .
 .TP
@ -167,6 +194,11 @@ error (without beginning execution of the application) otherwise.
 .
 .
 .TP
 .B —map-by ppr:N:<object>
 Launch N times the number of objects of the specified type on each node.
 .
 .
 .TP
 .B -npersocket\fR,\fP --npersocket <#persocket>
 On each node, launch this many processes times the number of processor
 sockets on the node.
@ -253,7 +285,7 @@ For process binding:
 .TP
 .B --bind-to <foo>
 Bind processes to the specified object, defaults to \fIcore\fP. Supported options
-include slot, hwthread, core, socket, numa, board, and none.
+include slot, hwthread, core, l1cache, l2cache, l3cache, socket, numa, board, and none.
 .
 .TP
 .B -cpus-per-proc\fR,\fP --cpus-per-proc <#perproc>
@ -749,13 +781,13 @@ Consider the same hostfile as above, again with \fI-np\fP 6:
  mpirun                  0 1 2 3      4 5
-  mpirun -bynode          0 3          1 4          2 5
+  mpirun --map-by node    0 3          1 4          2 5
  mpirun -nolocal                      0 1 2 3      4 5
 .
 .PP
-The \fI-bynode\fP option does likewise but numbers the processes in "by node"
+The \fI--map-by node\fP option will load balance the processes across
-in a round-robin fashion.
+the available nodes, numbering each process in a round-robin fashion.
 .
 .PP
 The \fI-nolocal\fP option prevents any processes from being mapped onto the
@ -821,19 +853,32 @@ mpirun -H aa -np 1 hostname : -H bb,cc -np 2 uptime
 will launch process 0 running \fIhostname\fP on node aa and processes 1 and 2
 each running \fIuptime\fP on nodes bb and cc, respectively.
 .
-.SS Mapping Processes to Nodes:  Using Arbitrary Mappings
+.SS Mapping, Ranking, and Binding: Oh My!
 .
-The mapping of process processes to nodes can be prescribed not just
+OpenMPI employs a three-phase procedure for assigning process locations and
 ranks. The \fImapping\fP step is used to assign a default location to each process
 based on the mapper being employed. Mapping by slot, node, and sequentially results
 in the assignment of the processes to the node level. In contrast, mapping by object, allows
 the mapper to assign the process to an actual object on each node.
 .
 .PP
 \fBNote:\fP the location assigned to the process is independent of where it will be bound - the
 assignment is used solely as input to the binding algorithm.
 .
 .PP
 The mapping of process processes to nodes can be defined not just
 with general policies but also, if necessary, using arbitrary mappings
 that cannot be described by a simple policy.  One can use the "sequential
 mapper," which reads the hostfile line by line, assigning processes
 to nodes in whatever order the hostfile specifies.  Use the
 \fI-mca rmaps seq\fP option.  For example, using the same hostfile
-as before
+as before:
 .
-.TP 4
+.PP
-mpirun -hostfile myhostfile ./a.out
+mpirun -hostfile myhostfile -mca rmaps seq ./a.out
-will launch three processes, on nodes aa, bb, and cc, respectively.
+.
 .PP
 will launch three processes, one on each of nodes aa, bb, and cc, respectively.
 The slot counts don't matter;  one process is launched per line on
 whatever node is listed on the line.
 .
@ -842,9 +887,31 @@ Another way to specify arbitrary mappings is with a rankfile, which
 gives you detailed control over process binding as well.  Rankfiles
 are discussed below.
 .
-.SS Process Binding
+.PP
 The second phase focuses on the \fIranking\fP of the process within the job. OpenMPI
 separates this from the mapping procedure to allow more flexibility in the
 relative placement of MPI ranks. This is best illustrated by considering the
 following two cases where we used the —map-by ppr:2:socket option:
 .
-Processes may be bound to specific resources on a node.  This can
+.PP
                          node aa       node bb
    rank-by core         0 1 ! 2 3     4 5 ! 6 7
   rank-by socket        0 2 ! 1 3     4 6 ! 5 7
   rank-by socket:span   0 4 ! 1 5     2 6 ! 3 7
 .
 .PP
 Ranking by core and by slot provide the identical result - a simple progression of ranks across
 each node. Ranking by socket does a round-robin ranking within each node until all processes
 have been assigned a rank, and then progresses to the next node. Adding the \fIspan\fP
 modifier to the ranking directive causes the ranking algorithm to treat the entire allocation
 as a single entity - thus, the ranks are assigned across all sockets before circling back
 around to the beginning.
 .
 .PP
 The \fIbinding\fP phase actually binds each process to a given set of processors. This can
 improve performance if the operating system is placing processes
 suboptimally.  For example, it might oversubscribe some multi-core
 processor sockets, leaving other sockets idle;  this can lead
@ -856,20 +923,23 @@ processes excessively, regardless of how optimally those processes
 were placed to begin with.
 .
 .PP
-To bind processes, one must first associate them with the resources
+The processors to be used for binding
-on which they should run.  For example, the \fI--map-by core\fP option
+can be identified in terms of topological groupings - e.g., binding to an l3cache will bind
-associates the processes on a node with successive cores.  Or,
+each process to all processors in the l3cache within their assigned location. Thus, if a process
-\fI--map-by socket\fP associates the processes with successive processor sockets,
+is assigned by the mapper to a certain socket, then a \fI—bind-to l3cache\fP directive will cause
-cycling through the sockets in a round-robin fashion if necessary.
+the process to be bound to the l3cache within that socket.
 And \fI-cpus-per-proc\fP indicates how many cores to bind per process.
 .
 .PP
-But, such association is meaningless unless the processes are actually
+To help balance loads, the binding directive uses a round-robin method when binding to
-bound to those resources.  The binding option specifies the granularity
+levels lower than used in the mapper. For example, consider the case where a job is
-of binding -- say, with \fI-bind-to core\fP or \fI-bind-to socket\fP.
+mapped to the socket level, and then bound to core. Each socket will have multiple cores,
-One can also turn binding off with \fI-bind-to none\fP, which is
+so if multiple processes are mapped to a given socket, the binding algorithm will assign
-typically the default.
+each process located to a socket to a unique core in a round-robin manner.
-.\" JMS ^^ THE ABOVE STATEMENT IS NO LONGER TRUE.
+.
 .PP
 Alternatively, processes mapped by l2cache and then bound to socket will simply be bound
 to all the processors in the socket where they are located. In this manner, users can
 exert detailed control over relative rank location and binding.
 .
 .PP
 Finally, \fI--report-bindings\fP can be used to report bindings.
@ -921,30 +991,17 @@ Their usage is less convenient than that of \fImpirun\fP options.
 On the other hand, MCA parameters can be set not only on the \fImpirun\fP
 command line, but alternatively in a system or user mca-params.conf file
 or as environment variables, as described in the MCA section below.
-The correspondences are:
+Some examples include:
 .
  mpirun option          MCA parameter key           value
  --map-by core          rmaps_base_schedule_policy  core
  --map-by socket        rmaps_base_schedule_policy  socket
  --bind-to core         orte_process_binding        core
  --bind-to socket       orte_process_binding        socket
  --bind-to none         orte_process_binding        none
 .\" JMS I DON'T KNOW IF THESE ARE STILL THE RIGHT MCA PARAM NAMES
 .
 .PP
-The \fIorte_process_binding\fP value can also take on the
+    mpirun option          MCA parameter key         value
 \fI:if-avail\fP attribute.  This attribute means that processes
 will be bound only if this is supported on the underlying
 operating system.  Without the attribute, if there is no
 such support, the binding request results in an error.
 For example, you could have
 .
-  % cat $HOME/.openmpi/mca-params.conf
+  --map-by core          rmaps_base_mapping_policy   core
-  rmaps_base_schedule_policy = socket
+  --map-by socket        rmaps_base_mapping_policy   socket
-  orte_process_binding       = socket:if-avail
+  --rank-by core         rmaps_base_ranking_policy   core
  --bind-to core         hwloc_base_binding_policy   core
  --bind-to socket       hwloc_base_binding_policy   socket
  --bind-to none         hwloc_base_binding_policy   none
 .
 .
 .SS Rankfiles
@ -1218,13 +1275,15 @@ is equivalent to
 .SS Exported Environment Variables
 .
 All environment variables that are named in the form OMPI_* will automatically
-be exported to new processes on the local and remote nodes.
+be exported to new processes on the local and remote nodes. Environmental
-The \fI\-x\fP option to \fImpirun\fP can be used to export specific environment
+parameters can also be set/forwarded to the new processes using the new MCA
-variables to the new processes.  While the syntax of the \fI\-x\fP
+parameter \fImca_base_env_list\fP. The \fI\-x\fP option to \fImpirun\fP has
-option allows the definition of new variables, note that the parser
+been deprecated, but the syntax of the new MCA param follows that prior
-for this option is currently not very sophisticated - it does not even
+example. While the syntax of the \fI\-x\fP option and MCA param
 allows the definition of new variables, note that the parser
 for these options are currently not very sophisticated - it does not even
 understand quoted values.  Users are advised to set variables in the
-environment and use \fI\-x\fP to export them; not to define them.
+environment and use the option to export them; not to define them.
 .
 .
 .