openmpi/orte/tools/orterun/orterun.1

.\"
.\" Man page for ORTE's orterun command
.\"
.\" .TH name     section center-footer   left-footer  center-header
.TH     MPIRUN  1       "March 2006" "Open MPI"   "OPEN MPI COMMANDS"
.\" **************************
.\"    Name Section
.\" **************************
.SH NAME
.
orterun, mpirun, mpiexec \- Execute serial and parallel jobs in Open MPI.

.B Note:
\fImpirun\fP, \fImpiexec\fP, and \fIorterun\fP are all exact synonyms for each
other.  Using any of the names will result in exactly identical behavior.
.
.\" **************************
.\"    Synopsis Section
.\" **************************
.SH SYNOPSIS
.
.PP
Single Process Multiple Data (SPMD) Model:

.B mpirun
.R [ options ]
.B <program>
.R [ <args> ]

Multiple Instruction Multiple Data (MIMD) Model:

.B mpirun
.R [ global_options ]
       [ local_options1 ]
.B <program1>
.R [ <args1> ] :
       [ local_options2 ]
.B <program2>
.R [ <args2> ] :
       ... :
       [ local_optionsN ]
.B <programN>
.R [ <argsN> ]
.
.\" **************************
.\"    Quick Summary Section
.\" **************************
.SH QUICK SUMMARY
.
If you are simply looking for how to run an MPI application, you
probably want to use a command line of the following form:

    \fBshell$\fP mpirun -np X [ --hostfile <filename> ]  <program>

This will run X copies of \fI<program>\fR in your current run-time
environment (if running under a supported resource manager, Open MPI's
\fImpirun\fR will usually automatically use the corresponding resource manager
process starter, as opposed to, for example, \fIrsh\fR or \fIssh\fR,
which require the use of a hostfile, or will default to running all X
copies on the localhost), scheduling (by default) in a round-robin fashion by
CPU slot.  See the rest of this page for more details.
.
.\" **************************
.\"    Options Section
.\" **************************
.SH OPTIONS
.
.I mpirun
will send the name of the directory where it was invoked on the local
node to each of the remote nodes, and attempt to change to that
directory.  See the "Current Working Directory" section below for further
details.
.\"
.\" Start options listing
.\"    Indent 10 chacters from start of first column to start of second column
.TP 10
.B <args>
Pass these run-time arguments to every new process.  These must always
be the last arguments to \fImpirun\fP. If an app context file is used,
\fI<args>\fP will be ignored.
.
.
.TP
.B <program>
The program executable. This is identified as the first non-recognized argument
to mpirun.
.
.
.TP
.B -aborted\fR,\fP --aborted \fR<#>\fP
Set the maximum number of aborted processes to display.
.
.
.TP
.B --app \fR<appfile>\fP
Provide an appfile, ignoring all other command line options.
.
.
.TP
.B -bynode\fR,\fP --bynode
Allocate (map) the processes by node in a round-robin scheme.
.
.
.TP
.B -byslot\fR,\fP --byslot
Allocate (map) the processes by slot in a round-robin scheme. This is the
default.
.
.
.TP
.B -c \fR<#>\fP
Synonym for \fI-np\fP.
.
.
.TP
.B -debug\fR,\fP --debug
Invoke the user-level debugger indicated by the \fIorte_base_user_debugger\fP
MCA parameter.
.
.
.TP
.B -debugger\fR,\fP --debugger
Sequence of debuggers to search for when \fI--debug\fP is used (i.e.
a synonym for \fIorte_base_user_debugger\fP MCA parameter).
.
.
.TP
.B -gmca\fR,\fP --gmca \fR<key> <value>\fP
Pass global MCA parameters that are applicable to all contexts. \fI<key>\fP is
the parameter name; \fI<value>\fP is the parameter value.
.
.
.TP
.B -h\fR,\fP --help
Display help for this command
.
.
.TP
.B -H \fR<host1,host2,...,hostN>\fP
Synonym for \fI-host\fP.
.
.
.TP
.B -host\fR,\fP --host \fR<host1,host2,...,hostN>\fP
List of hosts on which to invoke processes.
.
.
.TP
.B -hostfile\fR,\fP --hostfile \fR<hostfile>\fP
Provide a hostfile to use.
.\" JJH - Should have man page for how to format a hostfile properly.
.
.
.TP
.B -machinefile\fR,\fP --machinefile \fR<machinefile>\fP
Synonym for \fI-hostfile\fP.
.
.
.TP
.B -mca\fR,\fP --mca <key> <value>
Send arguments to various MCA modules.  See the "MCA" section, below.
.
.
.TP
.B -n\fR,\fP --n \fR<#>\fP
Synonym for \fI-np\fP.
.
.
.TP
.B -np \fR<#>\fP
Run this many copies of the program on the given nodes.  This option
indicates that the specified file is an executable program and not an
application context.
.
.
.TP
.B -nw\fR,\fP --nw
Launch the processes and do not wait for their completion. mpirun will
complete as soon as successful launch occurs.
.
.
.TP
.B -path\fR,\fP --path \fR<path>\fP
<path> that will be used when attempting to locate requested executables.
.
.
.TP
.B --tmpdir \fR<dir>\fP
Set the root for the session directory tree for mpirun only.
.
.
.TP
.B -tv\fR,\fP --tv
Launch processes under the TotalView debugger.
Deprecated backwards compatibility flag. Synonym for \fI--debug\fP.
.
.
.TP
.B --universe \fR<username@hostname:universe_name>\fP
For this application, set the universe name as:
     username@hostname:universe_name
.
.
.TP
.B -v\fR,\fP --verbose
Be verbose
.
.
.TP
.B -wd \fR<dir>\fP
Change to the directory <dir> before the user's program executes.
See the "Current Working Directory" section for notes on relative paths.
.B Note:
If the \fI-wd\fP option appears both on the command line and in an
application context, the context will take precedence over the command line.
.
.
.TP
.B -x \fR<env>\fP
Export the specified environment variables to the remote nodes before
executing the program.  Existing environment variables can be
specified (see the Examples section, below), or new variable names
specified with corresponding values.  The parser for the \fI-x\fP
option is not very sophisticated; it does not even understand quoted
values.  Users are advised to set variables in the environment, and
then use \fI-x\fP to export (not define) them.
.
.
.P
The following options are useful for developers; they are not generally
useful to most ORTE and/or MPI users:
.
.TP
.B -d\fR,\fP --debug-devel
Enable debugging of the OpenRTE (the run-time layer in Open MPI).
This is not generally useful for most users.
.
.
.TP
.B --debug-daemons
Enable debugging of any OpenRTE daemons used by this application.
.
.
.TP
.B --debug-daemons-file
Enable debugging of any OpenRTE daemons used by this application, storing
output in files.
.
.
.TP
.B --no-daemonize
Do not detach OpenRTE daemons used by this application.
.
.
.\" **************************
.\"    Description Section
.\" **************************
.SH DESCRIPTION
.
One invocation of \fImpirun\fP starts an MPI application running under Open
MPI. If the application is single process multiple data (SPMD), the application
can be specified on the \fImpirun\fP command line.

If the application is multiple instruction multiple data (MIMD), comprising of
multiple programs, the set of programs and argument can be specified in one of
two ways: Extended Command Line Arguments, and Application Context.
.PP
An application context describes the MIMD program set including all arguments
in a separate file.
.\"See appcontext(5) for a description of the application context syntax.
This file essentially contains multiple \fImpirun\fP command lines, less the
command name itself.  The ability to specify different options for different
instantiations of a program is another reason to use an application context.
.PP
Extended command line arguments allow for the description of the application
layout on the command line using colons (\fI:\fP) to separate the specification
of programs and arguments. Some options are globally set across all specified
programs (e.g. --hostfile), while others are specific to a single program
(e.g. -np).
.
.
.
.SS Location Nomenclature
.
As described above, \fImpirun\fP can specify arbitrary locations in the current
Open MPI universe.
Locations can be specified either by CPU or by node.

.B Note:
Open MPI does not bind processes to CPUs -- specifying a location "by CPU" is
really a convenience mechanism for SMPs that ultimately maps down to a specific
node.
.PP
Specifying locations by node will launch one copy of an executable per
specified node.
Using the \fI--bynode\fP option tells Open MPI to use all available nodes.
Using the \fI--byslot\fP option tells Open MPI to use all slots on an available
node before allocating resources on the next available node.
For example:
.
.TP 4
mpirun --bynode -np 4 a.out
Runs one copy of the the executable
.I a.out
on all available nodes in the Open MPI universe.  MPI_COMM_WORLD rank 0
will be on node0, rank 1 will be on node1, etc. Regardless of how many slots
are available on each of the nodes.
.
.
.TP
mpirun --byslot -np 4 a.out
Runs one copy of the the executable
.I a.out
on each slot on a given node before running the executable on other available
nodes.
.
.
.
.SS Specifying Hosts
.
Hosts can be specified in a number of ways. The most common of which is in a
'hostfile' or 'machinefile'. If our hostfile contain the following information:
.
.

   \fBshell$\fP cat my-hostfile
   node00 slots=2
   node01 slots=2
   node02 slots=2

.
.
.TP
mpirun --hostfile my-hostfile -np 3 a.out
This will run one copy of the executable
.I a.out
on hosts node00,node01, and node02.
.
.
.PP
Another method for specifying hosts is directly on the command line. Here can
can include and exclude hosts from the set of hosts to run on. For example:
.
.
.TP
mpirun -np 3 --host a a.out
Runs three copies of the executable
.I a.out
on host a.
.
.
.TP
mpirun -np 3 --host a,b,c a.out
Runs one copy of the executable
.I a.out
on hosts a, b, and c.
.
.
.TP
mpirun -np 3 --hostfile my-hostfile --host node00 a.out
Runs three copies of the executable
.I a.out
on host node00.
.
.
.TP
mpirun -np 3 --hostfile my-hostfile --host node10 a.out
This will prompt an error since node10 is not in my-hostfile; mpirun will
abort.
.
.
.TP
shell$ mpirun -np 1 --host a hostname : -np 2 --host b,c uptime
Runs one copy of the executable
.I hostname
on host a. And runs one copy of the executable
.I uptime
on hosts b and c.
.
.
.
.SS Application Context or Executable Program?
.
To distinguish the two different forms, \fImpirun\fP
looks on the command line for \fI--app\fP option.  If
it is specified, then the file named on the command line is
assumed to be an application context.  If it is not
specified, then the file is assumed to be an executable program.
.
.
.
.SS Locating Files
.
If \fIno\fP relative or absolute path is specified for a file, Open MPI
will look for files by searching the directories in the user's PATH environment
variable as defined on the source node(s).
.PP
If a relative directory is specified, it must be relative to the initial
working directory determined by the specific starter used. For example when
using the rsh or ssh starters, the initial directory is $HOME by default. Other
starters may set the initial directory to the current working directory from
the invocation of \fImpirun\fP.
.
.
.
.SS Current Working Directory
.
The \fI\-wd\fP mpirun option allows the user to change to an arbitrary
directory before their program is invoked.  It can also be used in application
context files to specify working directories on specific nodes and/or
for specific applications.
.PP
If the \fI\-wd\fP option appears both in a context file and on the command line,
the context file directory will override the command line value.
.PP
If the \fI-wd\fP option is specified, Open MPI will attempt to change to the
specified directory on all of the remote nodes. If this fails, \fImpirun\fP
will abort.
.PP
If the \fI-wd\fP option is \fBnot\fP specified, Open MPI will send the
directory name where \fImpirun\fP was invoked to each of the remote nodes. The
remote nodes will try to change to that directory. If they are unable (e.g., if
the directory does not exit on that node), then Open MPI will use the default
directory determined by the starter.
.PP
All directory changing occurs before the user's program is invoked; it
does not wait until \fIMPI_INIT\fP is called.
.
.
.
.SS Standard I/O
.
Open MPI directs UNIX standard input to /dev/null on all processes
except the MPI_COMM_WORLD rank 0 process. The MPI_COMM_WORLD rank 0 process
inherits standard input from \fImpirun\fP.
.B Note:
The node that invoked \fImpirun\fP need not be the same as the node where the
MPI_COMM_WORLD rank 0 process resides. Open MPI handles the redirection of
\fImpirun\fP's standard input to the rank 0 process.
.PP
Open MPI directs UNIX standard output and error from remote nodes to the node
that invoked \fImpirun\fP and prints it on the standard output/error of
\fImpirun\fP.
Local processes inherit the standard output/error of \fImpirun\fP and transfer
to it directly.
.PP
Thus it is possible to redirect standard I/O for Open MPI applications by
using the typical shell redirection procedure on \fImpirun\fP.

      \fBshell$\fP mpirun -np 2 my_app < my_input > my_output

Note that in this example \fIonly\fP the MPI_COMM_WORLD rank 0 process will
receive the stream from \fImy_input\fP on stdin.  The stdin on all the other
nodes will be tied to /dev/null.  However, the stdout from all nodes will
be collected into the \fImy_output\fP file.
.
.
.
.SS Process Termination / Signal Handling
.
During the run of an MPI application, if any rank dies abnormally
(either exiting before invoking \fIMPI_FINALIZE\fP, or dying as the result of a
signal), \fImpirun\fP will print out an error message and kill the rest of the
MPI application.
.PP
User signal handlers should probably avoid trying to cleanup MPI state
(Open MPI is, currently, neither thread-safe nor async-signal-safe).
For example, if a segmentation fault occurs in \fIMPI_SEND\fP (perhaps because
a bad buffer was passed in) and a user signal handler is invoked, if this user
handler attempts to invoke \fIMPI_FINALIZE\fP, Bad Things could happen since
Open MPI was already "in" MPI when the error occurred.  Since \fImpirun\fP
will notice that the process died due to a signal, it is probably not
necessary (and safest) for the user to only clean up non-MPI state.
.
.
.
.SS Process Environment
.
Processes in the MPI application inherit their environment from the
Open RTE daemon upon the node on which they are running.  The environment
is typically inherited from the user's shell.  On remote nodes, the exact
environment is determined by the boot MCA module used.  The rsh boot module,
for example, uses either rsh/ssh to launch the Open RTE daemon on remote nodes, and
typically executes one or more of the user's shell-setup files before launching
the Open RTE daemon.  When running dynamically linked applications which
require the LD_LIBRARY_PATH environment variable to be set, care must be taken
to ensure that it is correctly set when booting Open MPI.
.
.
.
.SS Exported Environment Variables
.
All environment variables that are named in the form OMPI_* will automatically
be exported to new processes on the local and remote nodes.
The \fI\-x\fP option to \fImpirun\fP can be used to export specific environment
variables to the new processes.  While the syntax of the \fI\-x\fP
option allows the definition of new variables, note that the parser
for this option is currently not very sophisticated - it does not even
understand quoted values.  Users are advised to set variables in the
environment and use \fI\-x\fP to export them; not to define them.
.
.
.
.SS MCA (Modular Component Architecture)
The \fI-mca\fP switch allows the passing of parameters to various MCA modules.
.\" Open MPI's MCA modules are described in detail in ompimca(7).
MCA modules have direct impact on MPI programs because they allow tunable
parameters to be set at run time (such as which BTL communication device driver
to use, what parameters to pass to that BTL, etc.).
.PP
The \fI-mca\fP switch takes two arguments: \fI<key>\fP and \fI<value>\fP.
The \fI<key>\fP argument generally specifies which MCA module will receive the value.
For example, the \fI<key>\fP "btl" is used to select which BTL to be used for
transporting MPI messages.  The \fI<value>\fP argument is the value that is
passed.
For example:
.
.TP 4
mpirun -mca btl tcp,self -np 1 foo
Tells Open MPI to use the "tcp" and "self" BTLs, and to run a single copy of
"foo" an allocated node.
.
.TP
mpirun -mca btl self -np 1 foo
Tells Open MPI to use the "self" BTL, and to run a single copy of "foo" an
allocated node.
.\" And so on.  Open MPI's BTL MCA modules are described in ompimca_btl(7).
.PP
The \fI-mca\fP switch can be used multiple times to specify different
\fI<key>\fP and/or \fI<value>\fP arguments.  If the same \fI<key>\fP is
specified more than once, the \fI<value>\fPs are concatenated with a comma
(",") separating them.
.PP
.B Note:
The \fI-mca\fP switch is simply a shortcut for setting environment variables.
The same effect may be accomplished by setting corresponding environment
variables before running \fImpirun\fP.
The form of the environment variables that Open MPI sets are:

      OMPI_<key>=<value>
.PP
Note that the \fI-mca\fP switch overrides any previously set environment
variables.  Also note that unknown \fI<key>\fP arguments are still set as
environment variable -- they are not checked (by \fImpirun\fP) for correctness.
Illegal or incorrect \fI<value>\fP arguments may or may not be reported -- it
depends on the specific MCA module.
.
.\" **************************
.\"    Examples Section
.\" **************************
.SH EXAMPLES
Be sure to also see the examples in the "Location Nomenclature" section, above.
.
.TP 4
mpirun -np 1 prog1
Load and execute prog1 on one node.  Search the user's $PATH for the
executable file on each node.
.
.
.TP
mpirun -np 8 --byslot prog1
Run 8 copies of prog1 wherever Open MPI wants to run them.
.
.
.TP
mpirun -np 4 -mca btl ib,tcp,self prog1
Run 4 copies of prog1 using the "ib", "tcp", and "self" BTL's for the transport
of MPI messages.
.
.\" **************************
.\"    Diagnostics Section
.\" **************************
.
.\" .SH DIAGNOSTICS
.\".TP 4
.\"Error Msg:
.\"Description
.
.\" **************************
.\"    Return Value Section
.\" **************************
.
.SH RETURN VALUE
.
\fImpirun\fP returns 0 if all ranks started by \fImpirun\fP exit after calling
MPI_FINALIZE.  A non-zero value is returned if an internal error occurred in
mpirun, or one or more ranks exited before calling MPI_FINALIZE.  If an
internal error occurred in mpirun, the corresponding error code is returned.
In the event that one or more ranks exit before calling MPI_FINALIZE, the
return value of the rank of the process that \fImpirun\fP first notices died
before calling MPI_FINALIZE will be returned.  Note that, in general, this will
be the first rank that died but is not guaranteed to be so.
.PP
However, note that if the \fI-nw\fP switch is used, the return value from
mpirun does not indicate the exit status of the ranks.
.
.\" **************************
.\"    See Also Section
.\" **************************
.
.\" .SH SEE ALSO
.\" orted(1)