openmpi/orte/tools/orterun/orterun.1

.\"
.\" Man page for ORTE's orterun process
.\"
.\" .TH name     section center-footer   left-footer  center-header
.TH     ORTERUN  1       "February 2006" "Open MPI"   "OPEN MPI COMMANDS"
.\" **************************
.\"    Name Section
.\" **************************
.SH NAME
.
orterun, mpirun, mpiexec \- Execute serial and parallel jobs in Open MPI.

.B Note:
.IR mpirun , 
.IR mpiexec ,
and
.I orterun
are all exact synonyms for each other.  Using any of the names will
result in exactly identical behavior.
.
.\" **************************
.\"    Synopsis Section
.\" **************************
.SH SYNOPSIS
.
.B mpirun 
.R [ options ] 
.B <program>
.R [ <args> ]
.
.\" **************************
.\"    Quick Summary Section
.\" **************************
.SH QUICK SUMMARY
If you are simply looking for how to run an MPI application, you
probably want to use the following command line:

    \fBshell$\fP mpirun -np 4 my_mpi_application

This will run 4 copies of \fImy_mpi_application\fR in your current run-time
environment (if running under a supported resource manager, Open MPI's
\fIorterun\fR will usually automatically use the corresponding resource manager
process starter, as opposed to, for example, \fIrsh\fR or \fIssh\fR ),
scheduling (by default) in a round-robin fashion by CPU slot.  See the
rest of this page for more details.
.
.\" **************************
.\"    Options Section
.\" **************************
.SH OPTIONS
.
.I mpirun
will send the name of the directory where it was invoked on the local
node to each of the remote nodes, and attempt to change to that
directory.  See the "Current Working Directory" section, below.
.\"
.\" Start options listing
.\"    Indent 10 chacters from start of first column to start of second column
.TP 10
.B -aborted \fR<#>\fP
Set the maximum number of aborted processes to display.
.
.
.TP
.B --app \fR<appfile>\fP
Provide an appfile, ignoring all other command line options.
.
.
.TP
.B -bynode
Allocate (map) the processes by node in a round-robin scheme.
.
.
.TP
.B -byslot
Allocate (map) the processes by slot in a round-robin scheme. This is the
default.
.
.
.TP
.B -c \fR<#>\fP
Synonym for \fI-np\fP (see below).
.
.
.TP
.B -d, --debug-devel
Enable debugging og OpenRTE
.
.
.TP
.B --debug
Invoke the user-level debugger indicated by the \fIorte_base_user_debugger\fP
MCA parameter.
.
.
.TP
.B --debug-daemons
Enable debugging of any OpenRTE daemons used by this application.
.
.
.TP
.B --debug-daemons-file
Enable debugging of any OpenRTE daemons used by this application, storing
output in files.
.
.
.TP
.B --debugger
Sequence of debuggers to search for when \fI--debug\fP is used.
.
.
.TP
.B -h, --help
Display help for this command
.
.
.TP
.B -H \fR<host1,host2,...,hostN>\fP
Synonym for \fI-host\fP (see below).
.
.
.TP
.B -host \fR<host1,host2,...,hostN>\fP
List of hosts on which to invoke processes.
.
.
.TP
.B -hostfile \fR<hostfile>\fP
Provide a hostfile to use. 
JJH - Should have man page for how to format a hostfile properly.
.
.
.TP
.B -machinefile \fR<machinefile>\fP
Synonym for \fI-hostfile\fP (see above).
.
.
.TP
.B -mca <key> <value>
Send arguments to various MCA modules.  See the "MCA" section, below.
.
.
.TP
.B -n \fR<#>\fP
Synonym for \fI-np\fP (see below).
.
.
.TP
.B --no-daemonize
Do not detach OpenRTE daemons used by this application.
.
.
.TP
.B -np \fR<#>\fP
Run this many copies of the program on the given nodes.  This option
indicates that the specified file is an executable program and not an
application schema.
.
.
.TP
.B -nw
Launch the processes and do not wair for their completion. orterun will
complete as soon as successful launch occurs.
.
.
.TP
.B -path \fR<path>\fP
PATH to be used to look for executables to start processes.
.
.
.TP
.B --tmpdir \fR<dir>\fP
Set the root for the session directory tree for orterun only.
.
.
.TP
.B -tv
Launch processes under the TotalView Debugger.
Deprecated backwards compatibility flag. Synonym for \fI--debug\fP.
.
.
.TP
.B --universe \fR<username@hostname:universe_name>\fP
For this application, set the universe name as:
     username@hostname:universe_name
.
.
.TP
.B -v, --verbose
Be verbose
.
.
.TP
.B -wd \fR<dir>\fP
Change to the directory <dir> before the user's program executes.
Note that if the \fI-wd\fP option appears both on the command line and in an
application schema, the schema will take precendence over the command line.
.
.
.TP
.B -x \fR<env>\fP
Export the specified environment variables to the remote nodes before
executing the program.  Existing environment variables can be
specified (see the Examples section, below), or new variable names
specified with corresponding values.  The parser for the \fI-x\fP
option is not very sophisticated; it does not even understand quoted
values.  Users are advised to set variables in the environment, and
then use \fI-x\fP to export (not define) them.
.
.
.TP
.B <args>
Pass these runtime arguments to every new process.  These must always
be the last arguments to \fImpirun\fP This option is not valid on the command
line if an application schema is specified.
.
.\" **************************
.\"    Description Section
.\" **************************
.SH DESCRIPTION
.
One invocation of \fImpirun\fP starts an MPI application running under Open
MPI. If the application is simply SPMD, the application can be specified on the
\fImpirun\fP command line.

If the application is MIMD, comprising multiple programs, an application
schema is required in a separate file.
See appschema(5) for a description of the application schema syntax.
It essentially contains multiple \fImpirun\fP command lines, less the command
name itself.  The ability to specify different options for different
instantiations of a program is another reason to use an application schema.
.
.
.
.SS Location Nomenclature
.
As described above, \fImpirun\fP can specify arbitrary locations in the current
Open MPI universe.
Locations can be specified either by CPU or by node.

.B Note:
Open MPI does not bind processes to CPUs -- specifying a location "by CPU" is
really a convenience mechanism for SMPs that ultimately maps down to a specific
node.
.PP
Specifying locations by node will launch one copy of an executable per
specified node.
Using the \fI--bynode\fP option tells Open MPI to use all available nodes.
Using the \fI--byslot\fP option tells Open MPI to use all slots on an available
node before allocating resources on the next available node.
For example:
.
.TP 4
mpirun --bynode -np 4 a.out
Runs one copy of the the executable
.I a.out
on all available nodes in the Open MPI universe.  MPI_COMM_WORLD rank 0
will be on node0, rank 1 will be on node1, etc. Regardless of how many slots
are available on each of the nodes.
.
.
.TP
mpirun --byslot -np 4 a.out
Runs one copy of the the executable
.I a.out
on each slot on a given node before running the executable on other available
nodes.
.
.
.
.SS Application Schema or Executable Program?
.
To distinguish the two different forms, \fImpirun\fP
looks on the command line for \fI--app\fP option.  If
it is specified, then the file named on the command line is
assumed to be an application schema.  If it is not
specified, then the file is assumed to be an executable program.
.
.
.
.SS Locating Files
.
Open MPI looks for an executable program by searching the directories in
the user's PATH environment variable as defined on the source node(s).
This behavior is consistent with logging into the source node and
executing the program from the shell.  On remote nodes, the "." path
is the home directory.
.PP
Open MPI looks for an application schema in three directories the local
directory.
.
.
.
.SS Standard I/O
.
Open MPI directs UNIX standard input to /dev/null on all remote nodes.  On
the local node that invoked \fImpirun\fP, standard input is inherited from
\fImpirun\fP.
.PP
Open MPI directs UNIX standard output and error to the Open RTE daemon on all
remote nodes. Open MPI ships all captured output/error to the node that
invoked \fImpirun\fP and prints it on the standard output/error of \fImpirun\fP
Local processes inherit the standard output/error of \fImpirun\fP and transfer
to it directly.
.PP
Thus it is possible to redirect standard I/O for Open MPI applications by
using the typical shell redirection procedure on \fImpirun\fP.

      \fBshell$\fP mpirun -np 2 my_app < my_input > my_output

Note that in this example \fIonly\fP the local node (i.e., the node where
mpirun was invoked from) will receive the stream from \fImy_input\fP on stdin.  The
stdin on all the other nodes will be tied to /dev/null.  However, the stdout
from all nodes will be collected into the \fImy_output\fP file.
.
.
.
.SS Process Termination / Signal Handling
.
During the run of an MPI application, if any rank dies abnormally
(either exiting before invoking \fIMPI_FINALIZE\fP, or dying as the result of a
signal), \fImpirun\fP will print out an error message and kill the rest of the
MPI application.
.PP
By default, Open MPI only installs a signal handler for one signal in
user programs (SIGUSR2).  Therefore, it is safe for users to install
their own signal handlers in Open MPI programs 
.PP
User signal handlers should probably avoid trying to cleanup MPI state
(Open MPI is, currently, neither thread-safe nor async-signal-safe).
For example, if a seg fault occurs in \fIMPI_SEND\fP (perhaps because a bad
buffer was passed in) and a user signal handler is invoked, if this user
handler attempts to invoke \fIMPI_FINALIZE\fP, Bad Things could happen since
Open MPI was already "in" MPI when the error occurred.  Since \fImpirun\fP
will notice that the process died due to a signal, it is probably not
necessary (and safest) for the user to only clean up non-MPI state.
.
.
.
.SS Current Working Directory
.
The default behavior of mpirun has changed with respect to the
directory that processes will be started in.
.PP
The \fI\-wd\fP option to mpirun allows the user to change to an arbitrary
directory before their program is invoked.  It can also be used in application
schema files to specify working directories on specific nodes and/or
for specific applications.
.PP
If the \fI\-wd\fP option appears both in a schema file and on the command line,
the schema file directory will override the command line value.
.PP
The \fI\-D\fP option will change the current working directory to the directory
where the executable resides.  It cannot be used in application schema files.
.PP
If \fI\-wd\fP is not specified, the local node will send the directory name
where mpirun was invoked from to each of the remote nodes.  The remote nodes
will then try to change to that directory.  If they fail (e.g., if the
directory does not exists on that node), they will start with from the
user's home directory.
.PP
All directory changing occurs before the user's program is invoked; it
does not wait until \fIMPI_INIT\fP is called.  
.
.
.
.SS Process Environment
.
Processes in the MPI application inherit their environment from the
Open RTE daemon upon the node on which they are running.  The environment
is typically inherited from the user's shell.  On remote nodes, the exact
environment is determined by the boot MCA module used.  The rsh boot module,
for example, uses either rsh/ssh to launch the LAM daemon on remote nodes, and
typically executes one or more of the user's shell-setup files before launching
the Open RTE daemon.  When running dynamically linked applications which
require the LD_LIBRARY_PATH environment variable to be set, care must be taken
to ensure that it is correctly set when booting Open MPI.
.
.
.
.SS Exported Environment Variables
.
All environment variables that are named in the form OMPI_* will automatically
be exported to new processes on the local and remote nodes.
The \fI\-x\fP option to \fImpirun\fP can be used to export specific environment
variables to the new processes.  While the syntax of the \fI\-x\fP
option allows the definition of new variables, note that the parser
for this option is currently not very sophisticated - it does not even
understand quoted values.  Users are advised to set variables in the
environment and use \fI\-x\fP to export them; not to define them.
.
.
.
.SS MCA (Modular Component Architecture)
The
.I -mca
switch allows the passing of parameters to various MCA modules.
.\" Open MPI's MCA modules are described in detail in ompimca(7).
MCA modules have direct impact on MPI programs because they allow tunable
parameters to be set at run time (such as which BTL communication device driver
to use, what parameters to pass to that BTL, etc.).
.PP
The \fI-mca\fP switch takes two arguments: \fI<key\fP and \fI<value>\fP.
The \fI<key>\fP argument generally specifies which MCA module will receive the value.
For example, the \fI<key>\fP "btl" is used to select which BTL to be used for
transporting MPI messages.  The \fI<value>\fP argument is the value that is
passed.
For example: 
.
.TP 4
mpirun -mca btl tcp,self -np 1 foo
Tells Open MPI to use the "tcp" and "self" BTLs, and to run a single copy of
"foo" an allocated node.
.
.TP
mpirun -mca btl self -np 1 foo
Tells Open MPI to use the "self" BTL, and to run a single copy of
"foo" an allocated node.
.\" And so on.  Open MPI's BTL MCA modules are described in lamssi_rpi(7).
.PP
The \fI-mca\fP switch can be used multiple times to specify different
\fI<key>\fP and/or \fI<value>\fP arguments.  If the same \fI<key>\fP is
specified more than once, the \fI<value>\fPs are concatenated with a comma
(",") separating them.
.PP
.B Note:
The \fI-mca\fP switch is simply a shortcut for setting environment variables.
The same effect may be accomplished by setting corresponding environment
variables before running \fImpirun\fP.
The form of the environment variables that Open MPI sets are:

      OMPI_<key>=<value>
.PP
Note that the \fI-mca\fP switch overrides any previously set environment
variables.  Also note that unknown \fI<key>\fP arguments are still set as
environment variable -- they are not checked (by \fImpirun\fP) for correctness.
Illegal or incorrect \fI<value>\fP arguments may or may not be reported -- it
depends on the specific MCA module.
.
.\" **************************
.\"    Examples Section
.\" **************************
.SH EXAMPLES
Be sure to also see the examples in the "Location Nomenclature" section, above.
.
.TP 4
mpirun -np 1 prog1
Load and execute prog1 on one node.  Search the user's $PATH for the
executable file on each node.
.
.
.TP
mpirun -np 8 --byslot prog1
Run 8 copies of prog1 wherever Open MPI wants to run them.
.
.
.TP
mpirun -np 4 -mca btl ib,tcp,self prog1
Run 4 copies of prog1 using the "ib", "tcp", and "self" BTL's for the transport
of MPI messages.
.
.\" **************************
.\"    Diagnostics Section
.\" **************************
.
.\" .SH DIAGNOSTICS
.\".TP 4
.\"Error Msg:
.\"Description
.
.\" **************************
.\"    Return Value Section
.\" **************************
.
.SH RETURN VALUE
.
\fImpirun\fP returns 0 if all ranks started by \fImpirun\fP exit after calling
MPI_FINALIZE.  A non-zero value is returned if an internal error occurred in
mpirun, or one or more ranks exited before calling MPI_FINALIZE.  If an
internal error occurred in mpirun, the corresponding error code is returned.
In the event that one or more ranks exit before calling MPI_FINALIZE, the
return value of the rank of the process that \fImpirun\fP first notices died
before calling MPI_FINALIZE will be returned.  Note that, in general, this will
be the first rank that died but is not guaranteed to be so.
.PP
However, note that if the \fI-nw\fP switch is used, the return value from
mpirun does not indicate the exit status of the ranks.
.
.\" **************************
.\"    See Also Section
.\" **************************
.
.SH SEE ALSO
orted(1)