Removed all of the LAM stuff.
This needs to be gone over a few more times before it is allowed to see daylight, but has come a long way. Some sections may be off more than a little, but the general idea is there. Need to audit to make sure we don't call the ORTE VHNP's daemons :) This commit was SVN r9078.
Этот коммит содержится в:
родитель
2938545220
Коммит
02c999776b
@ -218,665 +218,277 @@ line if an application schema is specified.
|
|||||||
.\" Description Section
|
.\" Description Section
|
||||||
.\" **************************
|
.\" **************************
|
||||||
.SH DESCRIPTION
|
.SH DESCRIPTION
|
||||||
One invocation of
|
.
|
||||||
.I mpirun
|
One invocation of \fImpirun\fP starts an MPI application running under Open
|
||||||
starts an MPI application running under LAM.
|
MPI. If the application is simply SPMD, the application can be specified on the
|
||||||
If the application is simply SPMD, the application can be specified on the
|
\fImpirun\fP command line.
|
||||||
.I mpirun
|
|
||||||
command line.
|
|
||||||
If the application is MIMD, comprising multiple programs, an application
|
If the application is MIMD, comprising multiple programs, an application
|
||||||
schema is required in a separate file.
|
schema is required in a separate file.
|
||||||
See appschema(5) for a description of the application schema syntax,
|
See appschema(5) for a description of the application schema syntax.
|
||||||
but it essentially contains multiple
|
It essentially contains multiple \fImpirun\fP command lines, less the command
|
||||||
.I mpirun
|
name itself. The ability to specify different options for different
|
||||||
command lines, less the command name itself. The ability to specify
|
instantiations of a program is another reason to use an application schema.
|
||||||
different options for different instantiations of a program is another
|
|
||||||
reason to use an application schema.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS Location Nomenclature
|
.SS Location Nomenclature
|
||||||
As described above,
|
.
|
||||||
.I mpirun
|
As described above, \fImpirun\fP can specify arbitrary locations in the current
|
||||||
can specify arbitrary locations in the current LAM universe.
|
Open MPI universe.
|
||||||
Locations can be specified either by CPU or by node (noted by the
|
Locations can be specified either by CPU or by node.
|
||||||
"<where>" in the SYNTAX section, above). Note that LAM does not bind
|
|
||||||
processes to CPUs -- specifying a location "by CPU" is really a
|
.B Note:
|
||||||
convenience mechanism for SMPs that ultimately maps down to a specific
|
Open MPI does not bind processes to CPUs -- specifying a location "by CPU" is
|
||||||
|
really a convenience mechanism for SMPs that ultimately maps down to a specific
|
||||||
node.
|
node.
|
||||||
.PP
|
.PP
|
||||||
Note that LAM effectively numbers MPI_COMM_WORLD ranks from
|
|
||||||
left-to-right in the <where>, regardless of which nomenclature is
|
|
||||||
used. This can be important because typical MPI programs tend to
|
|
||||||
communicate more with their immediate neighbors (i.e., myrank +/- X)
|
|
||||||
than distant neighbors. When neighbors end up on the same node, the
|
|
||||||
shmem RPIs can be used for communication rather than the network RPIs,
|
|
||||||
which can result in faster MPI performance.
|
|
||||||
.PP
|
|
||||||
Specifying locations by node will launch one copy of an executable per
|
Specifying locations by node will launch one copy of an executable per
|
||||||
specified node. Using a capitol "N" tells LAM to use all available
|
specified node.
|
||||||
nodes that were lambooted (see lamboot(1)). Ranges of specific nodes
|
Using the \fI--bynode\fP option tells Open MPI to use all available nodes.
|
||||||
can also be specified in the form "nR[,R]*", where R specifies either
|
Using the \fI--byslot\fP option tells Open MPI to use all slots on an available
|
||||||
a single node number or a valid range of node numbers in the range of
|
node before allocating resources on the next available node.
|
||||||
[0, num_nodes). For example:
|
For example:
|
||||||
|
.
|
||||||
.TP 4
|
.TP 4
|
||||||
mpirun N a.out
|
mpirun --bynode -np 4 a.out
|
||||||
Runs one copy of the the executable
|
Runs one copy of the the executable
|
||||||
.I a.out
|
.I a.out
|
||||||
on all available nodes in the LAM universe. MPI_COMM_WORLD rank 0
|
on all available nodes in the Open MPI universe. MPI_COMM_WORLD rank 0
|
||||||
will be on n0, rank 1 will be on n1, etc.
|
will be on node0, rank 1 will be on node1, etc. Regardless of how many slots
|
||||||
|
are available on each of the nodes.
|
||||||
|
.
|
||||||
|
.
|
||||||
.TP
|
.TP
|
||||||
mpirun n0-3 a.out
|
mpirun --byslot -np 4 a.out
|
||||||
Runs one copy of the the executable
|
Runs one copy of the the executable
|
||||||
.I a.out
|
.I a.out
|
||||||
on nodes 0 through 3. MPI_COMM_WORLD rank 0 will be on n0, rank 1
|
on each slot on a given node before running the executable on other available
|
||||||
will be on n1, etc.
|
nodes.
|
||||||
.TP
|
|
||||||
mpirun n0-3,8-11,15 a.out
|
|
||||||
Runs one copy of the the executable
|
|
||||||
.I a.out
|
|
||||||
on nodes 0 through 3, 8 through 11, and 15. MPI_COMM_WORLD ranks will
|
|
||||||
be ordered as follows: (0, n0), (1, n1), (2, n2), (3, n3), (4, n8),
|
|
||||||
(5, n9), (6, n10), (7, n11), (8, n15).
|
|
||||||
.PP
|
|
||||||
Specifying by CPU is the preferred method of launching MPI jobs. The
|
|
||||||
intent is that the boot schema used with lamboot(1) will indicate how
|
|
||||||
many CPUs are available on each node, and then a single, simple
|
|
||||||
.I mpirun
|
|
||||||
command can be used to launch across all of them. As noted above,
|
|
||||||
specifying CPUs does not actually bind processes to CPUs -- it is only
|
|
||||||
a convenience mechanism for launching on SMPs. Otherwise, the by-CPU
|
|
||||||
notation is the same as the by-node notation, except that "C" and "c"
|
|
||||||
are used instead of "N" and "n".
|
|
||||||
.PP
|
|
||||||
Assume in the following example that the LAM universe consists of four
|
|
||||||
4-way SMPs. So c0-3 are on n0, c4-7 are on n1, c8-11 are on n2, and
|
|
||||||
13-15 are on n3.
|
|
||||||
.TP 4
|
|
||||||
mpirun C a.out
|
|
||||||
Runs one copy of the the executable
|
|
||||||
.I a.out
|
|
||||||
on all available CPUs in the LAM universe. This is typically the
|
|
||||||
simplest (and preferred) method of launching all MPI jobs (even if it
|
|
||||||
resolves to one process per node). MPI_COMM_WORLD ranks 0-3 will be
|
|
||||||
on n0, ranks 4-7 will be on n1, ranks 8-11 will be on n2, and ranks
|
|
||||||
13-15 will be on n3.
|
|
||||||
.TP
|
|
||||||
mpirun c0-3 a.out
|
|
||||||
Runs one copy of the the executable
|
|
||||||
.I a.out
|
|
||||||
on CPUs 0 through 3. All four ranks of MPI_COMM_WORLD will be on
|
|
||||||
MPI_COMM_WORLD.
|
|
||||||
.TP
|
|
||||||
mpirun c0-3,8-11,15 a.out
|
|
||||||
Runs one copy of the the executable
|
|
||||||
.I a.out
|
|
||||||
on CPUs 0 through 3, 8 through 11, and 15. MPI_COMM_WORLD ranks 0-3
|
|
||||||
will be on n0, 4-7 will be on n2, and 8 will be on n3.
|
|
||||||
.PP
|
|
||||||
The reason that the by-CPU nomenclature is preferred over the by-node
|
|
||||||
nomenclature is best shown through example. Consider trying to run
|
|
||||||
the first CPU example (with the same MPI_COMM_WORLD mapping) with the
|
|
||||||
by-node nomenclature -- run one copy of
|
|
||||||
.I a.out
|
|
||||||
for every available CPU, and maximize the number of local neighbors to
|
|
||||||
potentially maximize MPI performance. One solution would be to use
|
|
||||||
the following command:
|
|
||||||
.TP 4
|
|
||||||
mpirun n0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3 a.out
|
|
||||||
.PP
|
|
||||||
This
|
|
||||||
.IR works ,
|
|
||||||
but is definitely klunky to type. It is typically easier to use the
|
|
||||||
by-CPU notation. One might think that the following is equivalent:
|
|
||||||
.TP 4
|
|
||||||
mpirun N -np 16 a.out
|
|
||||||
.PP
|
|
||||||
This is
|
|
||||||
.I not
|
|
||||||
equivalent because the MPI_COMM_WORLD rank mappings will be assigned
|
|
||||||
by node rather than by CPU. Hence rank 0 will be on n0, rank 1 will
|
|
||||||
be on n1, etc. Note that the following, however,
|
|
||||||
.I is
|
|
||||||
equivalent, because LAM interprets lack of a <where> as "C":
|
|
||||||
.TP 4
|
|
||||||
mpirun -np 16 a.out
|
|
||||||
.PP
|
|
||||||
However, a "C" can tend to be more convenient, especially for
|
|
||||||
batch-queuing scripts because the exact number of processes may vary
|
|
||||||
between queue submissions. Since the batch system will determine the
|
|
||||||
final number of CPUs available, having a generic script that
|
|
||||||
effectively says "run on everything you gave me" may lead to more
|
|
||||||
portable / re-usable scripts.
|
|
||||||
.PP
|
|
||||||
Finally, it should be noted that specifying multiple <where> clauses
|
|
||||||
are perfectly acceptable. As such, mixing of the by-node and by-CPU
|
|
||||||
syntax is also valid, albiet typically not useful. For example:
|
|
||||||
.TP 4
|
|
||||||
mpirun C N a.out
|
|
||||||
.PP
|
|
||||||
However, in some cases, specifying multiple <where> clauses can be
|
|
||||||
useful. Consider a parallel application where MPI_COMM_WORLD rank 0
|
|
||||||
will be a "manager" and therefore consume very few CPU cycles because
|
|
||||||
it is usually waiting for "worker" processes to return results.
|
|
||||||
Hence, it is probably desirable to run one "worker" process on all
|
|
||||||
available CPUs, and run one extra process that will be the "manager":
|
|
||||||
.TP 4
|
|
||||||
mpirun c0 C manager-worker-program
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS Application Schema or Executable Program?
|
.SS Application Schema or Executable Program?
|
||||||
To distinguish the two different forms,
|
|
||||||
.I mpirun
|
|
||||||
looks on the command line for <where> or the \fI-c\fR option. If
|
|
||||||
neither is specified, then the file named on the command line is
|
|
||||||
assumed to be an application schema. If either one or both are
|
|
||||||
specified, then the file is assumed to be an executable program. If
|
|
||||||
<where> and \fI-c\fR both are specified, then copies of the program
|
|
||||||
are started on the specified nodes/CPUs according to an internal LAM
|
|
||||||
scheduling policy. Specifying just one node effectively forces LAM to
|
|
||||||
run all copies of the program in one place. If \fI-c\fR is given, but
|
|
||||||
not <where>, then all available CPUs on all LAM nodes are used. If
|
|
||||||
<where> is given, but not \fI-c\fR, then one copy of the program is
|
|
||||||
run on each node.
|
|
||||||
.PP
|
|
||||||
.
|
.
|
||||||
.
|
To distinguish the two different forms, \fImpirun\fP
|
||||||
.
|
looks on the command line for \fI--app\fP option. If
|
||||||
.SS Program Transfer
|
it is specified, then the file named on the command line is
|
||||||
By default, LAM searches for executable programs on the target node
|
assumed to be an application schema. If it is not
|
||||||
where a particular instantiation will run. If the file system is not
|
specified, then the file is assumed to be an executable program.
|
||||||
shared, the target nodes are homogeneous, and the program is
|
|
||||||
frequently recompiled, it can be convenient to have LAM transfer the
|
|
||||||
program from a source node (usually the local node) to each target
|
|
||||||
node. The \fI-s\fR option specifies this behavior and identifies the
|
|
||||||
single source node.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS Locating Files
|
.SS Locating Files
|
||||||
LAM looks for an executable program by searching the directories in
|
.
|
||||||
|
Open MPI looks for an executable program by searching the directories in
|
||||||
the user's PATH environment variable as defined on the source node(s).
|
the user's PATH environment variable as defined on the source node(s).
|
||||||
This behavior is consistent with logging into the source node and
|
This behavior is consistent with logging into the source node and
|
||||||
executing the program from the shell. On remote nodes, the "." path
|
executing the program from the shell. On remote nodes, the "." path
|
||||||
is the home directory.
|
is the home directory.
|
||||||
.PP
|
.PP
|
||||||
LAM looks for an application schema in three directories: the local
|
Open MPI looks for an application schema in three directories the local
|
||||||
directory, the value of the LAMAPPLDIR environment variable, and
|
directory.
|
||||||
laminstalldir/boot, where "laminstalldir" is the directory where
|
|
||||||
LAM/MPI was installed.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS Standard I/O
|
.SS Standard I/O
|
||||||
LAM directs UNIX standard input to /dev/null on all remote nodes. On
|
|
||||||
the local node that invoked
|
|
||||||
.IR mpirun ,
|
|
||||||
standard input is inherited from
|
|
||||||
.IR mpirun .
|
|
||||||
The default is what used to be the -w option to prevent conflicting
|
|
||||||
access to the terminal.
|
|
||||||
.PP
|
|
||||||
LAM directs UNIX standard output and error to the LAM daemon on all
|
|
||||||
remote nodes. LAM ships all captured output/error to the node that
|
|
||||||
invoked
|
|
||||||
.I mpirun
|
|
||||||
and prints it on the standard output/error of
|
|
||||||
.IR mpirun .
|
|
||||||
Local processes inherit the standard output/error of
|
|
||||||
.I mpirun
|
|
||||||
and transfer to it directly.
|
|
||||||
.PP
|
|
||||||
Thus it is possible to redirect standard I/O for LAM applications by
|
|
||||||
using the typical shell redirection procedure on
|
|
||||||
.IR mpirun .
|
|
||||||
.sp
|
|
||||||
.RS
|
|
||||||
% mpirun C my_app < my_input > my_output
|
|
||||||
.RE
|
|
||||||
.PP
|
|
||||||
Note that in this example
|
|
||||||
.I only
|
|
||||||
the local node (i.e., the node where mpirun was invoked from) will
|
|
||||||
receive the stream from my_input on stdin. The stdin on all the other
|
|
||||||
nodes will be tied to /dev/null. However, the stdout from all nodes
|
|
||||||
will be collected into the my_output file.
|
|
||||||
.PP
|
|
||||||
The
|
|
||||||
.I \-f
|
|
||||||
option avoids all the setup required to support standard I/O described
|
|
||||||
above. Remote processes are completely directed to /dev/null and
|
|
||||||
local processes inherit file descriptors from lamboot(1).
|
|
||||||
.
|
.
|
||||||
.
|
Open MPI directs UNIX standard input to /dev/null on all remote nodes. On
|
||||||
.
|
the local node that invoked \fImpirun\fP, standard input is inherited from
|
||||||
.SS Pseudo-tty support
|
\fImpirun\fP.
|
||||||
The
|
|
||||||
.I \-pty
|
|
||||||
option enabled pseudo-tty support for process output (it is also
|
|
||||||
enabled by default). This allows, among other things, for line
|
|
||||||
buffered output from remote nodes (which is probably what you want).
|
|
||||||
This option can be disabled with the
|
|
||||||
.I \-npty
|
|
||||||
switch.
|
|
||||||
.PP
|
.PP
|
||||||
|
Open MPI directs UNIX standard output and error to the Open RTE daemon on all
|
||||||
|
remote nodes. Open MPI ships all captured output/error to the node that
|
||||||
|
invoked \fImpirun\fP and prints it on the standard output/error of \fImpirun\fP
|
||||||
|
Local processes inherit the standard output/error of \fImpirun\fP and transfer
|
||||||
|
to it directly.
|
||||||
|
.PP
|
||||||
|
Thus it is possible to redirect standard I/O for Open MPI applications by
|
||||||
|
using the typical shell redirection procedure on \fImpirun\fP.
|
||||||
|
|
||||||
|
\fBshell$\fP mpirun -np 2 my_app < my_input > my_output
|
||||||
|
|
||||||
|
Note that in this example \fIonly\fP the local node (i.e., the node where
|
||||||
|
mpirun was invoked from) will receive the stream from \fImy_input\fP on stdin. The
|
||||||
|
stdin on all the other nodes will be tied to /dev/null. However, the stdout
|
||||||
|
from all nodes will be collected into the \fImy_output\fP file.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS Process Termination / Signal Handling
|
.SS Process Termination / Signal Handling
|
||||||
|
.
|
||||||
During the run of an MPI application, if any rank dies abnormally
|
During the run of an MPI application, if any rank dies abnormally
|
||||||
(either exiting before invoking
|
(either exiting before invoking \fIMPI_FINALIZE\fP, or dying as the result of a
|
||||||
.IR MPI_FINALIZE ,
|
signal), \fImpirun\fP will print out an error message and kill the rest of the
|
||||||
or dying as the result of a signal),
|
MPI application.
|
||||||
.I mpirun
|
|
||||||
will print out an error message and kill the rest of the MPI
|
|
||||||
application.
|
|
||||||
.PP
|
.PP
|
||||||
By default, LAM/MPI only installs a signal handler for one signal in
|
By default, Open MPI only installs a signal handler for one signal in
|
||||||
user programs (SIGUSR2 by default, but this can be overridden when LAM
|
user programs (SIGUSR2). Therefore, it is safe for users to install
|
||||||
is configured and built). Therefore, it is safe for users to install
|
their own signal handlers in Open MPI programs
|
||||||
their own signal handlers in LAM/MPI programs (LAM notices
|
|
||||||
death-by-signal cases by examining the process' return status provided
|
|
||||||
by the operating system).
|
|
||||||
.PP
|
.PP
|
||||||
User signal handlers should probably avoid trying to cleanup MPI state
|
User signal handlers should probably avoid trying to cleanup MPI state
|
||||||
-- LAM is neither thread-safe nor async-signal-safe. For example, if
|
(Open MPI is, currently, neither thread-safe nor async-signal-safe).
|
||||||
a seg fault occurs in
|
For example, if a seg fault occurs in \fIMPI_SEND\fP (perhaps because a bad
|
||||||
.I MPI_SEND
|
buffer was passed in) and a user signal handler is invoked, if this user
|
||||||
(perhaps because a bad buffer was passed in) and a user signal handler
|
handler attempts to invoke \fIMPI_FINALIZE\fP, Bad Things could happen since
|
||||||
is invoked, if this user handler attempts to invoke
|
Open MPI was already "in" MPI when the error occurred. Since \fImpirun\fP
|
||||||
.IR MPI_FINALIZE ,
|
|
||||||
Bad Things could happen since LAM/MPI was already "in" MPI when the
|
|
||||||
error occurred. Since
|
|
||||||
.I mpirun
|
|
||||||
will notice that the process died due to a signal, it is probably not
|
will notice that the process died due to a signal, it is probably not
|
||||||
necessary (and safest) for the user to only clean up non-MPI state.
|
necessary (and safest) for the user to only clean up non-MPI state.
|
||||||
.PP
|
|
||||||
If the
|
|
||||||
.I -sigs
|
|
||||||
option is used with
|
|
||||||
.IR mpirun ,
|
|
||||||
LAM/MPI will install several signal handlers to locally on each rank
|
|
||||||
to catch signals, print out error messages, and kill the rest of the
|
|
||||||
MPI application. This is somewhat redundant behavior since this is
|
|
||||||
now all handled by
|
|
||||||
.IR mpirun ,
|
|
||||||
but it has been left for backwards compatability.
|
|
||||||
.
|
|
||||||
.
|
|
||||||
.
|
|
||||||
.SS Process Exit Statuses
|
|
||||||
The
|
|
||||||
.IR -sa ,
|
|
||||||
\
|
|
||||||
.IR -sf ,
|
|
||||||
and
|
|
||||||
.I -p
|
|
||||||
parameters can be used to display the exist statuses of the individual
|
|
||||||
MPI processes as they terminate.
|
|
||||||
.I -sa
|
|
||||||
forces the exit statuses to be displayed for all processes;
|
|
||||||
.I -sf
|
|
||||||
only displays the exist statuses if at least one process terminates
|
|
||||||
either by a signal or a non-zero exit status (note that exiting before
|
|
||||||
invoking
|
|
||||||
.I MPI_FINALIZE
|
|
||||||
will cause a non-zero exit status).
|
|
||||||
.PP
|
|
||||||
The status of each process is printed out, one per line, in the
|
|
||||||
following format:
|
|
||||||
.sp
|
|
||||||
.RS
|
|
||||||
prefix_string node pid killed status
|
|
||||||
.RE
|
|
||||||
.PP
|
|
||||||
If
|
|
||||||
.I killed
|
|
||||||
is 1, then
|
|
||||||
.I status
|
|
||||||
is the signal number. If
|
|
||||||
.I killed
|
|
||||||
is 0, then
|
|
||||||
.I status
|
|
||||||
is the exit status of the process.
|
|
||||||
.PP
|
|
||||||
The default
|
|
||||||
.I prefix_string
|
|
||||||
is "mpirun:", but the
|
|
||||||
.I -p
|
|
||||||
option can be used override this string.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS Current Working Directory
|
.SS Current Working Directory
|
||||||
|
.
|
||||||
The default behavior of mpirun has changed with respect to the
|
The default behavior of mpirun has changed with respect to the
|
||||||
directory that processes will be started in.
|
directory that processes will be started in.
|
||||||
.PP
|
.PP
|
||||||
The
|
The \fI\-wd\fP option to mpirun allows the user to change to an arbitrary
|
||||||
.I \-wd
|
directory before their program is invoked. It can also be used in application
|
||||||
option to mpirun allows the user to change to an arbitrary directory
|
|
||||||
before their program is invoked. It can also be used in application
|
|
||||||
schema files to specify working directories on specific nodes and/or
|
schema files to specify working directories on specific nodes and/or
|
||||||
for specific applications.
|
for specific applications.
|
||||||
.PP
|
.PP
|
||||||
If the
|
If the \fI\-wd\fP option appears both in a schema file and on the command line,
|
||||||
.I \-wd
|
the schema file directory will override the command line value.
|
||||||
option appears both in a schema file and on the command line, the
|
|
||||||
schema file directory will override the command line value.
|
|
||||||
.PP
|
.PP
|
||||||
The
|
The \fI\-D\fP option will change the current working directory to the directory
|
||||||
.I \-D
|
where the executable resides. It cannot be used in application schema files.
|
||||||
option will change the current working directory to the directory
|
|
||||||
where the executable resides. It cannot be used in application schema
|
|
||||||
files.
|
|
||||||
.I \-wd
|
|
||||||
is mutually exclusive with
|
|
||||||
.IR \-D .
|
|
||||||
.PP
|
.PP
|
||||||
If neither
|
If \fI\-wd\fP is not specified, the local node will send the directory name
|
||||||
.I \-wd
|
where mpirun was invoked from to each of the remote nodes. The remote nodes
|
||||||
nor
|
|
||||||
.I \-D
|
|
||||||
are specified, the local node will send the directory name where
|
|
||||||
mpirun was invoked from to each of the remote nodes. The remote nodes
|
|
||||||
will then try to change to that directory. If they fail (e.g., if the
|
will then try to change to that directory. If they fail (e.g., if the
|
||||||
directory does not exists on that node), they will start with from the
|
directory does not exists on that node), they will start with from the
|
||||||
user's home directory.
|
user's home directory.
|
||||||
.PP
|
.PP
|
||||||
All directory changing occurs before the user's program is invoked; it
|
All directory changing occurs before the user's program is invoked; it
|
||||||
does not wait until
|
does not wait until \fIMPI_INIT\fP is called.
|
||||||
.I MPI_INIT
|
|
||||||
is called.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS Process Environment
|
.SS Process Environment
|
||||||
|
.
|
||||||
Processes in the MPI application inherit their environment from the
|
Processes in the MPI application inherit their environment from the
|
||||||
LAM daemon upon the node on which they are running. The environment
|
Open RTE daemon upon the node on which they are running. The environment
|
||||||
of a LAM daemon is fixed upon booting of the LAM with lamboot(1) and
|
is typically inherited from the user's shell. On remote nodes, the exact
|
||||||
is typically inherited from the user's shell. On the origin node,
|
environment is determined by the boot MCA module used. The rsh boot module,
|
||||||
this will be the shell from which lamboot(1) was invoked; on remote
|
for example, uses either rsh/ssh to launch the LAM daemon on remote nodes, and
|
||||||
nodes, the exact environment is determined by the boot SSI module used
|
typically executes one or more of the user's shell-setup files before launching
|
||||||
by lamboot(1). The rsh boot module, for example, uses either rsh/ssh
|
the Open RTE daemon. When running dynamically linked applications which
|
||||||
to launch the LAM daemon on remote nodes, and typically executes one
|
require the LD_LIBRARY_PATH environment variable to be set, care must be taken
|
||||||
or more of the user's shell-setup files before launching the LAM
|
to ensure that it is correctly set when booting Open MPI.
|
||||||
daemon. When running dynamically linked applications which require
|
|
||||||
the LD_LIBRARY_PATH environment variable to be set, care must be taken
|
|
||||||
to ensure that it is correctly set when booting the LAM.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS Exported Environment Variables
|
.SS Exported Environment Variables
|
||||||
All environment variables that are named in the form LAM_MPI_*,
|
.
|
||||||
LAM_IMPI_*, or IMPI_* will automatically be exported to new processes
|
All environment variables that are named in the form OMPI_* will automatically
|
||||||
on the local and remote nodes. This exporting may be inhibited with
|
be exported to new processes on the local and remote nodes.
|
||||||
the
|
The \fI\-x\fP option to \fImpirun\fP can be used to export specific environment
|
||||||
.I \-nx
|
variables to the new processes. While the syntax of the \fI\-x\fP
|
||||||
option.
|
|
||||||
.PP
|
|
||||||
Additionally, the
|
|
||||||
.I \-x
|
|
||||||
option to
|
|
||||||
.IR mpirun
|
|
||||||
can be used to export specific environment variables to the new
|
|
||||||
processes. While the syntax of the
|
|
||||||
.I \-x
|
|
||||||
option allows the definition of new variables, note that the parser
|
option allows the definition of new variables, note that the parser
|
||||||
for this option is currently not very sophisticated - it does not even
|
for this option is currently not very sophisticated - it does not even
|
||||||
understand quoted values. Users are advised to set variables in the
|
understand quoted values. Users are advised to set variables in the
|
||||||
environment and use
|
environment and use \fI\-x\fP to export them; not to define them.
|
||||||
.I \-x
|
|
||||||
to export them; not to define them.
|
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.
|
.
|
||||||
.SS Trace Generation
|
.SS MCA (Modular Component Architecture)
|
||||||
Two switches control trace generation from processes running under LAM
|
|
||||||
and both must be in the on position for traces to actually be
|
|
||||||
generated. The first switch is controlled by
|
|
||||||
.I mpirun
|
|
||||||
and the second switch is initially set by
|
|
||||||
.I mpirun
|
|
||||||
but can be toggled at runtime with MPIL_Trace_on(2) and
|
|
||||||
MPIL_Trace_off(2). The \fI-t\fR (\fI-ton\fR is equivalent) and
|
|
||||||
\fI-toff\fR options all turn on the first switch. Otherwise the first
|
|
||||||
switch is off and calls to MPIL_Trace_on(2) in the application program
|
|
||||||
are ineffective. The \fI-t\fR option also turns on the second switch.
|
|
||||||
The \fI-toff\fR option turns off the second switch. See
|
|
||||||
MPIL_Trace_on(2) and lamtrace(1) for more details.
|
|
||||||
.
|
|
||||||
.
|
|
||||||
.
|
|
||||||
.SS MPI Data Conversion
|
|
||||||
LAM's MPI library converts MPI messages from local representation to
|
|
||||||
LAM representation upon sending them and then back to local
|
|
||||||
representation upon receiving them. If the case of a LAM consisting
|
|
||||||
of a homogeneous network of machines where the local representation
|
|
||||||
differs from the LAM representation this can result in unnecessary
|
|
||||||
conversions.
|
|
||||||
.P
|
|
||||||
The \fI-O\fR switch used to be necessary to indicate to LAM whether
|
|
||||||
the mulitcomputer was homogeneous or not. LAM now automatically
|
|
||||||
determines whether a given MPI job is homogeneous or not. The
|
|
||||||
.I -O
|
|
||||||
flag will silently be accepted for backwards compatability, but it is
|
|
||||||
ignored.
|
|
||||||
.
|
|
||||||
.
|
|
||||||
.
|
|
||||||
.SS SSI (System Services Interface)
|
|
||||||
The
|
The
|
||||||
.I -ssi
|
.I -mca
|
||||||
switch allows the passing of parameters to various SSI modules. LAM's
|
switch allows the passing of parameters to various MCA modules.
|
||||||
SSI modules are described in detail in lamssi(7). SSI modules have
|
.\" Open MPI's MCA modules are described in detail in ompimca(7).
|
||||||
direct impact on MPI programs because they allow tunable parameters to
|
MCA modules have direct impact on MPI programs because they allow tunable
|
||||||
be set at run time (such as which RPI communication device driver to
|
parameters to be set at run time (such as which BTL communication device driver
|
||||||
use, what parameters to pass to that RPI, etc.).
|
to use, what parameters to pass to that BTL, etc.).
|
||||||
.PP
|
.PP
|
||||||
The
|
The \fI-mca\fP switch takes two arguments: \fI<key\fP and \fI<value>\fP.
|
||||||
.I -ssi
|
The \fI<key>\fP argument generally specifies which MCA module will receive the value.
|
||||||
switch takes two arguments:
|
For example, the \fI<key>\fP "btl" is used to select which BTL to be used for
|
||||||
.I <key>
|
transporting MPI messages. The \fI<value>\fP argument is the value that is
|
||||||
and
|
passed.
|
||||||
.IR <value> .
|
For example:
|
||||||
The
|
.
|
||||||
.I <key>
|
|
||||||
argument generally specifies which SSI module will receive the value.
|
|
||||||
For example, the
|
|
||||||
.I <key>
|
|
||||||
"rpi" is used to select which RPI to be used for transporting MPI
|
|
||||||
messages. The
|
|
||||||
.I <value>
|
|
||||||
argument is the value that is passed. For example:
|
|
||||||
.TP 4
|
.TP 4
|
||||||
mpirun -ssi rpi lamd N foo
|
mpirun -mca btl tcp,self -np 1 foo
|
||||||
Tells LAM to use the "lamd" RPI and to run a single copy of "foo" on
|
Tells Open MPI to use the "tcp" and "self" BTLs, and to run a single copy of
|
||||||
every node.
|
"foo" an allocated node.
|
||||||
|
.
|
||||||
.TP
|
.TP
|
||||||
mpirun -ssi rpi tcp N foo
|
mpirun -mca btl self -np 1 foo
|
||||||
Tells LAM to use the "tcp" RPI.
|
Tells Open MPI to use the "self" BTL, and to run a single copy of
|
||||||
.TP
|
"foo" an allocated node.
|
||||||
mpirun -ssi rpi sysv N foo
|
.\" And so on. Open MPI's BTL MCA modules are described in lamssi_rpi(7).
|
||||||
Tells LAM to use the "sysv" RPI.
|
|
||||||
.PP
|
.PP
|
||||||
And so on. LAM's RPI SSI modules are described in lamssi_rpi(7).
|
The \fI-mca\fP switch can be used multiple times to specify different
|
||||||
|
\fI<key>\fP and/or \fI<value>\fP arguments. If the same \fI<key>\fP is
|
||||||
|
specified more than once, the \fI<value>\fPs are concatenated with a comma
|
||||||
|
(",") separating them.
|
||||||
.PP
|
.PP
|
||||||
The
|
.B Note:
|
||||||
.I -ssi
|
The \fI-mca\fP switch is simply a shortcut for setting environment variables.
|
||||||
switch can be used multiple times to specify different
|
The same effect may be accomplished by setting corresponding environment
|
||||||
.I <key>
|
variables before running \fImpirun\fP.
|
||||||
and/or
|
The form of the environment variables that Open MPI sets are:
|
||||||
.I <value>
|
|
||||||
arguments. If the same
|
OMPI_<key>=<value>
|
||||||
.I <key>
|
|
||||||
is specified more than once, the
|
|
||||||
.IR <value> s
|
|
||||||
are concatenated with a comma (",") separating them.
|
|
||||||
.PP
|
.PP
|
||||||
Note that the
|
Note that the \fI-mca\fP switch overrides any previously set environment
|
||||||
.I -ssi
|
variables. Also note that unknown \fI<key>\fP arguments are still set as
|
||||||
switch is simply a shortcut for setting environment variables. The
|
environment variable -- they are not checked (by \fImpirun\fP) for correctness.
|
||||||
same effect may be accomplished by setting corresponding environment
|
Illegal or incorrect \fI<value>\fP arguments may or may not be reported -- it
|
||||||
variables before running
|
depends on the specific MCA module.
|
||||||
.IR mpirun .
|
|
||||||
The form of the environment variables that LAM sets are:
|
|
||||||
.IR LAM_MPI_SSI_<key>=<value> .
|
|
||||||
.PP
|
|
||||||
Note that the
|
|
||||||
.I -ssi
|
|
||||||
switch overrides any previously set environment variables. Also note
|
|
||||||
that unknown
|
|
||||||
.I <key>
|
|
||||||
arguments are still set as environment variable -- they are not
|
|
||||||
checked (by
|
|
||||||
.IR mpirun )
|
|
||||||
for correctness. Illegal or incorrect
|
|
||||||
.I <value>
|
|
||||||
arguments may or may not be reported -- it depends on the specific SSI
|
|
||||||
module.
|
|
||||||
.PP
|
|
||||||
The
|
|
||||||
.I -ssi
|
|
||||||
switch obsoletes the old
|
|
||||||
.I -c2c
|
|
||||||
and
|
|
||||||
.I -lamd
|
|
||||||
switches. These switches used to be relevant because LAM could only
|
|
||||||
have two RPI's available at a time: the lamd RPI and one of the C2C
|
|
||||||
RPIs. This is no longer true -- all RPI's are now available and
|
|
||||||
choosable at run-time. Selecting the lamd RPI is shown in the
|
|
||||||
examples above.
|
|
||||||
The
|
|
||||||
.I -c2c
|
|
||||||
switch has no direct translation since "C2C" used to refer to all
|
|
||||||
other RPI's that were not the lamd RPI. As such,
|
|
||||||
.I -ssi rpi <value>
|
|
||||||
must be used to select the specific desired RPI (whether it is "lamd"
|
|
||||||
or one of the other RPI's).
|
|
||||||
.
|
|
||||||
.
|
|
||||||
.
|
|
||||||
.SS Guaranteed Envelope Resources
|
|
||||||
By default, LAM will guarantee a minimum amount of message envelope
|
|
||||||
buffering to each MPI process pair and will impede or report an error
|
|
||||||
to a process that attempts to overflow this system resource. This
|
|
||||||
robustness and debugging feature is implemented in a machine specific
|
|
||||||
manner when direct communication is used. For normal LAM
|
|
||||||
communication via the LAM daemon, a protocol is used. The \fI-nger\fR
|
|
||||||
option disables GER and the measures taken to support it. The minimum
|
|
||||||
GER is configured by the system administrator when LAM is installed.
|
|
||||||
See MPI(7) for more details.
|
|
||||||
.
|
.
|
||||||
.\" **************************
|
.\" **************************
|
||||||
.\" Examples Section
|
.\" Examples Section
|
||||||
.\" **************************
|
.\" **************************
|
||||||
.SH EXAMPLES
|
.SH EXAMPLES
|
||||||
Be sure to also see the examples in the "Location Nomenclature"
|
Be sure to also see the examples in the "Location Nomenclature" section, above.
|
||||||
section, above.
|
.
|
||||||
.TP 4
|
.TP 4
|
||||||
mpirun N prog1
|
mpirun -np 1 prog1
|
||||||
Load and execute prog1 on all nodes. Search the user's $PATH for the
|
Load and execute prog1 on one node. Search the user's $PATH for the
|
||||||
executable file on each node.
|
executable file on each node.
|
||||||
|
.
|
||||||
|
.
|
||||||
.TP
|
.TP
|
||||||
mpirun -c 8 prog1
|
mpirun -np 8 --byslot prog1
|
||||||
Run 8 copies of prog1 wherever LAM wants to run them.
|
Run 8 copies of prog1 wherever Open MPI wants to run them.
|
||||||
|
.
|
||||||
|
.
|
||||||
.TP
|
.TP
|
||||||
mpirun n8-10 -v -nw -s n3 prog1 -q
|
mpirun -np 4 -mca btl ib,tcp,self prog1
|
||||||
Load and execute prog1 on nodes 8, 9, and 10. Search for prog1 on
|
Run 4 copies of prog1 using the "ib", "tcp", and "self" BTL's for the transport
|
||||||
node 3 and transfer it to the three target nodes. Report as each
|
of MPI messages.
|
||||||
process is created. Give "-q" as a command line to each new process.
|
|
||||||
Do not wait for the processes to complete before exiting
|
|
||||||
.IR mpirun .
|
|
||||||
.TP
|
|
||||||
mpirun -v myapp
|
|
||||||
Parse the application schema, myapp, and start all processes specified
|
|
||||||
in it. Report as each process is created.
|
|
||||||
.TP
|
|
||||||
mpirun -npty -wd /work/output -x DISPLAY C my_application
|
|
||||||
|
|
||||||
Start one copy of "my_application" on each available CPU. The number
|
|
||||||
of available CPUs on each node was previously specified when LAM was
|
|
||||||
booted with lamboot(1). As noted above,
|
|
||||||
.I mpirun
|
|
||||||
will schedule adjoining rank in
|
|
||||||
.I MPI_COMM_WORLD
|
|
||||||
on the same node where possible. For example, if n0 has a CPU count
|
|
||||||
of 8, and n1 has a CPU count of 4,
|
|
||||||
.I mpirun
|
|
||||||
will place
|
|
||||||
.I MPI_COMM_WORLD
|
|
||||||
ranks 0 through 7 on n0, and 8 through 11 on n1. This tends to
|
|
||||||
maximize on-node communication for many parallel applications; when
|
|
||||||
used in conjunction with the multi-protocol network/shared memory RPIs
|
|
||||||
in LAM (see the RELEASE_NOTES and INSTALL files with the LAM
|
|
||||||
distribution), overall communication performance can be quite good.
|
|
||||||
Also disable pseudo-tty support, change directory to /work/output, and
|
|
||||||
export the DISPLAY variable to the new processes (perhaps
|
|
||||||
my_application will invoke an X application such as xv to display
|
|
||||||
output).
|
|
||||||
.
|
.
|
||||||
.\" **************************
|
.\" **************************
|
||||||
.\" Diagnostics Section
|
.\" Diagnostics Section
|
||||||
.\" **************************
|
.\" **************************
|
||||||
.
|
.
|
||||||
.SH DIAGNOSTICS
|
.\" .SH DIAGNOSTICS
|
||||||
.TP 4
|
.\".TP 4
|
||||||
mpirun: Exec format error
|
.\"Error Msg:
|
||||||
This usually means that either a number of processes or an appropriate
|
.\"Description
|
||||||
<where> clause was not specified, indicating that LAM does not know
|
|
||||||
how many processes to run. See the EXAMPLES and "Location
|
|
||||||
Nomenclature" sections, above, for examples on how to specify how many
|
|
||||||
processes to run, and/or where to run them. However, it can also mean
|
|
||||||
that a non-ASCII character was detected in the application schema.
|
|
||||||
This is usually a command line usage error where
|
|
||||||
.I mpirun
|
|
||||||
is expecting an application schema and an executable file was given.
|
|
||||||
.TP
|
|
||||||
mpirun: syntax error in application schema, line XXX
|
|
||||||
The application schema cannot be parsed because of a usage or syntax error
|
|
||||||
on the given line in the file.
|
|
||||||
.TP
|
|
||||||
<filename>: No such file or directory
|
|
||||||
This error can occur in two cases. Either the named file cannot be
|
|
||||||
located or it has been found but the user does not have sufficient
|
|
||||||
permissions to execute the program or read the application schema.
|
|
||||||
.
|
.
|
||||||
.\" **************************
|
.\" **************************
|
||||||
.\" Return Value Section
|
.\" Return Value Section
|
||||||
.\" **************************
|
.\" **************************
|
||||||
.
|
.
|
||||||
.SH RETURN VALUE
|
.SH RETURN VALUE
|
||||||
.I mpirun
|
.
|
||||||
returns 0 if all ranks started by
|
\fImpirun\fP returns 0 if all ranks started by \fImpirun\fP exit after calling
|
||||||
.I mpirun
|
MPI_FINALIZE. A non-zero value is returned if an internal error occurred in
|
||||||
exit after calling MPI_FINALIZE. A non-zero value is returned if an
|
mpirun, or one or more ranks exited before calling MPI_FINALIZE. If an
|
||||||
internal error occurred in mpirun, or one or more ranks exited before
|
internal error occurred in mpirun, the corresponding error code is returned.
|
||||||
calling MPI_FINALIZE. If an internal error occurred in mpirun, the
|
In the event that one or more ranks exit before calling MPI_FINALIZE, the
|
||||||
corresponding error code is returned. In the event that one or more ranks
|
return value of the rank of the process that \fImpirun\fP first notices died
|
||||||
exit before calling MPI_FINALIZE, the return value of the rank of the
|
before calling MPI_FINALIZE will be returned. Note that, in general, this will
|
||||||
process that
|
be the first rank that died but is not guaranteed to be so.
|
||||||
.I mpirun
|
|
||||||
first notices died before calling MPI_FINALIZE will be returned. Note
|
|
||||||
that, in general, this will be the first rank that died but is not
|
|
||||||
guaranteed to be so.
|
|
||||||
.PP
|
.PP
|
||||||
However, note that if the
|
However, note that if the \fI-nw\fP switch is used, the return value from
|
||||||
.I \-nw
|
mpirun does not indicate the exit status of the ranks.
|
||||||
switch is used, the return value from mpirun does not indicate the exit status
|
|
||||||
of the ranks.
|
|
||||||
.
|
.
|
||||||
.\" **************************
|
.\" **************************
|
||||||
.\" See Also Section
|
.\" See Also Section
|
||||||
.\" **************************
|
.\" **************************
|
||||||
.
|
.
|
||||||
.SH SEE ALSO
|
.SH SEE ALSO
|
||||||
bhost(5),
|
orted(1)
|
||||||
lamexec (1),
|
|
||||||
lamssi(7),
|
|
||||||
lamssi_rpi(7),
|
|
||||||
lamtrace(1),
|
|
||||||
loadgo(1),
|
|
||||||
MPIL_Trace_on(2),
|
|
||||||
mpimsg(1),
|
|
||||||
mpitask(1)
|
|
||||||
|
Загрузка…
x
Ссылка в новой задаче
Block a user