1598 строки
50 KiB
Plaintext
1598 строки
50 KiB
Plaintext
|
.\" -*- nroff -*-
|
||
|
.\" Copyright (c) 2009-2016 Cisco Systems, Inc. All rights reserved.
|
||
|
.\" Copyright (c) 2008-2009 Sun Microsystems, Inc. All rights reserved.
|
||
|
.\" Copyright (c) 2017 Intel, Inc. All rights reserved.
|
||
|
.\" Copyright (c) 2017 Los Alamos National Security, LLC. All rights
|
||
|
.\" reserved.
|
||
|
.\" $COPYRIGHT$
|
||
|
.\"
|
||
|
.\" Man page for PSRVR's prun command
|
||
|
.\"
|
||
|
.\" .TH name section center-footer left-footer center-header
|
||
|
.TH PRUN 1 "#OMPI_DATE#" "#PACKAGE_VERSION#" "#PACKAGE_NAME#"
|
||
|
.\" **************************
|
||
|
.\" Name Section
|
||
|
.\" **************************
|
||
|
.SH NAME
|
||
|
.
|
||
|
prun \- Execute serial and parallel jobs with the PMIx Reference Server.
|
||
|
|
||
|
.
|
||
|
.\" **************************
|
||
|
.\" Synopsis Section
|
||
|
.\" **************************
|
||
|
.SH SYNOPSIS
|
||
|
.
|
||
|
.PP
|
||
|
Single Process Multiple Data (SPMD) Model:
|
||
|
|
||
|
.B prun
|
||
|
[ options ]
|
||
|
.B <program>
|
||
|
[ <args> ]
|
||
|
.P
|
||
|
|
||
|
Multiple Instruction Multiple Data (MIMD) Model:
|
||
|
|
||
|
.B prun
|
||
|
[ global_options ]
|
||
|
[ local_options1 ]
|
||
|
.B <program1>
|
||
|
[ <args1> ] :
|
||
|
[ local_options2 ]
|
||
|
.B <program2>
|
||
|
[ <args2> ] :
|
||
|
... :
|
||
|
[ local_optionsN ]
|
||
|
.B <programN>
|
||
|
[ <argsN> ]
|
||
|
.P
|
||
|
|
||
|
Note that in both models, invoking \fIprun\fP via an absolute path
|
||
|
name is equivalent to specifying the \fI--prefix\fP option with a
|
||
|
\fI<dir>\fR value equivalent to the directory where \fIprun\fR
|
||
|
resides, minus its last subdirectory. For example:
|
||
|
|
||
|
\fB%\fP /usr/local/bin/prun ...
|
||
|
|
||
|
is equivalent to
|
||
|
|
||
|
\fB%\fP prun --prefix /usr/local
|
||
|
|
||
|
.
|
||
|
.\" **************************
|
||
|
.\" Quick Summary Section
|
||
|
.\" **************************
|
||
|
.SH QUICK SUMMARY
|
||
|
.
|
||
|
If you are simply looking for how to run an application, you
|
||
|
probably want to use a command line of the following form:
|
||
|
|
||
|
\fB%\fP prun [ -np X ] [ --hostfile <filename> ] <program>
|
||
|
|
||
|
This will run X copies of \fI<program>\fR in your current run-time
|
||
|
environment (if running under a supported resource manager, PSRVR's
|
||
|
\fIprun\fR will usually automatically use the corresponding resource manager
|
||
|
process starter, as opposed to, for example, \fIrsh\fR or \fIssh\fR,
|
||
|
which require the use of a hostfile, or will default to running all X
|
||
|
copies on the localhost), scheduling (by default) in a round-robin fashion by
|
||
|
CPU slot. See the rest of this page for more details.
|
||
|
.P
|
||
|
Please note that prun automatically binds processes. Three binding patterns are used in the absence of any further directives:
|
||
|
.TP 18
|
||
|
.B Bind to core:
|
||
|
when the number of processes is <= 2
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B Bind to socket:
|
||
|
when the number of processes is > 2
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B Bind to none:
|
||
|
when oversubscribed
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
If your application uses threads, then you probably want to ensure that you are
|
||
|
either not bound at all (by specifying --bind-to none), or bound to multiple cores
|
||
|
using an appropriate binding level or specific number of processing elements per
|
||
|
application process.
|
||
|
.
|
||
|
.\" **************************
|
||
|
.\" Options Section
|
||
|
.\" **************************
|
||
|
.SH OPTIONS
|
||
|
.
|
||
|
.I prun
|
||
|
will send the name of the directory where it was invoked on the local
|
||
|
node to each of the remote nodes, and attempt to change to that
|
||
|
directory. See the "Current Working Directory" section below for further
|
||
|
details.
|
||
|
.\"
|
||
|
.\" Start options listing
|
||
|
.\" Indent 10 characters from start of first column to start of second column
|
||
|
.TP 10
|
||
|
.B <program>
|
||
|
The program executable. This is identified as the first non-recognized argument
|
||
|
to prun.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B <args>
|
||
|
Pass these run-time arguments to every new process. These must always
|
||
|
be the last arguments to \fIprun\fP. If an app context file is used,
|
||
|
\fI<args>\fP will be ignored.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -h\fR,\fP --help
|
||
|
Display help for this command
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -q\fR,\fP --quiet
|
||
|
Suppress informative messages from prun during application execution.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -v\fR,\fP --verbose
|
||
|
Be verbose
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -V\fR,\fP --version
|
||
|
Print version number. If no other arguments are given, this will also
|
||
|
cause prun to exit.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -N \fR<num>\fP
|
||
|
.br
|
||
|
Launch num processes per node on all allocated nodes (synonym for npernode).
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -display-map\fR,\fP --display-map
|
||
|
Display a table showing the mapped location of each process prior to launch.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -display-allocation\fR,\fP --display-allocation
|
||
|
Display the detected resource allocation.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -output-proctable\fR,\fP --output-proctable
|
||
|
Output the debugger proctable after launch.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -max-vm-size\fR,\fP --max-vm-size \fR<size>\fP
|
||
|
Number of processes to run.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -novm\fR,\fP --novm
|
||
|
Execute without creating an allocation-spanning virtual machine (only start
|
||
|
daemons on nodes hosting application procs).
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -hnp\fR,\fP --hnp \fR<arg0>\fP
|
||
|
Specify the URI of the \fRpsrvr\fP process, or the name of the file (specified as
|
||
|
file:filename) that contains that info.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
Use one of the following options to specify which hosts (nodes) within the \fRpsrvr\fP to run on.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -H\fR,\fP -host\fR,\fP --host \fR<host1,host2,...,hostN>\fP
|
||
|
List of hosts on which to invoke processes.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -hostfile\fR,\fP --hostfile \fR<hostfile>\fP
|
||
|
Provide a hostfile to use.
|
||
|
.\" JJH - Should have man page for how to format a hostfile properly.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -default-hostfile\fR,\fP --default-hostfile \fR<hostfile>\fP
|
||
|
Provide a default hostfile.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -machinefile\fR,\fP --machinefile \fR<machinefile>\fP
|
||
|
Synonym for \fI-hostfile\fP.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -cpu-set\fR,\fP --cpu-set \fR<list>\fP
|
||
|
Restrict launched processes to the specified logical cpus on each node (comma-separated
|
||
|
list). Note that the binding options will still apply within the specified envelope - e.g.,
|
||
|
you can elect to bind each process to only one cpu within the specified cpu set.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
The following options specify the number of processes to launch. Note that none
|
||
|
of the options imply a particular binding policy - e.g., requesting N processes
|
||
|
for each socket does not imply that the processes will be bound to the socket.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -c\fR,\fP -n\fR,\fP --n\fR,\fP -np \fR<#>\fP
|
||
|
Run this many copies of the program on the given nodes. This option
|
||
|
indicates that the specified file is an executable program and not an
|
||
|
application context. If no value is provided for the number of copies to
|
||
|
execute (i.e., neither the "-np" nor its synonyms are provided on the command
|
||
|
line), prun will automatically execute a copy of the program on
|
||
|
each process slot (see below for description of a "process slot"). This
|
||
|
feature, however, can only be used in the SPMD model and will return an
|
||
|
error (without beginning execution of the application) otherwise.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B —map-by ppr:N:<object>
|
||
|
Launch N times the number of objects of the specified type on each node.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -npersocket\fR,\fP --npersocket \fR<#persocket>\fP
|
||
|
On each node, launch this many processes times the number of processor
|
||
|
sockets on the node.
|
||
|
The \fI-npersocket\fP option also turns on the \fI-bind-to-socket\fP option.
|
||
|
(deprecated in favor of --map-by ppr:n:socket)
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -npernode\fR,\fP --npernode \fR<#pernode>\fP
|
||
|
On each node, launch this many processes.
|
||
|
(deprecated in favor of --map-by ppr:n:node)
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -pernode\fR,\fP --pernode
|
||
|
On each node, launch one process -- equivalent to \fI-npernode\fP 1.
|
||
|
(deprecated in favor of --map-by ppr:1:node)
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
To map processes:
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B --map-by \fR<foo>\fP
|
||
|
Map to the specified object, defaults to \fIsocket\fP. Supported options
|
||
|
include slot, hwthread, core, L1cache, L2cache, L3cache, socket, numa,
|
||
|
board, node, sequential, distance, and ppr. Any object can include
|
||
|
modifiers by adding a \fR:\fP and any combination of PE=n (bind n
|
||
|
processing elements to each proc), SPAN (load
|
||
|
balance the processes across the allocation), OVERSUBSCRIBE (allow
|
||
|
more processes on a node than processing elements), and NOOVERSUBSCRIBE.
|
||
|
This includes PPR, where the pattern would be terminated by another colon
|
||
|
to separate it from the modifiers.
|
||
|
.
|
||
|
.TP
|
||
|
.B -bycore\fR,\fP --bycore
|
||
|
Map processes by core (deprecated in favor of --map-by core)
|
||
|
.
|
||
|
.TP
|
||
|
.B -byslot\fR,\fP --byslot
|
||
|
Map and rank processes round-robin by slot.
|
||
|
.
|
||
|
.TP
|
||
|
.B -nolocal\fR,\fP --nolocal
|
||
|
Do not run any copies of the launched application on the same node as
|
||
|
prun is running. This option will override listing the localhost
|
||
|
with \fB--host\fR or any other host-specifying mechanism.
|
||
|
.
|
||
|
.TP
|
||
|
.B -nooversubscribe\fR,\fP --nooversubscribe
|
||
|
Do not oversubscribe any nodes; error (without starting any processes)
|
||
|
if the requested number of processes would cause oversubscription.
|
||
|
This option implicitly sets "max_slots" equal to the "slots" value for
|
||
|
each node. (Enabled by default).
|
||
|
.
|
||
|
.TP
|
||
|
.B -oversubscribe\fR,\fP --oversubscribe
|
||
|
Nodes are allowed to be oversubscribed, even on a managed system, and
|
||
|
overloading of processing elements.
|
||
|
.
|
||
|
.TP
|
||
|
.B -bynode\fR,\fP --bynode
|
||
|
Launch processes one per node, cycling by node in a round-robin
|
||
|
fashion. This spreads processes evenly among nodes and assigns
|
||
|
ranks in a round-robin, "by node" manner.
|
||
|
.
|
||
|
.TP
|
||
|
.B -cpu-list\fR,\fP --cpu-list \fR<cpus>\fP
|
||
|
List of processor IDs to bind processes to [default=NULL].
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
To order processes' ranks:
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B --rank-by \fR<foo>\fP
|
||
|
Rank in round-robin fashion according to the specified object,
|
||
|
defaults to \fIslot\fP. Supported options
|
||
|
include slot, hwthread, core, L1cache, L2cache, L3cache,
|
||
|
socket, numa, board, and node.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
For process binding:
|
||
|
.
|
||
|
.TP
|
||
|
.B --bind-to \fR<foo>\fP
|
||
|
Bind processes to the specified object, defaults to \fIcore\fP. Supported options
|
||
|
include slot, hwthread, core, l1cache, l2cache, l3cache, socket, numa, board, and none.
|
||
|
.
|
||
|
.TP
|
||
|
.B -cpus-per-proc\fR,\fP --cpus-per-proc \fR<#perproc>\fP
|
||
|
Bind each process to the specified number of cpus.
|
||
|
(deprecated in favor of --map-by <obj>:PE=n)
|
||
|
.
|
||
|
.TP
|
||
|
.B -cpus-per-rank\fR,\fP --cpus-per-rank \fR<#perrank>\fP
|
||
|
Alias for \fI-cpus-per-proc\fP.
|
||
|
(deprecated in favor of --map-by <obj>:PE=n)
|
||
|
.
|
||
|
.TP
|
||
|
.B -bind-to-core\fR,\fP --bind-to-core
|
||
|
Bind processes to cores (deprecated in favor of --bind-to core)
|
||
|
.
|
||
|
.TP
|
||
|
.B -bind-to-socket\fR,\fP --bind-to-socket
|
||
|
Bind processes to processor sockets (deprecated in favor of --bind-to socket)
|
||
|
.
|
||
|
.TP
|
||
|
.B -report-bindings\fR,\fP --report-bindings
|
||
|
Report any bindings for launched processes.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
For rankfiles:
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -rf\fR,\fP --rankfile \fR<rankfile>\fP
|
||
|
Provide a rankfile file.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
To manage standard I/O:
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -output-filename\fR,\fP --output-filename \fR<filename>\fP
|
||
|
Redirect the stdout, stderr, and stddiag of all processes to a process-unique version of
|
||
|
the specified filename. Any directories in the filename will automatically be created.
|
||
|
Each output file will consist of filename.id, where the id will be the
|
||
|
processes' rank, left-filled with
|
||
|
zero's for correct ordering in listings.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -stdin\fR,\fP --stdin\fR <rank> \fP
|
||
|
The rank of the process that is to receive stdin. The
|
||
|
default is to forward stdin to rank 0, but this option
|
||
|
can be used to forward stdin to any process. It is also acceptable to
|
||
|
specify \fInone\fP, indicating that no processes are to receive stdin.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -merge-stderr-to-stdout\fR,\fP --merge-stderr-to-stdout
|
||
|
Merge stderr to stdout for each process.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -tag-output\fR,\fP --tag-output
|
||
|
Tag each line of output to stdout, stderr, and stddiag with \fB[jobid, MCW_rank]<stdxxx>\fP
|
||
|
indicating the process jobid and rank of the process that generated the output,
|
||
|
and the channel which generated it.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -timestamp-output\fR,\fP --timestamp-output
|
||
|
Timestamp each line of output to stdout, stderr, and stddiag.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -xml\fR,\fP --xml
|
||
|
Provide all output to stdout, stderr, and stddiag in an xml format.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -xml-file\fR,\fP --xml-file \fR<filename>\fP
|
||
|
Provide all output in XML format to the specified file.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -xterm\fR,\fP --xterm \fR<ranks>\fP
|
||
|
Display the output from the processes identified by their ranks in separate xterm windows. The ranks are specified
|
||
|
as a comma-separated list of ranges, with a -1 indicating all. A separate
|
||
|
window will be created for each specified process.
|
||
|
.B Note:
|
||
|
xterm will normally terminate the window upon termination of the process running
|
||
|
within it. However, by adding a "!" to the end of the list of specified ranks,
|
||
|
the proper options will be provided to ensure that xterm keeps the window open
|
||
|
\fIafter\fP the process terminates, thus allowing you to see the process' output.
|
||
|
Each xterm window will subsequently need to be manually closed.
|
||
|
.B Note:
|
||
|
In some environments, xterm may require that the executable be in the user's
|
||
|
path, or be specified in absolute or relative terms. Thus, it may be necessary
|
||
|
to specify a local executable as "./foo" instead of just "foo". If xterm fails to
|
||
|
find the executable, prun will hang, but still respond correctly to a ctrl-c.
|
||
|
If this happens, please check that the executable is being specified correctly
|
||
|
and try again.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
To manage files and runtime environment:
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -path\fR,\fP --path \fR<path>\fP
|
||
|
<path> that will be used when attempting to locate the requested
|
||
|
executables. This is used prior to using the local PATH setting.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B --prefix \fR<dir>\fP
|
||
|
Prefix directory that will be used to set the \fIPATH\fR and
|
||
|
\fILD_LIBRARY_PATH\fR on the remote node before invoking
|
||
|
the target process. See the "Remote Execution" section, below.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B --noprefix
|
||
|
Disable the automatic --prefix behavior
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -s\fR,\fP --preload-binary
|
||
|
Copy the specified executable(s) to remote machines prior to starting remote processes. The
|
||
|
executables will be copied to the session directory and will be deleted upon
|
||
|
completion of the job.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B --preload-files \fR<files>\fP
|
||
|
Preload the comma separated list of files to the current working directory of the remote
|
||
|
machines where processes will be launched prior to starting those processes.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -set-cwd-to-session-dir\fR,\fP --set-cwd-to-session-dir
|
||
|
Set the working directory of the started processes to their session directory.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -wd \fR<dir>\fP
|
||
|
Synonym for \fI-wdir\fP.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -wdir \fR<dir>\fP
|
||
|
Change to the directory <dir> before the user's program executes.
|
||
|
See the "Current Working Directory" section for notes on relative paths.
|
||
|
.B Note:
|
||
|
If the \fI-wdir\fP option appears both on the command line and in an
|
||
|
application context, the context will take precedence over the command
|
||
|
line. Thus, if the path to the desired wdir is different
|
||
|
on the backend nodes, then it must be specified as an absolute path that
|
||
|
is correct for the backend node.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -x \fR<env>\fP
|
||
|
Export the specified environment variables to the remote nodes before
|
||
|
executing the program. Only one environment variable can be specified
|
||
|
per \fI-x\fP option. Existing environment variables can be specified
|
||
|
or new variable names specified with corresponding values. For
|
||
|
example:
|
||
|
\fB%\fP prun -x DISPLAY -x OFILE=/tmp/out ...
|
||
|
|
||
|
The parser for the \fI-x\fP option is not very sophisticated; it does
|
||
|
not even understand quoted values. Users are advised to set variables
|
||
|
in the environment, and then use \fI-x\fP to export (not define) them.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
Setting MCA parameters:
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -gpmca\fR,\fP --gpmca \fR<key> <value>\fP
|
||
|
Pass global MCA parameters that are applicable to all contexts. \fI<key>\fP is
|
||
|
the parameter name; \fI<value>\fP is the parameter value.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -pmca\fR,\fP --pmca \fR<key> <value>\fP
|
||
|
Send arguments to various MCA modules. See the "MCA" section, below.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -am \fR<arg0>\fP
|
||
|
Aggregate MCA parameter set file list.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -tune\fR,\fP --tune \fR<tune_file>\fP
|
||
|
Specify a tune file to set arguments for various MCA modules and environment variables.
|
||
|
See the "Setting MCA parameters and environment variables from file" section, below.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
For debugging:
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -debug\fR,\fP --debug
|
||
|
Invoke the user-level debugger indicated by the \fIorte_base_user_debugger\fP
|
||
|
MCA parameter.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B --get-stack-traces
|
||
|
When paired with the
|
||
|
.B --timeout
|
||
|
option,
|
||
|
.I prun
|
||
|
will obtain and print out stack traces from all launched processes
|
||
|
that are still alive when the timeout expires. Note that obtaining
|
||
|
stack traces can take a little time and produce a lot of output,
|
||
|
especially for large process-count jobs.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -debugger\fR,\fP --debugger \fR<args>\fP
|
||
|
Sequence of debuggers to search for when \fI--debug\fP is used (i.e.
|
||
|
a synonym for \fIorte_base_user_debugger\fP MCA parameter).
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B --timeout \fR<seconds>
|
||
|
The maximum number of seconds that
|
||
|
.I prun
|
||
|
will run. After this many seconds,
|
||
|
.I prun
|
||
|
will abort the launched job and exit with a non-zero exit status.
|
||
|
Using
|
||
|
.B --timeout
|
||
|
can be also useful when combined with the
|
||
|
.B --get-stack-traces
|
||
|
option.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -tv\fR,\fP --tv
|
||
|
Launch processes under the TotalView debugger.
|
||
|
Deprecated backwards compatibility flag. Synonym for \fI--debug\fP.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
There are also other options:
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B --allow-run-as-root
|
||
|
Allow
|
||
|
.I prun
|
||
|
to run when executed by the root user
|
||
|
.RI ( prun
|
||
|
defaults to aborting when launched as the root user).
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B --app \fR<appfile>\fP
|
||
|
Provide an appfile, ignoring all other command line options.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -cf\fR,\fP --cartofile \fR<cartofile>\fP
|
||
|
Provide a cartography file.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -continuous\fR,\fP --continuous
|
||
|
Job is to run until explicitly terminated.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -disable-recovery\fR,\fP --disable-recovery
|
||
|
Disable recovery (resets all recovery options to off).
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -do-not-launch\fR,\fP --do-not-launch
|
||
|
Perform all necessary operations to prepare to launch the application, but do not actually launch it.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -do-not-resolve\fR,\fP --do-not-resolve
|
||
|
Do not attempt to resolve interfaces.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -enable-recovery\fR,\fP --enable-recovery
|
||
|
Enable recovery from process failure [Default = disabled].
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -index-argv-by-rank\fR,\fP --index-argv-by-rank
|
||
|
Uniquely index argv[0] for each process using its rank.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -max-restarts\fR,\fP --max-restarts \fR<num>\fP
|
||
|
Max number of times to restart a failed process.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B --ppr \fR<list>\fP
|
||
|
Comma-separated list of number of processes on a given resource type [default: none].
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -report-child-jobs-separately\fR,\fP --report-child-jobs-separately
|
||
|
Return the exit status of the primary job only.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -report-events\fR,\fP --report-events \fR<URI>\fP
|
||
|
Report events to a tool listening at the specified URI.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -report-pid\fR,\fP --report-pid \fR<channel>\fP
|
||
|
Print out prun's PID during startup. The channel must be either a '-' to indicate
|
||
|
that the pid is to be output to stdout, a '+' to indicate that the pid is to be
|
||
|
output to stderr, or a filename to which the pid is to be written.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -report-uri\fR,\fP --report-uri \fR<channel>\fP
|
||
|
Print out prun's URI during startup. The channel must be either a '-' to indicate
|
||
|
that the URI is to be output to stdout, a '+' to indicate that the URI is to be
|
||
|
output to stderr, or a filename to which the URI is to be written.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -show-progress\fR,\fP --show-progress
|
||
|
Output a brief periodic report on launch progress.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -terminate\fR,\fP --terminate
|
||
|
Terminate the DVM.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -use-hwthread-cpus\fR,\fP --use-hwthread-cpus
|
||
|
Use hardware threads as independent cpus.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -use-regexp\fR,\fP --use-regexp
|
||
|
Use regular expressions for launch.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
The following options are useful for developers; they are not generally
|
||
|
useful to most users:
|
||
|
.
|
||
|
.TP
|
||
|
.B -d\fR,\fP --debug-devel
|
||
|
Enable debugging. This is not generally useful for most users.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -display-devel-allocation\fR,\fP --display-devel-allocation
|
||
|
Display a detailed list of the allocation being used by this job.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -display-devel-map\fR,\fP --display-devel-map
|
||
|
Display a more detailed table showing the mapped location of each process prior to launch.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -display-diffable-map\fR,\fP --display-diffable-map
|
||
|
Display a diffable process map just before launch.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B -display-topo\fR,\fP --display-topo
|
||
|
Display the topology as part of the process map just before launch.
|
||
|
.
|
||
|
.
|
||
|
.TP
|
||
|
.B --report-state-on-timeout
|
||
|
When paired with the
|
||
|
.B --timeout
|
||
|
command line option, report the run-time subsystem state of each
|
||
|
process when the timeout expires.
|
||
|
.
|
||
|
.
|
||
|
.P
|
||
|
There may be other options listed with \fIprun --help\fP.
|
||
|
.
|
||
|
.
|
||
|
.\" **************************
|
||
|
.\" Description Section
|
||
|
.\" **************************
|
||
|
.SH DESCRIPTION
|
||
|
.
|
||
|
One invocation of \fIprun\fP starts an application running under PSRVR. If the application is single process multiple data (SPMD), the application
|
||
|
can be specified on the \fIprun\fP command line.
|
||
|
|
||
|
If the application is multiple instruction multiple data (MIMD), comprising of
|
||
|
multiple programs, the set of programs and argument can be specified in one of
|
||
|
two ways: Extended Command Line Arguments, and Application Context.
|
||
|
.PP
|
||
|
An application context describes the MIMD program set including all arguments
|
||
|
in a separate file.
|
||
|
.\" See appcontext(5) for a description of the application context syntax.
|
||
|
This file essentially contains multiple \fIprun\fP command lines, less the
|
||
|
command name itself. The ability to specify different options for different
|
||
|
instantiations of a program is another reason to use an application context.
|
||
|
.PP
|
||
|
Extended command line arguments allow for the description of the application
|
||
|
layout on the command line using colons (\fI:\fP) to separate the specification
|
||
|
of programs and arguments. Some options are globally set across all specified
|
||
|
programs (e.g. --hostfile), while others are specific to a single program
|
||
|
(e.g. -np).
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.SS Specifying Host Nodes
|
||
|
.
|
||
|
Host nodes can be identified on the \fIprun\fP command line with the \fI-host\fP
|
||
|
option or in a hostfile.
|
||
|
.
|
||
|
.PP
|
||
|
For example,
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -H aa,aa,bb ./a.out
|
||
|
launches two processes on node aa and one on bb.
|
||
|
.
|
||
|
.PP
|
||
|
Or, consider the hostfile
|
||
|
.
|
||
|
|
||
|
\fB%\fP cat myhostfile
|
||
|
aa slots=2
|
||
|
bb slots=2
|
||
|
cc slots=2
|
||
|
|
||
|
.
|
||
|
.PP
|
||
|
Here, we list both the host names (aa, bb, and cc) but also how many "slots"
|
||
|
there are for each. Slots indicate how many processes can potentially execute
|
||
|
on a node. For best performance, the number of slots may be chosen to be the
|
||
|
number of cores on the node or the number of processor sockets. If the hostfile
|
||
|
does not provide slots information, PSRVR will attempt to discover the number
|
||
|
of cores (or hwthreads, if the use-hwthreads-as-cpus option is set) and set the
|
||
|
number of slots to that value. This default behavior also occurs when specifying
|
||
|
the \fI-host\fP option with a single hostname. Thus, the command
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -H aa ./a.out
|
||
|
launches a number of processes equal to the number of cores on node aa.
|
||
|
.
|
||
|
.PP
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -hostfile myhostfile ./a.out
|
||
|
will launch two processes on each of the three nodes.
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -hostfile myhostfile -host aa ./a.out
|
||
|
will launch two processes, both on node aa.
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -hostfile myhostfile -host dd ./a.out
|
||
|
will find no hosts to run on and abort with an error.
|
||
|
That is, the specified host dd is not in the specified hostfile.
|
||
|
.
|
||
|
.PP
|
||
|
When running under resource managers (e.g., SLURM, Torque, etc.),
|
||
|
PSRVR will obtain both the hostnames and the number of slots directly
|
||
|
from the resource manger.
|
||
|
.
|
||
|
.SS Specifying Number of Processes
|
||
|
.
|
||
|
As we have just seen, the number of processes to run can be set using the
|
||
|
hostfile. Other mechanisms exist.
|
||
|
.
|
||
|
.PP
|
||
|
The number of processes launched can be specified as a multiple of the
|
||
|
number of nodes or processor sockets available. For example,
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -H aa,bb -npersocket 2 ./a.out
|
||
|
launches processes 0-3 on node aa and process 4-7 on node bb,
|
||
|
where aa and bb are both dual-socket nodes.
|
||
|
The \fI-npersocket\fP option also turns on the \fI-bind-to-socket\fP option,
|
||
|
which is discussed in a later section.
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -H aa,bb -npernode 2 ./a.out
|
||
|
launches processes 0-1 on node aa and processes 2-3 on node bb.
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -H aa,bb -npernode 1 ./a.out
|
||
|
launches one process per host node.
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -H aa,bb -pernode ./a.out
|
||
|
is the same as \fI-npernode\fP 1.
|
||
|
.
|
||
|
.
|
||
|
.PP
|
||
|
Another alternative is to specify the number of processes with the
|
||
|
\fI-np\fP option. Consider now the hostfile
|
||
|
.
|
||
|
|
||
|
\fB%\fP cat myhostfile
|
||
|
aa slots=4
|
||
|
bb slots=4
|
||
|
cc slots=4
|
||
|
|
||
|
.
|
||
|
.PP
|
||
|
Now,
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -hostfile myhostfile -np 6 ./a.out
|
||
|
will launch processes 0-3 on node aa and processes 4-5 on node bb. The remaining
|
||
|
slots in the hostfile will not be used since the \fI-np\fP option indicated
|
||
|
that only 6 processes should be launched.
|
||
|
.
|
||
|
.SS Mapping Processes to Nodes: Using Policies
|
||
|
.
|
||
|
The examples above illustrate the default mapping of process processes
|
||
|
to nodes. This mapping can also be controlled with various
|
||
|
\fIprun\fP options that describe mapping policies.
|
||
|
.
|
||
|
.
|
||
|
.PP
|
||
|
Consider the same hostfile as above, again with \fI-np\fP 6:
|
||
|
.
|
||
|
|
||
|
node aa node bb node cc
|
||
|
|
||
|
prun 0 1 2 3 4 5
|
||
|
|
||
|
prun --map-by node 0 3 1 4 2 5
|
||
|
|
||
|
prun -nolocal 0 1 2 3 4 5
|
||
|
.
|
||
|
.PP
|
||
|
The \fI--map-by node\fP option will load balance the processes across
|
||
|
the available nodes, numbering each process in a round-robin fashion.
|
||
|
.
|
||
|
.PP
|
||
|
The \fI-nolocal\fP option prevents any processes from being mapped onto the
|
||
|
local host (in this case node aa). While \fIprun\fP typically consumes
|
||
|
few system resources, \fI-nolocal\fP can be helpful for launching very
|
||
|
large jobs where \fIprun\fP may actually need to use noticeable amounts
|
||
|
of memory and/or processing time.
|
||
|
.
|
||
|
.PP
|
||
|
Just as \fI-np\fP can specify fewer processes than there are slots, it can
|
||
|
also oversubscribe the slots. For example, with the same hostfile:
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -hostfile myhostfile -np 14 ./a.out
|
||
|
will launch processes 0-3 on node aa, 4-7 on bb, and 8-11 on cc. It will
|
||
|
then add the remaining two processes to whichever nodes it chooses.
|
||
|
.
|
||
|
.PP
|
||
|
One can also specify limits to oversubscription. For example, with the same
|
||
|
hostfile:
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -hostfile myhostfile -np 14 -nooversubscribe ./a.out
|
||
|
will produce an error since \fI-nooversubscribe\fP prevents oversubscription.
|
||
|
.
|
||
|
.PP
|
||
|
Limits to oversubscription can also be specified in the hostfile itself:
|
||
|
.
|
||
|
% cat myhostfile
|
||
|
aa slots=4 max_slots=4
|
||
|
bb max_slots=4
|
||
|
cc slots=4
|
||
|
.
|
||
|
.PP
|
||
|
The \fImax_slots\fP field specifies such a limit. When it does, the
|
||
|
\fIslots\fP value defaults to the limit. Now:
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -hostfile myhostfile -np 14 ./a.out
|
||
|
causes the first 12 processes to be launched as before, but the remaining
|
||
|
two processes will be forced onto node cc. The other two nodes are
|
||
|
protected by the hostfile against oversubscription by this job.
|
||
|
.
|
||
|
.PP
|
||
|
Using the \fI--nooversubscribe\fR option can be helpful since PSRVR
|
||
|
currently does not get "max_slots" values from the resource manager.
|
||
|
.
|
||
|
.PP
|
||
|
Of course, \fI-np\fP can also be used with the \fI-H\fP or \fI-host\fP
|
||
|
option. For example,
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -H aa,bb -np 8 ./a.out
|
||
|
launches 8 processes. Since only two hosts are specified, after the first
|
||
|
two processes are mapped, one to aa and one to bb, the remaining processes
|
||
|
oversubscribe the specified hosts.
|
||
|
.
|
||
|
.PP
|
||
|
And here is a MIMD example:
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -H aa -np 1 hostname : -H bb,cc -np 2 uptime
|
||
|
will launch process 0 running \fIhostname\fP on node aa and processes 1 and 2
|
||
|
each running \fIuptime\fP on nodes bb and cc, respectively.
|
||
|
.
|
||
|
.SS Mapping, Ranking, and Binding: Oh My!
|
||
|
.
|
||
|
PSRVR employs a three-phase procedure for assigning process locations and
|
||
|
ranks:
|
||
|
.
|
||
|
.TP 10
|
||
|
\fBmapping\fP
|
||
|
Assigns a default location to each process
|
||
|
.
|
||
|
.TP 10
|
||
|
\fBranking\fP
|
||
|
Assigns a rank value to each process
|
||
|
.
|
||
|
.TP 10
|
||
|
\fBbinding\fP
|
||
|
Constrains each process to run on specific processors
|
||
|
.
|
||
|
.PP
|
||
|
The \fImapping\fP step is used to assign a default location to each process
|
||
|
based on the mapper being employed. Mapping by slot, node, and sequentially results
|
||
|
in the assignment of the processes to the node level. In contrast, mapping by object, allows
|
||
|
the mapper to assign the process to an actual object on each node.
|
||
|
.
|
||
|
.PP
|
||
|
\fBNote:\fP the location assigned to the process is independent of where it will be bound - the
|
||
|
assignment is used solely as input to the binding algorithm.
|
||
|
.
|
||
|
.PP
|
||
|
The mapping of process processes to nodes can be defined not just
|
||
|
with general policies but also, if necessary, using arbitrary mappings
|
||
|
that cannot be described by a simple policy. One can use the "sequential
|
||
|
mapper," which reads the hostfile line by line, assigning processes
|
||
|
to nodes in whatever order the hostfile specifies. Use the
|
||
|
\fI-pmca rmaps seq\fP option. For example, using the same hostfile
|
||
|
as before:
|
||
|
.
|
||
|
.PP
|
||
|
prun -hostfile myhostfile -pmca rmaps seq ./a.out
|
||
|
.
|
||
|
.PP
|
||
|
will launch three processes, one on each of nodes aa, bb, and cc, respectively.
|
||
|
The slot counts don't matter; one process is launched per line on
|
||
|
whatever node is listed on the line.
|
||
|
.
|
||
|
.PP
|
||
|
Another way to specify arbitrary mappings is with a rankfile, which
|
||
|
gives you detailed control over process binding as well. Rankfiles
|
||
|
are discussed below.
|
||
|
.
|
||
|
.PP
|
||
|
The second phase focuses on the \fIranking\fP of the process within
|
||
|
the job. PSRVR
|
||
|
separates this from the mapping procedure to allow more flexibility in the
|
||
|
relative placement of processes. This is best illustrated by considering the
|
||
|
following two cases where we used the —map-by ppr:2:socket option:
|
||
|
.
|
||
|
.PP
|
||
|
node aa node bb
|
||
|
|
||
|
rank-by core 0 1 ! 2 3 4 5 ! 6 7
|
||
|
|
||
|
rank-by socket 0 2 ! 1 3 4 6 ! 5 7
|
||
|
|
||
|
rank-by socket:span 0 4 ! 1 5 2 6 ! 3 7
|
||
|
.
|
||
|
.PP
|
||
|
Ranking by core and by slot provide the identical result - a simple
|
||
|
progression of ranks across each node. Ranking by
|
||
|
socket does a round-robin ranking within each node until all processes
|
||
|
have been assigned a rank, and then progresses to the next
|
||
|
node. Adding the \fIspan\fP modifier to the ranking directive causes
|
||
|
the ranking algorithm to treat the entire allocation as a single
|
||
|
entity - thus, the MCW ranks are assigned across all sockets before
|
||
|
circling back around to the beginning.
|
||
|
.
|
||
|
.PP
|
||
|
The \fIbinding\fP phase actually binds each process to a given set of processors. This can
|
||
|
improve performance if the operating system is placing processes
|
||
|
suboptimally. For example, it might oversubscribe some multi-core
|
||
|
processor sockets, leaving other sockets idle; this can lead
|
||
|
processes to contend unnecessarily for common resources. Or, it
|
||
|
might spread processes out too widely; this can be suboptimal if
|
||
|
application performance is sensitive to interprocess communication
|
||
|
costs. Binding can also keep the operating system from migrating
|
||
|
processes excessively, regardless of how optimally those processes
|
||
|
were placed to begin with.
|
||
|
.
|
||
|
.PP
|
||
|
The processors to be used for binding can be identified in terms of
|
||
|
topological groupings - e.g., binding to an l3cache will bind each
|
||
|
process to all processors within the scope of a single L3 cache within
|
||
|
their assigned location. Thus, if a process is assigned by the mapper
|
||
|
to a certain socket, then a \fI—bind-to l3cache\fP directive will
|
||
|
cause the process to be bound to the processors that share a single L3
|
||
|
cache within that socket.
|
||
|
.
|
||
|
.PP
|
||
|
To help balance loads, the binding directive uses a round-robin method when binding to
|
||
|
levels lower than used in the mapper. For example, consider the case where a job is
|
||
|
mapped to the socket level, and then bound to core. Each socket will have multiple cores,
|
||
|
so if multiple processes are mapped to a given socket, the binding algorithm will assign
|
||
|
each process located to a socket to a unique core in a round-robin manner.
|
||
|
.
|
||
|
.PP
|
||
|
Alternatively, processes mapped by l2cache and then bound to socket will simply be bound
|
||
|
to all the processors in the socket where they are located. In this manner, users can
|
||
|
exert detailed control over relative MCW rank location and binding.
|
||
|
.
|
||
|
.PP
|
||
|
Finally, \fI--report-bindings\fP can be used to report bindings.
|
||
|
.
|
||
|
.PP
|
||
|
As an example, consider a node with two processor sockets, each comprising
|
||
|
four cores. We run \fIprun\fP with \fI-np 4 --report-bindings\fP and
|
||
|
the following additional options:
|
||
|
.
|
||
|
|
||
|
% prun ... --map-by core --bind-to core
|
||
|
[...] ... binding child [...,0] to cpus 0001
|
||
|
[...] ... binding child [...,1] to cpus 0002
|
||
|
[...] ... binding child [...,2] to cpus 0004
|
||
|
[...] ... binding child [...,3] to cpus 0008
|
||
|
|
||
|
% prun ... --map-by socket --bind-to socket
|
||
|
[...] ... binding child [...,0] to socket 0 cpus 000f
|
||
|
[...] ... binding child [...,1] to socket 1 cpus 00f0
|
||
|
[...] ... binding child [...,2] to socket 0 cpus 000f
|
||
|
[...] ... binding child [...,3] to socket 1 cpus 00f0
|
||
|
|
||
|
% prun ... --map-by core:PE=2 --bind-to core
|
||
|
[...] ... binding child [...,0] to cpus 0003
|
||
|
[...] ... binding child [...,1] to cpus 000c
|
||
|
[...] ... binding child [...,2] to cpus 0030
|
||
|
[...] ... binding child [...,3] to cpus 00c0
|
||
|
|
||
|
% prun ... --bind-to none
|
||
|
.
|
||
|
.PP
|
||
|
Here, \fI--report-bindings\fP shows the binding of each process as a mask.
|
||
|
In the first case, the processes bind to successive cores as indicated by
|
||
|
the masks 0001, 0002, 0004, and 0008. In the second case, processes bind
|
||
|
to all cores on successive sockets as indicated by the masks 000f and 00f0.
|
||
|
The processes cycle through the processor sockets in a round-robin fashion
|
||
|
as many times as are needed. In the third case, the masks show us that
|
||
|
2 cores have been bound per process. In the fourth case, binding is
|
||
|
turned off and no bindings are reported.
|
||
|
.
|
||
|
.PP
|
||
|
PSRVR's support for process binding depends on the underlying
|
||
|
operating system. Therefore, certain process binding options may not be available
|
||
|
on every system.
|
||
|
.
|
||
|
.PP
|
||
|
Process binding can also be set with MCA parameters.
|
||
|
Their usage is less convenient than that of \fIprun\fP options.
|
||
|
On the other hand, MCA parameters can be set not only on the \fIprun\fP
|
||
|
command line, but alternatively in a system or user mca-params.conf file
|
||
|
or as environment variables, as described in the MCA section below.
|
||
|
Some examples include:
|
||
|
.
|
||
|
.PP
|
||
|
prun option MCA parameter key value
|
||
|
|
||
|
--map-by core rmaps_base_mapping_policy core
|
||
|
--map-by socket rmaps_base_mapping_policy socket
|
||
|
--rank-by core rmaps_base_ranking_policy core
|
||
|
--bind-to core hwloc_base_binding_policy core
|
||
|
--bind-to socket hwloc_base_binding_policy socket
|
||
|
--bind-to none hwloc_base_binding_policy none
|
||
|
.
|
||
|
.
|
||
|
.SS Rankfiles
|
||
|
.
|
||
|
Rankfiles are text files that specify detailed information about how
|
||
|
individual processes should be mapped to nodes, and to which
|
||
|
processor(s) they should be bound. Each line of a rankfile specifies
|
||
|
the location of one process. The general form of each line in the
|
||
|
rankfile is:
|
||
|
.
|
||
|
|
||
|
rank <N>=<hostname> slot=<slot list>
|
||
|
.
|
||
|
.PP
|
||
|
For example:
|
||
|
.
|
||
|
|
||
|
$ cat myrankfile
|
||
|
rank 0=aa slot=1:0-2
|
||
|
rank 1=bb slot=0:0,1
|
||
|
rank 2=cc slot=1-2
|
||
|
$ prun -H aa,bb,cc,dd -rf myrankfile ./a.out
|
||
|
.
|
||
|
.PP
|
||
|
Means that
|
||
|
.
|
||
|
|
||
|
Rank 0 runs on node aa, bound to logical socket 1, cores 0-2.
|
||
|
Rank 1 runs on node bb, bound to logical socket 0, cores 0 and 1.
|
||
|
Rank 2 runs on node cc, bound to logical cores 1 and 2.
|
||
|
.
|
||
|
.PP
|
||
|
Rankfiles can alternatively be used to specify \fIphysical\fP processor
|
||
|
locations. In this case, the syntax is somewhat different. Sockets are
|
||
|
no longer recognized, and the slot number given must be the number of
|
||
|
the physical PU as most OS's do not assign a unique physical identifier
|
||
|
to each core in the node. Thus, a proper physical rankfile looks something
|
||
|
like the following:
|
||
|
.
|
||
|
|
||
|
$ cat myphysicalrankfile
|
||
|
rank 0=aa slot=1
|
||
|
rank 1=bb slot=8
|
||
|
rank 2=cc slot=6
|
||
|
.
|
||
|
.PP
|
||
|
This means that
|
||
|
.
|
||
|
|
||
|
Rank 0 will run on node aa, bound to the core that contains physical PU 1
|
||
|
Rank 1 will run on node bb, bound to the core that contains physical PU 8
|
||
|
Rank 2 will run on node cc, bound to the core that contains physical PU 6
|
||
|
.
|
||
|
.PP
|
||
|
Rankfiles are treated as \fIlogical\fP by default, and the MCA parameter
|
||
|
rmaps_rank_file_physical must be set to 1 to indicate that the rankfile
|
||
|
is to be considered as \fIphysical\fP.
|
||
|
.
|
||
|
.PP
|
||
|
The hostnames listed above are "absolute," meaning that actual
|
||
|
resolveable hostnames are specified. However, hostnames can also be
|
||
|
specified as "relative," meaning that they are specified in relation
|
||
|
to an externally-specified list of hostnames (e.g., by prun's --host
|
||
|
argument, a hostfile, or a job scheduler).
|
||
|
.
|
||
|
.PP
|
||
|
The "relative" specification is of the form "+n<X>", where X is an
|
||
|
integer specifying the Xth hostname in the set of all available
|
||
|
hostnames, indexed from 0. For example:
|
||
|
.
|
||
|
|
||
|
$ cat myrankfile
|
||
|
rank 0=+n0 slot=1:0-2
|
||
|
rank 1=+n1 slot=0:0,1
|
||
|
rank 2=+n2 slot=1-2
|
||
|
$ prun -H aa,bb,cc,dd -rf myrankfile ./a.out
|
||
|
.
|
||
|
.PP
|
||
|
All socket/core slot locations are be
|
||
|
specified as
|
||
|
.I logical
|
||
|
indexes. You can use tools such as HWLOC's "lstopo" to find the
|
||
|
logical indexes of socket and cores.
|
||
|
.
|
||
|
.
|
||
|
.SS Application Context or Executable Program?
|
||
|
.
|
||
|
To distinguish the two different forms, \fIprun\fP
|
||
|
looks on the command line for \fI--app\fP option. If
|
||
|
it is specified, then the file named on the command line is
|
||
|
assumed to be an application context. If it is not
|
||
|
specified, then the file is assumed to be an executable program.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.SS Locating Files
|
||
|
.
|
||
|
If no relative or absolute path is specified for a file, prun will first look for files by searching the directories specified
|
||
|
by the \fI--path\fP option. If there is no \fI--path\fP option set or
|
||
|
if the file is not found at the \fI--path\fP location, then prun
|
||
|
will search the user's PATH environment variable as defined on the
|
||
|
source node(s).
|
||
|
.PP
|
||
|
If a relative directory is specified, it must be relative to the initial
|
||
|
working directory determined by the specific starter used. For example when
|
||
|
using the rsh or ssh starters, the initial directory is $HOME by default. Other
|
||
|
starters may set the initial directory to the current working directory from
|
||
|
the invocation of \fIprun\fP.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.SS Current Working Directory
|
||
|
.
|
||
|
The \fI\-wdir\fP prun option (and its synonym, \fI\-wd\fP) allows
|
||
|
the user to change to an arbitrary directory before the program is
|
||
|
invoked. It can also be used in application context files to specify
|
||
|
working directories on specific nodes and/or for specific
|
||
|
applications.
|
||
|
.PP
|
||
|
If the \fI\-wdir\fP option appears both in a context file and on the
|
||
|
command line, the context file directory will override the command
|
||
|
line value.
|
||
|
.PP
|
||
|
If the \fI-wdir\fP option is specified, prun will attempt to
|
||
|
change to the specified directory on all of the remote nodes. If this
|
||
|
fails, \fIprun\fP will abort.
|
||
|
.PP
|
||
|
If the \fI-wdir\fP option is \fBnot\fP specified, prun will send
|
||
|
the directory name where \fIprun\fP was invoked to each of the
|
||
|
remote nodes. The remote nodes will try to change to that
|
||
|
directory. If they are unable (e.g., if the directory does not exist on
|
||
|
that node), then prun will use the default directory determined by
|
||
|
the starter.
|
||
|
.PP
|
||
|
All directory changing occurs before the user's program is invoked.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.SS Standard I/O
|
||
|
.
|
||
|
PSRVR directs UNIX standard input to /dev/null on all processes
|
||
|
except the rank 0 process. The rank 0 process
|
||
|
inherits standard input from \fIprun\fP.
|
||
|
.B Note:
|
||
|
The node that invoked \fIprun\fP need not be the same as the node where the
|
||
|
rank 0 process resides. PSRVR handles the redirection of
|
||
|
\fIprun\fP's standard input to the rank 0 process.
|
||
|
.PP
|
||
|
PSRVR directs UNIX standard output and error from remote nodes to the node
|
||
|
that invoked \fIprun\fP and prints it on the standard output/error of
|
||
|
\fIprun\fP.
|
||
|
Local processes inherit the standard output/error of \fIprun\fP and transfer
|
||
|
to it directly.
|
||
|
.PP
|
||
|
Thus it is possible to redirect standard I/O for applications by
|
||
|
using the typical shell redirection procedure on \fIprun\fP.
|
||
|
|
||
|
\fB%\fP prun -np 2 my_app < my_input > my_output
|
||
|
|
||
|
Note that in this example \fIonly\fP the rank 0 process will
|
||
|
receive the stream from \fImy_input\fP on stdin. The stdin on all the other
|
||
|
nodes will be tied to /dev/null. However, the stdout from all nodes will
|
||
|
be collected into the \fImy_output\fP file.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.SS Signal Propagation
|
||
|
.
|
||
|
When prun receives a SIGTERM and SIGINT, it will attempt to kill
|
||
|
the entire job by sending all processes in the job a SIGTERM, waiting
|
||
|
a small number of seconds, then sending all processes in the job a
|
||
|
SIGKILL.
|
||
|
.
|
||
|
.PP
|
||
|
SIGUSR1 and SIGUSR2 signals received by prun are propagated to
|
||
|
all processes in the job.
|
||
|
.
|
||
|
.PP
|
||
|
A SIGTSTOP signal to prun will cause a SIGSTOP signal to be sent
|
||
|
to all of the programs started by prun and likewise a SIGCONT signal
|
||
|
to prun will cause a SIGCONT sent.
|
||
|
.
|
||
|
.PP
|
||
|
Other signals are not currently propagated
|
||
|
by prun.
|
||
|
.
|
||
|
.
|
||
|
.SS Process Termination / Signal Handling
|
||
|
.
|
||
|
During the run of an application, if any process dies abnormally
|
||
|
(either exiting before invoking \fIPMIx_Finalize\fP, or dying as the result of a
|
||
|
signal), \fIprun\fP will print out an error message and kill the rest of the
|
||
|
application.
|
||
|
.PP
|
||
|
.
|
||
|
.
|
||
|
.SS Process Environment
|
||
|
.
|
||
|
Processes in the application inherit their environment from the
|
||
|
PSRVR daemon upon the node on which they are running. The
|
||
|
environment is typically inherited from the user's shell. On remote
|
||
|
nodes, the exact environment is determined by the boot MCA module
|
||
|
used. The \fIrsh\fR launch module, for example, uses either
|
||
|
\fIrsh\fR/\fIssh\fR to launch the PSRVR daemon on remote nodes, and
|
||
|
typically executes one or more of the user's shell-setup files before
|
||
|
launching the daemon. When running dynamically linked
|
||
|
applications which require the \fILD_LIBRARY_PATH\fR environment
|
||
|
variable to be set, care must be taken to ensure that it is correctly
|
||
|
set when booting PSRVR.
|
||
|
.PP
|
||
|
See the "Remote Execution" section for more details.
|
||
|
.
|
||
|
.
|
||
|
.SS Remote Execution
|
||
|
.
|
||
|
PSRVR requires that the \fIPATH\fR environment variable be set to
|
||
|
find executables on remote nodes (this is typically only necessary in
|
||
|
\fIrsh\fR- or \fIssh\fR-based environments -- batch/scheduled
|
||
|
environments typically copy the current environment to the execution
|
||
|
of remote jobs, so if the current environment has \fIPATH\fR and/or
|
||
|
\fILD_LIBRARY_PATH\fR set properly, the remote nodes will also have it
|
||
|
set properly). If PSRVR was compiled with shared library support,
|
||
|
it may also be necessary to have the \fILD_LIBRARY_PATH\fR environment
|
||
|
variable set on remote nodes as well (especially to find the shared
|
||
|
libraries required to run user applications).
|
||
|
.PP
|
||
|
However, it is not always desirable or possible to edit shell
|
||
|
startup files to set \fIPATH\fR and/or \fILD_LIBRARY_PATH\fR. The
|
||
|
\fI--prefix\fR option is provided for some simple configurations where
|
||
|
this is not possible.
|
||
|
.PP
|
||
|
The \fI--prefix\fR option takes a single argument: the base directory
|
||
|
on the remote node where PSRVR is installed. PSRVR will use
|
||
|
this directory to set the remote \fIPATH\fR and \fILD_LIBRARY_PATH\fR
|
||
|
before executing any user applications. This allows
|
||
|
running jobs without having pre-configured the \fIPATH\fR and
|
||
|
\fILD_LIBRARY_PATH\fR on the remote nodes.
|
||
|
.PP
|
||
|
PSRVR adds the basename of the current
|
||
|
node's "bindir" (the directory where PSRVR's executables are
|
||
|
installed) to the prefix and uses that to set the \fIPATH\fR on the
|
||
|
remote node. Similarly, PSRVR adds the basename of the current
|
||
|
node's "libdir" (the directory where PSRVR's libraries are
|
||
|
installed) to the prefix and uses that to set the
|
||
|
\fILD_LIBRARY_PATH\fR on the remote node. For example:
|
||
|
.TP 15
|
||
|
Local bindir:
|
||
|
/local/node/directory/bin
|
||
|
.TP
|
||
|
Local libdir:
|
||
|
/local/node/directory/lib64
|
||
|
.PP
|
||
|
If the following command line is used:
|
||
|
|
||
|
\fB%\fP prun --prefix /remote/node/directory
|
||
|
|
||
|
PSRVR will add "/remote/node/directory/bin" to the \fIPATH\fR
|
||
|
and "/remote/node/directory/lib64" to the \fLD_LIBRARY_PATH\fR on the
|
||
|
remote node before attempting to execute anything.
|
||
|
.PP
|
||
|
The \fI--prefix\fR option is not sufficient if the installation paths
|
||
|
on the remote node are different than the local node (e.g., if "/lib"
|
||
|
is used on the local node, but "/lib64" is used on the remote node),
|
||
|
or if the installation paths are something other than a subdirectory
|
||
|
under a common prefix.
|
||
|
.PP
|
||
|
Note that executing \fIprun\fR via an absolute pathname is
|
||
|
equivalent to specifying \fI--prefix\fR without the last subdirectory
|
||
|
in the absolute pathname to \fIprun\fR. For example:
|
||
|
|
||
|
\fB%\fP /usr/local/bin/prun ...
|
||
|
|
||
|
is equivalent to
|
||
|
|
||
|
\fB%\fP prun --prefix /usr/local
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.SS Exported Environment Variables
|
||
|
.
|
||
|
All environment variables that are named in the form PMIX_* will automatically
|
||
|
be exported to new processes on the local and remote nodes. Environmental
|
||
|
parameters can also be set/forwarded to the new processes using the MCA
|
||
|
parameter \fImca_base_env_list\fP. While the syntax of the \fI\-x\fP option and MCA param
|
||
|
allows the definition of new variables, note that the parser
|
||
|
for these options are currently not very sophisticated - it does not even
|
||
|
understand quoted values. Users are advised to set variables in the
|
||
|
environment and use the option to export them; not to define them.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.SS Setting MCA Parameters
|
||
|
.
|
||
|
The \fI-pmca\fP switch allows the passing of parameters to various MCA
|
||
|
(Modular Component Architecture) modules.
|
||
|
.\" PSRVR's MCA modules are described in detail in psrvrmca(7).
|
||
|
MCA modules have direct impact on programs because they allow tunable
|
||
|
parameters to be set at run time (such as which BTL communication device driver
|
||
|
to use, what parameters to pass to that BTL, etc.).
|
||
|
.PP
|
||
|
The \fI-pmca\fP switch takes two arguments: \fI<key>\fP and \fI<value>\fP.
|
||
|
The \fI<key>\fP argument generally specifies which MCA module will receive the value.
|
||
|
For example, the \fI<key>\fP "btl" is used to select which BTL to be used for
|
||
|
transporting messages. The \fI<value>\fP argument is the value that is
|
||
|
passed.
|
||
|
For example:
|
||
|
.
|
||
|
.TP 4
|
||
|
prun -pmca btl tcp,self -np 1 foo
|
||
|
Tells PSRVR to use the "tcp" and "self" BTLs, and to run a single copy of
|
||
|
"foo" on an allocated node.
|
||
|
.
|
||
|
.TP
|
||
|
prun -pmca btl self -np 1 foo
|
||
|
Tells PSRVR to use the "self" BTL, and to run a single copy of "foo" on an
|
||
|
allocated node.
|
||
|
.\" And so on. PSRVR's BTL MCA modules are described in psrvrmca_btl(7).
|
||
|
.PP
|
||
|
The \fI-pmca\fP switch can be used multiple times to specify different
|
||
|
\fI<key>\fP and/or \fI<value>\fP arguments. If the same \fI<key>\fP is
|
||
|
specified more than once, the \fI<value>\fPs are concatenated with a comma
|
||
|
(",") separating them.
|
||
|
.PP
|
||
|
Note that the \fI-pmca\fP switch is simply a shortcut for setting environment variables.
|
||
|
The same effect may be accomplished by setting corresponding environment
|
||
|
variables before running \fIprun\fP.
|
||
|
The form of the environment variables that PSRVR sets is:
|
||
|
|
||
|
PMIX_MCA_<key>=<value>
|
||
|
.PP
|
||
|
Thus, the \fI-pmca\fP switch overrides any previously set environment
|
||
|
variables. The \fI-pmca\fP settings similarly override MCA parameters set
|
||
|
in the
|
||
|
$OPAL_PREFIX/etc/psrvr-mca-params.conf or $HOME/.psrvr/mca-params.conf
|
||
|
file.
|
||
|
.
|
||
|
.PP
|
||
|
Unknown \fI<key>\fP arguments are still set as
|
||
|
environment variable -- they are not checked (by \fIprun\fP) for correctness.
|
||
|
Illegal or incorrect \fI<value>\fP arguments may or may not be reported -- it
|
||
|
depends on the specific MCA module.
|
||
|
.PP
|
||
|
To find the available component types under the MCA architecture, or to find the
|
||
|
available parameters for a specific component, use the \fIpinfo\fP command.
|
||
|
See the \fIpinfo(1)\fP man page for detailed information on the command.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.SS Setting MCA parameters and environment variables from file.
|
||
|
The \fI-tune\fP command line option and its synonym \fI-pmca mca_base_envar_file_prefix\fP allows a user
|
||
|
to set mca parameters and environment variables with the syntax described below.
|
||
|
This option requires a single file or list of files separated by "," to follow.
|
||
|
.PP
|
||
|
A valid line in the file may contain zero or many "-x", "-pmca", or “--pmca” arguments.
|
||
|
The following patterns are supported: -pmca var val -pmca var "val" -x var=val -x var.
|
||
|
If any argument is duplicated in the file, the last value read will be used.
|
||
|
.PP
|
||
|
MCA parameters and environment specified on the command line have higher precedence than variables specified in the file.
|
||
|
.
|
||
|
.
|
||
|
.
|
||
|
.SS Running as root
|
||
|
.
|
||
|
The PSRVR team strongly advises against executing
|
||
|
.I prun
|
||
|
as the root user. Applications should be run as regular
|
||
|
(non-root) users.
|
||
|
.
|
||
|
.PP
|
||
|
Reflecting this advice, prun will refuse to run as root by default.
|
||
|
To override this default, you can add the
|
||
|
.I --allow-run-as-root
|
||
|
option to the
|
||
|
.I prun
|
||
|
command line.
|
||
|
.
|
||
|
.SS Exit status
|
||
|
.
|
||
|
There is no standard definition for what \fIprun\fP should return as an exit
|
||
|
status. After considerable discussion, we settled on the following method for
|
||
|
assigning the \fIprun\fP exit status (note: in the following description,
|
||
|
the "primary" job is the initial application started by prun - all jobs that
|
||
|
are spawned by that job are designated "secondary" jobs):
|
||
|
.
|
||
|
.IP \[bu] 2
|
||
|
if all processes in the primary job normally terminate with exit status 0, we return 0
|
||
|
.IP \[bu]
|
||
|
if one or more processes in the primary job normally terminate with non-zero exit status,
|
||
|
we return the exit status of the process with the lowest rank to have a non-zero status
|
||
|
.IP \[bu]
|
||
|
if all processes in the primary job normally terminate with exit status 0, and one or more
|
||
|
processes in a secondary job normally terminate with non-zero exit status, we (a) return
|
||
|
the exit status of the process with the lowest rank in the lowest jobid to have a non-zero
|
||
|
status, and (b) output a message summarizing the exit status of the primary and all secondary jobs.
|
||
|
.IP \[bu]
|
||
|
if the cmd line option --report-child-jobs-separately is set, we will return -only- the
|
||
|
exit status of the primary job. Any non-zero exit status in secondary jobs will be
|
||
|
reported solely in a summary print statement.
|
||
|
.
|
||
|
.PP
|
||
|
By default, PSRVR records and notes that processes exited with non-zero termination status.
|
||
|
This is generally not considered an "abnormal termination" - i.e., PSRVR will not abort a
|
||
|
job if one or more processes return a non-zero status. Instead, the default behavior simply
|
||
|
reports the number of processes terminating with non-zero status upon completion of the job.
|
||
|
.PP
|
||
|
However, in some cases it can be desirable to have the job abort when any process terminates
|
||
|
with non-zero status. For example, a non-PMIx job might detect a bad result from a calculation
|
||
|
and want to abort, but doesn't want to generate a core file. Or a PMIx job might continue past
|
||
|
a call to PMIx_Finalize, but indicate that all processes should abort due to some post-PMIx result.
|
||
|
.PP
|
||
|
It is not anticipated that this situation will occur frequently. However, in the interest of
|
||
|
serving the broader community, PSRVR now has a means for allowing users to direct that jobs be
|
||
|
aborted upon any process exiting with non-zero status. Setting the MCA parameter
|
||
|
"orte_abort_on_non_zero_status" to 1 will cause PSRVR to abort all processes once any process
|
||
|
exits with non-zero status.
|
||
|
.PP
|
||
|
Terminations caused in this manner will be reported on the console as an "abnormal termination",
|
||
|
with the first process to so exit identified along with its exit status.
|
||
|
.PP
|
||
|
.\" **************************
|
||
|
.\" Return Value Section
|
||
|
.\" **************************
|
||
|
.
|
||
|
.SH RETURN VALUE
|
||
|
.
|
||
|
\fIprun\fP returns 0 if all processes started by \fIprun\fP exit after calling
|
||
|
PMIx_Finalize. A non-zero value is returned if an internal error occurred in
|
||
|
prun, or one or more processes exited before calling PMIx_Finalize. If an
|
||
|
internal error occurred in prun, the corresponding error code is returned.
|
||
|
In the event that one or more processes exit before calling PMIx_Finalize, the
|
||
|
return value of the rank of the process that \fIprun\fP first notices died
|
||
|
before calling PMIx_Finalize will be returned. Note that, in general, this will
|
||
|
be the first process that died but is not guaranteed to be so.
|
||
|
.
|
||
|
.PP
|
||
|
If the
|
||
|
.B --timeout
|
||
|
command line option is used and the timeout expires before the job
|
||
|
completes (thereby forcing
|
||
|
.I prun
|
||
|
to kill the job)
|
||
|
.I prun
|
||
|
will return an exit status equivalent to the value of
|
||
|
.B ETIMEDOUT
|
||
|
(which is typically 110 on Linux and OS X systems).
|
||
|
|
||
|
.
|
||
|
.\" **************************
|
||
|
.\" See Also Section
|
||
|
.\" **************************
|
||
|
.
|