% -*- latex -*- % % Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana % University Research and Technology % Corporation. All rights reserved. % Copyright (c) 2004-2005 The University of Tennessee and The University % of Tennessee Research Foundation. All rights % reserved. % Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, % University of Stuttgart. All rights reserved. % Copyright (c) 2004-2005 The Regents of the University of California. % All rights reserved. % $COPYRIGHT$ % % Additional copyrights may follow % % $HEADER$ % \chapter{Available ORTE Components} \label{sec:mca-orte} There is currently only type of ORTE component that is visible to users: \kind{boot}, which is used to start the Open MPI run-time environment, most often through the \icmd{lamboot} command. The \cmd{lamboot} command itself is discussed in Section~\ref{sec:commands-lamboot} (page~\pageref{sec:commands-lamboot}); the discussion below focuses on the boot modules that make up the ``back end'' implementation of \cmd{lamboot}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Using the Open Run-Time Environment} \label{sec:mca-orte-pls} \index{ORTE MCA components|(} \index{MCA boot components|see {boot MCA components}} {\Huge JMS needs massive overhaul} Open MPI provides a number of modules for starting the \cmd{lamd} control daemons. In most cases, the \cmd{lamd}s are started using the \icmd{lamboot} command. In previous versions of Open MPI, \icmd{lamboot} could only use \icmd{rsh} or \icmd{ssh} for starting the Open MPI run-time environment on remote nodes. In Open MPI \ompiversion, it is possible to use a variety of mechanisms for this process startup. The following mechanisms are available in Open MPI \ompiversion: \begin{itemize} \item BProc \item Globus (beta-level support) \item \cmd{rsh} / \cmd{ssh} \item OpenPBS / PBS Pro / Torque (using the Task Management interface) \changebegin{7.1} \item SLURM (using its native interface) \changeend{7.1} \end{itemize} These mechanisms are discussed in detail below. Note that the sections below each assume that support for these modules have been compiled into Open MPI. The \icmd{laminfo} command can be used to determine exactly which modules are supported in your installation (see Section~\ref{sec:commands-laminfo}, page~\pageref{sec:commands-laminfo}). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Boot Schema Files (a.k.a., ``Hostfiles'' or ``Machinefiles'')} \label{sec:mca-orte-pls-schema} \index{boot schema} \index{hostfile|see {boot schema}} \index{machinefile|see {boot schema}} \cmdindex{lamboot}{boot schema file} Before discussing any of the specific boot MCA modules, this section discusses the boot schema file, commonly referred to as a ``hostfile'' or a ``machinefile''. Most (but not all) boot MCA modules require a boot schema, and the text below makes frequent mention of them. Hence, it is worth discussing them before getting into the details of each boot MCA. A boot schema is a text file that, in its simplest form, simply lists every host that the Open MPI run-time environment will be invoked on. For example: \lstset{style=lam-shell} \begin{lstlisting} # This is my boot schema inky.cluster.example.com pinky.cluster.example.com blinkly.cluster.example.com clyde.cluster.example.com \end{lstlisting} Lines beginning with ``{\tt \#}'' are treated as comments and are ignored. Each non-blank, non-comment line must, at a minimum, list a host. Specifically, the first token on each line must specify a host (although the definition of how that host is specified may vary differ between boot modules). However, each line can also specify arbitrary ``key=value'' pairs. A common global key is ``{\tt cpu}''. This key takes an integer value and indicates to Open MPI how many CPUs are available for Open MPI to use. If the key is not present, the value of 1 is assumed. This number does {\em not} need to reflect the physical number of CPUs -- it can be smaller then, equal to, or greater than the number of physical CPUs in the machine. It is solely used as a shorthand notation for \icmd{mpirun}'s ``C'' notation, meaning ``launch one process per CPU as specified in the boot schema file.'' For example, in the following boot schema: \lstset{style=lam-shell} \begin{lstlisting} inky.cluster.example.com cpu=2 pinky.cluster.example.com cpu=4 blinkly.cluster.example.com cpu=4 # clyde doesn't mention a cpu count, and is therefore implicitly 1 clyde.cluster.example.com \end{lstlisting} \noindent issuing the command ``{\tt mpirun C foo}'' would actually launch 11 copies of \cmd{foo}: 2 on \host{inky}, 4 on \host{pinky}, 4 on \host{blinky}, and 1 on \host{clyde}. Note that listing a host more than once has the same effect as incrementing the CPU count. The following boot schema has the same effect as the previous example (i.e., CPU counts of 2, 4, 4, and 1, respectively): \lstset{style=lam-shell} \begin{lstlisting} # inky has a CPU count of 2 inky.cluster.example.com inky.cluster.example.com # pinky has a CPU count of 4 pinky.cluster.example.com pinky.cluster.example.com pinky.cluster.example.com pinky.cluster.example.com # blinky has a CPU count of 4 blinkly.cluster.example.com blinkly.cluster.example.com blinkly.cluster.example.com blinkly.cluster.example.com # clyde only has 1 CPU clyde.cluster.example.com \end{lstlisting} Other keys are defined on a per-boot-MCA-module, and are described below. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Minimum Requirements} \label{sec:mca-orte-pls-min-reqs} In order to successfully launch a process on a remote node, several requirements must be met. Although each of the boot modules have different specific requirements, all of them share the following conditions for successful operation: \begin{enumerate} \item Each target host must be reachable and operational. \item The user must be able to execute arbitrary processes on the target. \item The Open MPI executables must be locatable on that machine. This typically involves using: the shell's search path, the \ienvvar{Open MPIHOME} environment variable, or a boot-module-specific mechanism. \item The user must be able to write to the Open MPI session directory (typically somewhere under \file{/tmp}; see Section~\ref{sec:misc-session-directory}, page~\pageref{sec:misc-session-directory}). \item All hosts must be able to resolve the fully-qualified domain name (FQDN) of all the machines being booted (including itself). \item Unless there is only one host being booted, any host resolving to the IP address 127.0.0.1 cannot be included in the list of hosts. \end{enumerate} If all of these conditions are not met, \cmd{lamboot} will fail. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Selecting a \kind{boot} Module} Only one \kind{boot} module will be selected; it will be used for the life of the Open MPI universe. As such, module priority values are the only factor used to determine which available module should be selected. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{\kind{boot} MCA Parameters} On many kinds of networks, Open MPI can know exactly which nodes should be making connections while booting the Open MPI run-time environment, and promiscuous connections (i.e., allowing any node to connect) are discouraged. However, this is not possible in some complex network configurations and promiscuous connections {\em must} be enabled. By default, Open MPI's base \kind{boot} MCA startup protocols disable promiscuous connections. However, this behavior can be overridden when Open MPI is configured and at run-time. If the MCA parameter \issiparam{boot\_\-base\_\-promisc} set to an empty value, or set to the integer value 1, promiscuous connections will be accepted when than Open MPI RTE is booted. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{The \boot{bproc} Module} \index{bproc boot MCA module@\boot{bproc} boot MCA module} \index{boot MCA modules!bproc@\boot{bproc}} The Beowulf Distributed Process Space (BProc) project\footnote{\url{http://bproc.sourceforge.net/}} is set of kernel modifications, utilities and libraries which allow a user to start processes on other machines in a Beowulf-style cluster. Remote processes started with this mechanism appear in the process table of the front-end machine in a cluster. Open MPI functionality has been tested with BProc version 3.2.5. Prior versions had a bug that affected at least some Open MPI functionality. It is strongly recommended to upgrade to at least version 3.2.5 before attempting to use the Open MPI native BProc capabilities. %%%%% \subsubsection{Minimum Requirements} Several of the minimum requirements listed in Section~\ref{sec:mca-orte-pls-min-reqs} will already be met in a BProc environment because BProc will copy \cmd{lamboot}'s entire environment (including the \envvar{PATH}) to the remote node. Hence, if \cmd{lamboot} is in the user's path on the local node, it will also [automatically] be in the user's path on the remote node. However, one of the minimum requirements conditions (``The user must be able to execute arbitrary processes on the target'') deserves a BProc-specific clarification. BProc has its own internal permission system for determining if users are allowed to execute on specific nodes. The system is similar to the user/group/other mechanism typically used in many Unix filesystems. Hence, in order for a user to successfully \cmd{lamboot} on a BProc cluster, he/she must have BProc execute permissions on each of the target nodes. Consult the BProc documentation for more details. %%%%% \subsubsection{Usage} In most situations, the \cmd{lamboot} command (and related commands) should automatically ``know'' to use the \boot{bproc} boot MCA module when running on the BProc head node; no additional command line parameters or environment variables should be required. % Specifically, when running in a BProc environment, the \boot{bproc} module will report that it is available, and artificially inflate its priority relatively high in order to influence the boot module selection process. % However, the BProc boot module can be forced by specifying the \issiparam{boot} MCA parameter with the value of \issivalue{boot}{bproc}. Running \cmd{lamboot} on a BProc cluster is just like running \cmd{lamboot} in a ``normal'' cluster. Specifically, you provide a boot schema file (i.e., a list of nodes to boot on) and run \cmd{lamboot} with it. For example: \lstset{style=lam-cmdline} \begin{lstlisting} shell$ lamboot hostfile \end{lstlisting} % stupid emacs mode: $ Note that when using the \boot{bproc} module, \cmd{lamboot} will only function properly from the head node. If you launch \cmd{lamboot} from a client node, it will likely either fail outright, or fall back to a different boot module (e.g., \cmd{rsh}/\cmd{ssh}). It is suggested that the \file{hostfile} file contain hostnames in the style that BProc prefers -- integer numbers. For example, \file{hostfile} may contain the following: \lstset{style=lam-shell} \begin{lstlisting} -1 0 1 2 3 \end{lstlisting} \noindent which boots on the BProc front end node (-1) and four slave nodes (0, 1, 2, 3). Note that using IP hostnames will also work, but using integer numbers is recommended. %%%%% \subsubsection{Tunable Parameters} Table~\ref{tbl:mca-orte-pls-bproc-mca-params} lists the MCA parameters that are available to the \boot{bproc} module. \begin{table}[htbp] \begin{ssiparamtb} % \ssiparamentry{boot\_\-bproc\_\-priority}{50}{Default priority level.} \end{ssiparamtb} \caption{MCA parameters for the \boot{bproc} boot module.} \label{tbl:mca-orte-pls-bproc-mca-params} \end{table} %%%%% \subsubsection{Special Notes} After booting, Open MPI will, by default, not schedule to run MPI jobs on the BProc front end. Specifically, Open MPI implicitly sets the ``no-schedule'' attribute on the -1 node in a BProc cluster. See Section~\ref{sec:commands-lamboot} (page~\pageref{sec:commands-lamboot}) for more detail about this attribute and boot schemas in general, and \ref{sec:commands-lamboot-no-schedule} (page \pageref{sec:commands-lamboot-no-schedule}). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{The \boot{globus} Module} \index{globus boot MCA module@\boot{globus} boot MCA module} \index{boot MCA modules!globus@\boot{globus}} Open MPI \ompiversion\ includes beta support for Globus. Specifically, only limited types of execution are possible. The Open MPI Team would appreciate feedback from the Globus community on expanding Globus support in Open MPI. %%%%% \subsubsection{Minimum Requirements} Open MPI jobs in Globus environment can only be started on nodes using the ``fork'' job manager for the Globus gatekeeper. Other job managers are not yet supported. %%%%% \subsubsection{Usage} Starting the Open MPI run-time environmetn in Globus environment makes use of the Globus Resource Allocation Manager (GRAM) client \icmd{globus-job-run}. % The Globus boot MCA module will never run automatically; it must always be specifically requested setting the \issiparam{boot} MCA parameter to \issivalue{boot}{globus}. Specifically, although the \boot{globus} module will report itself available if \icmd{globus-job-run} can be found in the \envvar{PATH}, the default priority will be quite low, effectively ensuring that it will not be selected unless it is the only module available (which will only occur if the \ssiparam{boot} parameter is set to \issivalue{boot}{globus}). Open MPI needs to be able to find the Globus executables. This can be accompilshed either by adding the appropriate directory to your path, or by setting the \ienvvar{GLOBUS\_\-LOCATION} environment variable. Additionally, the \ienvvar{Open MPI\_\-MPI\_\-SEMCAON\_\-SUFFIX} environment variable should be set to a unique value. This ensures that this instance of the Open MPI universe does not conflict with any other, concurrent Open MPI universes that are running under the same username on nodes in the Globus environment. Although any value can be used for this variable, it is probably best to have some kind of organized format, such as {\tt -}. Next, create a boot schema to use with \cmd{lamboot}. % Hosts are listed by their Globus contact strings (see the Globus manual for more information about contact strings). In cases where the Globus gatekeeper is running as a \cmd{inetd} service on the node, the contact string will simply be the hostname. If the contact string contains whitespace, the {\em entire} contact string must be enclosed in quotes (i.e., not just the values with whitespaces). % For example, if your contact string is: \centerline{\tt host1:port1:/O=xxx/OU=yyy/CN=aaa bbb ccc} Then you will need to have it listed as: \centerline{\tt "host1:port1:/O=xxx/OU=yyy/CN=aaa bbb ccc"} The following will not work: \centerline{\tt host1:port1:/O=xxx/OU=yyy/CN="aaa bbb ccc"} Each host in the boot schema must also have a ``{\tt lam\_\-install\_\-path}'' key indicating the absolute directory where Open MPI is installed. This value is mandatory because you cannot rely on the \ienvvar{PATH} environment variable in Globus environment because users' ``dot'' files are not executed in Globus jobs (and therefore the \envvar{PATH} environment variable is not provided). Other keys can be used as well; {\tt lam\_\-install\_\-path} is the only mandatory key. Here is a sample Globus boot schema: \changebegin{7.0.5} \lstset{style=lam-shell} \begin{lstlisting} # Globus boot schema ``inky.mycluster:12853:/O=MegaCorp/OU=Mine/CN=HPC Group'' prefix=/opt/lam cpu=2 ``pinky.yourcluster:3245:/O=MegaCorp/OU=Yours/CN=HPC Group'' prefix=/opt/lam cpu=4 ``blinky.hiscluster:23452:/O=MegaCorp/OU=His/CN=HPC Group'' prefix=/opt/lam cpu=4 ``clyde.hercluster:82342:/O=MegaCorp/OU=Hers/CN=HPC Group'' prefix=/software/lam \end{lstlisting} \changeend{7.0.5} Once you have this boot schema, the \cmd{lamboot} command can be used to launch it. Note, however, that unlike the other boot MCA modules, the Globus boot module will never be automatically selected by Open MPI -- it must be selected manually with the \issiparam{boot} MCA parameter with the value \issivalue{boot}{globus}. \lstset{style=lam-cmdline} \begin{lstlisting} shell$ lamboot -ssi boot globus hostfile \end{lstlisting} % stupid emacs mode: $ %%%%% \subsubsection{Tunable Parameters} Table~\ref{tbl:mca-orte-pls-globus-mca-params} lists the MCA parameters that are available to the \boot{globus} module. \begin{table}[htbp] \begin{ssiparamtb} % \ssiparamentry{boot\_\-globus\_\-priority}{3}{Default priority level.} \end{ssiparamtb} \caption{MCA parameters for the \boot{globus} boot module.} \label{tbl:mca-orte-pls-globus-mca-params} \end{table} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{The \boot{rsh} Module (including \cmd{ssh})} \index{rsh (ssh) boot MCA module@\boot{rsh} (\cmd{ssh}) boot MCA module} \index{boot MCA modules!rsh (rsh/ssh)@\boot{rsh} (\cmd{rsh}/\cmd{ssh})} The \cmd{rsh}/\cmd{ssh} boot MCA module is typically the ``least common denominator'' boot module. When not in an otherwise ``special'' environment (such as a batch scheduler), the \cmd{rsh}/\cmd{ssh} boot module is typically used to start the Open MPI run-time environment. %%%%% \subsubsection{Minimum Requirements} In addition to the minimum requirements listed in Section~\ref{sec:mca-orte-pls-min-reqs}, the following additional conditions must also be met for a successful \cmd{lamboot} using the \cmd{rsh} / \cmd{ssh} boot module: \begin{enumerate} \item The user must be able to execute arbitrary commands on each target host without being prompted for a password. \item The shell's start-up script must not print anything on standard error. The user can take advantage of the fact that \cmd{rsh} / \cmd{ssh} will start the shell non-interactively. The start-up script can exit early in this case, before executing many commands relevant only to interactive sessions and likely to generate output. \changebegin{7.1} This has now been changed in version 7.1; if the MCA parameter \issiparam{boot\_\-rsh\_\-ignore\_\-stderr} is nonzero, any output on standard error will {\em not} be treated as an error. \changeend{7.1} \end{enumerate} Section~\ref{sec:getting-started} (page~\pageref{sec:getting-started}) provides a short tutorial on using the \cmd{rsh} / \cmd{ssh} boot module, including tips on setting up ``dot'' files, setting up password-less remote execution, etc. %%%%% \subsubsection{Usage} Using \cmd{rsh}, \cmd{ssh}, or other remote-execution agent is probably the most common method for starting the Open MPI run-time execution environment. The boot schema typically lists the hostnames, CPU counts, and an optional username (if the user's name is different on the remote machine). \changebegin{7.1} The boot schema can also list an optional ``prefix'', which specifies the Open MPI installatation to be used on the particular host listed in the boot schema. This is typically used if the user has mutliple Open MPI installations on a host and want to switch between them without changing the dot files or \envvar{PATH} environment variables, or if the user has Open MPI installed under different paths on different hosts. If the prefix is not specified for a host in the boot schema file, then the Open MPI installation which is available in the \envvar{PATH} will be used on that host, or if the \cmdarg{-prefix $<$/lam/install/path$>$} option is specified for \cmd{lamboot}, the $<$/lam/install/path$>$ installation will be used. The prefix option in the boot schema file however overrides any prefix option specified on the \cmd{lamboot} command line for that host. For example: \lstset{style=lam-shell} \begin{lstlisting} # rsh boot schema inky.cluster.example.com cpu=2 pinky.cluster.example.com cpu=4 prefix=/home/joe/lam7.1/install/ blinky.cluster.example.com cpu=4 clyde.cluster.example.com user=jsmith \end{lstlisting} \changeend{7.1} The \cmd{rsh} / \cmd{ssh} boot module will usually run when no other boot module has been selected. It can, however, be manually selected, even when another module would typically [automatically] be selected by specifying the \issiparam{boot} MCA parameter with the value of \issivalue{boot}{rsh}. For example: \lstset{style=lam-cmdline} \begin{lstlisting} shell$ lamboot -ssi boot rsh hostfile \end{lstlisting} % stupid emacs mode: $ %%%%% \subsubsection{Tunable Parameters} \changebegin{7.1} Table~\ref{tbl:mca-orte-pls-rsh-mca-params} lists the MCA parameters that are available to the \boot{rsh} module. \changeend{7.1} \begin{table}[htbp] \begin{ssiparamtb} % \ssiparamentry{boot\_\-rsh\_\-agent}{From configure}{Remote shell agent to use.} % \ssiparamentry{boot\_\-rsh\_\-ignore\_\-stderr}{0}{If nonzero, ignore output from \file{stderr} when booting; don't treat it as an error.} \ssiparamentry{boot\_\-rsh\_\-priority}{10}{Default priority level.} % \ssiparamentry{boot\_\-rsh\_\-no\_\-n}{0}{If nonzero, don't use ``\cmd{-n}'' as an argument to the boot agent} % \ssiparamentry{boot\_\-rsh\_\-no\_\-profile}{0}{If nonzero, don't attempt to run ``\file{.profile}'' for Bourne-type shells.} % \ssiparamentry{boot\_\-rsh\_\-username}{None}{Username to use if different than login name.} \end{ssiparamtb} \caption{MCA parameters for the \boot{rsh} boot module.} \label{tbl:mca-orte-pls-rsh-mca-params} \end{table} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{The \boot{slurm} Module} \index{batch queue systems!SLURM boot MCA module} \index{slurm boot MCA module@\boot{slurm} boot MCA module} \index{boot MCA modules!slurm@\boot{slurm}} \changebegin{7.1} As its name implies, the Simple Linux Utility for Resource Management (SLURM)\footnote{http://www.llnl.gov/linux/slurm/} package is commonly used for managing Linux clusters, typically in high-performance computing environments. SLURM contains a native system for launching applications across the nodes that it manages. When using SLURM, \cmd{rsh}/\cmd{ssh} is not necessary to launch jobs on remote nodes. Instead, the \boot{slurm} boot module will automatically use SLURM's native job-launching interface to start Open MPI daemons. The advantages of using SLURM's native interface are: \begin{itemize} \item SLURM can generate proper accounting information for all nodes in a parallel job. \item SLURM can kill entire jobs properly when the job ends. \item \icmd{lamboot} executes significantly faster when using SLURM as compared to when it uses \cmd{rsh} / \cmd{ssh}. \end{itemize} %%%%% \subsubsection{Usage} SLURM allows running jobs in multiple ways. The \boot{slurm} boot module is only supported in some of them: \begin{itemize} \item ``Batch'' mode: where a script is submitted via the \icmd{srun} command and is executed on the first node from the set that SLURM allocated for the job. The script runs \icmd{lamboot}, \icmd{mpirun}, etc., as is normal for a Open MPI job. This method is supported, and is perhaps the most common way to run Open MPI automated jobs in SLURM environments. \item ``Allocate'' mode: where the ``\cmdarg{-A}'' option is given to \icmd{srun}, meaning that the shell were \icmd{lamboot} runs is likely to {\em not} be one of the nodes that SLURM has allocated for the job. In this case, Open MPI daemons will be launched on all nodes that were allocated by SLURM as well as the origin (i.e., the node where \cmd{lamboot} was run. The origin will be marked as ``no-schedule,'' meaning that applications launched by \cmd{mpirun} and \cmd{lamexec} will not be run there unless specifically requested (see See Section~\ref{sec:commands-lamboot}, page~\pageref{sec:commands-lamboot}, for more detail about this attribute and boot schemas in general). This method is supported, and is perhaps the most common way to run Open MPI interactive jobs in SLURM environments. \item ``\icmd{srun}'' mode: where a script is submitted via the \icmd{srun} command and is executed on {\em all} nodes that SLURM allocated for the job. In this case, the commands in the script (e.g., \icmd{lamboot}, \icmd{mpirun}, etc.) will be run on {\em all} nodes simultaneously, which is most likely not what you want. This mode is not supported. \end{itemize} When running in any of the supported SLURM modes, Open MPI will automatically detect that it should use the \boot{slurm} boot module -- no extra command line parameters or environment variables should be necessary. % Specifically, when running in a SLURM job, the \boot{slurm} module will report that it is available, and artificially inflate its priority relatively high in order to influence the boot module selection process. % However, the \boot{slurm} boot module can be forced by specifying the \issiparam{boot} MCA parameter with the value of \issivalue{boot}{slurm}. Unlike the \cmd{rsh}/\cmd{ssh} boot module, you do not need to specify a hostfile for the \boot{slurm} boot module. Instead, SLURM itself provides a list of nodes (and associated CPU counts) to Open MPI. Using \icmd{lamboot} is therefore as simple as: \lstset{style=lam-cmdline} \begin{lstlisting} shell$ lamboot \end{lstlisting} % stupid emacs mode: $ \changebegin{7.1} Note that in environments with multiple TCP networks, SLURM may be configured to use a network that is specifically designated for commodity traffic -- another network may exist that is specifically allocated for high-speed MPI traffic. By default, Open MPI will use the same hostnames that SLURM provides for all of its traffic. This means that Open MPI will send all of its MPI traffic across the same network that SLURM uses. However, Open MPI has the ability to boot using one set of hostnames / addresses and then use a second set of hostnames / addresses for MPI traffic. As such, Open MPI can redirect its TCP MPI traffic across a secondary network. It is possible that your system administrator has already configured Open MPI to operate in this manner. If a secondary TCP network is intended to be used for MPI traffic, see the section entitled ``Separating Open MPI and MPI TCP Traffic'' in the Open MPI Installation Guide. Note that this functionality has no effect on non-TCP \kind{rpi} modules (such as Myrinet, Infiniband, etc.). \changeend{7.1} %%%%% \subsubsection{Tunable Parameters} Table~\ref{tbl:mca-orte-pls-slurm-mca-params} lists the MCA parameters that are available to the \boot{slurm} module. \begin{table}[htbp] \begin{ssiparamtb} % \ssiparamentry{boot\_\-slurm\_\-priority}{50}{Default priority level.} \end{ssiparamtb} \caption{MCA parameters for the \boot{slurm} boot module.} \label{tbl:mca-orte-pls-slurm-mca-params} \end{table} %%%%% \subsubsection{Special Notes} Since the \boot{slurm} boot module is designed to work in SLURM jobs, it will fail if the \boot{slurm} boot module is manually specified and Open MPI is not currently running in a SLURM job. The \boot{slurm} module does not start a shell on the remote node. Instead, the entire environment of \cmd{lamboot} is pushed to the remote nodes before starting the Open MPI run-time environment. \changeend{7.1} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{The \boot{tm} Module (OpenPBS / PBS Pro / Torque)} \index{batch queue systems!OpenPBS / PBS Pro / Torque (TM) boot MCA module} \index{tm boot MCA module@\boot{tm} boot MCA module} \index{boot MCA modules!tm (PBS / Torque)@\boot{tm} (PBS / Torque)} Both OpenPBS and PBS Pro (both products of Altair Grid Technologies, LLC), contain support for the Task Management (TM) interface. Torque, the open source fork of the Open MPI product, also contains the TM interface. When using TM, \cmd{rsh}/\cmd{ssh} is not necessary to launch jobs on remote nodes. The advantages of using the TM interface are: \begin{itemize} \item PBS/Torque can generate proper accounting information for all nodes in a parallel job. \item PBS/Torque can kill entire jobs properly when the job ends. \item \icmd{lamboot} executes significantly faster when using TM as compared to when it uses \cmd{rsh} / \cmd{ssh}. \end{itemize} %%%%% \subsubsection{Usage} When running in a PBS/Torque batch job, Open MPI will automatically detect that it should use the \boot{tm} boot module -- no extra command line parameters or environment variables should be necessary. % Specifically, when running in a PBS/Torque job, the \boot{tm} module will report that it is available, and artificially inflate its priority relatively high in order to influence the boot module selection process. % However, the \boot{tm} boot module can be forced by specifying the \issiparam{boot} MCA parameter with the value of \issivalue{boot}{tm}. Unlike the \cmd{rsh}/\cmd{ssh} boot module, you do not need to specify a hostfile for the \boot{tm} boot module. Instead, PBS/Torque itself provides a list of nodes (and associated CPU counts) to Open MPI. Using \icmd{lamboot} is therefore as simple as: \lstset{style=lam-cmdline} \begin{lstlisting} shell$ lamboot \end{lstlisting} % stupid emacs mode: $ The \boot{tm} boot modules works in both interactive and non-interactive batch jobs. \changebegin{7.1} Note that in environments with multiple TCP networks, PBS / Torque may be configured to use a network that is specifically designated for commodity traffic -- another network may exist that is specifically allocated for high-speed MPI traffic. By default, Open MPI will use the same hostnames that the TM interface provides for all of its traffic. This means that Open MPI will send all of its MPI traffic across the same network that PBS / Torque uses. However, Open MPI has the ability to boot using one set of hostnames / addresses and then use a second set of hostnames / addresses for MPI traffic. As such, Open MPI can redirect its TCP MPI traffic across a secondary network. It is possible that your system administrator has already configured Open MPI to operate in this manner. If a secondary TCP network is intended to be used for MPI traffic, see the section entitled ``Separating Open MPI and MPI TCP Traffic'' in the Open MPI Installation Guide. Note that this has no effect on non-TCP \kind{rpi} modules (such as Myrinet, Infiniband, etc.). \changeend{7.1} %%%%% \subsubsection{Tunable Parameters} Table~\ref{tbl:mca-orte-pls-tm-mca-params} lists the MCA parameters that are available to the \boot{tm} module. \begin{table}[htbp] \begin{ssiparamtb} % \ssiparamentry{boot\_\-tm\_\-priority}{50}{Default priority level.} \end{ssiparamtb} \caption{MCA parameters for the \boot{tm} boot module.} \label{tbl:mca-orte-pls-tm-mca-params} \end{table} %%%%% \subsubsection{Special Notes} Since the \boot{tm} boot module is designed to work in PBS/Torque jobs, it will fail if the \boot{tm} boot module is manually specified and Open MPI is not currently running in a PBS/Torque job. The \boot{tm} module does not start a shell on the remote node. Instead, the entire environment of \cmd{lamboot} is pushed to the remote nodes before starting the Open MPI run-time environment. Also note that the Altair-provided client RPMs for PBS Pro do not include the \icmd{pbs\_\-demux} command, which is necessary for proper execution of TM jobs. The solution is to copy the executable from the server RPMs to the client nodes. Finally, TM does not provide a mechanism for path searching on the remote nodes, so the \cmd{lamd} executable is required to reside in the same location on each node to be booted. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Close of index \index{ORTE MCA components|)}