304 строки
12 KiB
TeX
304 строки
12 KiB
TeX
|
% -*- latex -*-
|
||
|
%
|
||
|
% Copyright (c) 2004-2005 The Trustees of Indiana University.
|
||
|
% All rights reserved.
|
||
|
% Copyright (c) 2004-2005 The Trustees of the University of Tennessee.
|
||
|
% All rights reserved.
|
||
|
% Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
||
|
% University of Stuttgart. All rights reserved.
|
||
|
% Copyright (c) 2004-2005 The Regents of the University of California.
|
||
|
% All rights reserved.
|
||
|
% $COPYRIGHT$
|
||
|
%
|
||
|
% Additional copyrights may follow
|
||
|
%
|
||
|
% $HEADER$
|
||
|
%
|
||
|
|
||
|
\chapter{Miscellaneous}
|
||
|
\label{sec:misc}
|
||
|
|
||
|
This chapter covers a variety of topics that don't conveniently fit
|
||
|
into other chapters.
|
||
|
|
||
|
{\Huge JMS Needs a lot of overhauling}
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\section{Singleton MPI Processes}
|
||
|
|
||
|
It is possible to run an MPI process without the \cmd{mpirun} or
|
||
|
\cmd{mpiexec} commands -- simply run the program as one would normally
|
||
|
launch a serial program:
|
||
|
|
||
|
\lstset{style=lam-cmdline}
|
||
|
\begin{lstlisting}
|
||
|
shell$ my_mpi_program
|
||
|
\end{lstlisting}
|
||
|
% Stupid emacs mode: $
|
||
|
|
||
|
Doing so will create an \mcw with a single process. This process can
|
||
|
either run by itself, or spawn or connect to other MPI processes and
|
||
|
become part of a larger MPI jobs using the MPI-2 dynamic function
|
||
|
calls.
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\section{MPI-2 I/O Support}
|
||
|
\index{ROMIO}
|
||
|
\index{MPI-2 I/O support|see {ROMIO}}
|
||
|
\index{I/O support|see {ROMIO}}
|
||
|
|
||
|
MPI-2 I/O support is provided through the ROMIO
|
||
|
package~\cite{thak99a,thak99b}. ROMIO has been fully integrated into
|
||
|
Open MPI. As such, \mpitype{MPI\_\-Request} objects can be used .....
|
||
|
|
||
|
ROMIO includes its own documentation and listings of known issues and
|
||
|
limitations. See the \file{README} file in the ROMIO directory in the
|
||
|
Open MPI distribution.
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\section{Fortran Process Names}
|
||
|
\index{fortran process names}
|
||
|
\cmdindex{mpitask}{fortran process names}
|
||
|
|
||
|
Since Fortran does not portably provide the executable name of the
|
||
|
process (similar to the way that C programs get an array of {\tt
|
||
|
argv}), the \icmd{mpitask} command lists the name ``Open MPI MPI Fortran
|
||
|
program'' by default for MPI programs that used the Fortran binding
|
||
|
for \mpifunc{MPI\_\-INIT} or \mpifunc{MPI\_\-INIT\_\-THREAD}.
|
||
|
|
||
|
The environment variable \ienvvar{Open MPI\_\-MPI\_\-PROCESS\_\-NAME} can
|
||
|
be used to override this behavior.
|
||
|
%
|
||
|
Setting this environment variable before invoking \icmd{mpirun} will
|
||
|
cause \cmd{mpitask} to list that name instead of the default title.
|
||
|
%
|
||
|
This environment variable only works for processes that invoke the
|
||
|
Fortran binding for \mpifunc{MPI\_\-INIT} or
|
||
|
\mpifunc{MPI\_\-INIT\_\-THREAD}.
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\section{MPI Thread Support}
|
||
|
\label{sec:misc-threads}
|
||
|
\index{threads and MPI}
|
||
|
\index{MPI and threads|see {threads and MPI}}
|
||
|
|
||
|
\def\mtsingle{\mpiconst{MPI\_\-THREAD\_\-SINGLE}}
|
||
|
\def\mtfunneled{\mpiconst{MPI\_\-THREAD\_\-FUNNELED}}
|
||
|
\def\mtserial{\mpiconst{MPI\_\-THREAD\_\-SERIALIZED}}
|
||
|
\def\mtmultiple{\mpiconst{MPI\_\-THREAD\_\-MULTIPLE}}
|
||
|
\def\mpiinit{\mpifunc{MPI\_\-INIT}}
|
||
|
\def\mpiinitthread{\mpifunc{MPI\_\-INIT\_\-THREAD}}
|
||
|
|
||
|
Open MPI currently implements support for \mtsingle, \mtfunneled, and
|
||
|
\mtserial. The constant \mtmultiple\ is provided, although Open MPI will
|
||
|
never return \mtmultiple\ in the \funcarg{provided} argument to
|
||
|
\mpiinitthread.
|
||
|
|
||
|
Open MPI makes no distinction between \mtsingle\ and \mtfunneled. When
|
||
|
\mtserial\ is used, a global lock is used to ensure that only one
|
||
|
thread is inside any MPI function at any time.
|
||
|
|
||
|
\subsection{Thread Level}
|
||
|
|
||
|
Selecting the thread level for an MPI job is best described in terms
|
||
|
of the two parameters passed to \mpiinitthread: \funcarg{requested}
|
||
|
and \funcarg{provided}. \funcarg{requested} is the thread level that
|
||
|
the user application requests, while \funcarg{provided} is the thread
|
||
|
level that Open MPI will run the application with.
|
||
|
|
||
|
\begin{itemize}
|
||
|
\item If \mpiinit\ is used to initialize the job, \funcarg{requested}
|
||
|
will implicitly be \mtsingle. However, if the
|
||
|
\ienvvar{Open MPI\_\-MPI\_\-THREAD\_\-LEVEL} environment variable is set
|
||
|
to one of the values in Table~\ref{tbl:mpi-env-thread-level}, the
|
||
|
corresponding thread level will be used for \funcarg{requested}.
|
||
|
|
||
|
\item If \mpiinitthread\ is used to initialized the job, the
|
||
|
\funcarg{requested} thread level is the first thread level that the
|
||
|
job will attempt to use. There is currently no way to specify lower
|
||
|
or upper bounds to the thread level that Open MPI will use.
|
||
|
|
||
|
The resulting thread level is largely determined by the SSI modules
|
||
|
that will be used in an MPI job; each module must be able to support
|
||
|
the target thread level. A complex algorithm is used to attempt to
|
||
|
find a thread level that is acceptable to all SSI modules.
|
||
|
Generally, the algorithm starts at \funcarg{requested} and works
|
||
|
backwards towards \mpiconst{MPI\_\-THREAD\_\-SINGLE} looking for an
|
||
|
acceptable level. However, any module may {\em increase} the thread
|
||
|
level under test if it requires it. At the end of this process, if
|
||
|
an acceptable thread level is not found, the MPI job will abort.
|
||
|
\end{itemize}
|
||
|
|
||
|
\begin{table}[htbp]
|
||
|
\centering
|
||
|
\begin{tabular}{|c|l|}
|
||
|
\hline
|
||
|
Value & \multicolumn{1}{|c|}{Meaning} \\
|
||
|
\hline
|
||
|
\hline
|
||
|
undefined & \mtsingle \\
|
||
|
0 & \mtsingle \\
|
||
|
1 & \mtfunneled \\
|
||
|
2 & \mtserial \\
|
||
|
3 & \mtmultiple \\
|
||
|
\hline
|
||
|
\end{tabular}
|
||
|
\caption{Valid values for the \envvar{Open MPI\_\-MPI\_\-THREAD\_\-LEVEL}
|
||
|
environment variable.}
|
||
|
\label{tbl:mpi-env-thread-level}
|
||
|
\end{table}
|
||
|
|
||
|
Also note that certain SSI modules require higher thread support
|
||
|
levels than others. For example, any checkpoint/restart SSI module
|
||
|
will require a minimum of \mtserial, and will attempt to adjust the
|
||
|
thread level upwards as necessary (if that CR module will be used
|
||
|
during the job).
|
||
|
|
||
|
Hence, using \mpiinit\ to initialize an MPI job does not imply that
|
||
|
the provided thread level will be \mtsingle.
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\section{MPI-2 Name Publishing}
|
||
|
\index{published names}
|
||
|
\index{dynamic name publishing|see {published names}}
|
||
|
\index{name publising|see {published names}}
|
||
|
|
||
|
Open MPI supports the MPI-2 functions \mpifunc{MPI\_\-PUBLISH\_\-NAME}
|
||
|
and \mpifunc{MPI\_\-UNPUBLISH\_\-NAME} for publishing and unpublishing
|
||
|
names, respectively. Published names are stored within the Open MPI
|
||
|
daemons, and are therefore persistent, even when the MPI process that
|
||
|
published them dies.
|
||
|
|
||
|
As such, it is important for correct MPI programs to unpublish their
|
||
|
names before they terminate. However, if stale names are left in the
|
||
|
Open MPI universe when an MPI process terminates, the \icmd{lamclean}
|
||
|
command can be used to clean {\em all} names from the Open MPI RTE.
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\section{Batch Queuing System Support}
|
||
|
\label{sec:misc-batch}
|
||
|
\index{batch queue systems}
|
||
|
\index{Portable Batch System|see {batch queue systems}}
|
||
|
\index{PBS|see {batch queue systems}}
|
||
|
\index{PBS Pro|see {batch queue systems}}
|
||
|
\index{OpenPBS|see {batch queue systems}}
|
||
|
\index{Load Sharing Facility|see {batch queue systems}}
|
||
|
\index{LSF|see {batch queue systems}}
|
||
|
\index{Clubmask|see {batch queue systems}}
|
||
|
|
||
|
Open MPI is now aware of some batch queuing systems. Support is currently
|
||
|
included for PBS, LSF, and Clubmask-based
|
||
|
systems. There is also a generic functionality that allows users of
|
||
|
other batch queue systems to take advantages of this functionality.
|
||
|
|
||
|
\begin{itemize}
|
||
|
\item When running under a supported batch queue system, Open MPI will take
|
||
|
precautions to isolate itself from other instances of Open MPI in
|
||
|
concurrent batch jobs. That is, the multiple Open MPI instances from the
|
||
|
same user can exist on the same machine when executing in batch.
|
||
|
This allows a user to submit as many Open MPI jobs as necessary, and even
|
||
|
if they end up running on the same nodes, a \cmd{lamclean} in one
|
||
|
job will not kill MPI applications in another job.
|
||
|
|
||
|
\item This behavior is {\em only} exhibited under a batch environment.
|
||
|
Other batch systems can easily be supported -- let the Open MPI Team know
|
||
|
if you'd like to see support for others included. Manually setting
|
||
|
the environment variable \ienvvar{Open MPI\_\-MPI\_\-SESSION\_\-SUFFIX}
|
||
|
on the node where \icmd{lamboot} is run achieves the same ends.
|
||
|
\end{itemize}
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\section{Location of Open MPI's Session Directory}
|
||
|
\label{sec:misc-session-directory}
|
||
|
\index{session directory}
|
||
|
|
||
|
By default, Open MPI will create a temporary per-user session directory
|
||
|
in the following directory:
|
||
|
|
||
|
\centerline{\file{<tmpdir>/lam-<username>@<hostname>[-<session\_suffix>]}}
|
||
|
|
||
|
\noindent Each of the components is described below:
|
||
|
|
||
|
\begin{description}
|
||
|
\item[\file{<tmpdir>}]: Open MPI will set the prefix used for the session
|
||
|
directory based on the following search order:
|
||
|
|
||
|
\begin{enumerate}
|
||
|
\item The value of the \ienvvar{Open MPI\_\-MPI\_\-SESSION\_\-PREFIX}
|
||
|
environment variable
|
||
|
|
||
|
\item The value of the \ienvvar{TMPDIR} environment variable
|
||
|
|
||
|
\item \file{/tmp/}
|
||
|
\end{enumerate}
|
||
|
|
||
|
It is important to note that (unlike
|
||
|
\ienvvar{Open MPI\_\-MPI\_\-SESSION\_\-SUFFIX}), the environment
|
||
|
variables for determining \file{<tmpdir>} must be set on each node
|
||
|
(although they do not necessarily have to be the same value).
|
||
|
\file{<tmpdir>} must exist before \icmd{lamboot} is run, or
|
||
|
\icmd{lamboot} will fail.
|
||
|
|
||
|
\item[\file{<username>}]: The user's name on that host.
|
||
|
|
||
|
\item[\file{<hostname>}]: The hostname.
|
||
|
|
||
|
\item[\file{<session\_suffix>}]: Open MPI will set the suffix (if any) used
|
||
|
for the session directory based on the following search order:
|
||
|
|
||
|
\begin{enumerate}
|
||
|
|
||
|
\item The value of the \ienvvar{Open MPI\_\-MPI\_\-SESSION\_\-SUFFIX}
|
||
|
environment variable.
|
||
|
|
||
|
\item If running under a supported batch system, a unique session
|
||
|
ID (based on information from the batch system) will be used.
|
||
|
\end{enumerate}
|
||
|
\end{description}
|
||
|
|
||
|
\ienvvar{Open MPI\_\-MPI\_\-SESSION\_\-SUFFIX} and the batch information
|
||
|
only need to be available on the node from which \icmd{lamboot} is
|
||
|
run. \icmd{lamboot} will propagate the information to the other
|
||
|
nodes.
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\section{Signal Catching}
|
||
|
\index{signals}
|
||
|
|
||
|
Open MPI MPI catches the signals SEGV, BUS, FPE, and ILL. The
|
||
|
signal handler terminates the application. This is useful in batch
|
||
|
jobs to help ensure that \icmd{mpirun} returns if an application
|
||
|
process dies. To disable the catching of signals use the
|
||
|
\cmdarg{-nsigs} option to \icmd{mpirun}.
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
|
||
|
\section{MPI Attributes}
|
||
|
|
||
|
\begin{discuss}
|
||
|
Need to have discussion of built-in attributes here, such as
|
||
|
MPI\_\-UNIVERSE\_\-SIZE, etc. Should specifically mention that
|
||
|
MPI\_\-UNIVERSE\_\-SIZE is fixed at \mpifunc{MPI\_\-INIT} time (at
|
||
|
least it is as of this writing -- who knows what it will be when we
|
||
|
release 7.1? :-).
|
||
|
|
||
|
This whole section is for 7.1.
|
||
|
\end{discuss}
|