dadca7da88
This merge adds Checkpoint/Restart support to Open MPI. The initial frameworks and components support a LAM/MPI-like implementation. This commit follows the risk assessment presented to the Open MPI core development group on Feb. 22, 2007. This commit closes trac:158 More details to follow. This commit was SVN r14051. The following SVN revisions from the original message are invalid or inconsistent and therefore were not cross-referenced: r13912 The following Trac tickets were found above: Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158
103 строки
2.6 KiB
Groff
103 строки
2.6 KiB
Groff
.\"
|
|
.\" Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana
|
|
.\" University Research and Technology
|
|
.\" Corporation. All rights reserved.
|
|
.\"
|
|
.\" Man page for OMPI's ompi-checkpoint command
|
|
.\"
|
|
.\" .TH name section center-footer left-footer center-header
|
|
.TH OMPI-CHECKPOINT 1 "March 2007" "Open MPI" "OPEN MPI COMMANDS"
|
|
.\" **************************
|
|
.\" Name Section
|
|
.\" **************************
|
|
.SH NAME
|
|
.
|
|
ompi-checkpoint, orte-checkpoint \- Checkpoint a running parallel process using the Open MPI
|
|
Checkpoint/Restart Service (CRS)
|
|
.
|
|
.PP
|
|
.
|
|
\fBNOTE:\fP \fIompi-checkpoint\fP, and \fIorte-checkpoint\fP are all exact
|
|
synonyms for each other. Using any of the names will result in exactly
|
|
identical behavior.
|
|
.
|
|
.\" **************************
|
|
.\" Synopsis Section
|
|
.\" **************************
|
|
.SH SYNOPSIS
|
|
.
|
|
.B ompi-checkpoint
|
|
.R [ options ]
|
|
.B <PID_OF_MPIRUN>
|
|
.
|
|
.\" **************************
|
|
.\" Options Section
|
|
.\" **************************
|
|
.SH Options
|
|
.
|
|
\fIorte-checkpoint\fR will attempt to notify a running parallel job (identified
|
|
by \fImpirun\fP) that it has been requested that the job checkpoint itself. A
|
|
global snapshot handle reference is presented to the user, which is used in
|
|
\fIompi_restart\fP to restart the job.
|
|
.
|
|
.TP 10
|
|
.B <PID_OF_MPIRUN>
|
|
Process ID of the \fImpirun\fP process.
|
|
.
|
|
.
|
|
.TP
|
|
.B -h | --help
|
|
Display help for this command
|
|
.
|
|
.
|
|
.TP
|
|
.B -w | --nowait
|
|
Do not wait for the application to finish checkpointing before returning.
|
|
.
|
|
.
|
|
.TP
|
|
.B -s | --status
|
|
Display status messages regarding the progression of the checkpoint request.
|
|
.
|
|
.
|
|
.TP
|
|
.B --term
|
|
After checkpointing the running job, terminate it.
|
|
.
|
|
.
|
|
.TP
|
|
.B -v | --verbose
|
|
Enable verbose output for debugging.
|
|
.
|
|
.
|
|
.TP
|
|
.B -gmca | --gmca \fR<key> <value>\fP
|
|
Pass global MCA parameters that are applicable to all contexts. \fI<key>\fP is
|
|
the parameter name; \fI<value>\fP is the parameter value.
|
|
.
|
|
.
|
|
.TP
|
|
.B -mca | --mca <key> <value>
|
|
Send arguments to various MCA modules.
|
|
.
|
|
.
|
|
.\" **************************
|
|
.\" Description Section
|
|
.\" **************************
|
|
.SH DESCRIPTION
|
|
.
|
|
.PP
|
|
\fIorte-checkpoint\fR can be invoked multiple, non-overlapping times.
|
|
It is convenient to note that the user does not need to spectify
|
|
the checkpointer to be used here, as that is determined completely by each of
|
|
the running process in the job being checkpointed.
|
|
.
|
|
.
|
|
.\" **************************
|
|
.\" See Also Section
|
|
.\" **************************
|
|
.
|
|
.SH SEE ALSO
|
|
orte-ps(1), orte-clean(1), ompi-restart(1), opal-checkpoint(1), opal-restart(1), opal_crs(7)
|
|
.
|