1
1
openmpi/orte/tools/orte-checkpoint
Josh Hursey 5406fdfb80 Add support for sending SIGSTOP the MPI job after the checkpoint is taken (uses a BLCR feature for the option).
This commit looks larger than it really is since it includes a fair amount of code cleanup.

The SIGSTOP/SIGCONT+checkpointing work uses some of the functionality in r20391. Basic use case below (note that the checkpoint generated is useable as usual if the stopped application is terminated).
{{{
shell 1) mpirun -np 2 -am ft-enable-cr my-app
... running ...

shell 2) ompi-checkpoint --stop -v MPIRUN_PID
[localhost:001300] [  0.00 /   0.20]                 Requested - ...
[localhost:001300] [  0.00 /   0.20]                   Pending - ...
[localhost:001300] [  0.01 /   0.21]                   Running - ...
[localhost:001300] [  1.01 /   1.22]                   Stopped - ompi_global_snapshot_1234.ckpt
Snapshot Ref.: 0 ompi_global_snapshot_1234.ckpt

shell 2) killall -CONT mpirun

... Application Continues execution in shell 1 ...
}}}

Other items in this commit are mostly cleanup that has been sitting off-trunk for too long:
 * Add a new {{{opal_crs_base_ckpt_options_t}}} type that encapsulates the various options that could be passed to the CRS. Currently only TERM and STOP, but this makes adding others ''much'' easier.
 * Eliminate ORTE_SNAPC_CKPT_STATE_PENDING_TERM, since it served a redundant purpose with the new options type.
 * Lay some basic ground work for some future features.

This commit was SVN r21995.

The following SVN revision numbers were found above:
  r20391 --> open-mpi/ompi@0704b98668
2009-09-22 18:26:12 +00:00
..
CMakeLists.txt Get rid of improper use of SET_SOURCE_FILES_PROPERTIES. When using the latest CMake (2.6 patch 4), we get many errors, which didn't show in previous version. 2009-05-07 17:41:05 +00:00
help-orte-checkpoint.txt fix some typos. should be moved to v1.3 2008-11-10 19:05:26 +00:00
Makefile.am Add windows support files into the tarball, including .windows, CMakeLists.txt files, and CMake modules. Thanks to Jeff for testing it on Linux. 2009-04-24 16:39:33 +00:00
orte-checkpoint.1in Putback for all changes to automate man page updates to strings of 2008-08-01 21:14:37 +00:00
orte-checkpoint.c Add support for sending SIGSTOP the MPI job after the checkpoint is taken (uses a BLCR feature for the option). 2009-09-22 18:26:12 +00:00