1
1
openmpi/orte/mca/snapc/base
Josh Hursey 5406fdfb80 Add support for sending SIGSTOP the MPI job after the checkpoint is taken (uses a BLCR feature for the option).
This commit looks larger than it really is since it includes a fair amount of code cleanup.

The SIGSTOP/SIGCONT+checkpointing work uses some of the functionality in r20391. Basic use case below (note that the checkpoint generated is useable as usual if the stopped application is terminated).
{{{
shell 1) mpirun -np 2 -am ft-enable-cr my-app
... running ...

shell 2) ompi-checkpoint --stop -v MPIRUN_PID
[localhost:001300] [  0.00 /   0.20]                 Requested - ...
[localhost:001300] [  0.00 /   0.20]                   Pending - ...
[localhost:001300] [  0.01 /   0.21]                   Running - ...
[localhost:001300] [  1.01 /   1.22]                   Stopped - ompi_global_snapshot_1234.ckpt
Snapshot Ref.: 0 ompi_global_snapshot_1234.ckpt

shell 2) killall -CONT mpirun

... Application Continues execution in shell 1 ...
}}}

Other items in this commit are mostly cleanup that has been sitting off-trunk for too long:
 * Add a new {{{opal_crs_base_ckpt_options_t}}} type that encapsulates the various options that could be passed to the CRS. Currently only TERM and STOP, but this makes adding others ''much'' easier.
 * Eliminate ORTE_SNAPC_CKPT_STATE_PENDING_TERM, since it served a redundant purpose with the new options type.
 * Lay some basic ground work for some future features.

This commit was SVN r21995.

The following SVN revision numbers were found above:
  r20391 --> open-mpi/ompi@0704b98668
2009-09-22 18:26:12 +00:00
..
base.h Add support for sending SIGSTOP the MPI job after the checkpoint is taken (uses a BLCR feature for the option). 2009-09-22 18:26:12 +00:00
help-orte-snapc-base.txt Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). 2007-03-16 23:11:45 +00:00
Makefile.am Complete implementation of the --without-rte-support configure option. Working with Brian, this has been tested on RedStorm. 2008-06-18 03:15:56 +00:00
snapc_base_close.c - On the way to get the BTLs split out and lessen dependency on orte: 2009-02-14 02:26:12 +00:00
snapc_base_fns.c Add support for sending SIGSTOP the MPI job after the checkpoint is taken (uses a BLCR feature for the option). 2009-09-22 18:26:12 +00:00
snapc_base_open.c A bunch of improvements focused on Snapshot Coordination (SnapC) and File Management (FileM). 2009-04-30 16:55:39 +00:00
snapc_base_select.c Add support for sending SIGSTOP the MPI job after the checkpoint is taken (uses a BLCR feature for the option). 2009-09-22 18:26:12 +00:00