1
1
Josh Hursey 7437f37e96 This commit contains the following:
* Fix some missing includes in a few places.
 * Add the cr_request() functionality to the BLCR CRS component.
   We are now dependent upon the 0.6.* series of BLCR.
 * Made the CR notification mechanism a registered function.
   This way we can have an OPAL-only version and it can be replaced at
   runtime with the ORTE version.
 * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only
   CR functionality when the user wants it. Default: Disabled.
 * Fix the placement of a checkpoint request check in MPI_Init
 * Pull the OPAL notification mechanism into the SnapC framework.
   * We no longer fork/exec the 'opal-checkpoint' command for local
   checkpointing, the Local coordinator in the orted does this directly.
   * The Local and Application coordinator talk together bypassing the OPAL
   notifiation mechanism.
   * Optimized the Local <-> App Coordinator communication.
   * Improved the structure used to track vpid_snapshots in the local coord.
 * Fix a race condition in which an application under heavy communication load
   may produce an inconsistent global checkpoint.

This commit was SVN r16389.
2007-10-08 20:53:02 +00:00

37 строки
682 B
Plaintext

#
# An Aggregate MCA Parameter Set to enable checkpoint/restart capabilities
# for a job.
#
# Usage:
# shell$ mpirun -am ft-enable-cr ./app
#
#
# OPAL Parameters
# - Turn off OPAL only checkpointing
# - Select only checkpoint ready components
# - Enable Additional FT infrastructure
# - Auto-select OPAL CRS component
#
opal_cr_allow_opal_only=0
mca_base_component_distill_checkpoint_ready=1
ft_cr_enabled=1
crs=
#
# ORTE Parameters
# - Wrap the RML
# - Use the 'full' Snapshot Coordinator
#
rml_wrapper=ftrm
snapc=full
#filem=rsh
#
# OMPI Parameters
# - Wrap the PML
# - Use the LAM/MPI-like Coordinated Checkpoint/Restart Coordination Protocol
#
pml_wrapper=crcpw
crcp=coord