7437f37e96
* Fix some missing includes in a few places. * Add the cr_request() functionality to the BLCR CRS component. We are now dependent upon the 0.6.* series of BLCR. * Made the CR notification mechanism a registered function. This way we can have an OPAL-only version and it can be replaced at runtime with the ORTE version. * Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only CR functionality when the user wants it. Default: Disabled. * Fix the placement of a checkpoint request check in MPI_Init * Pull the OPAL notification mechanism into the SnapC framework. * We no longer fork/exec the 'opal-checkpoint' command for local checkpointing, the Local coordinator in the orted does this directly. * The Local and Application coordinator talk together bypassing the OPAL notifiation mechanism. * Optimized the Local <-> App Coordinator communication. * Improved the structure used to track vpid_snapshots in the local coord. * Fix a race condition in which an application under heavy communication load may produce an inconsistent global checkpoint. This commit was SVN r16389.
37 строки
682 B
Plaintext
37 строки
682 B
Plaintext
#
|
|
# An Aggregate MCA Parameter Set to enable checkpoint/restart capabilities
|
|
# for a job.
|
|
#
|
|
# Usage:
|
|
# shell$ mpirun -am ft-enable-cr ./app
|
|
#
|
|
|
|
#
|
|
# OPAL Parameters
|
|
# - Turn off OPAL only checkpointing
|
|
# - Select only checkpoint ready components
|
|
# - Enable Additional FT infrastructure
|
|
# - Auto-select OPAL CRS component
|
|
#
|
|
opal_cr_allow_opal_only=0
|
|
mca_base_component_distill_checkpoint_ready=1
|
|
ft_cr_enabled=1
|
|
crs=
|
|
|
|
#
|
|
# ORTE Parameters
|
|
# - Wrap the RML
|
|
# - Use the 'full' Snapshot Coordinator
|
|
#
|
|
rml_wrapper=ftrm
|
|
snapc=full
|
|
#filem=rsh
|
|
|
|
#
|
|
# OMPI Parameters
|
|
# - Wrap the PML
|
|
# - Use the LAM/MPI-like Coordinated Checkpoint/Restart Coordination Protocol
|
|
#
|
|
pml_wrapper=crcpw
|
|
crcp=coord
|