I promoted the ''none'' component to a full component, and updated the other components to reflect this code movement. The ''none'' component is the default component unless the user requests '''-am ft-enable-cr''' to auto-select a component. There is an MCA parameter to show a warning if the application requested an FT enabled job, but the ''none'' component was selected ({{{crs_none_select_warning}}}).
This temporarily fixes the problem mentioned in r18739. The full fix will entail working on ticket #1291.
Thanks to Ethan from Sun for finding this bug.
This commit was SVN r18840.
The following SVN revision numbers were found above:
r18739 --> open-mpi/ompi@a003fa7a50
http://www.open-mpi.org/community/lists/devel/2008/04/3779.php
{{{
svn merge -r 18276:18380 https://svn.open-mpi.org/svn/ompi/tmp-public/jjh-mca-play .
}}}
Any components not in the trunk, but in one of the effected frameworks *must* be
updated. Contact the list, look at the RFC, or look at the diff for how to do this.
Sorry for the early commit of this, but I wanted to get it in today (per RFC) and
didn't know if I would have a chance later today.
This commit was SVN r18381.
Some MPI C interface files saw some spacing changes to conform to the coding standards of Open MPI.
Changed MPI C interface files to use {{{OPAL_CR_ENTER_LIBRARY()}}} and {{{OPAL_CR_EXIT_LIBRARY()}}} instead of just {{{OPAL_CR_TEST_CHECKPOINT_READY()}}}. This will allow the checkpoint/restart system more flexibility in how it is to behave.
Fixed the configure check for {{{--enable-ft-thread}}} so it has a know dependance on {{{--enable-mpi-thread}}} (and/or {{{--enable-progress-thread}}}).
Added a line for Checkpoint/Restart support to {{{ompi_info}}}.
Added some options to choose at runtime whether or not to use the checkpoint polling thread. By default, if the user asked for it to be compiled in, then it is used. But some users will want the ability to toggle its use at runtime.
There are still some places for improvement, but the feature works correctly. As always with Checkpoint/Restart, it is compiled out unless explicitly asked for at configure time. Further, if it was configured in, then it is not used unless explicitly asked for by the user at runtime.
This commit was SVN r17516.
* Fix some missing includes in a few places.
* Add the cr_request() functionality to the BLCR CRS component.
We are now dependent upon the 0.6.* series of BLCR.
* Made the CR notification mechanism a registered function.
This way we can have an OPAL-only version and it can be replaced at
runtime with the ORTE version.
* Add a 'opal_cr_allow_opal_only' parameter that will enable OPAL-only
CR functionality when the user wants it. Default: Disabled.
* Fix the placement of a checkpoint request check in MPI_Init
* Pull the OPAL notification mechanism into the SnapC framework.
* We no longer fork/exec the 'opal-checkpoint' command for local
checkpointing, the Local coordinator in the orted does this directly.
* The Local and Application coordinator talk together bypassing the OPAL
notifiation mechanism.
* Optimized the Local <-> App Coordinator communication.
* Improved the structure used to track vpid_snapshots in the local coord.
* Fix a race condition in which an application under heavy communication load
may produce an inconsistent global checkpoint.
This commit was SVN r16389.
This merge adds Checkpoint/Restart support to Open MPI. The initial
frameworks and components support a LAM/MPI-like implementation.
This commit follows the risk assessment presented to the Open MPI core
development group on Feb. 22, 2007.
This commit closes trac:158
More details to follow.
This commit was SVN r14051.
The following SVN revisions from the original message are invalid or
inconsistent and therefore were not cross-referenced:
r13912
The following Trac tickets were found above:
Ticket 158 --> https://svn.open-mpi.org/trac/ompi/ticket/158