
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php Documentation: http://osl.iu.edu/research/ft/ Major Changes: -------------- * Added C/R-enabled Debugging support. Enabled with the --enable-crdebug flag. See the following website for more information: http://osl.iu.edu/research/ft/crdebug/ * Added Stable Storage (SStore) framework for checkpoint storage * 'central' component does a direct to central storage save * 'stage' component stages checkpoints to central storage while the application continues execution. * 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress) * 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching) * Added Compression (compress) framework to support * Add two new ErrMgr recovery policies * {{{crmig}}} C/R Process Migration * {{{autor}}} C/R Automatic Recovery * Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component * Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option) * {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342) * {{{OMPI_CR_Restart}}} * {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules) * {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192) * {{{OMPI_CR_Quiesce_start}}} * {{{OMPI_CR_Quiesce_checkpoint}}} * {{{OMPI_CR_Quiesce_end}}} * {{{OMPI_CR_self_register_checkpoint_callback}}} * {{{OMPI_CR_self_register_restart_callback}}} * {{{OMPI_CR_self_register_continue_callback}}} * The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future. * Add a progress meter to: * FileM rsh (filem_rsh_process_meter) * SnapC full (snapc_full_progress_meter) * SStore stage (sstore_stage_progress_meter) * Added 2 new command line options to ompi-restart * --showme : Display the full command line that would have been exec'ed. * --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413) * Deprecated some MCA params: * crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir * snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir * snapc_base_global_shared deprecated, use sstore_stage_global_is_shared * snapc_base_store_in_place deprecated, replaced with different components of SStore * snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref * snapc_base_establish_global_snapshot_dir deprecated, never well supported * snapc_full_skip_filem deprecated, use sstore_stage_skip_filem Minor Changes: -------------- * Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing. * Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components * Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it. * Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}} * Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set. * opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality. * Cleanup the CRS framework and components to work with the SStore framework. * Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably). * Add 'quiesce' hook to CRCP for a future enhancement. * We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}. * Add optional application level INC callbacks (registered through the CR MPI Ext interface). * Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive. * {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked. * {{{opal-restart}}} also support local decompression before restarting * {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata * {{{orte-restart}}} now uses the SStore framework to work with the metadata * Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality. * Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}. * Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped. * Make sure to decrement the number of 'num_local_procs' in the orted when one goes away. * odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options. * Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities. * Improve the checks for 'already checkpointing' error path. * A a recovery output timer, to show how long it takes to restart a job * Do a better job of cleaning up the old session directory on restart. * Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment) * Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize. This commit was SVN r23587. The following Trac tickets were found above: Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924 Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097 Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161 Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192 Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208 Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342 Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
468 строки
14 KiB
C
468 строки
14 KiB
C
/*
|
|
* Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana
|
|
* University Research and Technology
|
|
* Corporation. All rights reserved.
|
|
* Copyright (c) 2004-2005 The University of Tennessee and The University
|
|
* of Tennessee Research Foundation. All rights
|
|
* reserved.
|
|
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
|
* University of Stuttgart. All rights reserved.
|
|
* Copyright (c) 2004-2005 The Regents of the University of California.
|
|
* All rights reserved.
|
|
* Copyright (c) 2007-2010 Cisco Systems, Inc. All rights reserved.
|
|
* Copyright (c) 2007 Sun Microsystems, Inc. All rights reserved.
|
|
* Copyright (c) 2009 Oak Ridge National Labs. All rights reserved.
|
|
* $COPYRIGHT$
|
|
*
|
|
* Additional copyrights may follow
|
|
*
|
|
* $HEADER$
|
|
*/
|
|
|
|
/** @file **/
|
|
|
|
#include "opal_config.h"
|
|
|
|
#include "opal/util/malloc.h"
|
|
#include "opal/util/output.h"
|
|
#include "opal/util/trace.h"
|
|
#include "opal/util/show_help.h"
|
|
#include "opal/memoryhooks/memory.h"
|
|
#include "opal/mca/base/base.h"
|
|
#include "opal/runtime/opal.h"
|
|
#include "opal/util/net.h"
|
|
#include "opal/datatype/opal_datatype.h"
|
|
#include "opal/mca/installdirs/base/base.h"
|
|
#include "opal/mca/memory/base/base.h"
|
|
#include "opal/mca/memcpy/base/base.h"
|
|
#include "opal/mca/paffinity/base/base.h"
|
|
#include "opal/mca/timer/base/base.h"
|
|
#include "opal/mca/memchecker/base/base.h"
|
|
#include "opal/dss/dss.h"
|
|
#include "opal/mca/carto/base/base.h"
|
|
#if OPAL_ENABLE_FT_CR == 1
|
|
#include "opal/mca/compress/base/base.h"
|
|
#endif
|
|
|
|
#include "opal/runtime/opal_cr.h"
|
|
#include "opal/mca/crs/base/base.h"
|
|
|
|
#include "opal/runtime/opal_progress.h"
|
|
#include "opal/event/event.h"
|
|
#include "opal/mca/backtrace/base/base.h"
|
|
|
|
#include "opal/constants.h"
|
|
#include "opal/util/error.h"
|
|
#include "opal/util/stacktrace.h"
|
|
#include "opal/util/keyval_parse.h"
|
|
#include "opal/util/sys_limits.h"
|
|
#include "opal/util/opal_sos.h"
|
|
|
|
#if OPAL_CC_USE_PRAGMA_IDENT
|
|
#pragma ident OPAL_IDENT_STRING
|
|
#elif OPAL_CC_USE_IDENT
|
|
#ident OPAL_IDENT_STRING
|
|
#endif
|
|
const char opal_version_string[] = OPAL_IDENT_STRING;
|
|
|
|
int opal_initialized = 0;
|
|
int opal_util_initialized = 0;
|
|
bool opal_profile = false;
|
|
char *opal_profile_file = NULL;
|
|
int opal_cache_line_size;
|
|
|
|
static const char *
|
|
opal_err2str(int errnum)
|
|
{
|
|
const char *retval;
|
|
|
|
switch (OPAL_SOS_GET_ERROR_CODE(errnum)) {
|
|
case OPAL_SUCCESS:
|
|
retval = "Success";
|
|
break;
|
|
case OPAL_ERROR:
|
|
retval = "Error";
|
|
break;
|
|
case OPAL_ERR_OUT_OF_RESOURCE:
|
|
retval = "Out of resource";
|
|
break;
|
|
case OPAL_ERR_TEMP_OUT_OF_RESOURCE:
|
|
retval = "Temporarily out of resource";
|
|
break;
|
|
case OPAL_ERR_RESOURCE_BUSY:
|
|
retval = "Resource busy";
|
|
break;
|
|
case OPAL_ERR_BAD_PARAM:
|
|
retval = "Bad parameter";
|
|
break;
|
|
case OPAL_ERR_FATAL:
|
|
retval = "Fatal";
|
|
break;
|
|
case OPAL_ERR_NOT_IMPLEMENTED:
|
|
retval = "Not implemented";
|
|
break;
|
|
case OPAL_ERR_NOT_SUPPORTED:
|
|
retval = "Not supported";
|
|
break;
|
|
case OPAL_ERR_INTERUPTED:
|
|
retval = "Interupted";
|
|
break;
|
|
case OPAL_ERR_WOULD_BLOCK:
|
|
retval = "Would block";
|
|
break;
|
|
case OPAL_ERR_IN_ERRNO:
|
|
retval = "In errno";
|
|
break;
|
|
case OPAL_ERR_UNREACH:
|
|
retval = "Unreachable";
|
|
break;
|
|
case OPAL_ERR_NOT_FOUND:
|
|
retval = "Not found";
|
|
break;
|
|
case OPAL_EXISTS:
|
|
retval = "Exists";
|
|
break;
|
|
case OPAL_ERR_TIMEOUT:
|
|
retval = "Timeout";
|
|
break;
|
|
case OPAL_ERR_NOT_AVAILABLE:
|
|
retval = "Not available";
|
|
break;
|
|
case OPAL_ERR_PERM:
|
|
retval = "No permission";
|
|
break;
|
|
case OPAL_ERR_VALUE_OUT_OF_BOUNDS:
|
|
retval = "Value out of bounds";
|
|
break;
|
|
case OPAL_ERR_FILE_READ_FAILURE:
|
|
retval = "File read failure";
|
|
break;
|
|
case OPAL_ERR_FILE_WRITE_FAILURE:
|
|
retval = "File write failure";
|
|
break;
|
|
case OPAL_ERR_FILE_OPEN_FAILURE:
|
|
retval = "File open failure";
|
|
break;
|
|
case OPAL_ERR_PACK_MISMATCH:
|
|
retval = "Pack data mismatch";
|
|
break;
|
|
case OPAL_ERR_PACK_FAILURE:
|
|
retval = "Data pack failed";
|
|
break;
|
|
case OPAL_ERR_UNPACK_FAILURE:
|
|
retval = "Data unpack failed";
|
|
break;
|
|
case OPAL_ERR_UNPACK_INADEQUATE_SPACE:
|
|
retval = "Data unpack had inadequate space";
|
|
break;
|
|
case OPAL_ERR_UNPACK_READ_PAST_END_OF_BUFFER:
|
|
retval = "Data unpack would read past end of buffer";
|
|
break;
|
|
case OPAL_ERR_OPERATION_UNSUPPORTED:
|
|
retval = "Requested operation is not supported on referenced data type";
|
|
break;
|
|
case OPAL_ERR_UNKNOWN_DATA_TYPE:
|
|
retval = "Unknown data type";
|
|
break;
|
|
case OPAL_ERR_BUFFER:
|
|
retval = "Buffer type (described vs non-described) mismatch - operation not allowed";
|
|
break;
|
|
case OPAL_ERR_DATA_TYPE_REDEF:
|
|
retval = "Attempt to redefine an existing data type";
|
|
break;
|
|
case OPAL_ERR_DATA_OVERWRITE_ATTEMPT:
|
|
retval = "Attempt to overwrite a data value";
|
|
break;
|
|
case OPAL_ERR_MODULE_NOT_FOUND:
|
|
retval = "Framework requires at least one active module, but none found";
|
|
break;
|
|
case OPAL_ERR_TOPO_SLOT_LIST_NOT_SUPPORTED:
|
|
retval = "OS topology does not support slot_list process affinity";
|
|
break;
|
|
case OPAL_ERR_TOPO_SOCKET_NOT_SUPPORTED:
|
|
retval = "Could not obtain socket topology information";
|
|
break;
|
|
case OPAL_ERR_TOPO_CORE_NOT_SUPPORTED:
|
|
retval = "Could not obtain core topology information";
|
|
break;
|
|
case OPAL_ERR_NOT_ENOUGH_SOCKETS:
|
|
retval = "Not enough sockets to meet request";
|
|
break;
|
|
case OPAL_ERR_NOT_ENOUGH_CORES:
|
|
retval = "Not enough cores to meet request";
|
|
break;
|
|
case OPAL_ERR_INVALID_PHYS_CPU:
|
|
retval = "Invalid physical cpu number returned";
|
|
break;
|
|
case OPAL_ERR_MULTIPLE_AFFINITIES:
|
|
retval = "Multiple methods for assigning process affinity were specified";
|
|
break;
|
|
case OPAL_ERR_SLOT_LIST_RANGE:
|
|
retval = "Provided slot_list range is invalid";
|
|
break;
|
|
|
|
default:
|
|
retval = NULL;
|
|
}
|
|
|
|
return retval;
|
|
}
|
|
|
|
|
|
int
|
|
opal_init_util(int* pargc, char*** pargv)
|
|
{
|
|
int ret;
|
|
char *error = NULL;
|
|
|
|
if( ++opal_util_initialized != 1 ) {
|
|
if( opal_util_initialized < 1 ) {
|
|
return OPAL_ERROR;
|
|
}
|
|
return OPAL_SUCCESS;
|
|
}
|
|
|
|
/* JMS See note in runtime/opal.h -- this is temporary; to be
|
|
replaced with real hwloc information soon (in trunk/v1.5 and
|
|
beyond, only). This *used* to be a #define, so it's important
|
|
to define it very early. */
|
|
opal_cache_line_size = 128;
|
|
|
|
/* initialize the memory allocator */
|
|
opal_malloc_init();
|
|
|
|
/* initialize the output system */
|
|
opal_output_init();
|
|
|
|
/* initialize install dirs code */
|
|
if (OPAL_SUCCESS != (ret = opal_installdirs_base_open())) {
|
|
fprintf(stderr, "opal_installdirs_base_open() failed -- process will likely abort (%s:%d, returned %d instead of OPAL_INIT)\n",
|
|
__FILE__, __LINE__, ret);
|
|
return ret;
|
|
}
|
|
|
|
/* initialize the help system */
|
|
opal_show_help_init();
|
|
|
|
/* initialize the OPAL SOS system */
|
|
opal_sos_init();
|
|
|
|
/* register handler for errnum -> string converstion */
|
|
if (OPAL_SUCCESS !=
|
|
(ret = opal_error_register("OPAL",
|
|
OPAL_ERR_BASE, OPAL_ERR_MAX, opal_err2str))) {
|
|
error = "opal_error_register";
|
|
goto return_error;
|
|
}
|
|
|
|
/* init the trace function */
|
|
opal_trace_init();
|
|
|
|
/* keyval lex-based parser */
|
|
if (OPAL_SUCCESS != (ret = opal_util_keyval_parse_init())) {
|
|
error = "opal_util_keyval_parse_init";
|
|
goto return_error;
|
|
}
|
|
|
|
if (OPAL_SUCCESS != (ret = opal_net_init())) {
|
|
error = "opal_net_init";
|
|
goto return_error;
|
|
}
|
|
|
|
/* Setup the parameter system */
|
|
if (OPAL_SUCCESS != (ret = mca_base_param_init())) {
|
|
error = "mca_base_param_init";
|
|
goto return_error;
|
|
}
|
|
|
|
/* register params for opal */
|
|
if (OPAL_SUCCESS != (ret = opal_register_params())) {
|
|
error = "opal_register_params";
|
|
goto return_error;
|
|
}
|
|
|
|
/* pretty-print stack handlers */
|
|
if (OPAL_SUCCESS != (ret = opal_util_register_stackhandlers())) {
|
|
error = "opal_util_register_stackhandlers";
|
|
goto return_error;
|
|
}
|
|
|
|
if (OPAL_SUCCESS != (ret = opal_util_init_sys_limits())) {
|
|
error = "opal_util_init_sys_limits";
|
|
goto return_error;
|
|
}
|
|
|
|
/* initialize the datatype engine */
|
|
if (OPAL_SUCCESS != (ret = opal_datatype_init ())) {
|
|
error = "opal_datatype_init";
|
|
goto return_error;
|
|
}
|
|
|
|
/* Initialize the data storage service. */
|
|
if (OPAL_SUCCESS != (ret = opal_dss_open())) {
|
|
error = "opal_dss_open";
|
|
goto return_error;
|
|
}
|
|
|
|
return OPAL_SUCCESS;
|
|
|
|
return_error:
|
|
opal_show_help( "help-opal-runtime.txt",
|
|
"opal_init:startup:internal-failure", true,
|
|
error, ret );
|
|
return ret;
|
|
}
|
|
|
|
|
|
int
|
|
opal_init(int* pargc, char*** pargv)
|
|
{
|
|
int ret;
|
|
char *error = NULL;
|
|
|
|
if( ++opal_initialized != 1 ) {
|
|
if( opal_initialized < 1 ) {
|
|
return OPAL_ERROR;
|
|
}
|
|
return OPAL_SUCCESS;
|
|
}
|
|
|
|
/* initialize util code */
|
|
if (OPAL_SUCCESS != (ret = opal_init_util(pargc, pargv))) {
|
|
return ret;
|
|
}
|
|
|
|
/* initialize the mca */
|
|
if (OPAL_SUCCESS != (ret = mca_base_open())) {
|
|
error = "mca_base_open";
|
|
goto return_error;
|
|
}
|
|
|
|
/* open the processor affinity base */
|
|
if (OPAL_SUCCESS != (ret = opal_paffinity_base_open())) {
|
|
error = "opal_paffinity_base_open";
|
|
goto return_error;
|
|
}
|
|
if (OPAL_SUCCESS != (ret = opal_paffinity_base_select())) {
|
|
error = "opal_paffinity_base_select";
|
|
goto return_error;
|
|
}
|
|
|
|
/* the memcpy component should be one of the first who get
|
|
* loaded in order to make sure we ddo have all the available
|
|
* versions of memcpy correctly configured.
|
|
*/
|
|
if( OPAL_SUCCESS != (ret = opal_memcpy_base_open()) ) {
|
|
error = "opal_memcpy_base_open";
|
|
goto return_error;
|
|
}
|
|
|
|
/* open the memory manager components. Memory hooks may be
|
|
triggered before this (any time after mem_free_init(),
|
|
actually). This is a hook available for memory manager hooks
|
|
without good initialization routine support */
|
|
if (OPAL_SUCCESS != (ret = opal_memory_base_open())) {
|
|
error = "opal_memory_base_open";
|
|
goto return_error;
|
|
}
|
|
|
|
/* initialize the memory manager / tracker */
|
|
if (OPAL_SUCCESS != (ret = opal_mem_hooks_init())) {
|
|
error = "opal_mem_hooks_init";
|
|
goto return_error;
|
|
}
|
|
|
|
/* initialize the memory checker, to allow early support for annotation */
|
|
if (OPAL_SUCCESS != (ret = opal_memchecker_base_open())) {
|
|
error = "opal_memchecker_base_open";
|
|
goto return_error;
|
|
}
|
|
|
|
/* select the memory checker */
|
|
if (OPAL_SUCCESS != (ret = opal_memchecker_base_select())) {
|
|
error = "opal_memchecker_base_select";
|
|
goto return_error;
|
|
}
|
|
|
|
if (OPAL_SUCCESS != (ret = opal_backtrace_base_open())) {
|
|
error = "opal_backtrace_base_open";
|
|
goto return_error;
|
|
}
|
|
|
|
if (OPAL_SUCCESS != (ret = opal_timer_base_open())) {
|
|
error = "opal_timer_base_open";
|
|
goto return_error;
|
|
}
|
|
|
|
/* setup the carto framework */
|
|
if (OPAL_SUCCESS != (ret = opal_carto_base_open())) {
|
|
error = "opal_carto_base_open";
|
|
goto return_error;
|
|
}
|
|
|
|
if (OPAL_SUCCESS != (ret = opal_carto_base_select())) {
|
|
error = "opal_carto_base_select";
|
|
goto return_error;
|
|
}
|
|
|
|
/*
|
|
* Need to start the event and progress engines if noone else is.
|
|
* opal_cr_init uses the progress engine, so it is lumped together
|
|
* into this set as well.
|
|
*/
|
|
/*
|
|
* Initialize the event library
|
|
*/
|
|
if (OPAL_SUCCESS != (ret = opal_event_init())) {
|
|
error = "opal_event_init";
|
|
goto return_error;
|
|
}
|
|
|
|
/*
|
|
* Initialize the general progress engine
|
|
*/
|
|
if (OPAL_SUCCESS != (ret = opal_progress_init())) {
|
|
error = "opal_progress_init";
|
|
goto return_error;
|
|
}
|
|
/* we want to tick the event library whenever possible */
|
|
opal_progress_event_users_increment();
|
|
|
|
#if OPAL_ENABLE_FT_CR == 1
|
|
/*
|
|
* Initialize the compression framework
|
|
* Note: Currently only used in C/R so it has been marked to only
|
|
* initialize when C/R is enabled. If other places in the code
|
|
* wish to use this framework, it is safe to remove the protection.
|
|
*/
|
|
if( OPAL_SUCCESS != (ret = opal_compress_base_open()) ) {
|
|
error = "opal_compress_base_open() failed";
|
|
goto return_error;
|
|
}
|
|
if( OPAL_SUCCESS != (ret = opal_compress_base_select()) ) {
|
|
error = "opal_compress_base_select() failed";
|
|
goto return_error;
|
|
}
|
|
#endif
|
|
|
|
/*
|
|
* Initalize the checkpoint/restart functionality
|
|
* Note: Always do this so we can detect if the user
|
|
* attempts to checkpoint a non checkpointable job,
|
|
* otherwise the tools may hang or not clean up properly.
|
|
*/
|
|
if (OPAL_SUCCESS != (ret = opal_cr_init() ) ) {
|
|
error = "opal_cr_init() failed";
|
|
goto return_error;
|
|
}
|
|
|
|
return OPAL_SUCCESS;
|
|
|
|
return_error:
|
|
opal_show_help( "help-opal-runtime.txt",
|
|
"opal_init:startup:internal-failure", true,
|
|
error, ret );
|
|
return ret;
|
|
}
|
|
|