2005-05-19 17:33:55 +04:00
|
|
|
/*
|
A number of C/R enhancements per RFC below:
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php
Documentation:
http://osl.iu.edu/research/ft/
Major Changes:
--------------
* Added C/R-enabled Debugging support.
Enabled with the --enable-crdebug flag. See the following website for more information:
http://osl.iu.edu/research/ft/crdebug/
* Added Stable Storage (SStore) framework for checkpoint storage
* 'central' component does a direct to central storage save
* 'stage' component stages checkpoints to central storage while the application continues execution.
* 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress)
* 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching)
* Added Compression (compress) framework to support
* Add two new ErrMgr recovery policies
* {{{crmig}}} C/R Process Migration
* {{{autor}}} C/R Automatic Recovery
* Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component
* Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option)
* {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342)
* {{{OMPI_CR_Restart}}}
* {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules)
* {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192)
* {{{OMPI_CR_Quiesce_start}}}
* {{{OMPI_CR_Quiesce_checkpoint}}}
* {{{OMPI_CR_Quiesce_end}}}
* {{{OMPI_CR_self_register_checkpoint_callback}}}
* {{{OMPI_CR_self_register_restart_callback}}}
* {{{OMPI_CR_self_register_continue_callback}}}
* The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future.
* Add a progress meter to:
* FileM rsh (filem_rsh_process_meter)
* SnapC full (snapc_full_progress_meter)
* SStore stage (sstore_stage_progress_meter)
* Added 2 new command line options to ompi-restart
* --showme : Display the full command line that would have been exec'ed.
* --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413)
* Deprecated some MCA params:
* crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir
* snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir
* snapc_base_global_shared deprecated, use sstore_stage_global_is_shared
* snapc_base_store_in_place deprecated, replaced with different components of SStore
* snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref
* snapc_base_establish_global_snapshot_dir deprecated, never well supported
* snapc_full_skip_filem deprecated, use sstore_stage_skip_filem
Minor Changes:
--------------
* Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing.
* Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components
* Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it.
* Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}}
* Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set.
* opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality.
* Cleanup the CRS framework and components to work with the SStore framework.
* Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably).
* Add 'quiesce' hook to CRCP for a future enhancement.
* We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}.
* Add optional application level INC callbacks (registered through the CR MPI Ext interface).
* Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive.
* {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked.
* {{{opal-restart}}} also support local decompression before restarting
* {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata
* {{{orte-restart}}} now uses the SStore framework to work with the metadata
* Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality.
* Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}.
* Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped.
* Make sure to decrement the number of 'num_local_procs' in the orted when one goes away.
* odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options.
* Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities.
* Improve the checks for 'already checkpointing' error path.
* A a recovery output timer, to show how long it takes to restart a job
* Do a better job of cleaning up the old session directory on restart.
* Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment)
* Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize.
This commit was SVN r23587.
The following Trac tickets were found above:
Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
2010-08-11 00:51:11 +04:00
|
|
|
* Copyright (c) 2004-2010 The Trustees of Indiana University and Indiana
|
2005-11-05 22:57:48 +03:00
|
|
|
* University Research and Technology
|
|
|
|
* Corporation. All rights reserved.
|
|
|
|
* Copyright (c) 2004-2005 The University of Tennessee and The University
|
|
|
|
* of Tennessee Research Foundation. All rights
|
|
|
|
* reserved.
|
2005-09-07 22:52:28 +04:00
|
|
|
* Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
|
2005-05-19 17:33:55 +04:00
|
|
|
* University of Stuttgart. All rights reserved.
|
|
|
|
* Copyright (c) 2004-2005 The Regents of the University of California.
|
|
|
|
* All rights reserved.
|
2012-02-10 22:29:52 +04:00
|
|
|
* Copyright (c) 2007-2012 Cisco Systems, Inc. All rights reserved.
|
2007-11-03 05:40:22 +03:00
|
|
|
* Copyright (c) 2007 Sun Microsystems, Inc. All rights reserved.
|
- Split the datatype engine into two parts: an MPI specific part in
OMPI
and a language agnostic part in OPAL. The convertor is completely
moved into OPAL. This offers several benefits as described in RFC
http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
namely:
- Fewer basic types (int* and float* types, boolean and wchar
- Fixing naming scheme to ompi-nomenclature.
- Usability outside of the ompi-layer.
- Due to the fixed nature of simple opal types, their information is
completely
known at compile time and therefore constified
- With fewer datatypes (22), the actual sizes of bit-field types may be
reduced
from 64 to 32 bits, allowing reorganizing the opal_datatype
structure, eliminating holes and keeping data required in convertor
(upon send/recv) in one cacheline...
This has implications to the convertor-datastructure and other parts
of the code.
- Several performance tests have been run, the netpipe latency does not
change with
this patch on Linux/x86-64 on the smoky cluster.
- Extensive tests have been done to verify correctness (no new
regressions) using:
1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
ompi-ddt:
a. running both trunk and ompi-ddt resulted in no differences
(except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
correctly).
b. with --enable-memchecker and running under valgrind (one buglet
when run with static found in test-suite, commited)
2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
all passed (except for the dynamic/ tests failed!! as trunk/MTT)
3. compilation and usage of HDF5 tests on Jaguar using PGI and
PathScale compilers.
4. compilation and usage on Scicortex.
- Please note, that for the heterogeneous case, (-m32 compiled
binaries/ompi), neither
ompi-trunk, nor ompi-ddt branch would successfully launch.
This commit was SVN r21641.
2009-07-13 08:56:31 +04:00
|
|
|
* Copyright (c) 2009 Oak Ridge National Labs. All rights reserved.
|
2012-04-06 18:23:13 +04:00
|
|
|
* Copyright (c) 2010-2012 Los Alamos National Security, LLC.
|
2011-06-21 19:41:57 +04:00
|
|
|
* All rights reserved.
|
2005-05-19 17:33:55 +04:00
|
|
|
* $COPYRIGHT$
|
2005-09-07 22:52:28 +04:00
|
|
|
*
|
2005-05-19 17:33:55 +04:00
|
|
|
* Additional copyrights may follow
|
2005-09-07 22:52:28 +04:00
|
|
|
*
|
2005-05-19 17:33:55 +04:00
|
|
|
* $HEADER$
|
|
|
|
*/
|
|
|
|
|
|
|
|
/** @file **/
|
|
|
|
|
2006-02-12 04:33:29 +03:00
|
|
|
#include "opal_config.h"
|
2005-05-22 22:40:03 +04:00
|
|
|
|
2005-07-04 05:36:20 +04:00
|
|
|
#include "opal/util/malloc.h"
|
2005-07-04 03:31:27 +04:00
|
|
|
#include "opal/util/output.h"
|
2005-09-07 22:52:28 +04:00
|
|
|
#include "opal/util/trace.h"
|
2005-10-05 17:56:35 +04:00
|
|
|
#include "opal/util/show_help.h"
|
2005-11-11 03:26:27 +03:00
|
|
|
#include "opal/memoryhooks/memory.h"
|
2005-08-13 00:46:25 +04:00
|
|
|
#include "opal/mca/base/base.h"
|
|
|
|
#include "opal/runtime/opal.h"
|
2007-07-19 00:25:01 +04:00
|
|
|
#include "opal/util/net.h"
|
- Split the datatype engine into two parts: an MPI specific part in
OMPI
and a language agnostic part in OPAL. The convertor is completely
moved into OPAL. This offers several benefits as described in RFC
http://www.open-mpi.org/community/lists/devel/2009/07/6387.php
namely:
- Fewer basic types (int* and float* types, boolean and wchar
- Fixing naming scheme to ompi-nomenclature.
- Usability outside of the ompi-layer.
- Due to the fixed nature of simple opal types, their information is
completely
known at compile time and therefore constified
- With fewer datatypes (22), the actual sizes of bit-field types may be
reduced
from 64 to 32 bits, allowing reorganizing the opal_datatype
structure, eliminating holes and keeping data required in convertor
(upon send/recv) in one cacheline...
This has implications to the convertor-datastructure and other parts
of the code.
- Several performance tests have been run, the netpipe latency does not
change with
this patch on Linux/x86-64 on the smoky cluster.
- Extensive tests have been done to verify correctness (no new
regressions) using:
1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and
ompi-ddt:
a. running both trunk and ompi-ddt resulted in no differences
(except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run
correctly).
b. with --enable-memchecker and running under valgrind (one buglet
when run with static found in test-suite, commited)
2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt:
all passed (except for the dynamic/ tests failed!! as trunk/MTT)
3. compilation and usage of HDF5 tests on Jaguar using PGI and
PathScale compilers.
4. compilation and usage on Scicortex.
- Please note, that for the heterogeneous case, (-m32 compiled
binaries/ompi), neither
ompi-trunk, nor ompi-ddt branch would successfully launch.
This commit was SVN r21641.
2009-07-13 08:56:31 +04:00
|
|
|
#include "opal/datatype/opal_datatype.h"
|
2007-04-21 04:15:05 +04:00
|
|
|
#include "opal/mca/installdirs/base/base.h"
|
2005-08-14 21:23:34 +04:00
|
|
|
#include "opal/mca/memory/base/base.h"
|
2006-04-05 09:57:51 +04:00
|
|
|
#include "opal/mca/memcpy/base/base.h"
|
2011-09-11 23:02:24 +04:00
|
|
|
#include "opal/mca/hwloc/base/base.h"
|
2005-08-18 09:34:22 +04:00
|
|
|
#include "opal/mca/timer/base/base.h"
|
2008-02-12 11:46:27 +03:00
|
|
|
#include "opal/mca/memchecker/base/base.h"
|
2008-02-28 04:57:57 +03:00
|
|
|
#include "opal/dss/dss.h"
|
2011-06-21 19:41:57 +04:00
|
|
|
#include "opal/mca/shmem/base/base.h"
|
A number of C/R enhancements per RFC below:
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php
Documentation:
http://osl.iu.edu/research/ft/
Major Changes:
--------------
* Added C/R-enabled Debugging support.
Enabled with the --enable-crdebug flag. See the following website for more information:
http://osl.iu.edu/research/ft/crdebug/
* Added Stable Storage (SStore) framework for checkpoint storage
* 'central' component does a direct to central storage save
* 'stage' component stages checkpoints to central storage while the application continues execution.
* 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress)
* 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching)
* Added Compression (compress) framework to support
* Add two new ErrMgr recovery policies
* {{{crmig}}} C/R Process Migration
* {{{autor}}} C/R Automatic Recovery
* Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component
* Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option)
* {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342)
* {{{OMPI_CR_Restart}}}
* {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules)
* {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192)
* {{{OMPI_CR_Quiesce_start}}}
* {{{OMPI_CR_Quiesce_checkpoint}}}
* {{{OMPI_CR_Quiesce_end}}}
* {{{OMPI_CR_self_register_checkpoint_callback}}}
* {{{OMPI_CR_self_register_restart_callback}}}
* {{{OMPI_CR_self_register_continue_callback}}}
* The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future.
* Add a progress meter to:
* FileM rsh (filem_rsh_process_meter)
* SnapC full (snapc_full_progress_meter)
* SStore stage (sstore_stage_progress_meter)
* Added 2 new command line options to ompi-restart
* --showme : Display the full command line that would have been exec'ed.
* --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413)
* Deprecated some MCA params:
* crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir
* snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir
* snapc_base_global_shared deprecated, use sstore_stage_global_is_shared
* snapc_base_store_in_place deprecated, replaced with different components of SStore
* snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref
* snapc_base_establish_global_snapshot_dir deprecated, never well supported
* snapc_full_skip_filem deprecated, use sstore_stage_skip_filem
Minor Changes:
--------------
* Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing.
* Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components
* Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it.
* Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}}
* Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set.
* opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality.
* Cleanup the CRS framework and components to work with the SStore framework.
* Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably).
* Add 'quiesce' hook to CRCP for a future enhancement.
* We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}.
* Add optional application level INC callbacks (registered through the CR MPI Ext interface).
* Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive.
* {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked.
* {{{opal-restart}}} also support local decompression before restarting
* {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata
* {{{orte-restart}}} now uses the SStore framework to work with the metadata
* Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality.
* Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}.
* Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped.
* Make sure to decrement the number of 'num_local_procs' in the orted when one goes away.
* odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options.
* Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities.
* Improve the checks for 'already checkpointing' error path.
* A a recovery output timer, to show how long it takes to restart a job
* Do a better job of cleaning up the old session directory on restart.
* Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment)
* Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize.
This commit was SVN r23587.
The following Trac tickets were found above:
Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
2010-08-11 00:51:11 +04:00
|
|
|
#if OPAL_ENABLE_FT_CR == 1
|
|
|
|
#include "opal/mca/compress/base/base.h"
|
|
|
|
#endif
|
2007-03-17 02:11:45 +03:00
|
|
|
|
|
|
|
#include "opal/runtime/opal_cr.h"
|
|
|
|
#include "opal/mca/crs/base/base.h"
|
|
|
|
|
|
|
|
#include "opal/runtime/opal_progress.h"
|
Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.
Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.
Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.
I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:
1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)
2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.
There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.
This commit was SVN r23925.
2010-10-24 22:35:54 +04:00
|
|
|
#include "opal/mca/event/base/base.h"
|
2006-09-26 03:41:06 +04:00
|
|
|
#include "opal/mca/backtrace/base/base.h"
|
2007-03-17 02:11:45 +03:00
|
|
|
|
2006-02-12 04:33:29 +03:00
|
|
|
#include "opal/constants.h"
|
2005-08-22 07:05:39 +04:00
|
|
|
#include "opal/util/error.h"
|
2006-01-11 07:36:39 +03:00
|
|
|
#include "opal/util/stacktrace.h"
|
2006-01-16 04:48:03 +03:00
|
|
|
#include "opal/util/keyval_parse.h"
|
2007-04-23 22:53:47 +04:00
|
|
|
#include "opal/util/sys_limits.h"
|
2005-09-07 22:52:28 +04:00
|
|
|
|
2009-05-07 00:11:28 +04:00
|
|
|
#if OPAL_CC_USE_PRAGMA_IDENT
|
2007-11-03 05:40:22 +03:00
|
|
|
#pragma ident OPAL_IDENT_STRING
|
2009-05-07 00:11:28 +04:00
|
|
|
#elif OPAL_CC_USE_IDENT
|
2007-11-03 05:40:22 +03:00
|
|
|
#ident OPAL_IDENT_STRING
|
|
|
|
#endif
|
2008-05-20 16:13:19 +04:00
|
|
|
const char opal_version_string[] = OPAL_IDENT_STRING;
|
2007-03-17 02:11:45 +03:00
|
|
|
|
2011-07-12 21:07:41 +04:00
|
|
|
int opal_initialized = 0;
|
|
|
|
int opal_util_initialized = 0;
|
2012-04-24 21:31:06 +04:00
|
|
|
/* We have to put a guess in here in case hwloc is not available. If
|
|
|
|
hwloc is available, this value will be overwritten when the
|
|
|
|
hwloc data is loaded. */
|
|
|
|
int opal_cache_line_size = 128;
|
2006-08-22 00:07:38 +04:00
|
|
|
|
2011-02-13 19:09:17 +03:00
|
|
|
static int
|
|
|
|
opal_err2str(int errnum, const char **errmsg)
|
2005-08-22 07:05:39 +04:00
|
|
|
{
|
|
|
|
const char *retval;
|
|
|
|
|
2012-04-06 18:23:13 +04:00
|
|
|
switch (errnum) {
|
2005-08-22 07:05:39 +04:00
|
|
|
case OPAL_SUCCESS:
|
|
|
|
retval = "Success";
|
|
|
|
break;
|
|
|
|
case OPAL_ERROR:
|
|
|
|
retval = "Error";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_OUT_OF_RESOURCE:
|
|
|
|
retval = "Out of resource";
|
|
|
|
break;
|
2005-12-21 09:27:34 +03:00
|
|
|
case OPAL_ERR_TEMP_OUT_OF_RESOURCE:
|
|
|
|
retval = "Temporarily out of resource";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_RESOURCE_BUSY:
|
|
|
|
retval = "Resource busy";
|
2005-08-22 07:05:39 +04:00
|
|
|
break;
|
|
|
|
case OPAL_ERR_BAD_PARAM:
|
|
|
|
retval = "Bad parameter";
|
|
|
|
break;
|
2005-12-21 09:27:34 +03:00
|
|
|
case OPAL_ERR_FATAL:
|
|
|
|
retval = "Fatal";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_NOT_IMPLEMENTED:
|
|
|
|
retval = "Not implemented";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_NOT_SUPPORTED:
|
|
|
|
retval = "Not supported";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_INTERUPTED:
|
|
|
|
retval = "Interupted";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_WOULD_BLOCK:
|
|
|
|
retval = "Would block";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_IN_ERRNO:
|
|
|
|
retval = "In errno";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_UNREACH:
|
|
|
|
retval = "Unreachable";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_NOT_FOUND:
|
|
|
|
retval = "Not found";
|
|
|
|
break;
|
|
|
|
case OPAL_EXISTS:
|
|
|
|
retval = "Exists";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_TIMEOUT:
|
|
|
|
retval = "Timeout";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_NOT_AVAILABLE:
|
|
|
|
retval = "Not available";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_PERM:
|
|
|
|
retval = "No permission";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_VALUE_OUT_OF_BOUNDS:
|
|
|
|
retval = "Value out of bounds";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_FILE_READ_FAILURE:
|
|
|
|
retval = "File read failure";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_FILE_WRITE_FAILURE:
|
|
|
|
retval = "File write failure";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_FILE_OPEN_FAILURE:
|
|
|
|
retval = "File open failure";
|
|
|
|
break;
|
2008-02-28 04:57:57 +03:00
|
|
|
case OPAL_ERR_PACK_MISMATCH:
|
|
|
|
retval = "Pack data mismatch";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_PACK_FAILURE:
|
|
|
|
retval = "Data pack failed";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_UNPACK_FAILURE:
|
|
|
|
retval = "Data unpack failed";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_UNPACK_INADEQUATE_SPACE:
|
|
|
|
retval = "Data unpack had inadequate space";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_UNPACK_READ_PAST_END_OF_BUFFER:
|
|
|
|
retval = "Data unpack would read past end of buffer";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_OPERATION_UNSUPPORTED:
|
|
|
|
retval = "Requested operation is not supported on referenced data type";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_UNKNOWN_DATA_TYPE:
|
|
|
|
retval = "Unknown data type";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_BUFFER:
|
2009-04-16 20:23:28 +04:00
|
|
|
retval = "Buffer type (described vs non-described) mismatch - operation not allowed";
|
2008-02-28 04:57:57 +03:00
|
|
|
break;
|
|
|
|
case OPAL_ERR_DATA_TYPE_REDEF:
|
|
|
|
retval = "Attempt to redefine an existing data type";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_DATA_OVERWRITE_ATTEMPT:
|
|
|
|
retval = "Attempt to overwrite a data value";
|
|
|
|
break;
|
2010-05-07 00:57:17 +04:00
|
|
|
case OPAL_ERR_MODULE_NOT_FOUND:
|
|
|
|
retval = "Framework requires at least one active module, but none found";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_TOPO_SLOT_LIST_NOT_SUPPORTED:
|
|
|
|
retval = "OS topology does not support slot_list process affinity";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_TOPO_SOCKET_NOT_SUPPORTED:
|
|
|
|
retval = "Could not obtain socket topology information";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_TOPO_CORE_NOT_SUPPORTED:
|
|
|
|
retval = "Could not obtain core topology information";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_NOT_ENOUGH_SOCKETS:
|
|
|
|
retval = "Not enough sockets to meet request";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_NOT_ENOUGH_CORES:
|
|
|
|
retval = "Not enough cores to meet request";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_INVALID_PHYS_CPU:
|
|
|
|
retval = "Invalid physical cpu number returned";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_MULTIPLE_AFFINITIES:
|
|
|
|
retval = "Multiple methods for assigning process affinity were specified";
|
|
|
|
break;
|
|
|
|
case OPAL_ERR_SLOT_LIST_RANGE:
|
|
|
|
retval = "Provided slot_list range is invalid";
|
|
|
|
break;
|
2011-06-07 06:09:11 +04:00
|
|
|
case OPAL_ERR_NETWORK_NOT_PARSEABLE:
|
|
|
|
retval = "Provided network specification is not parseable";
|
|
|
|
break;
|
At long last, the fabled revision to the affinity system has arrived. A more detailed explanation of how this all works will be presented here:
https://svn.open-mpi.org/trac/ompi/wiki/ProcessPlacement
The wiki page is incomplete at the moment, but I hope to complete it over the next few days. I will provide updates on the devel list. As the wiki page states, the default and most commonly used options remain unchanged (except as noted below). New, esoteric and complex options have been added, but unless you are a true masochist, you are unlikely to use many of them beyond perhaps an initial curiosity-motivated experimentation.
In a nutshell, this commit revamps the map/rank/bind procedure to take into account topology info on the compute nodes. I have, for the most part, preserved the default behaviors, with three notable exceptions:
1. I have at long last bowed my head in submission to the system admin's of managed clusters. For years, they have complained about our default of allowing users to oversubscribe nodes - i.e., to run more processes on a node than allocated slots. Accordingly, I have modified the default behavior: if you are running off of hostfile/dash-host allocated nodes, then the default is to allow oversubscription. If you are running off of RM-allocated nodes, then the default is to NOT allow oversubscription. Flags to override these behaviors are provided, so this only affects the default behavior.
2. both cpus/rank and stride have been removed. The latter was demanded by those who didn't understand the purpose behind it - and I agreed as the users who requested it are no longer using it. The former was removed temporarily pending implementation.
3. vm launch is now the sole method for starting OMPI. It was just too darned hard to maintain multiple launch procedures - maybe someday, provided someone can demonstrate a reason to do so.
As Jeff stated, it is impossible to fully test a change of this size. I have tested it on Linux and Mac, covering all the default and simple options, singletons, and comm_spawn. That said, I'm sure others will find problems, so I'll be watching MTT results until this stabilizes.
This commit was SVN r25476.
2011-11-15 07:40:11 +04:00
|
|
|
case OPAL_ERR_SILENT:
|
|
|
|
retval = NULL;
|
|
|
|
break;
|
2012-02-10 22:29:52 +04:00
|
|
|
case OPAL_ERR_NOT_INITIALIZED:
|
|
|
|
retval = "Not initialized";
|
|
|
|
break;
|
Per RFC, bring in the following changes:
* Remove paffinity, maffinity, and carto frameworks -- they've been
wholly replaced by hwloc.
* Move ompi_mpi_init() affinity-setting/checking code down to ORTE.
* Update sm, smcuda, wv, and openib components to no longer use carto.
Instead, use hwloc data. There are still optimizations possible in
the sm/smcuda BTLs (i.e., making multiple mpools). Also, the old
carto-based code found out how many NUMA nodes were ''available''
-- not how many were used ''in this job''. The new hwloc-using
code computes the same value -- it was not updated to calculate how
many NUMA nodes are used ''by this job.''
* Note that I cannot compile the smcuda and wv BTLs -- I ''think''
they're right, but they need to be verified by their owners.
* The openib component now does a bunch of stuff to figure out where
"near" OpenFabrics devices are. '''THIS IS A CHANGE IN DEFAULT
BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors
(I do not have a NUMA machine with an OpenFabrics device that is a
non-uniform distance from multiple different NUMA nodes).
* Completely rewrite the OMPI_Affinity_str() routine from the
"affinity" mpiext extension. This extension now understands
hyperthreads; the output format of it has changed a bit to reflect
this new information.
* Bunches of minor changes around the code base to update names/types
from maffinity/paffinity-based names to hwloc-based names.
* Add some helper functions into the hwloc base, mainly having to do
with the fact that we have the hwloc data reporting ''all''
topology information, but sometimes you really only want the
(online | available) data.
This commit was SVN r26391.
2012-05-07 18:52:54 +04:00
|
|
|
case OPAL_ERR_NOT_BOUND:
|
|
|
|
retval = "Not bound";
|
|
|
|
break;
|
2005-09-07 22:52:28 +04:00
|
|
|
default:
|
2005-08-22 07:05:39 +04:00
|
|
|
retval = NULL;
|
Per RFC, bring in the following changes:
* Remove paffinity, maffinity, and carto frameworks -- they've been
wholly replaced by hwloc.
* Move ompi_mpi_init() affinity-setting/checking code down to ORTE.
* Update sm, smcuda, wv, and openib components to no longer use carto.
Instead, use hwloc data. There are still optimizations possible in
the sm/smcuda BTLs (i.e., making multiple mpools). Also, the old
carto-based code found out how many NUMA nodes were ''available''
-- not how many were used ''in this job''. The new hwloc-using
code computes the same value -- it was not updated to calculate how
many NUMA nodes are used ''by this job.''
* Note that I cannot compile the smcuda and wv BTLs -- I ''think''
they're right, but they need to be verified by their owners.
* The openib component now does a bunch of stuff to figure out where
"near" OpenFabrics devices are. '''THIS IS A CHANGE IN DEFAULT
BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors
(I do not have a NUMA machine with an OpenFabrics device that is a
non-uniform distance from multiple different NUMA nodes).
* Completely rewrite the OMPI_Affinity_str() routine from the
"affinity" mpiext extension. This extension now understands
hyperthreads; the output format of it has changed a bit to reflect
this new information.
* Bunches of minor changes around the code base to update names/types
from maffinity/paffinity-based names to hwloc-based names.
* Add some helper functions into the hwloc base, mainly having to do
with the fact that we have the hwloc data reporting ''all''
topology information, but sometimes you really only want the
(online | available) data.
This commit was SVN r26391.
2012-05-07 18:52:54 +04:00
|
|
|
}
|
2005-08-22 07:05:39 +04:00
|
|
|
|
2011-02-13 19:09:17 +03:00
|
|
|
*errmsg = retval;
|
|
|
|
return OPAL_SUCCESS;
|
2005-08-22 07:05:39 +04:00
|
|
|
}
|
2005-08-18 09:34:22 +04:00
|
|
|
|
2005-05-19 17:33:55 +04:00
|
|
|
|
2006-01-16 04:48:03 +03:00
|
|
|
int
|
2009-12-04 03:51:15 +03:00
|
|
|
opal_init_util(int* pargc, char*** pargv)
|
2005-05-22 22:40:03 +04:00
|
|
|
{
|
2005-10-05 17:56:35 +04:00
|
|
|
int ret;
|
|
|
|
char *error = NULL;
|
|
|
|
|
2011-07-12 21:07:41 +04:00
|
|
|
if( ++opal_util_initialized != 1 ) {
|
|
|
|
if( opal_util_initialized < 1 ) {
|
|
|
|
return OPAL_ERROR;
|
|
|
|
}
|
2007-07-19 00:28:19 +04:00
|
|
|
return OPAL_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2005-05-22 22:40:03 +04:00
|
|
|
/* initialize the memory allocator */
|
2005-07-04 05:36:20 +04:00
|
|
|
opal_malloc_init();
|
2005-05-22 22:40:03 +04:00
|
|
|
|
|
|
|
/* initialize the output system */
|
2005-07-04 03:31:27 +04:00
|
|
|
opal_output_init();
|
2005-08-22 07:05:39 +04:00
|
|
|
|
2009-09-29 06:07:46 +04:00
|
|
|
/* initialize install dirs code */
|
|
|
|
if (OPAL_SUCCESS != (ret = opal_installdirs_base_open())) {
|
|
|
|
fprintf(stderr, "opal_installdirs_base_open() failed -- process will likely abort (%s:%d, returned %d instead of OPAL_INIT)\n",
|
|
|
|
__FILE__, __LINE__, ret);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
This commit represents a bunch of work on a Mercurial side branch. As
such, the commit message back to the master SVN repository is fairly
long.
= ORTE Job-Level Output Messages =
Add two new interfaces that should be used for all new code throughout
the ORTE and OMPI layers (we already make the search-and-replace on
the existing ORTE / OMPI layers):
* orte_output(): (and corresponding friends ORTE_OUTPUT,
orte_output_verbose, etc.) This function sends the output directly
to the HNP for processing as part of a job-specific output
channel. It supports all the same outputs as opal_output()
(syslog, file, stdout, stderr), but for stdout/stderr, the output
is sent to the HNP for processing and output. More on this below.
* orte_show_help(): This function is a drop-in-replacement for
opal_show_help(), with two differences in functionality:
1. the rendered text help message output is sent to the HNP for
display (rather than outputting directly into the process' stderr
stream)
1. the HNP detects duplicate help messages and does not display them
(so that you don't see the same error message N times, once from
each of your N MPI processes); instead, it counts "new" instances
of the help message and displays a message every ~5 seconds when
there are new ones ("I got X new copies of the help message...")
opal_show_help and opal_output still exist, but they only output in
the current process. The intent for the new orte_* functions is that
they can apply job-level intelligence to the output. As such, we
recommend that all new ORTE and OMPI code use the new orte_*
functions, not thei opal_* functions.
=== New code ===
For ORTE and OMPI programmers, here's what you need to do differently
in new code:
* Do not include opal/util/show_help.h or opal/util/output.h.
Instead, include orte/util/output.h (this one header file has
declarations for both the orte_output() series of functions and
orte_show_help()).
* Effectively s/opal_output/orte_output/gi throughout your code.
Note that orte_output_open() takes a slightly different argument
list (as a way to pass data to the filtering stream -- see below),
so you if explicitly call opal_output_open(), you'll need to
slightly adapt to the new signature of orte_output_open().
* Literally s/opal_show_help/orte_show_help/. The function signature
is identical.
=== Notes ===
* orte_output'ing to stream 0 will do similar to what
opal_output'ing did, so leaving a hard-coded "0" as the first
argument is safe.
* For systems that do not use ORTE's RML or the HNP, the effect of
orte_output_* and orte_show_help will be identical to their opal
counterparts (the additional information passed to
orte_output_open() will be lost!). Indeed, the orte_* functions
simply become trivial wrappers to their opal_* counterparts. Note
that we have not tested this; the code is simple but it is quite
possible that we mucked something up.
= Filter Framework =
Messages sent view the new orte_* functions described above and
messages output via the IOF on the HNP will now optionally be passed
through a new "filter" framework before being output to
stdout/stderr. The "filter" OPAL MCA framework is intended to allow
preprocessing to messages before they are sent to their final
destinations. The first component that was written in the filter
framework was to create an XML stream, segregating all the messages
into different XML tags, etc. This will allow 3rd party tools to read
the stdout/stderr from the HNP and be able to know exactly what each
text message is (e.g., a help message, another OMPI infrastructure
message, stdout from the user process, stderr from the user process,
etc.).
Filtering is not active by default. Filter components must be
specifically requested, such as:
{{{
$ mpirun --mca filter xml ...
}}}
There can only be one filter component active.
= New MCA Parameters =
The new functionality described above introduces two new MCA
parameters:
* '''orte_base_help_aggregate''': Defaults to 1 (true), meaning that
help messages will be aggregated, as described above. If set to 0,
all help messages will be displayed, even if they are duplicates
(i.e., the original behavior).
* '''orte_base_show_output_recursions''': An MCA parameter to help
debug one of the known issues, described below. It is likely that
this MCA parameter will disappear before v1.3 final.
= Known Issues =
* The XML filter component is not complete. The current output from
this component is preliminary and not real XML. A bit more work
needs to be done to configure.m4 search for an appropriate XML
library/link it in/use it at run time.
* There are possible recursion loops in the orte_output() and
orte_show_help() functions -- e.g., if RML send calls orte_output()
or orte_show_help(). We have some ideas how to fix these, but
figured that it was ok to commit before feature freeze with known
issues. The code currently contains sub-optimal workarounds so
that this will not be a problem, but it would be good to actually
solve the problem rather than have hackish workarounds before v1.3 final.
This commit was SVN r18434.
2008-05-14 00:00:55 +04:00
|
|
|
/* initialize the help system */
|
|
|
|
opal_show_help_init();
|
|
|
|
|
2005-08-22 07:05:39 +04:00
|
|
|
/* register handler for errnum -> string converstion */
|
2007-08-04 04:44:23 +04:00
|
|
|
if (OPAL_SUCCESS !=
|
|
|
|
(ret = opal_error_register("OPAL",
|
|
|
|
OPAL_ERR_BASE, OPAL_ERR_MAX, opal_err2str))) {
|
2005-10-05 17:56:35 +04:00
|
|
|
error = "opal_error_register";
|
2005-11-27 00:18:47 +03:00
|
|
|
goto return_error;
|
2005-10-05 17:56:35 +04:00
|
|
|
}
|
2005-08-25 00:19:36 +04:00
|
|
|
|
2007-04-21 04:15:05 +04:00
|
|
|
/* init the trace function */
|
|
|
|
opal_trace_init();
|
|
|
|
|
2006-01-16 04:48:03 +03:00
|
|
|
/* keyval lex-based parser */
|
|
|
|
if (OPAL_SUCCESS != (ret = opal_util_keyval_parse_init())) {
|
|
|
|
error = "opal_util_keyval_parse_init";
|
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
|
2007-07-19 00:25:01 +04:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_net_init())) {
|
|
|
|
error = "opal_net_init";
|
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
|
2006-01-16 04:48:03 +03:00
|
|
|
/* Setup the parameter system */
|
|
|
|
if (OPAL_SUCCESS != (ret = mca_base_param_init())) {
|
|
|
|
error = "mca_base_param_init";
|
2005-11-27 00:18:47 +03:00
|
|
|
goto return_error;
|
2005-10-05 17:56:35 +04:00
|
|
|
}
|
2005-05-19 17:33:55 +04:00
|
|
|
|
2006-01-11 07:36:39 +03:00
|
|
|
/* register params for opal */
|
2007-11-07 04:52:23 +03:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_register_params())) {
|
2006-01-11 07:36:39 +03:00
|
|
|
error = "opal_register_params";
|
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
|
2006-01-16 04:48:03 +03:00
|
|
|
/* pretty-print stack handlers */
|
2006-12-03 16:59:23 +03:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_util_register_stackhandlers())) {
|
|
|
|
error = "opal_util_register_stackhandlers";
|
2006-01-16 04:48:03 +03:00
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
|
2007-04-23 22:53:47 +04:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_util_init_sys_limits())) {
|
|
|
|
error = "opal_util_init_sys_limits";
|
|
|
|
goto return_error;
|
|
|
|
}
|
2007-08-04 04:44:23 +04:00
|
|
|
|
2009-08-03 20:46:33 +04:00
|
|
|
/* initialize the datatype engine */
|
|
|
|
if (OPAL_SUCCESS != (ret = opal_datatype_init ())) {
|
|
|
|
error = "opal_datatype_init";
|
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Initialize the data storage service. */
|
2008-02-28 04:57:57 +03:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_dss_open())) {
|
|
|
|
error = "opal_dss_open";
|
|
|
|
goto return_error;
|
|
|
|
}
|
2009-08-03 20:46:33 +04:00
|
|
|
|
2006-01-16 04:48:03 +03:00
|
|
|
return OPAL_SUCCESS;
|
|
|
|
|
|
|
|
return_error:
|
2008-03-06 17:44:47 +03:00
|
|
|
opal_show_help( "help-opal-runtime.txt",
|
2006-01-16 04:48:03 +03:00
|
|
|
"opal_init:startup:internal-failure", true,
|
|
|
|
error, ret );
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
int
|
2009-12-04 03:51:15 +03:00
|
|
|
opal_init(int* pargc, char*** pargv)
|
2006-01-16 04:48:03 +03:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
char *error = NULL;
|
|
|
|
|
2011-07-12 21:07:41 +04:00
|
|
|
if( ++opal_initialized != 1 ) {
|
|
|
|
if( opal_initialized < 1 ) {
|
|
|
|
return OPAL_ERROR;
|
|
|
|
}
|
2007-06-01 06:43:46 +04:00
|
|
|
return OPAL_SUCCESS;
|
|
|
|
}
|
|
|
|
|
2006-01-16 04:48:03 +03:00
|
|
|
/* initialize util code */
|
2009-12-04 03:51:15 +03:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_init_util(pargc, pargv))) {
|
2006-01-16 04:48:03 +03:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* initialize the mca */
|
|
|
|
if (OPAL_SUCCESS != (ret = mca_base_open())) {
|
|
|
|
error = "mca_base_open";
|
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
|
2011-09-11 23:02:24 +04:00
|
|
|
/* open hwloc - since this is a static framework, no
|
|
|
|
* select is required
|
|
|
|
*/
|
|
|
|
if (OPAL_SUCCESS != (ret = opal_hwloc_base_open())) {
|
2011-11-02 22:24:19 +04:00
|
|
|
error = "opal_hwloc_base_open";
|
2011-09-11 23:02:24 +04:00
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
|
2006-04-05 09:57:51 +04:00
|
|
|
/* the memcpy component should be one of the first who get
|
|
|
|
* loaded in order to make sure we ddo have all the available
|
|
|
|
* versions of memcpy correctly configured.
|
|
|
|
*/
|
|
|
|
if( OPAL_SUCCESS != (ret = opal_memcpy_base_open()) ) {
|
|
|
|
error = "opal_memcpy_base_open";
|
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
|
2005-08-14 21:23:34 +04:00
|
|
|
/* open the memory manager components. Memory hooks may be
|
|
|
|
triggered before this (any time after mem_free_init(),
|
|
|
|
actually). This is a hook available for memory manager hooks
|
|
|
|
without good initialization routine support */
|
2005-10-05 17:56:35 +04:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_memory_base_open())) {
|
|
|
|
error = "opal_memory_base_open";
|
2005-11-27 00:18:47 +03:00
|
|
|
goto return_error;
|
2005-10-05 17:56:35 +04:00
|
|
|
}
|
2005-08-14 21:23:34 +04:00
|
|
|
|
2005-09-27 00:20:20 +04:00
|
|
|
/* initialize the memory manager / tracker */
|
2008-05-19 15:57:44 +04:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_mem_hooks_init())) {
|
2006-12-03 16:59:23 +03:00
|
|
|
error = "opal_mem_hooks_init";
|
2005-11-27 00:18:47 +03:00
|
|
|
goto return_error;
|
2005-10-05 17:56:35 +04:00
|
|
|
}
|
2005-09-27 00:20:20 +04:00
|
|
|
|
2008-02-12 11:46:27 +03:00
|
|
|
/* initialize the memory checker, to allow early support for annotation */
|
2008-05-19 15:57:44 +04:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_memchecker_base_open())) {
|
2008-02-12 11:46:27 +03:00
|
|
|
error = "opal_memchecker_base_open";
|
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* select the memory checker */
|
2008-05-19 15:57:44 +04:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_memchecker_base_select())) {
|
2008-02-12 11:46:27 +03:00
|
|
|
error = "opal_memchecker_base_select";
|
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
|
2006-09-26 03:41:06 +04:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_backtrace_base_open())) {
|
|
|
|
error = "opal_backtrace_base_open";
|
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
|
2005-10-05 17:56:35 +04:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_timer_base_open())) {
|
|
|
|
error = "opal_timer_base_open";
|
2005-11-27 00:18:47 +03:00
|
|
|
goto return_error;
|
2005-10-05 17:56:35 +04:00
|
|
|
}
|
2006-01-11 07:36:39 +03:00
|
|
|
|
2007-05-25 01:54:58 +04:00
|
|
|
/*
|
|
|
|
* Need to start the event and progress engines if noone else is.
|
|
|
|
* opal_cr_init uses the progress engine, so it is lumped together
|
|
|
|
* into this set as well.
|
|
|
|
*/
|
|
|
|
/*
|
|
|
|
* Initialize the event library
|
|
|
|
*/
|
Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.
Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.
Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.
I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:
1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)
2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.
There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.
This commit was SVN r23925.
2010-10-24 22:35:54 +04:00
|
|
|
if (OPAL_SUCCESS != (ret = opal_event_base_open())) {
|
|
|
|
error = "opal_event_base_open";
|
2007-05-25 01:54:58 +04:00
|
|
|
goto return_error;
|
|
|
|
}
|
2007-03-17 02:11:45 +03:00
|
|
|
|
2007-05-25 01:54:58 +04:00
|
|
|
/*
|
2008-02-12 19:59:59 +03:00
|
|
|
* Initialize the general progress engine
|
2007-05-25 01:54:58 +04:00
|
|
|
*/
|
|
|
|
if (OPAL_SUCCESS != (ret = opal_progress_init())) {
|
|
|
|
error = "opal_progress_init";
|
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
/* we want to tick the event library whenever possible */
|
|
|
|
opal_progress_event_users_increment();
|
|
|
|
|
2011-06-21 19:41:57 +04:00
|
|
|
/* setup the shmem framework */
|
|
|
|
if (OPAL_SUCCESS != (ret = opal_shmem_base_open())) {
|
|
|
|
error = "opal_shmem_base_open";
|
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (OPAL_SUCCESS != (ret = opal_shmem_base_select())) {
|
|
|
|
error = "opal_shmem_base_select";
|
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
|
A number of C/R enhancements per RFC below:
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php
Documentation:
http://osl.iu.edu/research/ft/
Major Changes:
--------------
* Added C/R-enabled Debugging support.
Enabled with the --enable-crdebug flag. See the following website for more information:
http://osl.iu.edu/research/ft/crdebug/
* Added Stable Storage (SStore) framework for checkpoint storage
* 'central' component does a direct to central storage save
* 'stage' component stages checkpoints to central storage while the application continues execution.
* 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress)
* 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching)
* Added Compression (compress) framework to support
* Add two new ErrMgr recovery policies
* {{{crmig}}} C/R Process Migration
* {{{autor}}} C/R Automatic Recovery
* Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component
* Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option)
* {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342)
* {{{OMPI_CR_Restart}}}
* {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules)
* {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192)
* {{{OMPI_CR_Quiesce_start}}}
* {{{OMPI_CR_Quiesce_checkpoint}}}
* {{{OMPI_CR_Quiesce_end}}}
* {{{OMPI_CR_self_register_checkpoint_callback}}}
* {{{OMPI_CR_self_register_restart_callback}}}
* {{{OMPI_CR_self_register_continue_callback}}}
* The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future.
* Add a progress meter to:
* FileM rsh (filem_rsh_process_meter)
* SnapC full (snapc_full_progress_meter)
* SStore stage (sstore_stage_progress_meter)
* Added 2 new command line options to ompi-restart
* --showme : Display the full command line that would have been exec'ed.
* --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413)
* Deprecated some MCA params:
* crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir
* snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir
* snapc_base_global_shared deprecated, use sstore_stage_global_is_shared
* snapc_base_store_in_place deprecated, replaced with different components of SStore
* snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref
* snapc_base_establish_global_snapshot_dir deprecated, never well supported
* snapc_full_skip_filem deprecated, use sstore_stage_skip_filem
Minor Changes:
--------------
* Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing.
* Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components
* Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it.
* Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}}
* Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set.
* opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality.
* Cleanup the CRS framework and components to work with the SStore framework.
* Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably).
* Add 'quiesce' hook to CRCP for a future enhancement.
* We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}.
* Add optional application level INC callbacks (registered through the CR MPI Ext interface).
* Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive.
* {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked.
* {{{opal-restart}}} also support local decompression before restarting
* {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata
* {{{orte-restart}}} now uses the SStore framework to work with the metadata
* Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality.
* Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}.
* Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped.
* Make sure to decrement the number of 'num_local_procs' in the orted when one goes away.
* odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options.
* Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities.
* Improve the checks for 'already checkpointing' error path.
* A a recovery output timer, to show how long it takes to restart a job
* Do a better job of cleaning up the old session directory on restart.
* Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment)
* Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize.
This commit was SVN r23587.
The following Trac tickets were found above:
Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
2010-08-11 00:51:11 +04:00
|
|
|
#if OPAL_ENABLE_FT_CR == 1
|
|
|
|
/*
|
|
|
|
* Initialize the compression framework
|
|
|
|
* Note: Currently only used in C/R so it has been marked to only
|
|
|
|
* initialize when C/R is enabled. If other places in the code
|
|
|
|
* wish to use this framework, it is safe to remove the protection.
|
|
|
|
*/
|
|
|
|
if( OPAL_SUCCESS != (ret = opal_compress_base_open()) ) {
|
2011-07-14 03:34:34 +04:00
|
|
|
error = "opal_compress_base_open";
|
A number of C/R enhancements per RFC below:
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php
Documentation:
http://osl.iu.edu/research/ft/
Major Changes:
--------------
* Added C/R-enabled Debugging support.
Enabled with the --enable-crdebug flag. See the following website for more information:
http://osl.iu.edu/research/ft/crdebug/
* Added Stable Storage (SStore) framework for checkpoint storage
* 'central' component does a direct to central storage save
* 'stage' component stages checkpoints to central storage while the application continues execution.
* 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress)
* 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching)
* Added Compression (compress) framework to support
* Add two new ErrMgr recovery policies
* {{{crmig}}} C/R Process Migration
* {{{autor}}} C/R Automatic Recovery
* Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component
* Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option)
* {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342)
* {{{OMPI_CR_Restart}}}
* {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules)
* {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192)
* {{{OMPI_CR_Quiesce_start}}}
* {{{OMPI_CR_Quiesce_checkpoint}}}
* {{{OMPI_CR_Quiesce_end}}}
* {{{OMPI_CR_self_register_checkpoint_callback}}}
* {{{OMPI_CR_self_register_restart_callback}}}
* {{{OMPI_CR_self_register_continue_callback}}}
* The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future.
* Add a progress meter to:
* FileM rsh (filem_rsh_process_meter)
* SnapC full (snapc_full_progress_meter)
* SStore stage (sstore_stage_progress_meter)
* Added 2 new command line options to ompi-restart
* --showme : Display the full command line that would have been exec'ed.
* --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413)
* Deprecated some MCA params:
* crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir
* snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir
* snapc_base_global_shared deprecated, use sstore_stage_global_is_shared
* snapc_base_store_in_place deprecated, replaced with different components of SStore
* snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref
* snapc_base_establish_global_snapshot_dir deprecated, never well supported
* snapc_full_skip_filem deprecated, use sstore_stage_skip_filem
Minor Changes:
--------------
* Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing.
* Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components
* Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it.
* Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}}
* Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set.
* opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality.
* Cleanup the CRS framework and components to work with the SStore framework.
* Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably).
* Add 'quiesce' hook to CRCP for a future enhancement.
* We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}.
* Add optional application level INC callbacks (registered through the CR MPI Ext interface).
* Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive.
* {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked.
* {{{opal-restart}}} also support local decompression before restarting
* {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata
* {{{orte-restart}}} now uses the SStore framework to work with the metadata
* Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality.
* Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}.
* Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped.
* Make sure to decrement the number of 'num_local_procs' in the orted when one goes away.
* odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options.
* Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities.
* Improve the checks for 'already checkpointing' error path.
* A a recovery output timer, to show how long it takes to restart a job
* Do a better job of cleaning up the old session directory on restart.
* Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment)
* Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize.
This commit was SVN r23587.
The following Trac tickets were found above:
Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
2010-08-11 00:51:11 +04:00
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
if( OPAL_SUCCESS != (ret = opal_compress_base_select()) ) {
|
2011-07-14 03:34:34 +04:00
|
|
|
error = "opal_compress_base_select";
|
A number of C/R enhancements per RFC below:
http://www.open-mpi.org/community/lists/devel/2010/07/8240.php
Documentation:
http://osl.iu.edu/research/ft/
Major Changes:
--------------
* Added C/R-enabled Debugging support.
Enabled with the --enable-crdebug flag. See the following website for more information:
http://osl.iu.edu/research/ft/crdebug/
* Added Stable Storage (SStore) framework for checkpoint storage
* 'central' component does a direct to central storage save
* 'stage' component stages checkpoints to central storage while the application continues execution.
* 'stage' supports offline compression of checkpoints before moving (sstore_stage_compress)
* 'stage' supports local caching of checkpoints to improve automatic recovery (sstore_stage_caching)
* Added Compression (compress) framework to support
* Add two new ErrMgr recovery policies
* {{{crmig}}} C/R Process Migration
* {{{autor}}} C/R Automatic Recovery
* Added the {{{ompi-migrate}}} command line tool to support the {{{crmig}}} ErrMgr component
* Added CR MPI Ext functions (enable them with {{{--enable-mpi-ext=cr}}} configure option)
* {{{OMPI_CR_Checkpoint}}} (Fixes trac:2342)
* {{{OMPI_CR_Restart}}}
* {{{OMPI_CR_Migrate}}} (may need some more work for mapping rules)
* {{{OMPI_CR_INC_register_callback}}} (Fixes trac:2192)
* {{{OMPI_CR_Quiesce_start}}}
* {{{OMPI_CR_Quiesce_checkpoint}}}
* {{{OMPI_CR_Quiesce_end}}}
* {{{OMPI_CR_self_register_checkpoint_callback}}}
* {{{OMPI_CR_self_register_restart_callback}}}
* {{{OMPI_CR_self_register_continue_callback}}}
* The ErrMgr predicted_fault() interface has been changed to take an opal_list_t of ErrMgr defined types. This will allow us to better support a wider range of fault prediction services in the future.
* Add a progress meter to:
* FileM rsh (filem_rsh_process_meter)
* SnapC full (snapc_full_progress_meter)
* SStore stage (sstore_stage_progress_meter)
* Added 2 new command line options to ompi-restart
* --showme : Display the full command line that would have been exec'ed.
* --mpirun_opts : Command line options to pass directly to mpirun. (Fixes trac:2413)
* Deprecated some MCA params:
* crs_base_snapshot_dir deprecated, use sstore_stage_local_snapshot_dir
* snapc_base_global_snapshot_dir deprecated, use sstore_base_global_snapshot_dir
* snapc_base_global_shared deprecated, use sstore_stage_global_is_shared
* snapc_base_store_in_place deprecated, replaced with different components of SStore
* snapc_base_global_snapshot_ref deprecated, use sstore_base_global_snapshot_ref
* snapc_base_establish_global_snapshot_dir deprecated, never well supported
* snapc_full_skip_filem deprecated, use sstore_stage_skip_filem
Minor Changes:
--------------
* Fixes trac:1924 : {{{ompi-restart}}} now recognizes path prefixed checkpoint handles and does the right thing.
* Fixes trac:2097 : {{{ompi-info}}} should now report all available CRS components
* Fixes trac:2161 : Manual checkpoint movement. A user can 'mv' a checkpoint directory from the original location to another and still restart from it.
* Fixes trac:2208 : Honor various TMPDIR varaibles instead of forcing {{{/tmp}}}
* Move {{{ompi_cr_continue_like_restart}}} to {{{orte_cr_continue_like_restart}}} to be more flexible in where this should be set.
* opal_crs_base_metadata_write* functions have been moved to SStore to support a wider range of metadata handling functionality.
* Cleanup the CRS framework and components to work with the SStore framework.
* Cleanup the SnapC framework and components to work with the SStore framework (cleans up these code paths considerably).
* Add 'quiesce' hook to CRCP for a future enhancement.
* We now require a BLCR version that supports {{{cr_request_file()}}} or {{{cr_request_checkpoint()}}} in order to make the code more maintainable. Note that {{{cr_request_file}}} has been deprecated since 0.7.0, so we prefer to use {{{cr_request_checkpoint()}}}.
* Add optional application level INC callbacks (registered through the CR MPI Ext interface).
* Increase the {{{opal_cr_thread_sleep_wait}}} parameter to 1000 microseconds to make the C/R thread less aggressive.
* {{{opal-restart}}} now looks for cache directories before falling back on stable storage when asked.
* {{{opal-restart}}} also support local decompression before restarting
* {{{orte-checkpoint}}} now uses the SStore framework to work with the metadata
* {{{orte-restart}}} now uses the SStore framework to work with the metadata
* Remove the {{{orte-restart}}} preload option. This was removed since the user only needs to select the 'stage' component in order to support this functionality.
* Since the '-am' parameter is saved in the metadata, {{{ompi-restart}}} no longer hard codes {{{-am ft-enable-cr}}}.
* Fix {{{hnp}}} ErrMgr so that if a previous component in the stack has 'fixed' the problem, then it should be skipped.
* Make sure to decrement the number of 'num_local_procs' in the orted when one goes away.
* odls now checks the SStore framework to see if it needs to load any checkpoint files before launching (to support 'stage'). This separates the SStore logic from the --preload-[binary|files] options.
* Add unique IDs to the named pipes established between the orted and the app in SnapC. This is to better support migration and automatic recovery activities.
* Improve the checks for 'already checkpointing' error path.
* A a recovery output timer, to show how long it takes to restart a job
* Do a better job of cleaning up the old session directory on restart.
* Add a local module to the autor and crmig ErrMgr components. These small modules prevent the 'orted' component from attempting a local recovery (Which does not work for MPI apps at the moment)
* Add a fix for bounding the checkpointable region between MPI_Init and MPI_Finalize.
This commit was SVN r23587.
The following Trac tickets were found above:
Ticket 1924 --> https://svn.open-mpi.org/trac/ompi/ticket/1924
Ticket 2097 --> https://svn.open-mpi.org/trac/ompi/ticket/2097
Ticket 2161 --> https://svn.open-mpi.org/trac/ompi/ticket/2161
Ticket 2192 --> https://svn.open-mpi.org/trac/ompi/ticket/2192
Ticket 2208 --> https://svn.open-mpi.org/trac/ompi/ticket/2208
Ticket 2342 --> https://svn.open-mpi.org/trac/ompi/ticket/2342
Ticket 2413 --> https://svn.open-mpi.org/trac/ompi/ticket/2413
2010-08-11 00:51:11 +04:00
|
|
|
goto return_error;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2007-05-25 01:54:58 +04:00
|
|
|
/*
|
|
|
|
* Initalize the checkpoint/restart functionality
|
|
|
|
* Note: Always do this so we can detect if the user
|
|
|
|
* attempts to checkpoint a non checkpointable job,
|
|
|
|
* otherwise the tools may hang or not clean up properly.
|
|
|
|
*/
|
|
|
|
if (OPAL_SUCCESS != (ret = opal_cr_init() ) ) {
|
2011-07-14 03:34:34 +04:00
|
|
|
error = "opal_cr_init";
|
2007-05-25 01:54:58 +04:00
|
|
|
goto return_error;
|
2007-03-17 02:11:45 +03:00
|
|
|
}
|
|
|
|
|
2005-11-27 00:18:47 +03:00
|
|
|
return OPAL_SUCCESS;
|
2005-08-18 09:34:22 +04:00
|
|
|
|
2005-11-27 00:18:47 +03:00
|
|
|
return_error:
|
2008-03-06 17:44:47 +03:00
|
|
|
opal_show_help( "help-opal-runtime.txt",
|
2005-11-27 00:18:47 +03:00
|
|
|
"opal_init:startup:internal-failure", true,
|
|
|
|
error, ret );
|
2005-10-05 17:56:35 +04:00
|
|
|
return ret;
|
2005-05-22 22:40:03 +04:00
|
|
|
}
|
2005-06-08 23:03:29 +04:00
|
|
|
|