1
1
openmpi/orte/mca/errmgr/base
Josh Hursey 8f45fcb429 More fixes for the C/R support. Fixes a couple bugs with the migration and autor features. The C/R functionality should be fully working now.
* Fix the checkpoint-restart-checkpoint case which would previous reject the checkpoint of the newly restarted process. By making sure to re-enable checkpointing once the application has fully restarted fixes this issue (make sure to set is_app_checkpointable to true on restart confirmation).
 * In the case of an invalid checkpoint, do not try to access the SStore datastore as it will be using a dummy handler, and return NULL strings. mpirun was segfaulting in the error case because it was trying to convert the seq_num from a string to an integer.
 * Make sure to initialize the timer event in the Automatic Recovery section of the HNP errmgr, per the libevent update. This caused a segfault when attempting to recover a failed process.
 * If ompi-checkpoint loses connection to the HNP/mpirun the TCP socket will fail and call the ErrMgr update_state function. This commit adds a dummy function {{{orte_errmgr_base_update_state()}}} that will prevent the ompi-checkpoint command from segfaulting in this error scenario.

This commit was SVN r24306.
2011-01-26 14:56:35 +00:00
..
base.h Few fault tolerance updates related to the CIFTS project (http://www.mcs.anl.gov/research/cifts/) 2011-01-13 20:13:49 +00:00
errmgr_base_close.c Simplification of the ErrMgr framework by removing the 'stack'/composite functionality. 2010-08-19 13:09:20 +00:00
errmgr_base_fns.c More fixes for the C/R support. Fixes a couple bugs with the migration and autor features. The C/R functionality should be fully working now. 2011-01-26 14:56:35 +00:00
errmgr_base_open.c Simplification of the ErrMgr framework by removing the 'stack'/composite functionality. 2010-08-19 13:09:20 +00:00
errmgr_base_select.c Simplification of the ErrMgr framework by removing the 'stack'/composite functionality. 2010-08-19 13:09:20 +00:00
errmgr_base_tool.c add notifier events for process migration 2010-11-16 17:57:44 +00:00
errmgr_private.h More fixes for the C/R support. Fixes a couple bugs with the migration and autor features. The C/R functionality should be fully working now. 2011-01-26 14:56:35 +00:00
Makefile.am A number of C/R enhancements per RFC below: 2010-08-10 20:51:11 +00:00