1
1
openmpi/orte/runtime
Josh Hursey 596062d34b Seems that the recent changes in the sds and oob exposed some invalid
assumptions in the FT restart code for the ORTE layer.

This fixes those problems by having the RML completely shutdown and 
restart the OOB framework (instead of just the module as before).
This makes it much easier to manage, and maintainable as the OOB
changes in the future.

The SDS now does communication as part of its startup procedure, so
we need to make sure we restart the RML before the SDS so that it can
communicate properly.

OOB base [close|open] used a static bool to determine if they have
been called previously or not. I needed to expose this boolean so 
that I can close() then open() the oob base in the restart procedure.
The functionality has not changed, we just now have the ability to 
open/close the framework as many times as we need to as long as we
always call them in that order. (So calling open twice in a row is not allowed
as before, it is only allowed if you open(), close(), then open() again).

Things seem to be working now.

This commit was SVN r14515.
2007-04-25 19:51:52 +00:00
..
help-orte-runtime.txt Complete modifications for failed-to-start of applications. Modifications for failed-to-start of orteds coming next. 2007-04-24 20:53:54 +00:00
Makefile.am Add a new function to wake orterun up - used in failed-to-start scenarios, but can be used anytime a lower level needs to ensure orterun wakes up 2007-04-23 12:49:25 +00:00
orte_abort.c Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). 2007-03-16 23:11:45 +00:00
orte_cr.c Seems that the recent changes in the sds and oob exposed some invalid 2007-04-25 19:51:52 +00:00
orte_cr.h Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). 2007-03-16 23:11:45 +00:00
orte_finalize.c * Before this commit, if we called ompi_mpi_abort() before MPI_INIT 2007-01-29 22:01:28 +00:00
orte_init_stage1.c Complete modifications for failed-to-start of applications. Modifications for failed-to-start of orteds coming next. 2007-04-24 20:53:54 +00:00
orte_init_stage2.c Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). 2007-03-16 23:11:45 +00:00
orte_init.c Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). 2007-03-16 23:11:45 +00:00
orte_monitor.c Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly. 2007-04-23 18:41:04 +00:00
orte_params.c Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly. 2007-04-23 18:41:04 +00:00
orte_restart.c Bring over the update to terminate orteds that are generated by a dynamic spawn such as comm_spawn. This introduces the concept of a job "family" - i.e., jobs that have a parent/child relationship. Comm_spawn'ed jobs have a parent (the one that spawned them). We track that relationship throughout the lineage - i.e., if a comm_spawned job in turn calls comm_spawn, then it has a parent (the one that spawned it) and a "root" job (the original job that started things). 2006-11-14 19:34:59 +00:00
orte_setup_hnp.c Revert commit r13235. 2007-01-22 06:46:58 +00:00
orte_setup_hnp.h And ORTE is ready for prime-time. All Windows tricks are in: 2006-08-23 03:32:36 +00:00
orte_system_finalize.c Merging in the jjhursey-ft-cr-stable branch (r13912 : HEAD). 2007-03-16 23:11:45 +00:00
orte_system_init.c Next step in the project split, mainly source code re-arranging 2006-02-12 01:33:29 +00:00
orte_universe_exists.c Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly. 2007-04-23 18:41:04 +00:00
orte_wait.c Don't reset the pid, as at this point it is already set. 2007-04-05 20:13:50 +00:00
orte_wait.h And ORTE is ready for prime-time. All Windows tricks are in: 2006-08-23 03:32:36 +00:00
orte_wakeup.c Add a new function to wake orterun up - used in failed-to-start scenarios, but can be used anytime a lower level needs to ensure orterun wakes up 2007-04-23 12:49:25 +00:00
orte_wakeup.h Add a new function to wake orterun up - used in failed-to-start scenarios, but can be used anytime a lower level needs to ensure orterun wakes up 2007-04-23 12:49:25 +00:00
params.h Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly. 2007-04-23 18:41:04 +00:00
runtime_internal.h Next step in the project split, mainly source code re-arranging 2006-02-12 01:33:29 +00:00
runtime_types.h Update the copyright notices for IU and UTK. 2005-11-05 19:57:48 +00:00
runtime.h Bring in the code for routing xcast stage gate messages via the local orteds. This code is inactive unless you specifically request it via an mca param oob_xcast_mode (can be set to "linear" or "direct"). Direct mode is the old standard method where we send messages directly to each MPI process. Linear mode sends the xcast message via the orteds, with the HNP sending the message to each orted directly. 2007-04-23 18:41:04 +00:00