1
1
openmpi/orte/mca/rmgr
Ralph Castain 4e79a51395 Add a job_info segment to the system that holds a container for each job. Within each container is a keyval indicating the job state (i.e., all procs at stage1, finalized, etc.). This provides a rough state-of-health for the job.
This required a little fiddling with a number of areas. Biggest problem was that it uncovered a potential for an infinite loop to be created in the registry. If a callback function modified the registry, the registry checked the triggers to see if anything had fired. Well, if the original callback was due to a trigger firing, that condition hadn't changed - so the trigger fired again....which caused the callback to be called, which modified the registry, which checked the triggers, etc. etc.

Triggers are now checked and then "flagged" as being "in process" so that the registry will NOT recheck that trigger until all callbacks have been processed. Tried doing this with subscriptions as well, but that caused a problem - when we release processes from a stagegate, they (at the moment) immediately place data on the registry that should cause a subscription to fire. Unfortunately, the system will just hang if that subscription doesn't get processed. So, I have left the subscription system alone - any callback function that modifies the registry in a fashion that will fire a subscription will indeed fire that subscription. We'll have to see if this causes problems - it shouldn't, but a careless user could lock things up if the callback generates a callback to itself.

Also fixed the code that placed a process' RML contact info on the registry to eliminate the leading '/' from the string.

This commit was SVN r6684.
2005-07-29 14:11:19 +00:00
..
base Add a job_info segment to the system that holds a container for each job. Within each container is a keyval indicating the job state (i.e., all procs at stage1, finalized, etc.). This provides a rough state-of-health for the job. 2005-07-29 14:11:19 +00:00
proxy First phase of the scalable RTE changes: 2005-07-18 18:49:00 +00:00
urm First phase of the scalable RTE changes: 2005-07-18 18:49:00 +00:00
Makefile.am More orte Makefile.am updates 2005-07-02 15:13:41 +00:00
rmgr_types.h * rename ompi_object and ompi_class to opal_object and opal_class 2005-07-03 16:06:07 +00:00
rmgr.h Oops -- rmgr should be in orte, not ompi. 2005-07-02 14:14:42 +00:00