Features:
- Support for an override parameter file (openmpi-mca-param-override.conf).
Variable values in this file can not be overridden by any file or environment
value.
- Support for boolean, unsigned, and unsigned long long variables.
- Support for true/false values.
- Support for enumerations on integer variables.
- Support for MPIT scope, verbosity, and binding.
- Support for command line source.
- Support for setting variable source via the environment using
OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
- Cleaner API.
- Support for variable groups (equivalent to MPIT categories).
Notes:
- Variables must be created with a backing store (char **, int *, or bool *)
that must live at least as long as the variable.
- Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
mca_base_var_set_value() to change the value.
- String values are duplicated when the variable is registered. It is up to
the caller to free the original value if necessary. The new value will be
freed by the mca_base_var system and must not be freed by the user.
- Variables with constant scope may not be settable.
- Variable groups (and all associated variables) are deregistered when the
component is closed or the component repository item is freed. This
prevents a segmentation fault from accessing a variable after its component
is unloaded.
- After some discussion we decided we should remove the automatic registration
of component priority variables. Few component actually made use of this
feature.
- The enumerator interface was updated to be general enough to handle
future uses of the interface.
- The code to generate ompi_info output has been moved into the MCA variable
system. See mca_base_var_dump().
opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system
This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.
This commit was SVN r28236.
Add hooks for supporting dynamic allocation and deallocation to support application-driven requests and fault recovery operations.
This commit was SVN r27879.
Reasoning: The old behavior was a little confusing. mca_base_components_open does not open an output stream so it is a little unexpected that mca_base_components_close does. To add to this several frameworks (that don't use mca_base_components_close) failed to close their output in the framework close function and others closed their output a second time. This change is an improvement to the symantics of mca_base_components_open/close as they are now symetric in their functionality.
This commit was SVN r27570.
gone. So exit with 0, not ORTE_ERROR_DEFAULT_EXIT_CODE (which is 1).
This fixes a race condition in the rsh launcher upon termination,
where ORTE would sometimes think that a daemon failed to launch.
This commit was SVN r26935.
* Fix a typo introduced by the removal of the notifier framework
* Fix to flush the modex cached data correctly using the orte DB API.
This commit was SVN r26773.
Among other things, this patch deals with the following issues:
* fix ompi-checkpoint argument parsing
* ompi-restart -showme prints an extraneous "Restarted child with PID"
message. Move around the debug statement to avoid this.
* fixes for the state machine changes
This commit was SVN r26770.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.
Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.
This commit was SVN r26242.
So provide a new parameter (can't have too many!) that handles this situation by stripping the prefix from the returned node name. Also do a little cleanup to ensure we cleanly exit from errors, without generating too many annoying messages.
This commit was SVN r25562.
Brian dealt with this in the past by creating platform files and using "no-build" to block the components. This was clunky, but acceptable when only one organization was using that option. However, that number has now expanded to at least two more locations.
Accordingly, make --without-rte-support actually work by adding appropriate configury to prevent components from building when they shouldn't. While doing so, remove two frameworks (db and rmcast) that are no longer used as ORCM comes to a close (besides, they belonged in ORCM now anyway). Do some minor cleanups along the way.
This commit was SVN r25497.
Modify the configure logic and the PMI components to accommodate Cray's approach. Refactor the PMI error reporting code so it resides in only one place. Cray actually decided -not- to define the PMI-2 error codes, so we have to use the PMI-1 codes instead. More fun.
This commit was SVN r25348.
If the errmgr is going to try and hold the orted until all routes and children are gone, then the exit cmd must do the same. Otherwise, the orted exits immediately without waiting for routes to be dismantled, which is why we don't see the connections close.
Also cleanup some diagnostics and add some debug to more clearly see what's going on.
This commit was SVN r25321.
termination condition is to be checked at the daemon/HNP level not down
in the routing.
This commit was SVN r25313.
The following SVN revision numbers were found above:
r25248 --> open-mpi/ompi@b42ccc89b8
1 otherwise. It was doing the opposite, so this patch fixes the
return values. All uses (all in ORTE) used the actual return values,
not the documented values, so fix them as well.
This commit was SVN r25257.
To enable the epochs and the resilient orte code, use the configure flag:
--enable-resilient-orte
This will define both:
ORTE_ENABLE_EPOCH
ORTE_RESIL_ORTE
This commit was SVN r25093.
Correctly checks that the orted's job is completed.
Correctly tests to make sure that there is shutdown going on (doesn't rely on orte_orteds_term_ordered).
Adds a patch from Ralph to correctdly check the status of processes.
This commit was SVN r24962.
still unstable. Reverted errmgr modules back to the original errmgr (with the
updates since the resilient code was brought into the trunk).
This commit was SVN r24958.