Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
Take another shot at untangling the spaghetti
orterun: fix for command line parsing
orte-submit calls opal_init_util () before parsing out MCA command line
options (-mca, -am, etc). This prevents mpirun from setting opal MCA
variables for some frameworks as well as the MCA base. This is because
when a framework is opened all of its variables are set to read-only.
Eventually we want to lift this restriction on some MCA variables but
since -mca is affected we must parse out the MCA command line options
before opal_init_util(). This commit fixes the bug by adding a new
option to opal_cmd_line_parse (ignore unknown option) so orte-submit
can pre-parse the command line for MCA options.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
Minor cleanups to avoid releasing/recreating the cmd line
This commit adds a new type of enumerator meant to support flag
values. The enumerator parses comma-delimited strings and matches
each string or value to a list of valid flags. Additionally, the
enumerator does some basic checks to see if 1) a flag is valid in the
enumerator, and 2) if any conflicting flags are specified.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Up until this point we have had inconsistent usage for MCA verbosity
levels. This commit attempts to correct this by recommending
components use these standard levels: none (0), error (1), warn (10),
info (20), debug (40), and trace (60).
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
CID 1047278 Unchecked return value
Updated check for mca_base_var_generate_full_name4 to match other
checks. Logically equivalent to the old check. Not a bug.
CID 1196685 Dereference null return
Added check for NULL when looking up the original variable for a
synonym.
CID 1269705 Logically dead code
Removed code that set the project to NULL. Code was intended to be
removed with an earlier commit that added the project name into the
component structure. Added code to actually support searching for a
group with a wildcard ('*').
CID 1292739 Dereference null return
CID 1269819 Dereference null return
Removed unnecessary string duplication and strchr.
CID 1287030 Logically dead code
Refactored fixup_files code and confirmed that the code in question is
not reachable. Removed the dead code.
CID 1292740 Use of untrusted string
Use strdup to silence coverity warning.
CID 1294413 Free of address-of expression
Reset mitem to NULL after the OPAL_LIST_FOREACH loop to ensure we
never try to free the list sentinel.
CID 1294414 Unchecked return value
Use (void) to indicate we do not care about the return code in this
instance.
CID 1294415 Resource leak
On error free all the base pointer.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes synonyms so the source file is correctly printed out
by ompi_info. This commit also adds support for printing out the line
number where the variable is set.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes several vagrind errors. Included:
- installdirs did not correctly reinitialize all pointers to NULL
at close. This causes valgrind errors on a subsequent call to
opal_init_tool.
- several opal strings were leaked by opal_deregister_params which
was setting them to NULL instead of letting them be freed by the
MCA variable system.
- move opal_net_init to AFTER the variable system is initialized and
opal's MCA variables have been registered. opal_net_init uses a
variable registered by opal_register_params!
- do not leak ompi_mpi_main_thread when it is allocated by
MPI_T_init_thread.
- do not overwrite ompi_mpi_main_thread if it is already set (by
MPI_T_init_thread).
- mca_base_var: read_files was overwritting mca_base_var_file_list
even if it was non-NULL.
- mca_base_var: set all file global variables to initial states on
finalize.
- btl/vader: decrement enumerator reference count to ensure that it
is freed.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds support for project_framework_component_* parameter
matching. This is the first step in allowing the same framework name
in multiple projects. This change also bumps the MCA component version
to 2.1.0.
All master frameworks have been updated to use the new component
versioning macro. An mca.h has been added to each project to add a
project specific versioning macro of the form
PROJECT_MCA_VERSION_2_1_0.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
There were a number of bugs in the framework/variable code that
affected deregistration:
- Frameworks could be erroneously closed if seperately registered and
opened then subsequently closed. This was a bug in the original
design which only reference counted opens but not
registrations. This would cause undefined behavior if
MPI_T_finalize actually calls ompi_info_close_components as
intended. Now both registrations and opens are reference counted
and frameworks/components are not torn down until the matching
number of close calls have been made.
- group_find_by_name did not pass the invalidok flags down
to mca_base_var_group_get_internal correctly.
- Group deregistration caused the group to be completely reset. This
does not match the behavior required by MPI_T as it could reduce
the number of variables/subgroups in a group.
This commit also updates MPI_T_finalize to call
ompi_info_close_components as originally intended.
Closes#374
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Coverity alerted us to the fact that there are places where
the synonym_for param is hard-coded to -1 when calling
register_variable(). It would be a coding error if synonym_for==-1
and (flags & MCA_BASE_VAR_FLAG_SYNONYM)>0, so let's add that to the
debug-only check at the top of the function.
This was CID 993717.
These two macros set the MCA prefix and MCA cmd line id,
respectively. Specifically, MCA parameters will be named
PREFIX<foo> in the environment, and the cmd line will use
-ID foo bar.
These macros must be called during configure.ac and a value
supplied. In the case of Open MPI, the values given are
PREFIX=OMPI_MCA_ and ID=mca.
Other projects (such as ORCM) will call these macros with
their own unique values. For example, ORCM uses PREFIX=ORCM_MCA_
and ID=omca
This scheme is necessary to allow running Open MPI applications under
systems that use their own versions of ORTE and OPAL. For example,
when running OMPI applications under ORCM, we need the MCA params passed
to the ORCM daemons to be separated from those recognized by the OMPI application.
Add a debugging check that ensures that the registered storage is
aligned appropriately for the type that is specified.
When we know that the storage is properly aligned, we can cast the
mbv_storage to the appropriate type and then simply do the assignment.
We used to do this assignment via a union, but clang's
-fsanitizer=alignment complained about this.
This commit was SVN r32716.
new flag to ompi_info that allows a user to print all MCA variables of a specific type.
--type version_string
This command will print all MCA variables of type version_string.
This feature was developed by Elena Shipunova and was reviewed by Josh Ladd.
This commit was SVN r32166.
mpirun ... -x env_foo1=val1 -x env_foo2 -x env_foo3=val3 should now be expressed as
mpirun ... -mca mca_base_env_list env_foo1=val1+env_foo2+env_foo3=val3.
The motivation for doing this is so that a list of environment variables may be set via standard MCA mechanisms such as mca parameter files, amca lists, etc.
This feature was developed by Elena Shipunova and was reviewed by Josh Ladd.
This commit was SVN r32163.
After discussion with Nathan, change the ERR_VALUE_OUT_OF_BOUNDS to be
ERR_NOT_FOUND, for two reasons:
1. It's consistent with other uses of ERR_NOT_FOUND in the code.
1. In this case, we could have just looked up a variable that is
basically a "hole" -- e.g., var indexes 0, 1, 2 are valid, and 4,
5, and 5 are valid, but 3 is invalid (e.g., 3 was de-registered --
remember that MPI_T explicitly does not allow re-using indexes).
So returning ERR_OUT_OF_BOUNDS seems weird -- returning
ERR_NOT_FOUND seems a bit more natural.
cmr=v1.8.2:ticket=trac:4587
This commit was SVN r32158.
The following Trac tickets were found above:
Ticket 4587 --> https://svn.open-mpi.org/trac/ompi/ticket/4587
Several fixes:
- I was allowing an MPI_T_cvar_handle to be created for an invalid
variable. Fixed this by checking if the variable is valid in
mca_base_var_get.
- Use a better error code when the caller tries to create an unbound
pvar handle for a bound variable.
- Return the verbosity level in MPI_T_cvar_get_info.
cmr=v1.8.2:reviewer=jsquyres
This commit was SVN r31576.
add -mca base_env_list "var1=val1 var2=val2 ..." mca parameter that can be used in mca param files
or with -am app.conf mpirun commandline to set rank env variables with mca mechanism
fixed by Elena, reviewed by Miked
cmr=v1.8.1:reviewer=ompi-rm1.8
This commit was SVN r31302.
The original code was passing a union by value, and doing odd things
on Solaris/SPARC (where "odd" rhymes with "SIGBUS"). Replace it with
an exploded switch/case block for all the enum values. Also use the
string literals so that we get compiler checking of the format string
vs. the type of the actual arguments.
cmr=v1.7.4:revier=hjelmn:subject=Fix MCA base var to not pass union by value
This commit was SVN r30276.
variable names for deprecated variables.
Closes trac:3270
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r30275.
The following Trac tickets were found above:
Ticket 3270 --> https://svn.open-mpi.org/trac/ompi/ticket/3270
configury/Makefile.am changes; this commit renames the internal
installdirs.h framework struct field names to match the configry macro
names:
* pkgdatdir -> ompidatadir
* pkglibdir -> ompilibdir
* pkgincludedir -> ompiincludedir
This commit was SVN r30145.
The following SVN revision numbers were found above:
r30140 --> open-mpi/ompi@8b778903d8
- Use ->boolval for booleans when creating a string.
- Solaris has some issue with the ?: used in one of find functions. Use an if instead.
- Change all instances of index -> vari to avoid issues with redefining index.
cmr=v1.7.4:reviewer=jsquyres
This commit was SVN r29997.