This commit is a rework of the component repository. The changes
included in this commit are:
- Remove the component dependency code based off .ompi_info
files. This code is legacy code dating back 10 years that and is no
longer used.
- Move the plugin scanning code to the component repository. New
calls have been added to add new scanning paths, query available
components, and dlopen/load components.
- Pass the framework down to mca_base_component_find/filter. Eventually
the framework structure will be used to further validate components
before they are used.
- Add support to the MCA framework system to disable scanning for
dlopened components on open (support already existed in
register). This is really only relevant to installdirs as it has no
register function and no DSO components.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This adds a check at `make install` time to look for common symbols. It
attempts to ignore "Fortran-shaped" symbols by default. It also will
look in the source tree for any files named "common_sym_whitelist" and
will ignore any symbols listed in that file (one per line, comments
allowed).
See open-mpi/ompi#375 for more background.
This commit fixes a typo in mca_btl_vader_progress_endpoints where
OPAL_THREAD_LOCK was used when OPAL_THREAD_UNLOCK was intended.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
== Short version
Do not export special variables into the environment (e.g., LIBS,
LDFLAGS, etc.) when invoking subdir configure scripts. This prevents
problems described in open-mpi/ompi#471.
== More detail
Exporing special env variables before invoking a subdir configure
script causes problems in some cases. E.g., in open-mpi/ompi#471,
when the user configures with `--with-hwloc=/path/to/hwloc`, and that
directory is *not* in a default linker search location will cause the
libevent subdir configuration to fail.
This happens because:
1. We'll pass LIBS="-L/path/to/hwloc/lib -lhwloc" to the libevent
configure script
1. Meaning: configure-generated executables will link successfully
1. But unless LD_LIBRARY_PATH (or some other
tell-the-linker-where-to-find-things mechanism) includes
/path/to/hwloc/lib, the executable can't run.
Specifically, the libevent "hey, does the compiler generate proper
executables?" check will fail, and configure will abort (because OMPI
needs libevent).
I checked the history: exporting these vars dates all the way back to
LAM/MPI. I can't think of a reason why we need to export these
variables -- AC_CONFIG_SUBDIRs doesn't do it; subdir configure scripts
should be orthogonal from the upper-layer configure script (and its
variables). So let's remove these export statements and see if
anything breaks.
The oob/ud configure was not honoring the case
if the ompi is configured with --with-verbs=no.
This fixes that problems.
Fixes#522
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
OMPI *requires* a C99 compiler, so we don't need the AC_C_INLINE or
AC_C_RESTRICT tests anymore (because the C99 compiler guarantees to
support them). Indeed, in some cases (see open-mpi/ompi#491),
AC_C_INLINE gets the wrong answer and defines `inline` to be empty,
which screws up some systems (e.g., OS X with clang).
As an added bonus, we get to get rid of a bunch of gcc 2.96-specific
code (!) in configure.ac.
Based on some on-list and IM discussion with @hjelmn about
open-mpi/ompi@40b7643119, change the testing to a switch/case. If we
fall into the default case, assert() error (because it's an OMPI
developer programming error).
This commit fixes several vagrind errors. Included:
- installdirs did not correctly reinitialize all pointers to NULL
at close. This causes valgrind errors on a subsequent call to
opal_init_tool.
- several opal strings were leaked by opal_deregister_params which
was setting them to NULL instead of letting them be freed by the
MCA variable system.
- move opal_net_init to AFTER the variable system is initialized and
opal's MCA variables have been registered. opal_net_init uses a
variable registered by opal_register_params!
- do not leak ompi_mpi_main_thread when it is allocated by
MPI_T_init_thread.
- do not overwrite ompi_mpi_main_thread if it is already set (by
MPI_T_init_thread).
- mca_base_var: read_files was overwritting mca_base_var_file_list
even if it was non-NULL.
- mca_base_var: set all file global variables to initial states on
finalize.
- btl/vader: decrement enumerator reference count to ensure that it
is freed.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a bug when opal_error_init is called with the same
values multiple times. If opal_error_init is called too many times it
will start failing with OPAL_ERR_OUT_OF_RESOURCE. To fix the problem
check if an existing convertor matching the requested one and return
that one instead.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
putenv requires that any string put into the environment is not
changed or freed. That is not the case with constant strings as they
will go away when dlclose is called on the component. Instead, just
use opal_setenv which does not have this restriction.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes the following bugs:
- opal_output_finalize did not properly set internal state. This
caused problems when calling the sequence opal_output_init (),
opal_output_finalize (), opal_output_init ().
- opal_info support called mca_base_open () but never called the
matching mca_base_close (). mca_base_open () and mca_base_close ()
have been updated to use a open count instead of an open flag to
allow mca_base_open to be called through multiple paths (as may be
the case when MPI_T is in use).
- orte_info support did not register opal variables. This can cause
orte-info to not return opal variables.
- opal_info, orte_info, and ompi_info support have been updated to
use a register count.
- When opening the dl framework the reference count was added to
ensure the framework stuck around. The framework being closed
prematurely was a bug in the MCA base that has since been
corrected. The increment (and associated decrement) have been
removed.
- dl/dlopen did not set the value of
mca_dl_dlopen_component.filename_suffixes_mca_storage on each call
to register. Instead the value was set in the component
structure. This caused the value to be lost when re-loading the
component. Fixed by setting the default value in register.
- Reset shmem framework state on close to avoid returning a stale
component after reloading opal/shmem.
- MCA base parameters were not properly deregistered when the MCA
base was closed.
This commit may fix#374.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The fragment flush code tries to send the active fragment before
sending any queued fragments. This could cause osc messages to arrive
out-of-order at the target (bad). Ensure ordering by alway sending
the active fragment after sending queued fragments.
This commit also fixes a bug when a synchronization message (unlock,
flush, complete) can not be packed at the end of an existing active
fragment. In this case the source process will end up sending 1 more
fragment than claimed in the synchronization message. To fix the issue
a check has been added that fixes the fragment count if this situation
is detected.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
linux: only use the device-tree on Power machines
It's available on ARM but the assumption that cpus' "reg" start at 0
is invalid.
We could make that work but the device-tree doesn't currently
bring anything better than sysfs on ARM, so don't bother for now.