With Open MPI 5.0, the decision was made to stop building
3rd-party packages, such as Libevent, HWLOC, PMIx, and PRRTE as
MCA components and instead 1) start relying on external libraries
whenever possible and 2) Open MPI builds the 3rd party
libraries (if needed) as independent libraries, rather than
linked into libopen-pal.
This patch moves libevent from an MCA framework to a stand-alone
library built outside of OPAL. A wrapper in opal/util is provided
to minimize the unnecessary changes in the rest of the code. When
using the internal Libevent, it will be installed as a stand-alone
libevent.a, instead of bundled in OPAL. Any pre-installed version
of Libevent at or after 2.0.21 is preferred over the internal
version.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Add a framework to support different types of threading models including
user space thread packages such as Qthreads and argobot:
https://github.com/pmodels/argobotshttps://github.com/Qthreads/qthreads
The default threading model is pthreads. Alternate thread models are
specificed at configure time using the --with-threads=X option.
The framework is static. The theading model to use is selected at
Open MPI configure/build time.
mca/threads: implement Argobots threading layer
config: fix thread configury
- Add double quotations
- Change Argobot to Argobots
config: implement Argobots check
If the poll time is too long, MPI hangs.
This quick fix just sets it to 0, but it is not good for the
Pthreads version. Need to find a good way to abstract it.
Note that even 1 (= 1 millisecond) causes disastrous performance
degradation.
rework threads MCA framework configury
It now works more like the ompi/mca/rte configury,
modulo some edge items that are special for threading package
linking, etc.
qthreads module
some argobots cleanup
Signed-off-by: Noah Evans <noah.evans@gmail.com>
Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov>
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
OMPI can only support PMIx v3 and above. PRRTE requires at least PMIx
v4, so protect against the case where OMPI is built against an external
PMIx v3.
Fix check of PMIx_Init return code for singleton operations.
Ensure that the PMIx framework gets properly opened.
Signed-off-by: Ralph Castain <rhc@pmix.org>
Before this change, the reference counters `opal_util_initialized`
and `opal_initialized` were incremented at the beginning of the
`opal_init_util` and the `opal_init` functions respectively.
In other words, they were incremented before fully initialized.
This causes the following program to abort by SIGFPE if
`--enable-timing` is enabled on `configure`.
```c
// need -lm option on link
int main(int argc, char *argv[])
{
// raise SIGFPE on division-by-zero
feenableexcept(FE_DIVBYZERO);
MPI_Init(&argc, &argv);
MPI_Finalize();
return 0;
}
```
The logic of the SIGFPE is:
1. `MPI_Init` calls `opal_init` through `ompi_rte_init`.
2. `opal_init` changes the value of `opal_initialized` to 1.
3. `opal_init` calls `opal_init_util`.
4. `opal_init_util` calls `opal_timing_ts_func` through
`OPAL_TIMING_ENV_INIT`, and `opal_timing_ts_func` returns
`get_ts_cycle` instead of `get_ts_gettimeofday` because
`opal_initialized` to 1.
(This is the problem)
5. `opal_init_util` calls `get_ts_cycle` through
`OPAL_TIMING_ENV_INIT`.
6. `get_ts_cycle` executes
`opal_timer_base_get_cycles()) / opal_timer_base_get_freq()`
and it raises SIGFPE (division-by-zero) because the OPAL TIMER
framework is not initialized yet and `opal_timer_base_get_freq`
returns 0.
This commit changes the increment timing of `opal_util_initialized`
and `opal_initialized` to the end of `opal_init_util` and the
`opal_init` functions respectively.
Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
We initially thought it was a safe bet that opal_gethostname() would
never be called before opal_init(). However, it turns out that there
are some cases -- e.g., developer debugging -- where it is useful to
call opal_output() (which calls opal_gethostname()) before
opal_init().
Hence, we need to guarantee that opal_gethostname() always returns a
valid value. If opal_gethostname() finds NULL in
opal_process_info.nodename, simply call the internal function to
initialize opal_process_info.nodename.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The opal_gethostname() function provides a more robust mechanism
to retrieve the hostname than gethostname(), which can return
results that are not null-terminated, and which can vary in its
behavior from system to system.
opal_gethostname() just returns the value in opal_process_info.nodename;
this is populated in opal_init_gethostname() inside opal_init.c.
-Changed all gethostname calls in opal subtree to opal_gethostname
-Changed all gethostname calls in orte subtree to opal_gethostname
-Changed all gethostname calls in ompi subdir to opal_gethostname
-Changed all gethostname calls in oshmem subdir to opal_gethostname
-Changed opal_if.c in test subdir to use opal_gethostname
-Changed opal_init.c to include opal_init_gethostname. This function
returns an int and directly sets opal_process_info.nodename per
jsquyres' modifications.
Relates to open-mpi#6801
Signed-off-by: Charles Shereda <cpshereda@lanl.gov>
We've been fighting the battle of trying to create a regex generator and
parser that can handle arbitrary hostname schemes - without long-term
success. The worst of it is that there is no way of checking to see if
the computed regex is correct short of parsing it and doing a
character-by-character comparison with the original string. Ugh...there
has to be a better solution.
One option is to investigate using 3rd-party regex libraries as
those are coming from communities whose sole focus is resolving that
problem. However, someone would need to spend the time to investigate
it, and we'd have to find a license-friendly implementation.
Another option is to quit beating our heads against the wall and just
compress the information. It won't be as much of a reduction, but we
also won't keep hitting scenarios where things break. In this case, it
seems that "perfection" is definitely the enemy of "good enough".
This PR implements the compression option while retaining the
possibility of people adding regex-generating components. The
compression code used in ORTE is consolidated into the opal/compress
framework. That framework currently held bzip and gzip components for
use in compressing checkpoint files - since we no longer support C/R, I
have .opal_ignore'd those components.
However, I have left the original framework APIs alone in case someone
ever decides to redo C/R. The APIs of interest here are added to the
framework - specifically, the "compress_block" and "decompress_block"
functions. I then moved the ORTE zlib compression code into a new
component in this framework.
Unfortunately, the framework currently is a single-select one - i.e.,
only one active component at a time. Since I .opal_ignore'd the other
two and made the priority of zlib high, this isn't a problem. However,
if someone wants to re-enable bzip/gzip or add another component, they
might need to transition opal/compress to a multi-select framework.
Included changes:
* Consolidate the compression code into the opal/compress framework
* Move the ORTE zlib compression code into a new opal/compress/zlib
component
* Ignore the bzip and gzip components in opal/compress framework
* Add a "compress_base_limit" MCA param to set the threshold above which
we compress data - defaults to 4096 bytes
* Delete stale brucks and rcd components from orte/grpcomm framework
* Delete the orte/regx framework
* Update the launch system to use opal/compress instead of string regex
* Provide a default module if no zlib is available
* Fix some misc multi-node issues
* Properly generate the nidmap in response to a "connection warmup"
message so the remote daemon knows the children it needs to launch.
* Remove stale references to orte_node_regex
* opal_byte_object_t's are not OPAL objects - properly release allocated
memory.
* Set the topology
* Currently only handling homogeneous case
* Update the compress framework files to conform
* Consolidate open/close into one "frame" file. Ensure we open/close the
framework
Signed-off-by: Ralph Castain <rhc@pmix.org>
This commit fixes the ordering of the teardown for
opal_finalize_util. The installdirs and if frameworks need to come
down before the MCA system.
Fixes#6259
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit contains the following changes:
- Remove the unused opal_test_init/opal_test_finalize
functions. These functions are not used by anything in the code
base or MTT. Tests use opal_init_util/opal_finalize_util instead.
- Get rid of gotos in opal_init_util and opal_init. Replaced them
with a cleaner solution.
- Automatically register cleanup functions in init functions. The
cleanup functions are executed in the reverse order of the
initialization functions. The cleanup functions are run in
opal_finalize_util() before tearing down the class system.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This is a point-in-time update that includes support for several new PMIx features, mostly focused on debuggers and "instant on":
* initial prototype support for PMIx-based debuggers. For the moment, this is restricted to using the DVM. Supports direct launch of apps under debugger control, and indirect launch using prun as the intermediate launcher. Includes ability for debuggers to control the environment of both the launcher and the spawned app procs. Work continues on completing support for indirect launch
* IO forwarding for tools. Output of apps launched under tool control is directed to the tool and output there - includes support for XML formatting and output to files. Stdin can be forwarded from the tool to apps, but this hasn't been implemented in ORTE yet.
* Fabric integration for "instant on". Enable collection of network "blobs" to be delivered to network libraries on compute nodes prior to local proc spawn. Infrastructure is in place - implementation will come later.
* Harvesting and forwarding of envars. Enable network plugins to harvest envars and include them in the launch msg for setting the environment prior to local proc spawn. Currently, only OmniPath is supported. PMIx MCA params control which envars are included, and also allows envars to be excluded.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Initialize the reachable framework during opal_init() and tear
it back down during opal_finalize(). The framework was never
used, so the lack of initialization didn't matter, but this is
a required step in using the framework.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Still in the "needs to be done" category:
* mapping/ranking/binding options aren't correctly supported
* if the DVM encounters some errors (e.g., not enough resources for the job), the resulting error is globally set and impacts any subsequent job submission
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
there is no such thing as pthread_join(main_thread), so key destructors
are never invoked on the main thread, which causes valgrind report
some memory leaks. Manually store and then invoke the key destructors and
make valgrind happy.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
the class system can be initialized/finalized as many times as we like,
so there is no more need to have opal_class_finalize() invoked in a destructor
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Add PMIx 2.0
Remove PMIx 1.1.4
Cleanup copying of component
Add missing file
Touchup a typo in the Makefile.am
Update the pmix ext114 component
Minor cleanups and resync to master
Update to latest PMIx 2.x
Update to the PMIx event notification branch latest changes
Per discussion on https://github.com/open-mpi/ompi/pull/1767 (and some
subsequent phone calls and off-issue email discussions), the PSM
library is hijacking signal handlers by default. Specifically: unless
the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel
TrueScale) is set, the library constructor for this library will
hijack various signal handlers for the purpose of invoking its own
error reporting mechanisms.
This may be a bit *surprising*, but is not a *problem*, per se. The
real problem is that older versions of at least the PSM library do not
unregister these signal handlers upon being unloaded from memory.
Hence, a segv can actually result in a double segv (i.e., the original
segv and then another segv when the now-non-existent signal handler is
invoked).
This PSM signal hijacking subverts Open MPI's own signal reporting
mechanism, which may be a bit surprising for some users (particularly
those who do not have Intel TrueScale). As such, we disable it by
default so that Open MPI's own error-reporting mechanisms are used.
Additionally, there is a typo in the library destructor for the PSM2
library that may cause problems in the unloading of its signal
handlers. This problem can be avoided by setting `HFI_NO_BACKTRACE=1`
(for PSM2 / Intel OmniPath).
This is further compounded by the fact that the PSM / PSM2 libraries
can be loaded by the OFI MTL and the usNIC BTL (because they are
loaded by libfabric), even when there is no Intel networking hardware
present. Having the PSM/PSM2 libraries behave this way when no Intel
hardware is present is clearly undesirable (and is likely to be fixed
in future releases of the PSM/PSM2 libraries).
This commit sets the following two environment variables to disable
this behavior from the PSM/PSM2 libraries (if they are not already
set):
* IPATH_NO_BACKTRACE=1
* HFI_NO_BACKTRACE=1
If the user has set these variables before invoking Open MPI, we will
not override their values (i.e., their preferences will be honored).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Take another shot at untangling the spaghetti
orterun: fix for command line parsing
orte-submit calls opal_init_util () before parsing out MCA command line
options (-mca, -am, etc). This prevents mpirun from setting opal MCA
variables for some frameworks as well as the MCA base. This is because
when a framework is opened all of its variables are set to read-only.
Eventually we want to lift this restriction on some MCA variables but
since -mca is affected we must parse out the MCA command line options
before opal_init_util(). This commit fixes the bug by adding a new
option to opal_cmd_line_parse (ignore unknown option) so orte-submit
can pre-parse the command line for MCA options.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
Minor cleanups to avoid releasing/recreating the cmd line
Define OPAL_MAXHOSTNAMELEN to be either:
(MAXHOSTNAMELEN + 1) or
(limits.h:HOST_NAME_MAX + 1) or
(255 + 1)
For pmix code, define above using PMIX_MAXHOSTNAMELEN.
Fixup opal layer to use the new max.
Signed-off-by: Karol Mroz <mroz.karol@gmail.com>
Because of the removal of the linux memory component it is no longer
necessary to initialize the memory component in opal_init(). This
commit moves the initialization to the creation of the first rcache
component.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds a framework to abstract runtime code patching.
Components in the new framework can provide functions for either
patching a named function or a function pointer. The later
functionality is not being used but may provide a way to allow memory
hooks when dlopen functionality is disabled.
This commit adds two different flavors of code patching. The first is
provided by the overwrite component. This component overwrites the
first several instructions of the target function with code to jump to
the provided hook function. The hook is expected to provide the full
functionality of the hooked function.
The linux patcher component is based on the memory hooks in ucx. It
only works on linux and operates by overwriting function pointers in
the symbol table. In this case the hook is free to call the original
function using the function pointer returned by dlsym.
Both components restore the original functions when the patcher
framework closes.
Changes had to be made to support Power/PowerPC with the Linux
dynamic loader patcher. Some of the changes:
- Move code necessary for powerpc/power support to the patcher
base. The code is needed by both the overwrite and linux
components.
- Move patch structure down to base and move the patch list to
mca_patcher_base_module_t. The structure has been modified to
include a function pointer to the function that will unapply the
patch. This allows the mixing of multiple different types of
patches in the patch_list.
- Update linux patching code to keep track of the matching between
got entry and original (unpatched) address. This allows us to
completely clean up the patch on finalize.
All patchers keep track of the changes they made so that they can be
reversed when the patcher framework is closed.
At this time there are bugs in the Linux dynamic loader patcher so
its priority is lower than the overwrite patcher.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
A few uninitialized common symbols are remaining:
common symbols generated by flex :
* opal/util/keyval/keyval_lex.l: opal_util_keyval_yyleng
* opal/util/keyval/keyval_lex.o: opal_util_keyval_yytext
* opal/util/show_help_lex.l: opal_show_help_yyleng
* opal/util/show_help_lex.l: opal_show_help_yytext
common symbol generated by "external" hwloc library:
* opal/mca/hwloc/hwloc191/hwloc/src/components.o: component_map
This commit is related to an RFC from June 2014. Disscussion can be
found at:
http://www.open-mpi.org/community/lists/devel/2014/07/15140.php
The finalize function is set using either the linker option -fini or
__attribute__((destructor)) depending on compiler support. I have
confirmed that this hybrid approach works with all the major
compilers. The attribute is supported by gcc, clang, llvm, xlc, and
icc. The fini function will support pgi. If a compiler/linker
combination does not support either the destructor or fini function a
message will be printed on re-init indicating it is not supported (an
improvement over the current behavior-- SEGV).
I moved the following to the destructor function:
- Class system finalize. This solves a bug when MPI_T_finalize is
called before MPI_Init. The only downside to this change is we will
leave the footprint of the opal class system after
MPI_Finalize. This footprint should be relatively small.
This is an alternative to #517 but the two PRs are not
mutually-exclusive (with some modifications). This commit should also
be safe for 1.8.x as it does not change internal or external ABI (#517
changes internal ABI).
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit is a rework of the component repository. The changes
included in this commit are:
- Remove the component dependency code based off .ompi_info
files. This code is legacy code dating back 10 years that and is no
longer used.
- Move the plugin scanning code to the component repository. New
calls have been added to add new scanning paths, query available
components, and dlopen/load components.
- Pass the framework down to mca_base_component_find/filter. Eventually
the framework structure will be used to further validate components
before they are used.
- Add support to the MCA framework system to disable scanning for
dlopened components on open (support already existed in
register). This is really only relevant to installdirs as it has no
register function and no DSO components.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes several vagrind errors. Included:
- installdirs did not correctly reinitialize all pointers to NULL
at close. This causes valgrind errors on a subsequent call to
opal_init_tool.
- several opal strings were leaked by opal_deregister_params which
was setting them to NULL instead of letting them be freed by the
MCA variable system.
- move opal_net_init to AFTER the variable system is initialized and
opal's MCA variables have been registered. opal_net_init uses a
variable registered by opal_register_params!
- do not leak ompi_mpi_main_thread when it is allocated by
MPI_T_init_thread.
- do not overwrite ompi_mpi_main_thread if it is already set (by
MPI_T_init_thread).
- mca_base_var: read_files was overwritting mca_base_var_file_list
even if it was non-NULL.
- mca_base_var: set all file global variables to initial states on
finalize.
- btl/vader: decrement enumerator reference count to ensure that it
is freed.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Properly setup the opal_process_info structure early in the initialization procedure. Define the local hostname right at the beginning of opal_init so all parts of opal can use it. Overlay that during orte_init as the user may choose to remove fqdn and strip prefixes during that time. Setup the job_session_dir and other such info immediately when it becomes available during orte_init.