We've been fighting the battle of trying to create a regex generator and
parser that can handle arbitrary hostname schemes - without long-term
success. The worst of it is that there is no way of checking to see if
the computed regex is correct short of parsing it and doing a
character-by-character comparison with the original string. Ugh...there
has to be a better solution.
One option is to investigate using 3rd-party regex libraries as
those are coming from communities whose sole focus is resolving that
problem. However, someone would need to spend the time to investigate
it, and we'd have to find a license-friendly implementation.
Another option is to quit beating our heads against the wall and just
compress the information. It won't be as much of a reduction, but we
also won't keep hitting scenarios where things break. In this case, it
seems that "perfection" is definitely the enemy of "good enough".
This PR implements the compression option while retaining the
possibility of people adding regex-generating components. The
compression code used in ORTE is consolidated into the opal/compress
framework. That framework currently held bzip and gzip components for
use in compressing checkpoint files - since we no longer support C/R, I
have .opal_ignore'd those components.
However, I have left the original framework APIs alone in case someone
ever decides to redo C/R. The APIs of interest here are added to the
framework - specifically, the "compress_block" and "decompress_block"
functions. I then moved the ORTE zlib compression code into a new
component in this framework.
Unfortunately, the framework currently is a single-select one - i.e.,
only one active component at a time. Since I .opal_ignore'd the other
two and made the priority of zlib high, this isn't a problem. However,
if someone wants to re-enable bzip/gzip or add another component, they
might need to transition opal/compress to a multi-select framework.
Included changes:
* Consolidate the compression code into the opal/compress framework
* Move the ORTE zlib compression code into a new opal/compress/zlib
component
* Ignore the bzip and gzip components in opal/compress framework
* Add a "compress_base_limit" MCA param to set the threshold above which
we compress data - defaults to 4096 bytes
* Delete stale brucks and rcd components from orte/grpcomm framework
* Delete the orte/regx framework
* Update the launch system to use opal/compress instead of string regex
* Provide a default module if no zlib is available
* Fix some misc multi-node issues
* Properly generate the nidmap in response to a "connection warmup"
message so the remote daemon knows the children it needs to launch.
* Remove stale references to orte_node_regex
* opal_byte_object_t's are not OPAL objects - properly release allocated
memory.
* Set the topology
* Currently only handling homogeneous case
* Update the compress framework files to conform
* Consolidate open/close into one "frame" file. Ensure we open/close the
framework
Signed-off-by: Ralph Castain <rhc@pmix.org>
This commit fixes the ordering of the teardown for
opal_finalize_util. The installdirs and if frameworks need to come
down before the MCA system.
Fixes#6259
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit contains the following changes:
- Remove the unused opal_test_init/opal_test_finalize
functions. These functions are not used by anything in the code
base or MTT. Tests use opal_init_util/opal_finalize_util instead.
- Get rid of gotos in opal_init_util and opal_init. Replaced them
with a cleaner solution.
- Automatically register cleanup functions in init functions. The
cleanup functions are executed in the reverse order of the
initialization functions. The cleanup functions are run in
opal_finalize_util() before tearing down the class system.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit added MCA param `opal_max_thread_in_progress` to set the
number of threads allowed to do opal_progress concurrently. The default
value is 1.
Component with multithreaded design can benefit from this change to
parallelize their component progress function.
Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
The Open MPI code base assumed that asprintf always behaved like
the FreeBSD variant, where ptr is set to NULL on error. However,
the C standard (and Linux) only guarantee that the return code will
be -1 on error and leave ptr undefined. Rather than fix all the
usage in the code, we use opal_asprintf() wrapper instead, which
guarantees the BSD-like behavior of ptr always being set to NULL.
In addition to being correct, this will fix many, many warnings
in the Open MPI code base.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
This commit updates the entire codebase to use specific opal types for
all atomic variables. This is a change from the prior atomic support
which required the use of the volatile keyword. This is the first step
towards implementing support for C11 atomics as that interface
requires the use of types declared with the _Atomic keyword.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
libevent does not support multiple threads calling the event loop on
the same event base. This causes external libevent's to print out
re-entrant warning messages. This commit fixes the issue by protecting
the call to the event loop with an atomic swap check.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This is a point-in-time update that includes support for several new PMIx features, mostly focused on debuggers and "instant on":
* initial prototype support for PMIx-based debuggers. For the moment, this is restricted to using the DVM. Supports direct launch of apps under debugger control, and indirect launch using prun as the intermediate launcher. Includes ability for debuggers to control the environment of both the launcher and the spawned app procs. Work continues on completing support for indirect launch
* IO forwarding for tools. Output of apps launched under tool control is directed to the tool and output there - includes support for XML formatting and output to files. Stdin can be forwarded from the tool to apps, but this hasn't been implemented in ORTE yet.
* Fabric integration for "instant on". Enable collection of network "blobs" to be delivered to network libraries on compute nodes prior to local proc spawn. Infrastructure is in place - implementation will come later.
* Harvesting and forwarding of envars. Enable network plugins to harvest envars and include them in the launch msg for setting the environment prior to local proc spawn. Currently, only OmniPath is supported. PMIx MCA params control which envars are included, and also allows envars to be excluded.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Reading the system clock on every call to opal_progress() is an
expensive operation on most architectures, and it can negatively affect
the performance, for example of message rate benchmarks.
We change opal_progress() to read the clock once per 8 calls, unless
there are active users of the event mechanism.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
This commit renames the arithmetic atomic operations in opal to
indicate that they return the new value not the old value. This naming
differentiates these routines from new functions that return the old
value.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit eliminates the old opal_atomic_bool_cmpset functions. They
have been replaced by the opal_atomic_compare_exchange_strong
functions.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit renames the atomic compare-and-swap functions to indicate
the return value. This is in preperation for adding support for a
compare-and-swap that returns the old value. At the same time the
return type has been changed to bool.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Initialize the reachable framework during opal_init() and tear
it back down during opal_finalize(). The framework was never
used, so the lack of initialization didn't matter, but this is
a required step in using the framework.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Still in the "needs to be done" category:
* mapping/ranking/binding options aren't correctly supported
* if the DVM encounters some errors (e.g., not enough resources for the job), the resulting error is globally set and impacts any subsequent job submission
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Passed the below set of symbols into a script that added ompi_ to them all.
Note that if processing a symbol named "foo" the script turns
foo into ompi_foo
but doesn't turn
foobar into ompi_foobar
But beyond that the script is blind to C syntax, so it hits strings and
comments etc as well as vars/functions.
coll_base_comm_get_reqs
comm_allgather_pml
comm_allreduce_pml
comm_bcast_pml
fcoll_base_coll_allgather_array
fcoll_base_coll_allgatherv_array
fcoll_base_coll_bcast_array
fcoll_base_coll_gather_array
fcoll_base_coll_gatherv_array
fcoll_base_coll_scatterv_array
fcoll_base_sort_iovec
mpit_big_lock
mpit_init_count
mpit_lock
mpit_unlock
netpatterns_base_err
netpatterns_base_verbose
netpatterns_cleanup_narray_knomial_tree
netpatterns_cleanup_recursive_doubling_tree_node
netpatterns_cleanup_recursive_knomial_allgather_tree_node
netpatterns_cleanup_recursive_knomial_tree_node
netpatterns_init
netpatterns_register_mca_params
netpatterns_setup_multinomial_tree
netpatterns_setup_narray_knomial_tree
netpatterns_setup_narray_tree
netpatterns_setup_narray_tree_contigous_ranks
netpatterns_setup_recursive_doubling_n_tree_node
netpatterns_setup_recursive_doubling_tree_node
netpatterns_setup_recursive_knomial_allgather_tree_node
netpatterns_setup_recursive_knomial_tree_node
pml_v_output_close
pml_v_output_open
intercept_extra_state_t
odls_base_default_wait_local_proc
_event_debug_mode_on
_evthread_cond_fns
_evthread_id_fn
_evthread_lock_debugging_enabled
_evthread_lock_fns
cmd_line_option_t
cmd_line_param_t
crs_base_self_checkpoint_fn
crs_base_self_continue_fn
crs_base_self_restart_fn
event_enable_debug_output
event_global_current_base_
event_module_include
eventops
sync_wait_mt
trigger_user_inc_callback
var_type_names
var_type_sizes
Signed-off-by: Mark Allen <markalle@us.ibm.com>
* `ompi_info --show-failed` will include the failed components along
with information about why they failed.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
* Add a path for failed component load information to be reported up.
* This allows ompi_info to display this information inline to make it
easier for folks to see if the component is present but failed for
some reason. Most likely a missing library, but could be a libnl
conflict.
* Add MCA parameter to enable this feature:
- `mca_base_component_track_load_errors` takes a boolean
- Default: `false`
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
This commit adds a new base enumerator type for variables that take of
the values -1, 0, and 1. These values are mapped to the strings auto,
false, true. This commit updates the mpi_leave_pinned MCA variable to
use the new enumerator.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
- New MCA option: opal_stacktrace_output
- Specifies where the stack trace output stream goes.
- Accepts: none, stdout, stderr, file[:filename]
- Default filename 'stacktrace'
- Filename will be `stacktrace.PID`, or if VPID is available,
then the filename will be `stacktrace.VPID.PID`
- Update util/stacktrace to allow for different output avenues
including files. Previously this was hardcoded to 'stderr'.
- Since opal_backtrace_print needs to be signal safe, passing it a
FILE object that actually represents a file stream is difficult. This
is because we cannot open the file in the signal handler using
`fopen` (not safe), but have to use `open` (safe). Additionally, we
cannot use `fdopen` to convert the `int fd` to a `FILE *fh` since it
is also not signal safe.
- I did not want to break the backtrace.h API so I introduced a new
rule (documented in `backtrace.c`) that if the `FILE *file`
argument is `NULL` then look for the `opal_stacktrace_output_fileno`
variable to tell you which file descriptor to use for output.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
there is no such thing as pthread_join(main_thread), so key destructors
are never invoked on the main thread, which causes valgrind report
some memory leaks. Manually store and then invoke the key destructors and
make valgrind happy.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
the class system can be initialized/finalized as many times as we like,
so there is no more need to have opal_class_finalize() invoked in a destructor
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Still not completely done as we need a better way of tracking the routed module being used down in the OOB - e.g., when a peer drops connection, we want to remove that route from all conduits that (a) use the OOB and (b) are routed, but we don't want to remove it from an OFI conduit.
if MPI_Init[_thread]/MPI_Finalize and MPI_T_init_thread/MPI_T_finalize
are balanced, opal_initialized is zero, and hence opal_cleanup destructor
never invokes opal_class_finalize.
if MPI_Init[_thread] nor MPI_T_init_thread have been called, classes is NULL,
so opal_class_finalize does nothing
This commit adds another check to the low-priority callback
conditional that short-circuits the atomic-add if there are no
low-priority callbacks. This should improve performance in the common
case.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>