When querying an info value, copy out exactly as many characters as
the caller asked for -- do not artificially truncate the target just
to ensure that it is \0-terminated.
Specifically: do not use opal_string_copy() to copy info values,
because opal_string_copy() will guarantee to \0-terminate the target,
even if it means truncating the target. E.g., if the caller calls
opal_info_get_nolock() with valuelen=5, opal_string_copy() will return
"1234\0" -- which is wrong. This commit fixes the behavior to return
"12345".
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This commit contains the following changes:
- Remove the unused opal_test_init/opal_test_finalize
functions. These functions are not used by anything in the code
base or MTT. Tests use opal_init_util/opal_finalize_util instead.
- Get rid of gotos in opal_init_util and opal_init. Replaced them
with a cleaner solution.
- Automatically register cleanup functions in init functions. The
cleanup functions are executed in the reverse order of the
initialization functions. The cleanup functions are run in
opal_finalize_util() before tearing down the class system.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
PR #5241 provided an MCA variable to allow multi-threaded opal_progress.
However, it allowed to update the linked list even when multiple threads was
allowed to call opal_progress. This caused a scenario when a more recent thread
could complete it's progress and fail the assert(sync ==
wait_sync_list).
Allowing to update the linked list only for the case when the number of threads
exceeds the threshold fixes the problem.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
remove whitespace around '=' when setting btl_uct_LIBS
Thanks Ake Sandgren for reporting this
Refs. open-mpi/ompi#6173
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Always use size_t (instead of converting to an uint32_t) in order to
correctly support large datatypes.
Thanks Ben Menadue for the initial bug report
Refs open-mpi/ompi#6016
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Though not a recommended configuration it is possible to use Open MPI
over UCX over uGNI. This configuration had some issues related to the
connection management and tl selection. This commit fixes those
issues.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
If UCX is available, then pml/ucx will be used instead of
pml/ob1 + btl/openib, so there is no need to warn about
btl/openib not supporting Infiniband.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Add the --pset option for app_contexts so the user can provide a string
name for each app_context. Use the new PMIx pset attribute to store the
names in the PMIx local storage for retrieval
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Now Open MPI requires a C99 compiler. Checking availability of
the following types is no more needed.
- `long long` (`signed` and `unsigned`)
- `long double`
- `float _Complex`
- `double _Complex`
- `long double _Complex`
Furthermore, the `#if HAVE_[TYPE]` style checking is not correct.
Availability of C types is checked by `AC_CHECK_TYPES` in `configure.ac`.
`AC_CHECK_TYPES` defines macro `HAVE_[TYPE]` as `1` in `opal_config.h`
if the `[TYPE]` is available. But it does not define `HAVE_[TYPE]`
(instead of defining as `0`) if it is not available. So even if we
need `HAVE_[TYPE]` checking, it should be `#if defined(HAVE_[TYPE])`.
I didn't remove `AC_CHECK_TYPES` for these types in `configure.ac`
since someone may use `HAVE_[TYPE]` macros somewhere.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
It seems in some cases (gcc older than v6.0.0) the __atomic_thread_fence is a
no-op with __ATOMIC_ACQUIRE. This appears to be the case with X86_64 so go
ahead and use __ATOMIC_SEQ_CST for the x86_64 read memory barrier. This should
not cause any performance issues as it is equivalent to the memory barrier
in the hand-written atomics.
References #6014
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit added MCA param `opal_max_thread_in_progress` to set the
number of threads allowed to do opal_progress concurrently. The default
value is 1.
Component with multithreaded design can benefit from this change to
parallelize their component progress function.
Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
Under certain circumstances, ibv_exp_query_device was
returning an error due to uninitialized fields in the
extended attributes struct.
Fixes: #5810Fixes: #5914
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
In 457f058 I broke the TCP BTL with --enable-ipv6. This patch
fixes the compile error, so IPv6 works again.
Fixed#5996
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
* Needed to properly read PMIx job data like the following
- `OPAL_PMIX_LOCALLDR`
- `OPAL_PMIX_RANK`
- `OPAL_PMIX_GLOBAL_RANK`
- `OPAL_PMIX_APPLDR`
- `OPAL_PMIX_APP_RANK`
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
Only default to the external component if its version is
greater or equal than the internal libevent (2.0.22)
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
- Always use the external component when configure'd with --with-libevent=external
- Fix the external libevent library version detection
by testing _EVENT_NUMERIC_VERSION and EVENT__NUMERIC_VERSION macros
- Use the event2/event.h header (event.h is deprecated since libevent 2.0
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
AC_CHECK_DECLS take a comma separated list of macros/symbols,
so replace the whitespace separator with a comma.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
The monitoring PML hides it's existence from the OMPI infrastructure by
removing itself from the list of PML loaded components, remaining hidden
until MPI_Finalize.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Simplify selection of the address to publish for a given BTL TCP
module in the module exchange code. Rather than looping through
all IP addresses associated with a node, looking for one that
matches the kindex of a module, loop over the modules and
use the address stored in the module structure. This also
happens to be the address that the source will use to bind()
in a connect() call, so this should eliminate any confusion
(read: bugs) when an interface has multiple IPs associated with
it.
Refs #5818
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Today, a btl tcp module is associated with exactly one IP
address (IPv4 or IPv6). There's no need to reserve space
for both an IPv4 and IPv6 address in the module structure,
since the module will only be associated with one or the
other.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Work around a race condition in the TCP BTL's proc setup code.
The Cisco MTT results have been failing on TCP tests due to a
"dropped connection" message some percentage of the time.
Some digging shows that the issue happens in a combination of
multiple NICs and multiple threads. The race is detailed in
https://github.com/open-mpi/ompi/issues/3035#issuecomment-429500032.
This patch doesn't fix the race, but avoids it by forcing
the MPI layer to complete all calls to add_procs across the
entire job before any process leaves MPI_INIT. It also
reduces the scalability of the TCP BTL by increasing start-up
time, but better than hanging.
The long term fix is to do all endpoint setup in the first
call to add_procs for a given remote proc, removing the
race. THis patch is a work around until that patch can
be developed.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>