So long sshmem/verbs! After many years of (mostly) faithful service,
it is time to remove the sshmem verbs component. It has been fully
replaced by other components, such as the UCX PML and OFI MTL.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
So long BTL openib! After many years of (mostly) faithful service, it
is time to remove the openib BTL. It has been fully replaced by other
components, such as the UCX PML and OFI MTL.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
PMIx is removing the --enable-embedded-libevent and
--enable-embedded-hwloc flags as they are confusing users. Instead, we
will use the --enable-embedded-mode to handle both of these options.
Update the embedded configury to handle it.
Signed-off-by: Ralph Castain <rhc@pmix.org>
It doesn't seem like the BTL was using uninitialized pointer. But simply
setting the rcache pointer to NULL after destroying it makes the valgrind
errors go away.
Fixes Issue #6345
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
Reset ptypes when cloning a datatype in order to prevent
a double free() in the opal_datatype_t destructor.
This fixes a bug introduced in open-mpi/ompi@7c938f070fFixesopen-mpi/ompi#6346
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
The issue was a little complicated due to the internal stack used in the
convertor. The main issue was that in the case where we run out of iov
space to save the raw description of the data while hanbdling a
repetition (loop), instead of saving the current position and bailing out
directly we reading of the next predefined type element. It worked in
most cases, except the one identified by the HDF5 test. However, the
biggest issue here was the drop in performance for all ensuing calls to
the convertor pack/unpack, as instead of handling contiguous loops as a
whole (and minimizing the number of memory copies) we copied data
description by data description.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
When compiling mpi.h with a modern C++ compiler and a high degree of
pickyness (e.g., -Wold-style-cast), casting using (void*) in the
OMPI_PREDEFINED_GLOBAL and MPI_STATUS*_IGNORE macros will emit
warnings. So if we're compiling with a C++ compiler, use C++'s
static_cast<> instead of (void*).
Thanks to @shadow-fax for identifying the issue.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This commit fixes a bug introduced in
f62d26ddbc8cda4d985cceee531a2ec32406d1f6. That commit changed how
vader allocates fragment memory from the shared memory
segment. Unfortunately, the values used for the fragment sizes did not
include space for the fragment header. This can cause an overrun of
data from one fragment to the header of the next fragment.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
treematch/km_partitioning.c #include "config.h",
but there is no such file when the embedded treematch is used.
In order to prevent the embedded treematch from incorrectly using
the config.h from the embedded hwloc, generate a dummy config.h.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
When we exceed the threshold number of contexts created, print appropriate help
text
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
We missed an assert to check if ALLOW_OVERTAKE is set or not before
validating the sequence number and this will cause deadlock.
Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
This commit reverted pr #6199 as it introduced deadlock in some cases.
Also removed the assert as the condition is obsoleted.
Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
opal_config_bottom.h can only be #include'd in opal_config.h,
so there is no need to #include "opal_config.h" inside.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Some macros defined by the embedded hwloc ends up in opal_config.h
because hwloc configury m4 files are slurped into Open MPI. These
macros are not required here, and they might conflict with an external
hwloc install, so simply #undef them in hwloc/external/external.h
after including <opal_config.h> but before including the external
<hwloc.h>.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
ACCUMULATE, unlike REDUCE, can use with derived
datatypes with predefinied operations, with some
restrictions outlined in MPI-3:11.3.4. The derived
datatype must be composed entierly from one predefined
datatype (so you can do all the construction you want,
but at the bottom, you can only use one datatype, say,
MPI_INT).
Refs. open-mpi/ompi#6275
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Provide the av_attr.count hint (number of addresses that will be
inserted into the address vector through the life of the process)
at initialization of the address vector. It's ok to be a bit
wrong, but some endpoints (RxR) can benefit by not going through
the slow growth realloc churn.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
correctly handle the case in which iovec is full and the
last accessed element of the datatype is the beginning of a loop
Refs. open-mpi/ompi#6285
Thanks Axel Huebl for reporting this
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
correctly free ptypes if the datatype is not pre-defined.
Thanks Axel Huebl for reporting this.
Refs. open-mpi/ompi#6291
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
This commit fixes a problem reported on the mailing list with
individual writes larger than 512 MB.
The culprit is a floating point division of two large, close values.
Changing the datatypes from float to double (which is what is being
used in the fcoll components) fixes the problem.
See issue #6285 and
https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118
Thanks for Axel Huebl and René Widera for reporting the issue.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
With MTLs, there's no "other transport" when the remote side
does not have an active NIC, so we should print a useful error
message when the modex failed (indicating lack of a NIC on
the remote side).
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Similar to #6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.
- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)
Signed-off-by: René Widera <r.widera@hzdr.de>