Compiler implementations are free to include support for atomics that
use locks. Unfortunately lock-free and lock atomics do not mix. Older
versions of llvm on OS X use locks to provide
__atomic_compare_exchange on 128-bit values but are lock-free on
64-bit values. This screws up our lifo implementation which mixes
64-bit and 128-bit atomics on the same values to improve
performance. This commit adds a configure-time check if 128-bit
atomics are lock free. If they are not then the 128-bit __atomic CAS
is disabled and we check for the __sync version as a fallback.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
clang 7.0 with the picky option on is extremely verbose, and complains
about almost everything. Trying to make him happy, at least regarding
the datatype engine.
This commit fixes a bug in opal progress registration that can cause
crashes when a progress function is registered while another thread is
in opal_progress(). Before this commit realloc is used to allocate
more space for progress functions but it is possible for a thread in
opal_progress() to try to read from the array that is freed by realloc
before the array is re-assigned when realloc returns. To prevent this
race use malloc + memcpy to fill the new array and atomically swap out
the old and new array pointers.
Per suggestion we now allocate a default of 8 slots for callbacks and
double the current number when we run out of space.
This commit also fixes leaking the callbacks_lp array.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
* atomic: add support for __atomic builtins
This commit adds support for the gcc __atomic builtins. The __sync
builtins are deprecated and have been replaced by these atomics. In
addition, the new atomics support atomic exchange which was not
supported by __sync.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
* atomic: add support for transactional memory
This commit adds support for using transactional memory when using
opal atomic locks. This feature is enabled if the __HLE__ feature is
available and the gcc builtin atomics are in use.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This fixes https://github.com/open-mpi/ompi/issues/1732: i.e., the
case where the outer project has its own check for
<valgrind/valgrind.h>, but also supplements CPPFLAGS (to find
Valgrind's header files) before doing that check.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Ideally, we would tell OMPI to disable autoconf's caching of our
valgrind check result so that its check gets the right result after
adding CPPFLAGS. Not sure if we can do that.
For now, just disable our Valgrind code in embedded mode.
This will keep the x86 backend enabled under Valgrind but
it will auto-disable itself when finding identical APIC ids anyway
(because CPUID returns same outputs for all PUs).
Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
Fixesopen-mpi/ompi#1732
(cherry picked from commit open-mpi/hwloc@8b44fb1c81)
This commit fixes a programming error when using an aries nic. The
documentation of ugni shows that only the local alignment restriction
for get was lifted on aries. There is still a remote address alignment
restriction.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a programming error when using an aries nic. The
documentation of ugni shows that only the local alignment restriction
for get was lifted on aries. There is still a remote address alignment
restriction.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit adds support for Cray Aries atomic operations. This
includes 32-bit and floating point support.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit add support for more atomic operations and type. The
operations added are logical and, logical or, logical xor, swap, min,
and max. New types are 32-bit int by using the
MCA_BTL_ATOMIC_FLAG_32BIT flag, 64-bit float by using the
MCA_BTL_ATOMIC_FLAG_FLOAT flag, and 32-bit float by using both
flags. Floating point numbers are supported by packing the number in
as an int64_t or int32_t. We will update the btl interface in the
future to make this less confusing.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Before dynamic add_procs support was committed to master we called
add_procs with every proc in the job. The XRC code in the openib btl
was taking advantage of this and setting the number of work queue
entries (WQE) based on all the procs on a remote node. Since that is
no longer the case we can not simply increment the sd_wqe field on the
queue pair. To fix the issue a new field has been added to the xrc
queue pair structure to keep track of how many wqes there are total on
the queue pair. If a new endpoint is added that increases the number
of wqes and the xrc queue pair is already connected the code will
attempt to modify the number of wqes on the queue pair. A failure is
ignored because all that will happen is the number of active send work
requests on an XRC queue pair will be more limited.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The original VADER_MAX_ADDRESS was tunned for x86_64 platforms only.
For non x86_64 platforms we can use XPMEM_MAXADDR_SIZE.
Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
This commit reduces the overhead of calling the ugni progress
function. It does the following:
- Check for new connections once every eight calls.
- Do not call remote smsg progress unless we are connected to at
least one remote peer.
- Do not call rdma progress unless at least one rdma fragment is
outstanding.
- Check endpoint wait list size before obtaining a lock.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit reduces the default exclusivity so that btl/scif is not
used for send/recv over other shared memory transports.
Fixesopen-mpi/ompi#1712
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Our checks and the ones of libevent are somewhat flawed.
If adding multiple "-framework" to CXXFLAGS or CFLAGS, we strip
the keyword from the command-line, not good.
libevent however assumes plain gcc without testing properly
that the compiler supports -Wno-deprecated-declarations.
* Remodel the request.
Added the wait sync primitive and integrate it into the PML and MTL
infrastructure. The multi-threaded requests are now significantly
less heavy and less noisy (only the threads associated with completed
requests are signaled).
* Fix the condition to release the request.
The OPAL_CMA_NEED_SYSCALL_DEFS is always defined/set to 0 or 1. Therefore
instead of checking if the macro is defined, we have to look at the value
itself.
Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
This commit changes the behavior of bml/r2 from conditionally
registering btl progress functions to always registering progress
functions. Any progress function beloning to a btl that is not yet in
use is registered as low-priority. As soon as a proc is added that
will make use of the btl is is re-registered normally.
This works around an issue with some btls. In order to progress a
first message from an unknown peer both ugni and openib need to have
their progress functions called. If either btl is not in use after the
first call to add_procs the callback was never happening. This commit
ensures the btl progress function is called at some point but the
number of progress callbacks is reduced from normal to ensure lower
overhead when a btl is not used. The current ratio is 1 low priority
progress callback for every 8 calls to opal_progress().
Fixesopen-mpi/ompi#1676
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>