This commit adds another check to the low-priority callback
conditional that short-circuits the atomic-add if there are no
low-priority callbacks. This should improve performance in the common
case.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The OPAL_ENABLE_MULTI_THREADS macro is always defined as 1. This was
causing us to always use the multi-thread path for synchronization
objects. The code has been updated to use the opal_using_threads()
function. When MPI_THREAD_MULTIPLE support is disabled at build time
(2.x only) this function is a macro evaluating to false so the
compiler will optimize out the MT-path in this case. The
OPAL_ATOMIC_ADD_32 macro has been removed and replaced by the existing
OPAL_THREAD_ADD32 macro.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a performance regression introduced by the request
rework. We were always using the multi-thread path because
OPAL_ENABLE_MULTI_THREADS is either not defined or always defined to 1
depending on the Open MPI version. To fix this I removed the
conditional and added a conditional on opal_using_threads(). This path
will be optimized out in 2.0.0 in a non-thread-multiple build as
opal_using_threads is #defined to false in that case.
Fixesopen-mpi/ompi#1806
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Need to increment the total size after checking the local offset not
before. This typo causes large allocations with MPI_Win_allocate() to
fail.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Fixed an error where if there were no MPI exceptions, a
JNI error could still exist and not get handled.
Signed-off-by: Nathaniel Graham <nrgraham23@gmail.com>
The way the gni btl is currently coded,
it will run completely out of gas on KNL at
123 processes/node. Since there are bound to be
those who try to run a MPI process/hyperthread
on KNL nodes, the fma sharing mode needs to be requested.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
This seems like an obvious typo: insert a missing "break" statement so
that we don't fall through to the next case.
Fixes CIDs 1362756 and 1362764.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
It is valid for any rank to deviate on the split_type argument if they
specify MPI_UNDEFINED. The code was incorrectly not allowing this
condition. Changed the split_type uniformity check and allow
local_size to be 0 if the local split_type is MPI_UNDEFINED.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes a compile error on 32-bit platforms. The
low-priority call counter was always using 64-bit atomics which will
not work if 64-bit atomic math is not available. Updated to use 32-bit
instead.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Newer versions of gcc have "poisoned" the __malloc_initialize_hook
name and it can no longer be used. Added a configure check and
protection around its usage.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
According to MPI-3.1 P.122, `ni` for `MPI_COMBINER_DARRAY`
should be `4*ndims+4`, not `4*size+4`.
This bug may cause SEGV if `size` is smaller than `ndims`
when the darray is used for one-sided communication (pt2pt OSC).
This bug was introduced in open-mpi/ompi@79b13f36 (when darray
became a first class citizen and the `a_i` index of darray was
shifted by 2). The corresponding `MPI_Type_create_darray()`
function sets a right value so we don't need to update the function.