This commit adds opal_using_threads() protection around the atomic
operation in OBJ_RETAIN/OBJ_RELEASE. This resolves the performance
issues seen when running psm with MPI_THREAD_SINGLE.
To avoid issues with header dependencies opal_using_threads() has been
moved to a new header (thread_usage.h). The OPAL_THREAD_ADD* and
OPAL_THREAD_CMPSET* macros have also been relocated to this header.
This commit is cherry-picked off a fix that was submitted for the v1.8
release series but never applied to master. This fixes part of the
problem reported by @nysal in #1902.
(cherry picked from commit open-mpi/ompi-release@ce91307918)
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
The max inline send size on a queue pair is not available until after
the endpoint is connected. Before this commit the send flags
(including the inline flag) were set before this value was
initialized. This commit moves setting the send_flags down to
mca_btl_openib_put_internal which is only called after the endpoint is
connected. This fixes a bug when using osc/rdma.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This fixes a regression from open-mpi/ompi@dc5adc5a91
Fortran is only required by ompi, so m4_ifdef([project_ompi],...) protect Fortran related stuff in opal
Fixesopen-mpi/ompi#1884
This fixes a regression from open-mpi/ompi@dc5adc5a91
Fortran is only required by ompi, so m4_ifdef([project_ompi],...) protect Fortran related stuff in opal
Fixesopen-mpi/ompi#1884
This commit fixes typos on the C side of the request-based RMA binding. We
were not returning the request on success but on failure. Thanks to
@alazzaro for reporting and @ggouaillardet, and @vondele for tracking
this down.
Fixes part of open-mpi/ompi#1869
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit introduces a new algorithm for MPI_Comm_split_type. The
old algorithm performed an allgather on the communicator to decide
which processes were part of the new communicators. This does not
scale well in either time or memory.
The new algorithm performs a couple of all reductions to determine the
global parameters of the MPI_Comm_split_type call. If any rank gives
an inconsistent split_type (as defined by the standard) an error is
returned without proceeding further. The algorithm then creates a
communicator with all the ranks that match the split_type (no
communication required) in the same order as the original
communicator. It then does an allgather on the new communicator (which
should be much smaller) to determine 1) if the new communicator is in
the correct order, and 2) if any ranks in the new communicator
supplied MPI_UNDEFINED as the split_type. If either of these
conditions are detected the new communicator is split using
ompi_comm_split and the intermediate communicator is freed.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit simplifies the communicator context ID generation by
removing the blocking code. The high level calls: ompi_comm_nextcid
and ompi_comm_activate remain but now call the non-blocking variants
and wait on the resulting request. This was done to remove the
parallel paths for context ID generation in preperation for further
improvements of the CID generation code.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>