It seems in some cases (gcc older than v6.0.0) the __atomic_thread_fence is a
no-op with __ATOMIC_ACQUIRE. This appears to be the case with X86_64 so go
ahead and use __ATOMIC_SEQ_CST for the x86_64 read memory barrier. This should
not cause any performance issues as it is equivalent to the memory barrier
in the hand-written atomics.
References #6014
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 30119ee339eea086f43e3392352899187a4a73c7)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Commit 89da9651b inadvertantly #if'ed out both deprecated *and*
removed items from mpi.h. The intent was only to #if out items that
have been *removed* from the MPI specification and leave all items
that are merely deprecated.
This commit also re-orders the deleted typedef+functions to be in the
same order as they are listed in MPI-3.1 chapter 17, just to make
verifying/checking the code easier.
Note that --enable-mpi1-compatibility can still be used to restore
prototypes for the items that have been removed from the MPI
specification (e.g., MPI_Address()).
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit b03a39d359b019d2d7803d194fd03b2fcdffddce)
Under certain circumstances, ibv_exp_query_device was
returning an error due to uninitialized fields in the
extended attributes struct.
Fixes: #5810Fixes: #5914
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 8126779a354b3e0c720d3e1790f7b936dd5b93b2)
- added <cr> to split API groups to simplify human processing
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 6e7810208966d73e0a56f74b536aa5c56b9a8d1c)
- added missing file to profile makefile
- constants SHMEM_CTX_* are shifted into public header
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 4a3e83780c0303e7e4d0ff92d7ba85d3a2239737)
PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.
MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 88d781056f43934a93e16db556b340e72cdd3742)
mca_scoll_basic_alltoall() passed (pSync + 1) to barrier function, but
the value of _SHMEM_ALLTOALL_SYNC_SIZE is 1, which made the barrier
function use an invalid memory location. In particular, this location
was not initialized to _SHMEM_SYNC_VALUE, which broke the barrier
algorithm and it did not complete: One PE could read 0 from its peer and
assume the peer already started the barrier, and then write 1 to the
peer. Then, the peer entered the barrier and overwrote the 1 with 0, and
then it waited forever to see '1' in its pSync.
Found with shmem_verifier test suite.
(picked from master 6754bf1)
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
Users should migrate to https://github.com/pmix/prrte
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 1bd772e8ebf66f705537b9a6e1af2b6093ef8471)
Only default to the external component if its version is
greater or equal than the internal libevent (2.0.22)
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit b2050392051ed3d9d842f105326f7ea2223aafc4)
- Always use the external component when configure'd with --with-libevent=external
- Fix the external libevent library version detection
by testing _EVENT_NUMERIC_VERSION and EVENT__NUMERIC_VERSION macros
- Use the event2/event.h header (event.h is deprecated since libevent 2.0
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 35e77a286c369c1be66ee7ef9ad5ec2faef47edb)
If we detect that we are being debugged by an MPIR-based debugger, then
print a warning that OMPI's MPIR support has been deprecated and will be
removed in a subsequent release.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 2cb271716beae08b5e5e20f30a5a2fe3e5c50c5e)
check for providing a data representation that is actually supported
by ompio.
Add also one check for a non-NULL pointer in mpi/c/file_set_view
for the data representation.
Also fixes parts of issue #5643
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
this ensures that all processes are done modifying a file
before syncing. Fixes an error in the testmpio testsuite.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
return MPI_ERR_ARG if the size of the fileview is not a
multiple of the size of the etype provided.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
return MPI_ERR_ACCESS if the user tries to read from a file
that was opened using MPI_MODE_WRONLY
return MPI_ERR_READ_ONLY if the user tries to write a file
that was opened using MPI_MODE_RDONLY
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
This commit fixes a deadlock that can occur when using a TL that
supports the connect to endpoint model. The deadlock was occurring
while processing an incoming connection requests. This was done from
an active-message callback. For some unknown reason (at this time)
this callback was sometimes hanging. To avoid the issue the connection
active-message is saved for later processing.
At the same time I cleaned up the connection code to eliminate
duplicate messages when possible.
This commit also fixes some bugs in the active-message send path:
- Correctly set all fragment fields in prepare_src.
- Fix bug when using buffered-send. We were not reading the return
code correctly (which is in bytes). This resulted in a message
getting sent multiple times.
- Don't try to progress sends from the btl_send function when in an
active-message callback. It could lead to deep recursion and an
eventual crash if we get a trace like
send->progress->am_complete->ob1_callback->send->am_complete...
Closes#5820Closes#5821
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 707d35deeb62a93ea8a3806d07e07e3a96c51d19)
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
There was a race condition in opal_free_list_get. Code throughout the
Open MPI codebase was assuming that a NULL return from this function
was due to an out-of-memory condition. In some cases this can lead to
a fatal condition (MPI_Irecv and MPI_Isend in pml/ob1 for
example). Before this commit opal_free_list_get_mt looked like this:
```c
static inline opal_free_list_item_t *opal_free_list_get_mt (opal_free_list_t *flist)
{
opal_free_list_item_t *item =
(opal_free_list_item_t*) opal_lifo_pop_atomic (&flist->super);
if (OPAL_UNLIKELY(NULL == item)) {
opal_mutex_lock (&flist->fl_lock);
opal_free_list_grow_st (flist, flist->fl_num_per_alloc);
opal_mutex_unlock (&flist->fl_lock);
item = (opal_free_list_item_t *) opal_lifo_pop_atomic (&flist->super);
}
return item;
}
```
The problem is in a multithreaded environment is *is* possible for the
free list to be grown successfully but the thread calling
opal_free_list_get_mt to be left without an item. The happens if
between the calls to opal_lifo_push_atomic in opal_free_list_grow_st
and the call to opal_lifo_pop_atomic other threads pop all the items
added to the free list.
This commit fixes the issue by ensuring the thread that successfully
grew the free list **always** gets a free list item.
Fixes#2921
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 5c770a7becc496f63b9f9a59151206236416f4f4)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Thanks to @hjelmn for debugging it and providing the patch
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit efa8bcc17078c89f1c9d6aabed35c90973a469bf)
(cherry picked from commit 647a760b7e24194b37571a8245d8d39ed202e75b)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>