There was a race condition in opal_free_list_get. Code throughout the
Open MPI codebase was assuming that a NULL return from this function
was due to an out-of-memory condition. In some cases this can lead to
a fatal condition (MPI_Irecv and MPI_Isend in pml/ob1 for
example). Before this commit opal_free_list_get_mt looked like this:
```c
static inline opal_free_list_item_t *opal_free_list_get_mt (opal_free_list_t *flist)
{
opal_free_list_item_t *item =
(opal_free_list_item_t*) opal_lifo_pop_atomic (&flist->super);
if (OPAL_UNLIKELY(NULL == item)) {
opal_mutex_lock (&flist->fl_lock);
opal_free_list_grow_st (flist, flist->fl_num_per_alloc);
opal_mutex_unlock (&flist->fl_lock);
item = (opal_free_list_item_t *) opal_lifo_pop_atomic (&flist->super);
}
return item;
}
```
The problem is in a multithreaded environment is *is* possible for the
free list to be grown successfully but the thread calling
opal_free_list_get_mt to be left without an item. The happens if
between the calls to opal_lifo_push_atomic in opal_free_list_grow_st
and the call to opal_lifo_pop_atomic other threads pop all the items
added to the free list.
This commit fixes the issue by ensuring the thread that successfully
grew the free list **always** gets a free list item.
Fixes#2921
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 5c770a7becc496f63b9f9a59151206236416f4f4)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
While we require C99 to build Open MPI, we do not require C99 to build
user MPI applications. As such, we shouldn't have C99-style comments
(i.e., "//"-style) in mpi.h.in.
Thanks to @AdamSimpson for reporting the issue.
This commit simply converts a //-style comment to a /**/-style
comment. No code or logic changes.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f4b3ccabf726eaec6d39cbcff809882da55ae1e5)
Make sure all pending communications are done on all ranks before
closing the window. This way it will be safe to close the endpoints when
closing the component.
(picked from master b8e1af6)
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
OpenJDK 11 changed the default javadoc output HTML version to HTML 5
from HTML 4.01. It causes an error on building Open MPI configured
with `--enable-mpi-java` (default: disable). This fix is compatible
with older OpenJDK.
I don't know whether this problem exists with other vender's JDKs.
But this fix should be compatible with other JDKs because the new
syntax is used in other places in the same file.
Thanks to Siegmar Gross for the bug report.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit b491b454dc304a72c03970326880fbd01641a3d3)
A typo inadvertantly crept in to e836dbd506. Add the extra '-' to
make it correctly search for --with-*=internal.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 7675956b8fd739b150d3bdd14c265fd728786201)
Ignore with-hwloc=internal or external as those are meaningless to pmix
(will upstream)
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit c498a7e77a377ddc3a7bcc26ea072627a33cb470)
If we are using the internal PMIx component and the embedded library fails to configure, then fail - don't silently fail to build and then fail in execution
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit f379ba9c8e5ce17641937c351ab46e4b4a82446c)
Our components that have a --with-foo configure option won't know what
to do with a value of "internal". This scenario only occurs with hwloc
and libevent, both of which are statically contained in libopen-pal
Thanks to @jsquyres for the diff
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit e836dbd506502c797f5bff2f9761510fad4858cd)
Fortran bindings were added to persistent collectives in 9e0115c980
but man was not updated.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 43d85dbc815bc5fd02c12810fb7c9f2ddd294f12)
Changes of nonblocking collectives in e98d794e8b and f750c6932c
are applied to persistent collectives.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 357531847e819bf8c33224e0cc9004ed47b42445)
Following the commit f750c6932c, I compared
`ompi/mpi/fortran/use-mpi-f08/*.F90` and
`ompi/mpi/fortran/use-mpi-f08/profile/p*.F90`, and
`ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces.F90` and
`ompi/mpi/fortran/use-mpi-f08/mod/pmpi-f08-interfaces.F90`.
There are many differences. Some are bugs of `MPI_*`, some are
bugs of `PMPI_*`. I'm not sure how these bugs affect applications.
To make it easy to compare these files future, I also removed
editorial differences.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit cf6d28cb66981cf315f84e5efb8f8256b6c6a3a4)
This commit works around an Oracle C compiler bug in 5.15 (not sure
when it was introduced). The bug is triggered when we chain
assignments of atomic variables. Ex:
_Atomic intptr x, y;
intptr_t z = 0;
x = y = z;
Will produce a compiler error of the form:
operand cannot have void type: op "="
assignment type mismatch:
long "=" void
To work around the issue we are removing the chain assignment and
setting the head and tail on different lines.
Fixes#5814
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit dfa8d3a81ac64e32d2bfb13a9afb20b83d747e03)
On some platfoms reading a 64-bit value is non-atomic and it is
possible that the two 32-bit values are read in the wrong order. To
ensure the tag is always read first this commit reads the tag before
reading the full 64-bit value.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 66a7dc4c72cb25df67e7f872bee7a20b5fa9c763)