eaa98af52c
There was a race condition in opal_free_list_get. Code throughout the
Open MPI codebase was assuming that a NULL return from this function
was due to an out-of-memory condition. In some cases this can lead to
a fatal condition (MPI_Irecv and MPI_Isend in pml/ob1 for
example). Before this commit opal_free_list_get_mt looked like this:
```c
static inline opal_free_list_item_t *opal_free_list_get_mt (opal_free_list_t *flist)
{
opal_free_list_item_t *item =
(opal_free_list_item_t*) opal_lifo_pop_atomic (&flist->super);
if (OPAL_UNLIKELY(NULL == item)) {
opal_mutex_lock (&flist->fl_lock);
opal_free_list_grow_st (flist, flist->fl_num_per_alloc);
opal_mutex_unlock (&flist->fl_lock);
item = (opal_free_list_item_t *) opal_lifo_pop_atomic (&flist->super);
}
return item;
}
```
The problem is in a multithreaded environment is *is* possible for the
free list to be grown successfully but the thread calling
opal_free_list_get_mt to be left without an item. The happens if
between the calls to opal_lifo_push_atomic in opal_free_list_grow_st
and the call to opal_lifo_pop_atomic other threads pop all the items
added to the free list.
This commit fixes the issue by ensuring the thread that successfully
grew the free list **always** gets a free list item.
Fixes #2921
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit
|
||
---|---|---|
.. | ||
class | ||
datatype | ||
dss | ||
etc | ||
include | ||
mca | ||
memoryhooks | ||
runtime | ||
test/reachable | ||
threads | ||
tools | ||
util | ||
win32 | ||
common_sym_whitelist.txt | ||
Makefile.am | ||
win_makefile |