1
1
openmpi/opal/class
Nathan Hjelm 5c770a7bec opal/free_list: fix race condition
There was a race condition in opal_free_list_get. Code throughout the
Open MPI codebase was assuming that a NULL return from this function
was due to an out-of-memory condition. In some cases this can lead to
a fatal condition (MPI_Irecv and MPI_Isend in pml/ob1 for
example). Before this commit opal_free_list_get_mt looked like this:

```c
static inline opal_free_list_item_t *opal_free_list_get_mt (opal_free_list_t *flist)
{
    opal_free_list_item_t *item =
        (opal_free_list_item_t*) opal_lifo_pop_atomic (&flist->super);

    if (OPAL_UNLIKELY(NULL == item)) {
        opal_mutex_lock (&flist->fl_lock);
        opal_free_list_grow_st (flist, flist->fl_num_per_alloc);
        opal_mutex_unlock (&flist->fl_lock);
        item = (opal_free_list_item_t *) opal_lifo_pop_atomic (&flist->super);
    }

    return item;
}
```

The problem is in a multithreaded environment is *is* possible for the
free list to be grown successfully but the thread calling
opal_free_list_get_mt to be left without an item. The happens if
between the calls to opal_lifo_push_atomic in opal_free_list_grow_st
and the call to opal_lifo_pop_atomic other threads pop all the items
added to the free list.

This commit fixes the issue by ensuring the thread that successfully
grew the free list **always** gets a free list item.

Fixes #2921

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-16 13:17:09 -06:00
..
Makefile.am opal/class: add a new class: opal_interval_tree_t 2018-02-26 13:35:56 -07:00
opal_bitmap.c opal/bitmap: fix opal_bitmap_set_bit() 2017-12-27 14:56:43 +09:00
opal_bitmap.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
opal_fifo.c opal: add types for atomic variables 2018-09-14 10:48:55 -06:00
opal_fifo.h class/opal_fifo: fix warning 2018-10-15 19:18:31 -06:00
opal_free_list.c opal/free_list: fix race condition 2018-10-16 13:17:09 -06:00
opal_free_list.h opal/free_list: fix race condition 2018-10-16 13:17:09 -06:00
opal_graph.c opal: fix coverity issues 2017-06-23 08:15:34 -06:00
opal_graph.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
opal_hash_table.c Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
opal_hash_table.h opal: add the OPAL_HASH_TABLE_FOREACH macro 2016-10-08 16:58:20 +09:00
opal_hotel.c opal hotel: only delete events that have not yet fired 2016-01-13 10:59:06 -08:00
opal_hotel.h opal hotel: only delete events that have not yet fired 2016-01-13 10:59:06 -08:00
opal_interval_tree.c opal: add types for atomic variables 2018-09-14 10:48:55 -06:00
opal_interval_tree.h opal: add types for atomic variables 2018-09-14 10:48:55 -06:00
opal_lifo.c opal: add types for atomic variables 2018-09-14 10:48:55 -06:00
opal_lifo.h opal: add types for atomic variables 2018-09-14 10:48:55 -06:00
opal_list.c opal/asm: add fetch-and-op atomics 2017-11-30 10:41:23 -07:00
opal_list.h opal: add types for atomic variables 2018-09-14 10:48:55 -06:00
opal_object.c opal/atomic: always use C11 atomics if available 2018-09-14 10:51:05 -06:00
opal_object.h opal: add types for atomic variables 2018-09-14 10:48:55 -06:00
opal_pointer_array.c Dont assume a size for constants with UL and ULL. 2017-06-05 22:07:53 -04:00
opal_pointer_array.h Improve the opal_pointer_array & more (#3369) 2017-04-18 21:41:26 -04:00
opal_rb_tree.c Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
opal_rb_tree.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
opal_ring_buffer.c Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
opal_ring_buffer.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
opal_tree.c opal/asm: rename existing arithmetic atomic functions 2017-11-30 10:41:22 -07:00
opal_tree.h opal: add types for atomic variables 2018-09-14 10:48:55 -06:00
opal_value_array.c Purge whitespace from the repo 2015-06-23 20:59:57 -07:00
opal_value_array.h Purge whitespace from the repo 2015-06-23 20:59:57 -07:00