5c770a7bec
There was a race condition in opal_free_list_get. Code throughout the Open MPI codebase was assuming that a NULL return from this function was due to an out-of-memory condition. In some cases this can lead to a fatal condition (MPI_Irecv and MPI_Isend in pml/ob1 for example). Before this commit opal_free_list_get_mt looked like this: ```c static inline opal_free_list_item_t *opal_free_list_get_mt (opal_free_list_t *flist) { opal_free_list_item_t *item = (opal_free_list_item_t*) opal_lifo_pop_atomic (&flist->super); if (OPAL_UNLIKELY(NULL == item)) { opal_mutex_lock (&flist->fl_lock); opal_free_list_grow_st (flist, flist->fl_num_per_alloc); opal_mutex_unlock (&flist->fl_lock); item = (opal_free_list_item_t *) opal_lifo_pop_atomic (&flist->super); } return item; } ``` The problem is in a multithreaded environment is *is* possible for the free list to be grown successfully but the thread calling opal_free_list_get_mt to be left without an item. The happens if between the calls to opal_lifo_push_atomic in opal_free_list_grow_st and the call to opal_lifo_pop_atomic other threads pop all the items added to the free list. This commit fixes the issue by ensuring the thread that successfully grew the free list **always** gets a free list item. Fixes #2921 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> |
||
---|---|---|
.. | ||
Makefile.am | ||
opal_bitmap.c | ||
opal_bitmap.h | ||
opal_fifo.c | ||
opal_fifo.h | ||
opal_free_list.c | ||
opal_free_list.h | ||
opal_graph.c | ||
opal_graph.h | ||
opal_hash_table.c | ||
opal_hash_table.h | ||
opal_hotel.c | ||
opal_hotel.h | ||
opal_interval_tree.c | ||
opal_interval_tree.h | ||
opal_lifo.c | ||
opal_lifo.h | ||
opal_list.c | ||
opal_list.h | ||
opal_object.c | ||
opal_object.h | ||
opal_pointer_array.c | ||
opal_pointer_array.h | ||
opal_rb_tree.c | ||
opal_rb_tree.h | ||
opal_ring_buffer.c | ||
opal_ring_buffer.h | ||
opal_tree.c | ||
opal_tree.h | ||
opal_value_array.c | ||
opal_value_array.h |