Based on current implementation it is faster to use a blocking send than the non-blocking version. Switch the exchange function used in the barrier to use the blocking version combined with the non-blocking version of the receive.