-Updated blocking send to directly call functionality and
set completion events expected to 0 initally. This allows for optimization for
providers that support fi_tinject up to larger sizes. This also reduces
latency on running the OFI mtl with smaller sizes without requiring
calls to progress given fi_tinject is required to complete the messaging
before returning and will not create any events in the Completion Queue.
-Updated non-blocking send to directly call fi_tsend and avoid calling
fi_tinject as the functionality should not wait on completions. This
resolves a bug where applications calling MPI_Isend can overrun the
TX buffer with small (inject) messages causing a deadlock. In addition
this improves performance in message rates by preventing
waiting on any size message to complete in non-blocking send messages.
-Created common ompi_mtl_ofi_ssend_recv function to post the ssend recv
which is common between isend and send code paths.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit 7dc8c8ba3fa630df8c5c7ab36fcf25249a82bfe7)