For static builds, we need to also set
<framework>_<component>_WRAPPER_EXTRA_LIBS so that the wrappers know
what other libraries to add to link executables.
Ensure to count *this* process when checking for how many VFs we need
on the local server.
(cherry picked from commit 386c01934e98cb8dcb48ff648ecdfb0c8677baa9)
There was a mismatch between the structure for mca_rcache_vma_t and
the OBJ_CLASS_INSTANCE. One was opal_list_item_t and the other was
ompi_free_list_item_t. The super class in the structure looks like it
is the correct one. Changed the superclass in OBJ_CLASS_INSTANCE to
match.
If there are not enough resources (e.g., low VFs), we can end up
calling finalize_one_channel() on the same channel multiple times. So
ensure to NULL out fields that we have freed already so that we do not
try to free them a second time.
Fixes CSCus26648.
Fix the ordering so that we obtain the usnic netmask information
*before* we do the filtering based on CIDR-specified networks.
Also requires upstream Github libfabric commit 3976745.
Fixes CSCus22495.
We had several problems in the old code:
1. We were specifying an arbitrary timeout (100 ms) and then abandoning
all remaining pending AV insert operations. We would then free the
endpoint buffer that we gave to fi_av_insert(), usually causing
libfabric's progress thread to write to a freed buffer.
2. We were claiming in a show_help message that the timeout was
controllable via an MCA parameter. This commit removes that
parameter, since there's no good method for us to specify a timeout
like this to libfabric right now.
3. We also weren't waiting for the correct number of fi_av_insert()
operations to complete. We were waiting for nprocs, which is
accidentally fine for 2 procs on separate hosts, but not for most
other proc counts.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
There was a bug in the openib btl handling this valid sequence of
calls:
desc = btl_alloc ();
btl_free (desc);
When triggered the bug would cause either fragment loss or undefined
behavior (SEGV, etc). The problem occured because btl_alloc contained
the logic to modify the pending fragment (length, etc) and these
changes were not corrected if the fragment was freed instead of sent.
To fix this issue I 1) moved some of the coalescing logic to the
btl_send function, and 2) retry the coalesced fragment on btl_free
if it was never sent. This appears to completely address the issue.
For systems with OFED's lacking XRC support, commit b3617e73
broke the build of the openib btl. This commit addresses
the issues introduced by this commit.