libibverbs will complain to stderr if it sees device entries in
/sys/class/infiniband for which it has no userspace plugins.
The Cisco usNIC device no longer exports a verbs interface, thereby
causing libibverbs to emit this annoying stderr warning.
To avoid this, use the public ibv API to register a "fake" libibverbs
driver at run-time (right after we call ibv_fork_init(), but --
critically -- *before* we call ibv_get_device_list()). The purpose of
this driver is solely to convince libibverbs that there *is* a driver
for /sys/class/infininband/usnic_verbs devices. ...although this
driver will never return a valid ibv context (and therefore will never
be used).
Defer initializing the CPCs until we know that we have devices/ports
to use. This both prevents some useless work at startup when there
are no devices/ports to use, and also prevents librdmacm complaining
that there are no verbs-capable RDMA devices available (e.g., if a
Cisco usNIC device is present, but does not present a verbs RDMA
interface).
Defer initializing the CPCs until we know that we have devices/ports
to use. This both prevents some useless work at startup when there
are no devices/ports to use, and also prevents librdmacm complaining
that there are no verbs-capable RDMA devices available (e.g., if a
Cisco usNIC device is present, but does not present a verbs RDMA
interface).
The length parameter of ompi_mtl_portals4_long_isend() was declared
as "int", which may not be big enough depending on the platform and
compiler options used. This commit changes the type to size_t to
prevent overflow.
The source field was 16 bits which is not sufficient for many
current and future machines. This commit expands the source field
to 24 bits and reduces the tag field from 32 bits to 24 bits.
After a simple search-replace, the Portals4 description actually
described Portals3. This commit replaces the Portals3 description
with a Portals4 description.
Thanks to Paul Hargrove for spotting this and supplying the patch.
Move the call to opal_common_verbs_fork_test() to up before the call
to ibv_get_device_list() (just curious -- why not use
opal_ibv_get_device_list()?). This ensures that the call to
ibv_fork_init() is before *all* other ibv_* calls.
This commit fixes a bug identified by MTT that occurred when mixing
passive and active target synchronization. The bugs fixed in this
commit are:
- Do not update incoming fragment counts for any type of unbuffered
control message. These messages are out-of-band and should not be
considered towards the signal counts.
- Complete a change from using received counts to expected counts for
lock, unlock, and flush acks. Part of the change made it into
master before the rest was ready. This was preventing wakeups in
some cases.
- Turn the passive_target_access_epoch module member into a
counter. As long as at least one peer is locked we are in a
passive-target epoch and not an active target one. This fix will
ensure that fragment flags are set appropriately.
fixes#538
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
hwloc output can get fairly long, especially on machines with lots of
cores and/or hyperthreads. So put the Locale and Binding output on
separate lines.