Move away from verbs-specific terms "device" and "port" in the usnic
BTL help messages. Replace them with "usNIC interface" (since usNIC
has no concept of a port).
cmr=v1.8.2:ticket=trac:4734
This commit was SVN r32029.
The following Trac tickets were found above:
Ticket 4734 --> https://svn.open-mpi.org/trac/ompi/ticket/4734
Add the component use_udp value into the modex. If my component's
use_udp value doesn't agree with the use_udp value from a peer's modex
data, print a helpful message and disqualify the usnic BTL (the usnic
BTL will not be used). This prevents accidental customer
misconfigurations.
Reviewed by Dave Goodell
cmr=v1.8.2:reviewer=ompi-rm1.8
This commit was SVN r31689.
We don't use this functionality any more; we use the transport_type
and device name to identify usnic devices. It's slightly easier
because we can transport_type+name from ibv_device_open() and don't
have to do an additional ibv_query_device() to get its attributes.
Reviewed by Dave Goodell.
cmr=v1.7.5:reviewer=ompi-rm1.7
This commit was SVN r30882.
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style. That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint. If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later. We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.
cmr=v1.7.5:ticket=trac:4253
This commit was SVN r30879.
The following SVN revision numbers were found above:
r30850 --> open-mpi/ompi@3641500442
r30852 --> open-mpi/ompi@4e282a3295
The following Trac tickets were found above:
Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
Basically: since usnic is a connectionless transport, we do not get
OS-provided services "for free" that connection-oriented transports
get, namely: "hey, I wasn't able to make a connection to peer X", and
"hey, your connection to peer X has died."
This connectivity-checker runs in a separate progress thread in the
usnic BTL in local rank 0 on each server. Upon first send in any
process, the connectivty-checker agent will send some UDP pings to the
peer to ensure that we can reach it. If we can't, we'll abort the job
with a nice show_help message.
There's a lengthy comment in btl_usnic_connectivity.h explains the
scheme and how it works.
Reviewed by Dave Goodell.
cmr=v1.7.5:ticket=trac:4253
This commit was SVN r30860.
The following Trac tickets were found above:
Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
Prior to this commit we matched local interfaces to remote interfaces in
order to create endpoints in a simplistic way. If any remote interfaces
were on the same subnet as any of our local interfaces then only local
interfaces would be paired (IP-routed remote interfaces would be
ignored).
This commit introduces a more general scheme which attempts to make the
"best" pairing of local interfaces to remote interfaces. We now cast
the problem as a graph theory problem known as the "Assignment Problem",
or finding a maximum-cardinality, minimum-weight bipartite matching. We
solve this problem by reducing the bipartite graph of interface
connectivity to a flow network and then solving for a minimum cost flow.
This is then easily converted into back into a matching on the original
bipartite graph.
In the new scheme, interfaces on the same subnet are preferred over
interfaces requiring intermediate routing hops and higher bandwidth
links are preferred over lower bandwidth links.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
cmr=v1.7.5:ticket=trac:4253
This commit was SVN r30849.
The following Trac tickets were found above:
Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
1. Fix ompi_info memory leak in usnic BTL: do not allocate memory in
the component register function, because ompi_info only calls the
component register function and then dlclose's the component -- it
does not call component finalize. Instead, defer parsing the MCA
param (and alloc'ing memory) until the component init function so
that any allocated memory can be freed in the component close
function.
1. Also add a new check to ensure that we actually have some part
numbers to check. Add a show_help message if we don't find any
vendor part IDs to check.
1. Add a verbose output if usnic disqualifies itself from selection
because THREAD_MULTIPLE was specified.
cmr=v1.7.5:reviewer=dgoodell
This commit was SVN r30073.
changes required to support MPI_Bsend(). Introduces concept of
attaching a buffer to a large segment that the PML can scribble into and
we will send from. The reason we don't use a pinned buffer and send
directly from that is that usnic_verbs does not (yes) support num_sge>1
for regular sends. This means the data gets copied twice, but that is
unavoidable.
changed the logic in handle_large_send to be more sensible
Incorporated David's review comments
This commit was SVN r29184.
improvements:
* Fix minor memory leaks during component_init
* Ensure that an initialization loop does not underflow an unsigned int
* Improve mlock limit checking
* Fix set of BTL modules created during component_init when failing to
get QP resources or otherwise excluding some (but not all) usnic
verbs devices
* Fix/improve error messages to be consistent with other Cisco
documentation
* Randomize the initial sliding window sequence number so that we
silently drop incoming frames from previous jobs that still have
existant processes in the middle of dying (and are still
transmitting)
* Ensure we don't break out of add_procs too soon and create an
asymetrical view of what interfaces are available
This commit was SVN r28975.
This BTL accesses the Cisco usNIC Linux device via the Linux verbs
API via Unreliable Datagram queue pairs. A few noteworthy points:
* This BTL does most of its own fragmentation; it tells the PML that
it has a very high max_send_size (much higher than the network
MTU).
* Since UD fragments are, by definition, unreliable, the usnic BTL
handles all of its own reliability via a sliding window approach
using the opal_hotel construct and many tricks stolen from the
corpus of knowledge surrounding efficient TCP.
* There is a fun PML latency-metric based optimization for NUMA
awareness of short messages.
* Note that this is ''not'' a generic UD verbs BTL; it is specific to
the Cisco usNIC device.
This commit was SVN r28879.