Now that the infrastructure is calling BTL del_procs() before the BTL
finalize(), the usnic BTL had to re-order some of its teardown
sequence to avoid assert() failing.
This is part of a larger conversation involving #4669. Since
MPI_FINALIZE and MPI_COMM_DISCONNECT currently use an
oob/grpcomm-based barrier, the usnic BTL can ''absolutely know'' that
these endpoints and procs will no longer be used. If the ORTE DPM
goes back to a PML-based barrier, the usnic BTL will need to grow more
complex teardown semantics (a la TCP socket FIN/ACK/FIN_WAIT states).
Refs trac:4669
This commit was SVN r31871.
The following Trac tickets were found above:
Ticket 4669 --> https://svn.open-mpi.org/trac/ompi/ticket/4669
Basically: since usnic is a connectionless transport, we do not get
OS-provided services "for free" that connection-oriented transports
get, namely: "hey, I wasn't able to make a connection to peer X", and
"hey, your connection to peer X has died."
This connectivity-checker runs in a separate progress thread in the
usnic BTL in local rank 0 on each server. Upon first send in any
process, the connectivty-checker agent will send some UDP pings to the
peer to ensure that we can reach it. If we can't, we'll abort the job
with a nice show_help message.
There's a lengthy comment in btl_usnic_connectivity.h explains the
scheme and how it works.
Reviewed by Dave Goodell.
cmr=v1.7.5:ticket=trac:4253
This commit was SVN r30860.
The following Trac tickets were found above:
Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
If ibv_create_ah fails, we will not initialize the `endpoint->proc`.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
cmr=v1.7.5:ticket=trac:4253
This commit was SVN r30840.
The following Trac tickets were found above:
Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
Due to deallocation ordering (and an entirely missed deallocation), we
were leaking modest amounts of memory inside libusnic_verbs.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
This commit was SVN r29485.
endpoint_rfstart was being initialized from a value which was not yet
set. Also ensure that rfstart is a valid index in the range
0..WINDOW_SIZE-1, since it is used as the index into endpoint_rcvd_segs,
which has WINDOW_SIZE elements.
Without this change there is significant risk of memory corruption or
segfaults, resulting in hangs or crashes, if malloc ever returns us a
value >=WINDOW_SIZE (4096). Right now we seem to be getting lucky that
the malloc is returning zero-pages to us when we are allocating endpoint
structures (possibly because the freelist performs a single large
allocation for all endpoints).
Fixes Cisco bug CSCui88781.
Reviewed-by: rfaucett@cisco.com
Reviewed-by: jsquyres@cisco.com
cmr=v1.7.3:reviewer=jsquyres
This commit was SVN r29075.
improvements:
* Fix minor memory leaks during component_init
* Ensure that an initialization loop does not underflow an unsigned int
* Improve mlock limit checking
* Fix set of BTL modules created during component_init when failing to
get QP resources or otherwise excluding some (but not all) usnic
verbs devices
* Fix/improve error messages to be consistent with other Cisco
documentation
* Randomize the initial sliding window sequence number so that we
silently drop incoming frames from previous jobs that still have
existant processes in the middle of dying (and are still
transmitting)
* Ensure we don't break out of add_procs too soon and create an
asymetrical view of what interfaces are available
This commit was SVN r28975.
Brian (rightfully) hit me on the head with the
don't-use-ORTE-use-the-rte-framework clue bat; the usnic BTL now
nicely plays with the RTE framework.
This commit was SVN r28907.
This BTL accesses the Cisco usNIC Linux device via the Linux verbs
API via Unreliable Datagram queue pairs. A few noteworthy points:
* This BTL does most of its own fragmentation; it tells the PML that
it has a very high max_send_size (much higher than the network
MTU).
* Since UD fragments are, by definition, unreliable, the usnic BTL
handles all of its own reliability via a sliding window approach
using the opal_hotel construct and many tricks stolen from the
corpus of knowledge surrounding efficient TCP.
* There is a fun PML latency-metric based optimization for NUMA
awareness of short messages.
* Note that this is ''not'' a generic UD verbs BTL; it is specific to
the Cisco usNIC device.
This commit was SVN r28879.