1
1
Граф коммитов

12 Коммитов

Автор SHA1 Сообщение Дата
Dave Goodell
4875f48eaa usnic: enable UDP support
This commit decouples OMPI deployment from the version(s) of the lower
layers of the stack by probing for UDP support.

Verbs applications assume a 40-byte header (there is no current
mechanism for querying payload offset).  So to support a 42-byte UDP
header without causing existing applications like ibv_ud_pingpong or
older versions of OMPI to crash, we must inform libusnic_verbs that we
are aware of the nonstandard payload offset.  We do this by overriding
the `transport_type` field of the device to be 42 before calling
`ibv_open_device`.  If the library resets it to something else, then we
know the lower layers are UDP capable.  Otherwise we use the older
custom-L2 format.

This necessitated some minor ugliness in common_verbs, but it's as tidy
as Jeff and I know how to make it right now.

This commit only adds support for UDP headers and connectivity over the
same L2 network, it does not touch routing or interface pairing.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30838.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:44:35 +00:00
Dave Goodell
cadaa1c424 usnic: Shrink sequence numbers to 16 bits
Authored-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30834.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:10 +00:00
Dave Goodell
707e594d13 usnic: Use INLINE flag more often, saving the DMA is useful.
Authored-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30833.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:53 +00:00
Dave Goodell
82db913490 usnic: fix module_recv_buffers perf regression
Cisco v1.6 git commit 913ec6c and upstream trunk r29593 (segfault fix)
introduced a performance regression by inadvertently disabling the
`module_recv_buffers` functionality.  With those changes in place, the
`btl_usnic_recv.c` logic would end up mallocing a buffer that should
have otherwise come from a `module_recv_buffers` pool.  It also resulted
in a small, bounded memory leak (128 buffers at each power-of-two size
interval).

The new version just places the buffer after the free list item with a
flexible array member.  I bumped the pool to allocate all 128 elements
up front because the deferred allocation was modestly impacting IMB
Sendrecv performance at a few sizes.

Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29631.

The following SVN revision numbers were found above:
  r29593 --> open-mpi/ompi@1ed9b8ff43
2013-11-07 01:27:31 +00:00
Dave Goodell
73a943492c usnic: pack via convertor on the fly
If we need to use a convertor, go back to stashing that convertor in the
frag and populating segments "on the fly" (in
ompi_btl_usnic_module_progress_sends).  Previously we would pack into a
chain of chunk segments at prepare_src time, unnecessarily consuming
additional memory.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29592.
2013-11-04 22:52:03 +00:00
Dave Goodell
825686a205 usnic: certain send frag members are immutable
Ensure that they never are touched by checking in their destructors.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

This commit was SVN r29589.
2013-11-04 22:51:24 +00:00
Reese Faucette
f35d9b50e3 Cisco CSCuj22803: fixes for Bsend
changes required to support MPI_Bsend().  Introduces concept of
attaching a buffer to a large segment that the PML can scribble into and
we will send from.  The reason we don't use a pinned buffer and send
directly from that is that usnic_verbs does not (yes) support num_sge>1
for regular sends.  This means the data gets copied twice, but that is
unavoidable.

changed the logic in handle_large_send to be more sensible

Incorporated David's review comments

This commit was SVN r29184.
2013-09-17 07:27:39 +00:00
Reese Faucette
25b5c84d0f Cisco CSCuj13135: Data corruption in MPI_Bsend_ator_c
Do not assume that the "size" passed to alloc_send() will be the same as
the size of the message the resulting fragment will hold when
usnic_send() is called.  This means usnic_send()/usnic_put() can never
trust any pre-computed size values, and are only allowed to look at the
lengths and pointers of the elements in the desc SG list.

This commit was SVN r29183.
2013-09-17 07:25:05 +00:00
Dave Goodell
a669bd01e6 usnic: revamp convertor handling.
The fix for the HPL SEGV was incorrect because it assumed the
prepare_src() routine was always allowed to return "bytes processed"
less than the requested "bytes to send".  It turns out this is only true
if the convertor is what limits the size, we are not allowed to limit
the data sent for our own reasons, else we break login in the upper
layers.

This means we need to learn the number of bytes out of the size
requested the convertor will give us, no matter how big the size is.
Unfortunately, this is a destructive test, and (currently) the only way to
learn that number is to actually have the convertor copy the data out into
buffers.

This change implements this, copying the entire data out into a chain of
send segments which are attached to the large send fragment.  Now we can
always return the proper size value to the PML.

Fixes Cisco bug CSCuj08024

Authored-by: Reese Faucette <rfaucett@cisco.com>

Should be included in usnic v1.7.3 roll-up CMR (refs trac:3760)

This commit was SVN r29137.

The following Trac tickets were found above:
  Ticket 3760 --> https://svn.open-mpi.org/trac/ompi/ticket/3760
2013-09-06 03:21:21 +00:00
Dave Goodell
9cab9777d9 usnic: properly destroy embedded small send frag
Without this, an `--enable-debug` build would hit an assertion in the
list code when run under valgrind with `--malloc-fill=0xff` or any other
case where malloc returned non-zeroed buffers.

Also allow the normal OBJ_ machinery to handle the constructor
invocation ordering for us instead of doing it by hand (which could have
led to future bugs).

Reviewed-by: jsquyres@cisco.com

cmr=v1.7.4

Depends on trunk functionality in r29095 and r29096.  Refs trac:3740,#3741.

This commit was SVN r29127.

The following SVN revision numbers were found above:
  r29095 --> open-mpi/ompi@d1b5940e97
  r29096 --> open-mpi/ompi@a552921171

The following Trac tickets were found above:
  Ticket 3740 --> https://svn.open-mpi.org/trac/ompi/ticket/3740
2013-09-04 20:59:12 +00:00
Jeff Squyres
4b6006402d Use the RTE framework instead of calling ORTE directly.
Brian (rightfully) hit me on the head with the
don't-use-ORTE-use-the-rte-framework clue bat; the usnic BTL now
nicely plays with the RTE framework.

This commit was SVN r28907.
2013-07-22 17:28:23 +00:00
Jeff Squyres
194b285447 First commit of the Cisco usNIC BTL.
This BTL accesses the Cisco usNIC Linux device via the Linux verbs
API via Unreliable Datagram queue pairs.  A few noteworthy points:

 * This BTL does most of its own fragmentation; it tells the PML that
   it has a very high max_send_size (much higher than the network
   MTU).
 * Since UD fragments are, by definition, unreliable, the usnic BTL
   handles all of its own reliability via a sliding window approach
   using the opal_hotel construct and many tricks stolen from the
   corpus of knowledge surrounding efficient TCP.
 * There is a fun PML latency-metric based optimization for NUMA
   awareness of short messages.
 * Note that this is ''not'' a generic UD verbs BTL; it is specific to
   the Cisco usNIC device.

This commit was SVN r28879.
2013-07-19 22:13:58 +00:00