improvements:
* Fix minor memory leaks during component_init
* Ensure that an initialization loop does not underflow an unsigned int
* Improve mlock limit checking
* Fix set of BTL modules created during component_init when failing to
get QP resources or otherwise excluding some (but not all) usnic
verbs devices
* Fix/improve error messages to be consistent with other Cisco
documentation
* Randomize the initial sliding window sequence number so that we
silently drop incoming frames from previous jobs that still have
existant processes in the middle of dying (and are still
transmitting)
* Ensure we don't break out of add_procs too soon and create an
asymetrical view of what interfaces are available
This commit was SVN r28975.
Use the new sysfs files to check that there are enough VFs, QPs, and
CQs for all the MPI processes on this server.
Move the checking code into its own subroutine to make it smaller and
easier to read/grok.
This commit was SVN r28937.
Brian (rightfully) hit me on the head with the
don't-use-ORTE-use-the-rte-framework clue bat; the usnic BTL now
nicely plays with the RTE framework.
This commit was SVN r28907.
This BTL accesses the Cisco usNIC Linux device via the Linux verbs
API via Unreliable Datagram queue pairs. A few noteworthy points:
* This BTL does most of its own fragmentation; it tells the PML that
it has a very high max_send_size (much higher than the network
MTU).
* Since UD fragments are, by definition, unreliable, the usnic BTL
handles all of its own reliability via a sliding window approach
using the opal_hotel construct and many tricks stolen from the
corpus of knowledge surrounding efficient TCP.
* There is a fun PML latency-metric based optimization for NUMA
awareness of short messages.
* Note that this is ''not'' a generic UD verbs BTL; it is specific to
the Cisco usNIC device.
This commit was SVN r28879.
This commit improved the small message latency and bandwidth when using
the vader btl. These improvements should make performance competative
with other MPI implementations.
This commit was SVN r28760.
for the SM and TCP BTLs, as well as the mca_btl_base_param_register()
function (which registers MCA params for all BTLs).
The guidelines in
https://svn.open-mpi.org/trac/ompi/wiki/MCAParamLevels were used to
pick these levels.
This commit was SVN r28746.
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.
The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.
This commit was SVN r28722.
As per discussion in the June 2013 developer meeting these
flags will be used by the PML in the future to request
asynchronous progress on an operation. The naming was chosen
to reflect that a BTL supports this mode (MCA_BTL_FLAG_SIGNALED)
and that a descriptor should "signal" the remote side to wake
up and progress the message (MCA_BTL_DES_FLAG_SIGNAL).
Future commits will update OB1 to take advantage of this
feature when performing the RDMA get or RDMA rendezvous
protocols.
This commit was SVN r28612.
commit is the trunk version of what is needed for #3626.
Add the "ignore_device" field to the INI file. This allows us to
specifically list devices that should be ignored by the openib BTL
(such as the Intel Phi, at least as of May 2013 -- see #3626).
Also add the Intel Phi to the ini file, and set its ignore_device=1.
Finally, add the concept of counting intentionally ignored verbs
devices. Devices are ignored for one of two reasons:
* If the number of allowed ports on that device is 0 (i.e., if
if_include/if_exclude was set such that we're intentionally
ignoring this device).
* If the INI ignore_device field for this device is set to 1.
Once we have the count of devices that were intentionally ignored,
only show the "Hey, there's verbs devices that you're not using!"
show_help message if there are devices that were ''unintentionally''
ignored.
This commit was SVN r28589.
The following Trac tickets were found above:
Ticket 3626 --> https://svn.open-mpi.org/trac/ompi/ticket/3626
The primary issue with udcm is that the immediate data in message
acks were often bogus. This caused the sender to keep trying even
though a message was received and acked. The fix is to use the
source LID and QP to determine which message is being acked. In
most cases this should work well since only one message will be
in flight to any peer.
This commit was SVN r28444.
- increase number of wqe to minimize number of RNRs
- it is better to have high watermark and post relatively small number of wqes
- increased TX queue size
This commit was SVN r28440.
from the list (just for good measure), and then free() it (without
using _SAFE, we were accessing memory that was just free()'d to get to
the next item). Also be a little more thorough -- DESTRUCT the list
when we're all done.
This commit was SVN r28429.