Follow on to 7bd2de9960419422a4591f4b5d286f1f911a0a47: move setting
the iov_limit to 1 earlier in the startup sequence.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The usNIC BTL does not use more than 1 iov, so be sure to set it to 1
so that we don't allocate cq/rq/sq entries based on a default (i.e.,
>1) number of iovs per entry.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Double check the queue lengths that we get back from libfabric to
ensure that they are at least as long as we need. They *should* never
be shorter than we need, but let's just check to be sure.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Don't just blindly send ACKs; ensure that we have send credits before
doing so. If we don't have any send credits, just don't send the ACK
(it'll come again soon enough; it's not a tragedy if we don't send it
now).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The libfabric usnic provider may give you back TX/RX queues that are
longer than you asked for. So just use the TX/RQ/CQ lengths that we
asked for, regardless of what length comes back.
Additionally, keep the length of the priority channel CQ separate from
the length of the data CQ.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
With libfabric v1.4, the usnic provider changed the values of its
fabric and domain name strings (compared to libfabric <v1.4). Update
the Open MPI usNIC BTL to handle both pre-v1.4 and v1.4 fabric/domain
names.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
The btl_recv.h:lookup_sender() function uses the hashed ORTE proc name
to determine the sender of the packet. With add_procs_cutoff>0, the
usnic BTL may not have knowledge of all the senders.
Until the usNIC BTL can be adjusted to do something like the
openib/ugni BTLs (i.e., use opal_proc_for_name() to lookup unknown
sender proc names), set MCA_BTL_FLAGS_SINGLE_ADD_PROCS, which means
that ob1 will only all add_procs() once -- with all the procs in it.
Also in this commit, adapt the connectivity checker to not rely on
knowing all the senders (which is a bit easier than adapting the main
BTL send path): the receiving connectivity agent will simply echo back
the same PING message (which contains the sender's IP address+UDP
port) back to the sender without checking that it knows who the sender
is. If the sender receives the echoed PING back on the expexted
interface, it will find a match in the pending pings list. If the
sender receives the echoed PING back an unexpected interface, a match
will not be found, and the incoming PING message will be dropped.
Fixesopen-mpi/ompi#1440
Three minor updates from the code review of
https://github.com/open-mpi/ompi-release/pull/933:
* Remove an extra blank line a show_help message
* We no longer allow -1 for the MCA param btl_usnic_av_eq_num, so
change the flag to REGINT_GE_ONE
* Change "num_blocks" definition to be in terms of block_len (not
eq_size)
Add endpoints in a blocked manner so that we don't overrun the
fi_av_insert() event queue. Also make the AV EQ length an MCA param,
and report it in mca_btl_base_verbose >=5 output.
We try to keep the source code the same between master and v1.10. So
put the #if's back for OPAL_HAVE_HWLOC (and just hard-code it to 1 on
master) so that this code is also compilable in v1.10.
There's no longer any need for the usnic BTL to have its own progress
thread: it can use the opal_progress_thread() infrastructure. This
commit removes the code to startup/shutdown the usnic-BTL-specific
progress thread and instead, just adds its events to the OPAL-wide
progress thread.
This necessitated a small change in the finalization step.
Previously, we would stop the progress thread and then tear down the
events. We can no longer stop the progress thread, and if we start
tearing down events, this will cause shutdown/hangups to be sent
across sockets, potentially firing some of the still-remaining events
while some (but not all) of the data structures have been torn down.
Chaos ensues.
Instead, queue up an event to tear down all the pending events. Since
the progress thread will only fire one event at a time, having a
teardown event means that it can tear down all the pending events
"atomically" and not have to worry that one of those events will get
fired in the middle of the teardown process.
In libfabric v1.0.0 (i.e., API v1.0), the usnic provider handled
FI_MSG_PREFIX inconsistently between sends and receives. This has
been fixed in libfabric v1.1.0 (i.e., API v1.1): FI_MSG_PREFIX is
handled consistently for both sends and receives.
Run-time detect which libfabric we are running with and adapt behavior
appropriately.
Followup to open-mpi/ompi@65b66ab: if we're not debugging, then #if
out an entire block so that the compiler doesn't warn about variables
that are assigned and not used.
When using an external libfabric (or really any libfabric newer than
libfabric commit 607e863), we must use fi_getname to determine the local
port of our endpoint. Without this fix, OMPI will hang endlessly
while retransmitting packets to port 0 on the remote host.
fi_av_insert() is invoked with a context containing each endpoint
USNIC_NUM_CHANNELS times. If the address on that endpoint fails to
resolve / is unreachable / has some error, we'll therefore get
USNIC_NUM_CHANNELS error completions with that same endpoint. We
therefore only want to warn about the unreachability of (and
OBJ_RELEASE) that endpoint the *first* time.
Fixes CSCut46822.
If we really get a catastrophic error from a libfabric call, don't
bother trying to continue (because data has been corrupted and there's
nothing sane left to do). Just call opal_btl_usnic_exit() (which
tries to call the PML error callback, but we're so early in the
module_init process that this likely hasn't been setup yet, so the job
will likely abort).
Nothing too substantial here, but two of the messages moved from
"libfabric API failed" to "internal error during init", just to be a
bit more descriptive.
When we get errors, the entry.data field tells us how many errors are
being reported. So decrement the loop count variable by that much.
This fixes CSCut30441.
Also include two other minor changes:
1. More C99-style member initialization in the component struct
1. Fix the BTL module member initialization to not be redundant
Add the functions that changed between BTL 2.0 and 3.0 into compat.h
and compat.c:
* module.btl_prepare_src: the signature and body of this method
changed between 2.0 and 3.0. However, the functions that this
method calls did *not* need to change, so they are copied over
wholesale (with the exception that they no longer accept the unused
`registration` parameter).
* module.btl_prepare_dst: this method does not exist in BTL 3.0.
* module.btl_put: the signature and body of this method changed
between 2.0 and 3.0.