RHEL 7 has shipped with kernel support for the RDMA_TRANSPORT_USNIC
enum, but ''not'' the RDMA_TRANSPORT_USNIC_UDP enum. This means that
when you install usNIC drivers from cisco.com, the kernel will report
IBV_TRANSPORT_USNIC, even though the transport is actually using UDP.
Therefore, we have to modify the logic in common/verbs to do the
additional magic probe if the device reports either an
IBV_TRANSPORT_IWARP or IBV_TRANSPORT_USNIC (because both of those might
be lies -- do the probe to figure out the real transport).
The code changed by this patch is fairly trivial; it simply moves the
logic of the magic probe to its own short function, and then calls that
short function in both the IBV_TRANSPORT_(IWARP|USNIC) cases. It looks
longer because several lengthy comments were also updated.
Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>
cmr=v1.8.2:reviewer=ompi-rm1.8
This commit was SVN r32098.
the other collective modules. If we endup without some of the
collective the code will raise an error anyway.
cmr=v1.8.2:reviewer=hjelmn
This commit was SVN r32096.
Based on extensive discussions before/at the June 2014 developer's
meeting, put a lengthy comment explaining a second reason why we
''must'' use an RTE barrier during MPI_FINALIZE and
MPI_COMM_DISCONNECT (i.e., unreliable transports). Slightly explain
more the original reason why we do this, too (BTLs can lie/buffer a
message without actually injecting it on the network).
This commit was SVN r32095.
rtnetlink doesn't check the source address when determining whether to
return route info for a query. So we need to check that the OIF matches
the OIF of the source interface name. Without this check, OMPI might
pair a local interface which does not have a route to a particular
remote interface.
Fixes Cisco bug CSCup55797.
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>
cmr=v1.8.2:reviewer=ompi-rm1.8
This commit was SVN r32090.
if source memory could not be registered, then return NULL
some cleanup might be needed, please refer to the FIXME in the code
cmr=v1.8.2:reviewer=miked
This commit was SVN r32081.
The alps ras and plm components were broken by recent changes in ORTE. This
commit resolves those issues.
Changes:
- Define PMI2_SUCCESS if it isn't defined. This fixes a problem with Cray's
PMI implementation which does not define (for some reason) PMI2_SUCCESS. We
had previously just used PMI_SUCCESS.
- Add missing definition and a typo in pml_alps_module.
- launch_id is no longer available in the orte_node_t structure. Use the
attribute lookup to get the value.
- Do not use an O(n^2) sorting algorithm when putting alps nodes in order. Use
opal_list_sort instead (O(nlogn)).
This commit was SVN r32076.
At the developer meeting today, the question was raised as to whether
the SCTP BTL was maintained any more. I emailed Alan Wagner to see if
he had any interest/resources to continue to maintain the SCTP BTL.
He indicated that he unfortunately had any resources to maintain it;
it would be fine to remove the SCTP BTL from the trunk.
So long, SCTP BTL... fare thee well...
This commit was SVN r32075.
This file has to be pre-emptively compiled to generate the module, but
then it also has to be included in libmpi_usempif08.
cmr=v1.8.2:ticket=trac:4736
This commit was SVN r32071.
The following Trac tickets were found above:
Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
Some parameters were ommited and compilation failed if
configured with --disable-weak-symbols
cmr=v1.8.2:ticket=trac:4736
This commit was SVN r32064.
The following Trac tickets were found above:
Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
This won't transition cleanly to the 1.8 series, and may represent too much change, so we'll have to (a) evaluate whether or not to bring it over (once it demonstrates that it does indeed solve the problem), and (b) develop a custom patch for that purpose.
Refs trac:4717
This commit was SVN r32063.
The following Trac tickets were found above:
Ticket 4717 --> https://svn.open-mpi.org/trac/ompi/ticket/4717
1: find/create procs, and create associated endpoint for each
2: resolve peer addresses
The 2nd part is done as a separate loop so that the address lookups
can be parallelized.
The overall result is to split usnic_add_procs() into two smaller,
simpler parts.
cmr=v1.8.2:ticket=trac:4734
This commit was SVN r32062.
The following Trac tickets were found above:
Ticket 4734 --> https://svn.open-mpi.org/trac/ompi/ticket/4734
When ibv_create_ah() fails due to an address resolution failure, it
really only means that we can't reach that one peer -- so we should
just ignore that one peer. If ibv_create_ah() fails for some other
reason, then give up on the entire usnic_X device.
Change the show_help() message that is displayed when ibv_create_ah()
fails due to address resolution failure; indicate that it's likely a
routing problem. Also opal_output_verbose() the same info, since
show_help() is de-duplicated (and this particular show_help() message
can be squelched).
Fixes Cisco bugs CSCup35851 and CSCup35872.
cmr=v1.8.2:ticket=trac:4734
This commit was SVN r32061.
The following Trac tickets were found above:
Ticket 4734 --> https://svn.open-mpi.org/trac/ompi/ticket/4734
Use the appropriate modules, don't use mpif-config.h.
cmr=v1.8.2:ticket=trac:4736
This commit was SVN r32052.
The following Trac tickets were found above:
Ticket 4736 --> https://svn.open-mpi.org/trac/ompi/ticket/4736
There is more comprehensive work regarding MPI_SIZEOF coming, but the
Fortran working group in the MPI Forum is debating this internally,
and I'm still doing more testing to get a final solution. So for the
moment, just remove real*16 and complex*32 support so that it compiles
porperly with older compilers (that do not support real*16 and
complex*32).
This commit was SVN r32048.