1
1
Граф коммитов

4671 Коммитов

Автор SHA1 Сообщение Дата
Vasily Filipov
d702307521 OPENIB BTL/CONNECT: replace wrong rdma_freeaddrinfo call in rdmacm_component_query func.
This commit was SVN r30876.
2014-02-27 11:52:10 +00:00
Vasily Filipov
f2014b96e7 OPENIB BTL/CONNECT: Add support for AF_IB addressing in rdmacm.
This commit was SVN r30875.
2014-02-27 11:29:47 +00:00
Ralph Castain
ce26b096b4 Prevent failover to direct_modex if key isn't found unless direct_modex was enabled
Refs trac:4258

This commit was SVN r30865.

The following Trac tickets were found above:
  Ticket 4258 --> https://svn.open-mpi.org/trac/ompi/ticket/4258
2014-02-27 02:04:56 +00:00
Jeff Squyres
7440f21b75 Add usnic connectivity-checking agent service.
Basically: since usnic is a connectionless transport, we do not get
OS-provided services "for free" that connection-oriented transports
get, namely: "hey, I wasn't able to make a connection to peer X", and
"hey, your connection to peer X has died."
    
This connectivity-checker runs in a separate progress thread in the
usnic BTL in local rank 0 on each server.  Upon first send in any
process, the connectivty-checker agent will send some UDP pings to the
peer to ensure that we can reach it.  If we can't, we'll abort the job
with a nice show_help message.
    
There's a lengthy comment in btl_usnic_connectivity.h explains the
scheme and how it works.

Reviewed by Dave Goodell.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30860.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 22:21:25 +00:00
Nathan Hjelm
dfe4a504e4 udcm: fix race between ack arrival and message send and potential hang in udcm
finalize.

Closes trac:4290

cmr=v1.7.5:reviewer=miked

This commit was SVN r30854.

The following Trac tickets were found above:
  Ticket 4290 --> https://svn.open-mpi.org/trac/ompi/ticket/4290
2014-02-26 15:33:27 +00:00
Nathan Hjelm
30b61a3333 Fix a number of issues in the new one sided code.
- Fix several typos is osc/rdma.

 - Fix a locking issue in osc/sm that was caused by an incorrect
   assumption about the semantics of opal_atomic_add_32.

 - Always unlock the accumulation lock in osc/sm.

 - The base of a processes shared memory window should be NULL if
   the size is zero. Fixed.

cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30853.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-26 15:33:18 +00:00
Jeff Squyres
4e282a3295 Consolidate into a single, outter loop of ibv_create_ah() calls
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style.  That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint.  If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later.  We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30852.

The following SVN revision numbers were found above:
  r30850 --> open-mpi/ompi@3641500442

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 11:02:12 +00:00
Dave Goodell
3641500442 usnic: Loop on the ibv_create_ah call.
ibv_create_ah() may need to effect an ARP resolution, which may take
some time.  Rather than blocking in ibv_create_ah(), the usNIC driver
may return NULL and set errno to EAGAIN indicating that we should try
again (i.e., the ARP resolution is proceeding under the covers).

So add a simple loop here to loop over ibv_create_ah() until it
returns non-(NULL+EAGAIN).  A future commit will make this a bit more
efficient.

Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30850.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:38 +00:00
Dave Goodell
c40f8879c8 usnic: improve interface matching (esp. for UDP)
Prior to this commit we matched local interfaces to remote interfaces in
order to create endpoints in a simplistic way.  If any remote interfaces
were on the same subnet as any of our local interfaces then only local
interfaces would be paired (IP-routed remote interfaces would be
ignored).

This commit introduces a more general scheme which attempts to make the
"best" pairing of local interfaces to remote interfaces.  We now cast
the problem as a graph theory problem known as the "Assignment Problem",
or finding a maximum-cardinality, minimum-weight bipartite matching.  We
solve this problem by reducing the bipartite graph of interface
connectivity to a flow network and then solving for a minimum cost flow.
This is then easily converted into back into a matching on the original
bipartite graph.

In the new scheme, interfaces on the same subnet are preferred over
interfaces requiring intermediate routing hops and higher bandwidth
links are preferred over lower bandwidth links.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30849.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:26 +00:00
Dave Goodell
47148ab3cb usnic: helper routines for rtnetlink route lookups
Querying the OS routing table is important for making decisions about
which local and remote interfaces should be paired into reliable
communication channels.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30848.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:10 +00:00
Dave Goodell
db14c706ce usnic: add graph utility code
This code is intended to support usNIC interface matching functionality.
We currently view that problem as essentially the "Assignment Problem"
(http://en.wikipedia.org/wiki/Assignment_problem), for which there are
many possible solution approaches, including flow-network analysis.  In
the future, we might transition to a more nuanced view of the problem
which would likely also be flow-network based.

To this end, the current code focuses on providing one major algorithm
to the core usnic BTL: `ompi_btl_usnic_solve_bipartite_assignment`.  It
also exposes several typical and necessary functions for constructing,
manipulating, and querying weighted, directed graphs.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30847.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:49:54 +00:00
Dave Goodell
5bf969e63b usnic: unit test parse_ifex_str
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30846.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:48:05 +00:00
Dave Goodell
921a29e41f usnic: add simple unit testing infrastructure
This commit adds mechanisms for writing and running unit tests in the
usnic BTL.  The short version of how to run the tests is:

1. Configure with `--enable-ompi-btl-usnic-unit-tests`.  This will cause
   the unit testing code and test runner utility to be built.

2. Run the tests by running `ompi_btl_usnic_run_tests`.

See `README.test` for full details.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30845.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:50 +00:00
Dave Goodell
044a190cac usnic: consolidate orte includes into compat.h
These includes only exist in the Cisco-internal usnic-v1.6 code base,
but they should not exist anywhere except btl_usnic_compat.h in order to
minimize source differences between usnic-v1.6 and v1.7/trunk.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30844.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:33 +00:00
Dave Goodell
62dc42f628 usnic: check packet/segment lengths
Lower layer (hardware or software) bugs can result in a mismatch between
our BTL-layer payload size and the actual packet length.  We now check
that in order to catch these cases, which otherwise can result in
MPI-layer message corruption.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30843.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:19 +00:00
Dave Goodell
3b5b87c325 usnic: add missing MSGDEBUG in recv path
We were missing a debug message for a very common recv case, making it a
bit harder to follow a debug log.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30842.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:05 +00:00
Dave Goodell
f6036d11c8 usnic: fix sender hash comparisons for UDP
There was a duplicated subnet check in the sender hash lookup routine.
This caused receivers to always fail the sender hash lookup if the
sender was in a different subnet, so the receiver would discard the
packet as though it were coming from a different job.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30841.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:46:50 +00:00
Dave Goodell
90d68730f1 usnic: fix SEGV when ibv_create_ah fails
If ibv_create_ah fails, we will not initialize the `endpoint->proc`.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30840.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:46:37 +00:00
Dave Goodell
a54f53f242 usnic: also match interfaces in different subnets
This functionality is required for routable UDP/IP usnic traffic.

Previously we would only setup endpoints for remote interfaces on the
same subnet as the current module's local interface.  This behavior
still holds if two processes share any common subnets.  However, if the
two processes only have no subnets in common then we assume that all
interfaces are reachable from all other interfaces and wire them up in a
1-1, randomly-matched order somewhat similarly to the "tcp" BTL's
behavior.

Only match in different subnets if we detect UDP support in the lower
layer.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30839.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:44:49 +00:00
Dave Goodell
4875f48eaa usnic: enable UDP support
This commit decouples OMPI deployment from the version(s) of the lower
layers of the stack by probing for UDP support.

Verbs applications assume a 40-byte header (there is no current
mechanism for querying payload offset).  So to support a 42-byte UDP
header without causing existing applications like ibv_ud_pingpong or
older versions of OMPI to crash, we must inform libusnic_verbs that we
are aware of the nonstandard payload offset.  We do this by overriding
the `transport_type` field of the device to be 42 before calling
`ibv_open_device`.  If the library resets it to something else, then we
know the lower layers are UDP capable.  Otherwise we use the older
custom-L2 format.

This necessitated some minor ugliness in common_verbs, but it's as tidy
as Jeff and I know how to make it right now.

This commit only adds support for UDP headers and connectivity over the
same L2 network, it does not touch routing or interface pairing.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30838.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:44:35 +00:00
Dave Goodell
e10ad5763f usnic: rearrange component struct field order
Just trying to be deliberate about keeping fastpath-accessed fields
grouped together to fit into the same 64-byte cache line.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30837.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:53 +00:00
Dave Goodell
5d7eabbcd1 usnic: Change tiny_mtu to a size_t (it's compared against an unsigned value)
Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30836.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:37 +00:00
Dave Goodell
fef38d7e42 usnic: Fix a few compiler warnings about types of printed variables
Authored-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30835.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:23 +00:00
Dave Goodell
cadaa1c424 usnic: Shrink sequence numbers to 16 bits
Authored-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30834.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:10 +00:00
Dave Goodell
707e594d13 usnic: Use INLINE flag more often, saving the DMA is useful.
Authored-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30833.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:53 +00:00
Dave Goodell
dbbe6a8254 usnic: fix proc structure memory leak
Valgrind showed this one, just a bit of sloppiness with the reference
counting.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30832.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:34 +00:00
Dave Goodell
4af332bd4e Fix the logic in ompi_common_verbs_find_ports().
The logic did not correctly perform the OR behavior that is described
in the doxy docs for this function.  This commit fixes the logic so
that a port will be included if it has supports any of the
capabilities indicated by the passed-in flags.

Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30831.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:21 +00:00
Nathan Hjelm
acbd6032f9 Helps to include the correct header.
cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30821.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-25 19:14:48 +00:00
Nathan Hjelm
5edacac301 osc/rdma: add missing include
cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30820.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-25 19:11:19 +00:00
Ralph Castain
49d938de29 Merge one-sided updates to the trunk - written by Brian Barrett and Nathan Hjelmn
cmr=v1.7.5:reviewer=hjelmn:subject=Update one-sided to MPI-3

This commit was SVN r30816.
2014-02-25 17:36:43 +00:00
Joshua Ladd
9ea9bec4ad Addressing Jeff's comments:
1. Changed rng_buff_t --> opal_rng_buff_t
2. All global variables obey the prefix rule
3. Old code has been removed 
4. Found a couple of unnecessary includes

Refs trac:4298

This commit was SVN r30807.

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-24 23:18:35 +00:00
Jeff Squyres
d07d1864ae Revert r30804.
We're going to be bringing a bunch of usnic code to the SVN trunk
soon, and I basically brought this commit over out of order.  So I'm
reverting it for now; the same functionality will come back shortly.

This commit was SVN r30805.

The following SVN revision numbers were found above:
  r30804 --> open-mpi/ompi@5bedcc15bf
2014-02-24 19:12:49 +00:00
Jeff Squyres
5bedcc15bf Support the IBV_*_USNIC_* verbs constants.
These constants are now upstream (see
https://git.kernel.org/cgit/libs/infiniband/libibverbs.git/commit/?id=f57a9c67eabb9e7f19c624ac3c8c27b7be55796c),
so let's support them properly in Open MPI.

Added bonus: consolidating these checks up in
ompi_check_openfabrics.m4 allowed removing some custom checks and
AC_DEFINE's from the usnic configure.m4 script.

Also change the usnic/configure.m4 check for IBV_EVENT_GID_CHANGE to
use AC_CHECK_DECLS (vs. AC_CHECK_DECL).

cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r30804.
2014-02-24 18:57:04 +00:00
Jeff Squyres
1b855eca8e A few fixes after r30801:
* Use the prefix rule for global variables
 * Elimiante seed_prng() since it isn't necessary any more

These files will need to get edited again then the RNG type obeys the
prefix rule.

Refs trac:4298

This commit was SVN r30803.

The following SVN revision numbers were found above:
  r30801 --> open-mpi/ompi@e39d9f4080

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-24 17:47:52 +00:00
Joshua Ladd
e39d9f4080 Per the RFC schedule, add an additive lagged Fibonacci parallel random number generator to OPAL. In order to use, please add the following header to your code: opal/util/alfg.h. See ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c for an example how to seed with opal_srand and invoke the generator with opal_rand. This should be added to
cmr=v1.7.5:reviewer=rhc:subject=Add an OPAL RNG

This commit was SVN r30801.
2014-02-23 21:41:38 +00:00
Nathan Hjelm
bd275e642e btl/ugni: fix typo
cmr=v1.7.5:ticket=trac:4151

This commit was SVN r30795.

The following Trac tickets were found above:
  Ticket 4151 --> https://svn.open-mpi.org/trac/ompi/ticket/4151
2014-02-21 17:46:22 +00:00
Ralph Castain
29a7eda280 Remove executable property
This commit was SVN r30791.
2014-02-21 17:27:47 +00:00
Manjunath Gorentla Venkata
38e5a753dd basemuma bcol : fixing warnings
This commit was SVN r30784.
2014-02-20 18:30:53 +00:00
Mike Dubman
49ee63f4b8 MXM: do not enforce version check
- MXM uses libtool versioning scheme which is enough, no need additional in OMPI

reviewed by yossi

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30768.
2014-02-18 19:44:37 +00:00
Rolf vandeVaart
d4f12148c4 Fix several issues reported in ticket #4245.
This commit was SVN r30767.
2014-02-18 17:44:08 +00:00
Jeff Squyres
a80a24029d Rename poorly-named global: usnic_ticks -> ompi_btl_usnic_ticks
cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r30752.
2014-02-17 21:37:13 +00:00
Jeff Squyres
bb4ba6511d Remove an unused RML tag (it isn't even used in the oshmem layer).
This commit was SVN r30749.
2014-02-17 18:35:43 +00:00
Ralph Castain
c3df744a3b Shift the orte_db_localrank key to the opal level. Add the job and proc-level session directory names to the database using opal_db keys.
This commit was SVN r30746.
2014-02-17 01:40:56 +00:00
Ralph Castain
445c9f3384 Ensure we only post one receive for direct modex replies, and that we properly handle thread-transfer issues between the ORTE callback and the MPI layer. Account for potential threaded operations at the MPI level.
Refs trac:4258

This commit was SVN r30730.

The following Trac tickets were found above:
  Ticket 4258 --> https://svn.open-mpi.org/trac/ompi/ticket/4258
2014-02-14 20:37:17 +00:00
Ralph Castain
bdff767dce Ick - wonder how this ever built static? There is no "select" function anywhere in the system.
cmr=v1.7.5:reviewer=jsquyres:subject=remove bad function declaration

This commit was SVN r30729.
2014-02-14 20:34:21 +00:00
Mike Dubman
608269ed72 fca: support relocation of fca packages to opal_prefix/../fca
reviewed by AlexM
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30728.
2014-02-14 14:49:41 +00:00
Ralph Castain
3e12466f60 Ouch - fix bad race condition in direct modex
cmr=v1.7.5:reviewer=hjelmn:subject=fix bad race condition in direct modex

This commit was SVN r30691.
2014-02-11 23:21:27 +00:00
Dave Goodell
72c0b89e8f usnic: handle missing ibv_event_type_str
Some older versions of libibverbs do not have `ibv_event_type_str`,
leading to compilation failures on older machines, irrespective of
whether they could ever support usNIC anyway.  If we encounter any other
build issues related to "old verbs" then we should just cause the usnic
BTL to disqualify itself when it encounters "old" traits.

Thanks to Paul Hargrove for reporting the issue:
http://www.open-mpi.org/community/lists/devel/2014/02/14056.php

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30674.
2014-02-11 19:18:29 +00:00
Nathan Hjelm
6194bb502a vader: attempt to work around SGI UV issues by creating a segment that
only goes up to VADER_MAX_ADDRESS instead of 0xfffffffffffffffful.

cmr=v1.7.5:ticket=trac:4216

This commit was SVN r30669.

The following Trac tickets were found above:
  Ticket 4216 --> https://svn.open-mpi.org/trac/ompi/ticket/4216
2014-02-11 16:28:25 +00:00
Nathan Hjelm
f2f6a7fe81 vader: don't finalize an endpoint that is already finalized
Fixes trac:4252

cmr=v1.7.5:ticket=trac:4053

This commit was SVN r30668.

The following Trac tickets were found above:
  Ticket 4053 --> https://svn.open-mpi.org/trac/ompi/ticket/4053
  Ticket 4252 --> https://svn.open-mpi.org/trac/ompi/ticket/4252
2014-02-11 16:15:29 +00:00