1
1
Граф коммитов

19809 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
f8dbba78a7 Send the BTL-passed message to ompi_rte_abort.
cmr=v1.8:reviewer=rolfv

This commit was SVN r30889.
2014-02-28 16:20:54 +00:00
Ralph Castain
4a645f0342 Add detection of oversubscription with binding requested - if binding requested to core or hwt, warn and do not bind or else we will hurt performance. Also, if no binding directive was given, turn off the default binding
Refs trac:4317

This commit was SVN r30888.

The following Trac tickets were found above:
  Ticket 4317 --> https://svn.open-mpi.org/trac/ompi/ticket/4317
2014-02-28 16:08:52 +00:00
Ralph Castain
8500247c7b Fix the by-obj mapper in the case where slots are not specified, and so we are in a perpetual oversubscribed state
cmr=v1.7.5:reviewer=rhc

This commit was SVN r30887.
2014-02-28 05:21:46 +00:00
Ralph Castain
a4c3d0a5a0 Add some more debug to the by-obj mapper
This commit was SVN r30884.
2014-02-28 02:52:53 +00:00
Tom Naughton
8793560bde + fix abstraction violation (ORTE_process_info => OMPI_process_info)
This commit was SVN r30883.
2014-02-27 23:59:46 +00:00
Jeff Squyres
e819b5a34a Remove the vendor_ids parsing.
We don't use this functionality any more; we use the transport_type
and device name to identify usnic devices.  It's slightly easier
because we can transport_type+name from ibv_device_open() and don't
have to do an additional ibv_query_device() to get its attributes.

Reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30882.
2014-02-27 21:47:01 +00:00
Jeff Squyres
3cbdf33b88 This is what r30852 should have been: Consolidate into a single, outter loop of ibv_create_ah() calls
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style.  That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint.  If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later.  We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30879.

The following SVN revision numbers were found above:
  r30850 --> open-mpi/ompi@3641500442
  r30852 --> open-mpi/ompi@4e282a3295

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-27 17:19:50 +00:00
Jeff Squyres
45810f0efb Revert r30852: the wrong version of this patch got committed to SVN.
This commit was SVN r30878.

The following SVN revision numbers were found above:
  r30852 --> open-mpi/ompi@4e282a3295
2014-02-27 15:02:15 +00:00
Mike Dubman
d584869dda OSHMEM: memheap mkey exchange fix
fix situations where cluster nodes can have different btls

Fixed by Roman, reviewed by Igor, Mike
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30877.
2014-02-27 14:02:30 +00:00
Vasily Filipov
d702307521 OPENIB BTL/CONNECT: replace wrong rdma_freeaddrinfo call in rdmacm_component_query func.
This commit was SVN r30876.
2014-02-27 11:52:10 +00:00
Vasily Filipov
f2014b96e7 OPENIB BTL/CONNECT: Add support for AF_IB addressing in rdmacm.
This commit was SVN r30875.
2014-02-27 11:29:47 +00:00
Mike Dubman
e466fee747 OSHMEM: memheap framework fix warn, remove verbs deps
fixed by Igor, reviewed by Miked

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30874.
2014-02-27 07:22:57 +00:00
Mike Dubman
27b07a5d42 update ignores for new framework
cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30873.
2014-02-27 06:46:06 +00:00
Ralph Castain
ce26b096b4 Prevent failover to direct_modex if key isn't found unless direct_modex was enabled
Refs trac:4258

This commit was SVN r30865.

The following Trac tickets were found above:
  Ticket 4258 --> https://svn.open-mpi.org/trac/ompi/ticket/4258
2014-02-27 02:04:56 +00:00
Ralph Castain
ff37524bdd Update ignores
cmr=v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r30864.
2014-02-27 02:04:06 +00:00
Ralph Castain
d109c523b9 Per patch from Tetsuya Mishima, complete the overhaul of the round-robin mappers
Refs trac:4296

This commit was SVN r30861.

The following Trac tickets were found above:
  Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296
2014-02-27 00:43:53 +00:00
Jeff Squyres
7440f21b75 Add usnic connectivity-checking agent service.
Basically: since usnic is a connectionless transport, we do not get
OS-provided services "for free" that connection-oriented transports
get, namely: "hey, I wasn't able to make a connection to peer X", and
"hey, your connection to peer X has died."
    
This connectivity-checker runs in a separate progress thread in the
usnic BTL in local rank 0 on each server.  Upon first send in any
process, the connectivty-checker agent will send some UDP pings to the
peer to ensure that we can reach it.  If we can't, we'll abort the job
with a nice show_help message.
    
There's a lengthy comment in btl_usnic_connectivity.h explains the
scheme and how it works.

Reviewed by Dave Goodell.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30860.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 22:21:25 +00:00
Oscar Vega-Gisbert
f2043776f6 mpi.Comm: some methods must be private.
They were protected because the old Prequest implementation used them.

This commit was SVN r30859.
2014-02-26 21:17:52 +00:00
Joshua Ladd
d1baf3f00c Stop linking in verbs stuff in oshmem/mca/memheap/base now that we have the sshmem framework.
Refs trac:4261

This commit was SVN r30858.

The following Trac tickets were found above:
  Ticket 4261 --> https://svn.open-mpi.org/trac/ompi/ticket/4261
2014-02-26 20:28:47 +00:00
Ralph Castain
61a21e4f31 Based on Tetsuya's patch, with some changes, correct the case of map-by node where multiple cpus/rank are requested and result in a non-integer match with num slots. Also correct tests for binding policy given to use the proper macro.
Refs trac:4296

This commit was SVN r30857.

The following Trac tickets were found above:
  Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296
2014-02-26 18:12:23 +00:00
Mike Dubman
4572bd58e5 OSHMEM: fix bug in scoll/mpi, add fortran support
dtypes support to oshmem scoll mpi. added comment to
oshmem scoll mpi component regarding casting size_t to int.

Fixed by Elena, Reviewed by Igor/Mike

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30856.
2014-02-26 17:06:44 +00:00
Mike Dubman
323e4418b9 OSHMEM: extract memheap allocate methods into separate framework
- similar to opal/shmem
- next step is some refactoring and merge into opal/shmem
 Developed by Igor, reviewed by AlexM, MikeD

This commit fixes trac:4261.

This commit was SVN r30855.

The following Trac tickets were found above:
  Ticket 4261 --> https://svn.open-mpi.org/trac/ompi/ticket/4261
2014-02-26 16:32:23 +00:00
Nathan Hjelm
dfe4a504e4 udcm: fix race between ack arrival and message send and potential hang in udcm
finalize.

Closes trac:4290

cmr=v1.7.5:reviewer=miked

This commit was SVN r30854.

The following Trac tickets were found above:
  Ticket 4290 --> https://svn.open-mpi.org/trac/ompi/ticket/4290
2014-02-26 15:33:27 +00:00
Nathan Hjelm
30b61a3333 Fix a number of issues in the new one sided code.
- Fix several typos is osc/rdma.

 - Fix a locking issue in osc/sm that was caused by an incorrect
   assumption about the semantics of opal_atomic_add_32.

 - Always unlock the accumulation lock in osc/sm.

 - The base of a processes shared memory window should be NULL if
   the size is zero. Fixed.

cmr=v1.7.5:ticket=trac:4304

This commit was SVN r30853.

The following Trac tickets were found above:
  Ticket 4304 --> https://svn.open-mpi.org/trac/ompi/ticket/4304
2014-02-26 15:33:18 +00:00
Jeff Squyres
4e282a3295 Consolidate into a single, outter loop of ibv_create_ah() calls
Follow on to SVN trunk r30850: consolidate the ibv_create_ah() calls
into a single loop, MPI_WAITALL-style.  That is, call the (effectively
non-blocking) ibv_create_ah() for each endpoint.  If we get
NULL+EAGAIN, it means that the UDP ARP is still ongoing down in the
kernel, so just try again later.  We put these all into a single loop
because it allows us to parallelize the ARP progress in the kernel.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30852.

The following SVN revision numbers were found above:
  r30850 --> open-mpi/ompi@3641500442

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 11:02:12 +00:00
Jeff Squyres
52c48b34f0 Set svn:ignore in tests directory
cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30851.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 10:58:48 +00:00
Dave Goodell
3641500442 usnic: Loop on the ibv_create_ah call.
ibv_create_ah() may need to effect an ARP resolution, which may take
some time.  Rather than blocking in ibv_create_ah(), the usNIC driver
may return NULL and set errno to EAGAIN indicating that we should try
again (i.e., the ARP resolution is proceeding under the covers).

So add a simple loop here to loop over ibv_create_ah() until it
returns non-(NULL+EAGAIN).  A future commit will make this a bit more
efficient.

Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30850.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:38 +00:00
Dave Goodell
c40f8879c8 usnic: improve interface matching (esp. for UDP)
Prior to this commit we matched local interfaces to remote interfaces in
order to create endpoints in a simplistic way.  If any remote interfaces
were on the same subnet as any of our local interfaces then only local
interfaces would be paired (IP-routed remote interfaces would be
ignored).

This commit introduces a more general scheme which attempts to make the
"best" pairing of local interfaces to remote interfaces.  We now cast
the problem as a graph theory problem known as the "Assignment Problem",
or finding a maximum-cardinality, minimum-weight bipartite matching.  We
solve this problem by reducing the bipartite graph of interface
connectivity to a flow network and then solving for a minimum cost flow.
This is then easily converted into back into a matching on the original
bipartite graph.

In the new scheme, interfaces on the same subnet are preferred over
interfaces requiring intermediate routing hops and higher bandwidth
links are preferred over lower bandwidth links.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30849.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:26 +00:00
Dave Goodell
47148ab3cb usnic: helper routines for rtnetlink route lookups
Querying the OS routing table is important for making decisions about
which local and remote interfaces should be paired into reliable
communication channels.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30848.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:50:10 +00:00
Dave Goodell
db14c706ce usnic: add graph utility code
This code is intended to support usNIC interface matching functionality.
We currently view that problem as essentially the "Assignment Problem"
(http://en.wikipedia.org/wiki/Assignment_problem), for which there are
many possible solution approaches, including flow-network analysis.  In
the future, we might transition to a more nuanced view of the problem
which would likely also be flow-network based.

To this end, the current code focuses on providing one major algorithm
to the core usnic BTL: `ompi_btl_usnic_solve_bipartite_assignment`.  It
also exposes several typical and necessary functions for constructing,
manipulating, and querying weighted, directed graphs.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30847.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:49:54 +00:00
Dave Goodell
5bf969e63b usnic: unit test parse_ifex_str
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30846.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:48:05 +00:00
Dave Goodell
921a29e41f usnic: add simple unit testing infrastructure
This commit adds mechanisms for writing and running unit tests in the
usnic BTL.  The short version of how to run the tests is:

1. Configure with `--enable-ompi-btl-usnic-unit-tests`.  This will cause
   the unit testing code and test runner utility to be built.

2. Run the tests by running `ompi_btl_usnic_run_tests`.

See `README.test` for full details.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30845.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:50 +00:00
Dave Goodell
044a190cac usnic: consolidate orte includes into compat.h
These includes only exist in the Cisco-internal usnic-v1.6 code base,
but they should not exist anywhere except btl_usnic_compat.h in order to
minimize source differences between usnic-v1.6 and v1.7/trunk.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30844.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:33 +00:00
Dave Goodell
62dc42f628 usnic: check packet/segment lengths
Lower layer (hardware or software) bugs can result in a mismatch between
our BTL-layer payload size and the actual packet length.  We now check
that in order to catch these cases, which otherwise can result in
MPI-layer message corruption.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30843.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:19 +00:00
Dave Goodell
3b5b87c325 usnic: add missing MSGDEBUG in recv path
We were missing a debug message for a very common recv case, making it a
bit harder to follow a debug log.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30842.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:47:05 +00:00
Dave Goodell
f6036d11c8 usnic: fix sender hash comparisons for UDP
There was a duplicated subnet check in the sender hash lookup routine.
This caused receivers to always fail the sender hash lookup if the
sender was in a different subnet, so the receiver would discard the
packet as though it were coming from a different job.

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30841.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:46:50 +00:00
Dave Goodell
90d68730f1 usnic: fix SEGV when ibv_create_ah fails
If ibv_create_ah fails, we will not initialize the `endpoint->proc`.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30840.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:46:37 +00:00
Dave Goodell
a54f53f242 usnic: also match interfaces in different subnets
This functionality is required for routable UDP/IP usnic traffic.

Previously we would only setup endpoints for remote interfaces on the
same subnet as the current module's local interface.  This behavior
still holds if two processes share any common subnets.  However, if the
two processes only have no subnets in common then we assume that all
interfaces are reachable from all other interfaces and wire them up in a
1-1, randomly-matched order somewhat similarly to the "tcp" BTL's
behavior.

Only match in different subnets if we detect UDP support in the lower
layer.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30839.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:44:49 +00:00
Dave Goodell
4875f48eaa usnic: enable UDP support
This commit decouples OMPI deployment from the version(s) of the lower
layers of the stack by probing for UDP support.

Verbs applications assume a 40-byte header (there is no current
mechanism for querying payload offset).  So to support a 42-byte UDP
header without causing existing applications like ibv_ud_pingpong or
older versions of OMPI to crash, we must inform libusnic_verbs that we
are aware of the nonstandard payload offset.  We do this by overriding
the `transport_type` field of the device to be 42 before calling
`ibv_open_device`.  If the library resets it to something else, then we
know the lower layers are UDP capable.  Otherwise we use the older
custom-L2 format.

This necessitated some minor ugliness in common_verbs, but it's as tidy
as Jeff and I know how to make it right now.

This commit only adds support for UDP headers and connectivity over the
same L2 network, it does not touch routing or interface pairing.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30838.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:44:35 +00:00
Dave Goodell
e10ad5763f usnic: rearrange component struct field order
Just trying to be deliberate about keeping fastpath-accessed fields
grouped together to fit into the same 64-byte cache line.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30837.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:53 +00:00
Dave Goodell
5d7eabbcd1 usnic: Change tiny_mtu to a size_t (it's compared against an unsigned value)
Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Reese Faucette <rfaucett@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30836.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:37 +00:00
Dave Goodell
fef38d7e42 usnic: Fix a few compiler warnings about types of printed variables
Authored-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30835.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:23 +00:00
Dave Goodell
cadaa1c424 usnic: Shrink sequence numbers to 16 bits
Authored-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30834.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:40:10 +00:00
Dave Goodell
707e594d13 usnic: Use INLINE flag more often, saving the DMA is useful.
Authored-by: Reese Faucette <rfaucett@cisco.com>
Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30833.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:53 +00:00
Dave Goodell
dbbe6a8254 usnic: fix proc structure memory leak
Valgrind showed this one, just a bit of sloppiness with the reference
counting.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30832.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:34 +00:00
Dave Goodell
4af332bd4e Fix the logic in ompi_common_verbs_find_ports().
The logic did not correctly perform the OR behavior that is described
in the doxy docs for this function.  This commit fixes the logic so
that a port will be included if it has supports any of the
capabilities indicated by the passed-in flags.

Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30831.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:21 +00:00
Ralph Castain
fbeb0cac10 Silence warning
cmr=v1.7.5:reviewer=ompi-gk1.7

This commit was SVN r30828.
2014-02-25 23:50:12 +00:00
Ralph Castain
fea8a52983 Cleanup trailing spaces and use of tab instead of spaces
Refs trac:4298

This commit was SVN r30827.

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-25 23:41:55 +00:00
Ralph Castain
b880aa46bd Update the map-by obj and map-by obj:span mappers to correct for errors in computing carryover across the nodes. Be a little less complex in the algorithm so it is easier to follow and debug.
Refs trac:4296

This commit was SVN r30826.

The following Trac tickets were found above:
  Ticket 4296 --> https://svn.open-mpi.org/trac/ompi/ticket/4296
2014-02-25 23:32:43 +00:00
Joshua Ladd
a0a850a77b Fixing warnings in oshmem/mca/scoll/mpi.
Refs trac:4302

This commit was SVN r30824.

The following Trac tickets were found above:
  Ticket 4302 --> https://svn.open-mpi.org/trac/ompi/ticket/4302
2014-02-25 21:24:15 +00:00