1
1
Граф коммитов

290 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Rolf vandeVaart
9bc8fbaefd Create new error message so we can better pinpoint where an error occurs.
This commit was SVN r32303.
2014-07-24 15:18:55 +00:00
Rolf vandeVaart
8778418da2 Remove some debug #ifdefs (oops). Other lock support.
This commit was SVN r32263.
2014-07-22 02:09:06 +00:00
Rolf vandeVaart
1a61dd3078 Add some more locks where needed.
This commit was SVN r32262.
2014-07-22 00:29:57 +00:00
Jeff Squyres
da18eb1b8b common/verbs: fix usnic detection
The logic was mishandling the case of a newer kernel and an older
libusnic_verbs.  Simplify usnic_transport() to return constants in the
2 known cases (not a usNIC device and the TRANSPORT_USNIC_UDP case),
and call the magic probe in all other cases.

Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32260.
2014-07-21 19:52:29 +00:00
Rolf vandeVaart
947a4e14b4 Add a lock and clean up handling of some error conditions,
This commit was SVN r32258.
2014-07-17 19:33:10 +00:00
Rolf vandeVaart
26e3282a18 One more minor movement for easier reading. No functional change.
This commit was SVN r32252.
2014-07-16 20:59:07 +00:00
Rolf vandeVaart
c332ca75ff Change function name for clarity.
This commit was SVN r32251.
2014-07-16 20:46:10 +00:00
Dave Goodell
c104604387 common/verbs: update usnic transport probe
RHEL 7 has shipped with kernel support for the RDMA_TRANSPORT_USNIC
enum, but ''not'' the RDMA_TRANSPORT_USNIC_UDP enum.  This means that
when you install usNIC drivers from cisco.com, the kernel will report
IBV_TRANSPORT_USNIC, even though the transport is actually using UDP.

Therefore, we have to modify the logic in common/verbs to do the
additional magic probe if the device reports either an
IBV_TRANSPORT_IWARP or IBV_TRANSPORT_USNIC (because both of those might
be lies -- do the probe to figure out the real transport).

The code changed by this patch is fairly trivial; it simply moves the
logic of the magic probe to its own short function, and then calls that
short function in both the IBV_TRANSPORT_(IWARP|USNIC) cases.  It looks
longer because several lengthy comments were also updated.

Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.8.2:reviewer=ompi-rm1.8

This commit was SVN r32098.
2014-06-27 18:43:32 +00:00
Gilles Gouaillardet
de15b623b5 do not incrementing/decrementing the opal_progress_event_state.
Per Ralph :
"I noticed that we are incrementing and decrementing the opal_progress_event state.
However, this no longer has any impact whatsoever on the RML as that is running in
the independent ORTE event thread. So all this actually does is impact the MPI layer
by adding an unnecessary overhead."

Thanks Ralph for pointing this :-)

cmr=v1.8.2:reviewer=rhc:ticket=4671

This commit was SVN r31887.

The following Trac tickets were found above:
  Ticket 4671 --> https://svn.open-mpi.org/trac/ompi/ticket/4671
2014-05-23 05:02:31 +00:00
George Bosilca
66e91f3797 Remove a warning on p.
And correctly handle the reference count from OBJ_NEW in the error
path (thanks Gilles for noticing).

This commit was SVN r31877.
2014-05-22 05:57:21 +00:00
George Bosilca
6ed1ac032e Release the buffer in all error cases and add small code cleanups.
This commit was SVN r31876.
2014-05-22 05:17:35 +00:00
Gilles Gouaillardet
5e9347bfb4 Fix deallocation error in mca_common_sm_rml_info_bcast()
buffer is sent (num_local_procs-1) time, so it should be
OBJ_RETAIN'ed (num_local_procs-2) time

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31874.
2014-05-22 04:53:42 +00:00
Nathan Hjelm
e4db2c3ebb ompi: fix various small leaks
This commit fixes three leaks:

 - bml/r2: fix leak of del_procs in mca_bml_r2_del_procs

 - Release the modex data in btl/scif, btl/ugni, and btl/vader

 - ompi_mpi_finalize: close the allocator framework

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31778.

The following SVN revision numbers were found above:
  r2 --> open-mpi/ompi@58fdc18855
2014-05-15 15:59:51 +00:00
Gilles Gouaillardet
3e19041425 Fix memory leak in mca_common_sm_rml_info_bcast(...)
The local buffer object is only used by the root of the group,
so move down the buffer declaration and initialization.

cmr=v1.8.2:reviewer=rhc

This commit was SVN r31771.
2014-05-15 08:41:14 +00:00
Nathan Hjelm
c15c3dee4c common/ugni: fix bugs in r31557
This commit was SVN r31748.

The following SVN revision numbers were found above:
  r31557 --> open-mpi/ompi@c4c9bc1573
2014-05-13 22:37:01 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
Ralph Castain
fdfb331e13 Per RFC, continue the renaming process
This commit was SVN r31663.
2014-05-06 20:53:55 +00:00
Ralph Castain
c4c9bc1573 As per the RFC:
http://www.open-mpi.org/community/lists/devel/2014/04/14496.php

Revamp the opal database framework, including renaming it to "dstore" to reflect that it isn't a "database". Move the "db" framework to ORTE for now, soon to move to ORCM

This commit was SVN r31557.
2014-04-29 21:49:23 +00:00
Rolf vandeVaart
fc0a75da91 Fix help message errors as reported by check-help-strings.pl script.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31555.
2014-04-29 20:29:18 +00:00
Rolf vandeVaart
a6a245b5b5 More efficient way of waiting for asynchronous copy to complete.
This commit was SVN r31420.
2014-04-17 15:18:50 +00:00
Jeff Squyres
173c046617 build: add Automake-like silent/verbose macros for "ln -s ..." operations
Also, since I put some of the macros for these silent/verbose rules up
in the top-level Makefile.man-page-rules file, I renamed it to
Makefile.ompi-rules.

I've had this sitting around for a while; now seems like as good a
time as any to commit it.

This commit was SVN r31271.
2014-03-28 18:24:32 +00:00
Nathan Hjelm
b3bb90cf2d Do not include inttypes.h directly in Open MPI. Use opal_stdint.h instead.
This commit should finish the work started for #869. Closing that ticket
with this commit.

Closes trac:869

cmr=v1.8.1:reviewer=jsquyres

This commit was SVN r31257.

The following Trac tickets were found above:
  Ticket 869 --> https://svn.open-mpi.org/trac/ompi/ticket/869
2014-03-27 17:56:00 +00:00
Alina Sklarevich
947233f539 common/verbs: added a call to ompi_ibv_free_device_list.
the ompi_common_verbs_find_ports function had a call to
ompi_ibv_get_device_list, but not to ompi_ibv_free_device_list.

fixed by Alina, reviewed by Vasily/Mike.
cmr=v1.8:reviewer=ompi-rm1.8 

This commit was SVN r31200.
2014-03-25 14:41:09 +00:00
Jeff Squyres
da87b506bd Remove warnings identified by clang 3.4
* Remove unused static functions
 * Remove unused static variables

cmr=v1.8:reviewer=hjelmn

This commit was SVN r31023.
2014-03-12 13:17:54 +00:00
Jeff Squyres
fbde50e7cd Fix ibv_port_query() usnic extension usage
* Older versions of libusnic_verbs actually return 0 when querying for
  an unknown port.  So also check for a magic ID in the returned data
  to *really* know if the usnic extensions are supported.
* Use a union (in the common_verbs area) and memcpy (in the btl) to
  avoid undefined C type aliasing behavior.
* Ensure to memset the function table to 0 if the usnic extensions
  are not supported.

Submitted by Jeff Squyres, reviewed by Dave Goodell.

cmr=v1.7.5:reviewer=ompi-rm1.7

This commit was SVN r30935.
2014-03-04 22:11:47 +00:00
Jeff Squyres
05af83d5d8 common_verbs: Remove usnic magic probe test
Check the IBV_TRANSPORT_* values.  In the case of IBV_TRANSPORT_IWARP,
there's an ambiguity and we need to also check to see whether the
usnic verbs externsion probe exists.

This commit was SVN r30913.
2014-03-03 21:32:44 +00:00
Dave Goodell
4875f48eaa usnic: enable UDP support
This commit decouples OMPI deployment from the version(s) of the lower
layers of the stack by probing for UDP support.

Verbs applications assume a 40-byte header (there is no current
mechanism for querying payload offset).  So to support a 42-byte UDP
header without causing existing applications like ibv_ud_pingpong or
older versions of OMPI to crash, we must inform libusnic_verbs that we
are aware of the nonstandard payload offset.  We do this by overriding
the `transport_type` field of the device to be 42 before calling
`ibv_open_device`.  If the library resets it to something else, then we
know the lower layers are UDP capable.  Otherwise we use the older
custom-L2 format.

This necessitated some minor ugliness in common_verbs, but it's as tidy
as Jeff and I know how to make it right now.

This commit only adds support for UDP headers and connectivity over the
same L2 network, it does not touch routing or interface pairing.

Reviewed-by: Jeff Squyres <jsquyres@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30838.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:44:35 +00:00
Dave Goodell
4af332bd4e Fix the logic in ompi_common_verbs_find_ports().
The logic did not correctly perform the OR behavior that is described
in the doxy docs for this function.  This commit fixes the logic so
that a port will be included if it has supports any of the
capabilities indicated by the passed-in flags.

Authored-by: Jeff Squyres <jsquyres@cisco.com>
Reviewed-by: Dave Goodell <dgoodell@cisco.com>

cmr=v1.7.5:ticket=trac:4253

This commit was SVN r30831.

The following Trac tickets were found above:
  Ticket 4253 --> https://svn.open-mpi.org/trac/ompi/ticket/4253
2014-02-26 07:39:21 +00:00
Joshua Ladd
9ea9bec4ad Addressing Jeff's comments:
1. Changed rng_buff_t --> opal_rng_buff_t
2. All global variables obey the prefix rule
3. Old code has been removed 
4. Found a couple of unnecessary includes

Refs trac:4298

This commit was SVN r30807.

The following Trac tickets were found above:
  Ticket 4298 --> https://svn.open-mpi.org/trac/ompi/ticket/4298
2014-02-24 23:18:35 +00:00
Jeff Squyres
d07d1864ae Revert r30804.
We're going to be bringing a bunch of usnic code to the SVN trunk
soon, and I basically brought this commit over out of order.  So I'm
reverting it for now; the same functionality will come back shortly.

This commit was SVN r30805.

The following SVN revision numbers were found above:
  r30804 --> open-mpi/ompi@5bedcc15bf
2014-02-24 19:12:49 +00:00
Jeff Squyres
5bedcc15bf Support the IBV_*_USNIC_* verbs constants.
These constants are now upstream (see
https://git.kernel.org/cgit/libs/infiniband/libibverbs.git/commit/?id=f57a9c67eabb9e7f19c624ac3c8c27b7be55796c),
so let's support them properly in Open MPI.

Added bonus: consolidating these checks up in
ompi_check_openfabrics.m4 allowed removing some custom checks and
AC_DEFINE's from the usnic configure.m4 script.

Also change the usnic/configure.m4 check for IBV_EVENT_GID_CHANGE to
use AC_CHECK_DECLS (vs. AC_CHECK_DECL).

cmr=v1.7.5:reviewer=dgoodell

This commit was SVN r30804.
2014-02-24 18:57:04 +00:00
Joshua Ladd
e39d9f4080 Per the RFC schedule, add an additive lagged Fibonacci parallel random number generator to OPAL. In order to use, please add the following header to your code: opal/util/alfg.h. See ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c for an example how to seed with opal_srand and invoke the generator with opal_rand. This should be added to
cmr=v1.7.5:reviewer=rhc:subject=Add an OPAL RNG

This commit was SVN r30801.
2014-02-23 21:41:38 +00:00
Rolf vandeVaart
d4f12148c4 Fix several issues reported in ticket #4245.
This commit was SVN r30767.
2014-02-18 17:44:08 +00:00
Rolf vandeVaart
9f3bf4747d Provide option to have synchronous copy be asynchronous with a wait. For now,
this has to be selected at runtime.  Also fix up some error messages to have
node name in them.

This commit was SVN r30396.
2014-01-23 15:47:20 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Rolf vandeVaart
b3edca19df Add braces per coding convention and design review.
This commit was SVN r30137.
2014-01-07 17:30:37 +00:00
Rolf vandeVaart
695d854cd8 Fix return value.
This commit was SVN r30034.
2013-12-20 20:57:04 +00:00
Rolf vandeVaart
4cd1958deb Fix so we do not get warnings when running on system without CUDA software installed and CUDA-aware compiled in.
This commit was SVN r30032.
2013-12-20 20:39:25 +00:00
Rolf vandeVaart
b955dbd6d9 Fix various items discovered by review of ticket #3951.
This commit was SVN r29900.
2013-12-13 21:25:07 +00:00
Jeff Squyres
bac67e0d81 Per discussion @Chicago OMPI dev meeting Dec 2013: remove all MX support.
This commit was SVN r29873.
2013-12-12 18:54:47 +00:00
Rolf vandeVaart
d556b60b21 Chnage some CUDA configure code and macro names per review request by jsquyres in ticket #3880.
Functionally, nothing changes.

This commit was SVN r29815.
2013-12-06 14:35:10 +00:00
Rolf vandeVaart
218c05a4d1 Make sure synchronous copies are complete before moving the data.
This commit was SVN r29789.
2013-12-03 21:20:14 +00:00
Rolf vandeVaart
aa98b0333b Call function from function table. Discovered during static build.
This commit was SVN r29755.
2013-11-25 22:46:07 +00:00
Nathan Hjelm
24a7e7aa34 Add support for the udreg registration cache and dynamics on XE/XK/XC.
To support the new mpool two changes were made to the mpool infrastructure:

 1) Added an mpool flag to indicate that an mpool does not need the memory
    hooks to use the leave pinned protocols. This flag is checked in the
    mpool lookup.

 2) Add a mpool context to the base registration. This new member is used
    by the udreg mpool to store the udreg context associated with the
    particular registration. The new member will not break the ABI
    compatibility as the new member is only currently used by the udreg
    mpool.

Dynamics support for Cray systems makes use of the global rank provided by
orte to give the ugni library a unique rank for each process. Dynamics
support is not available under direct-launch (srun.)

cmr=v1.7.4

This commit was SVN r29719.
2013-11-18 04:58:37 +00:00
Rolf vandeVaart
92e6aaa808 Adjust a default value. Adjust some levels of verbosity and one more debug message.
This commit was SVN r29712.
2013-11-14 21:47:27 +00:00
Rolf vandeVaart
4964a5e98b Per this RFC from October 8, 2013 and as discuessed in telecon.
http://www.open-mpi.org/community/lists/devel/2013/10/13072.php

Add support for pinning GPU Direct RDMA in openib BTL for better small message latency of GPU buffers. 
Note that none of this is compiled in unless CUDA-aware support is requested.

This commit was SVN r29680.
2013-11-13 13:22:39 +00:00
Rolf vandeVaart
a6df7bc33a Fix issues reported in ticket #3877. Also added additional comments.
This commit was SVN r29641.
2013-11-07 20:44:47 +00:00
Rolf vandeVaart
2cf7c40ee5 Minor adjustments to error messages due to review of #3880.
This commit was SVN r29640.
2013-11-07 20:21:21 +00:00
Rolf vandeVaart
e46c0bb952 Fix one more space for consistent defines.
This commit was SVN r29607.
2013-11-05 15:31:49 +00:00