1
1

255 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
2cf4862b49 Cleanup warnings for use of void* - requires intermediate cast to uintptr_t. Thanks to Paul Hargrove for reporting it
cmr=v1.7.4:reviewer=jsquyres

This commit was SVN r30333.
2014-01-20 15:44:45 +00:00
Nathan Hjelm
5c8ea3a251 btl/openib: Move free list memory allocation to add_procs
Per RFC which expired two weeks ago:

We are planning to make a change to Open MPI to always set up the btls. This
means the btl init will be called even if add_procs is never called for that
btl. In the openib btl free lists fragments are currently allocated in btl_init.
To avoid wasting that memory this commit moves that final device setup to
the add_procs function. This included allocating free lists, and starting the
async event thread.

At this time this change is safe since we have a barrier after add_procs in
MPI_Init. If this changes we will need to re-think some of the initialization
since we might have the possibility of a connection request before add_procs
is called.

Tested with Mellanox ConnectX2 and QLogic HCAs.

Commit also cleans up tabs in btl_openib_async.c.

cmr=v1.7.5:reviewer=miked

This commit was SVN r30122.
2014-01-06 19:51:30 +00:00
Nathan Hjelm
e0e94a6029 Fix warning caused by typo in r29815
This commit was SVN r29860.

The following SVN revision numbers were found above:
  r29815 --> open-mpi/ompi@d556b60b21
2013-12-11 21:45:39 +00:00
Rolf vandeVaart
d556b60b21 Chnage some CUDA configure code and macro names per review request by jsquyres in ticket #3880.
Functionally, nothing changes.

This commit was SVN r29815.
2013-12-06 14:35:10 +00:00
Rolf vandeVaart
ab77435d9b Fix the CUDA-aware case where we are not sending any GPU data.
This commit was SVN r29788.
2013-12-03 20:25:58 +00:00
Rolf vandeVaart
4964a5e98b Per this RFC from October 8, 2013 and as discuessed in telecon.
http://www.open-mpi.org/community/lists/devel/2013/10/13072.php

Add support for pinning GPU Direct RDMA in openib BTL for better small message latency of GPU buffers. 
Note that none of this is compiled in unless CUDA-aware support is requested.

This commit was SVN r29680.
2013-11-13 13:22:39 +00:00
Rolf vandeVaart
ee7510b025 Remove redundant macro. This was from reviewed of earlier ticket.
Fixes trac:3878.  Reviewed by jsquyres.

This commit was SVN r29581.

The following Trac tickets were found above:
  Ticket 3878 --> https://svn.open-mpi.org/trac/ompi/ticket/3878
2013-11-01 12:19:40 +00:00
Mike Dubman
b0e64427a9 ompi/mca/btl/openib: Fix memory leak and accessing free'd memory issues
Let imagine that we have two btls in btl_openib_component_init() both points to the same openib_btl->device and as a result have the same openib_btl->device->endpoints array.

Finalization phase calls twice mca_btl_openib_finalize()->mca_btl_openib_finalize_resources().
mca_btl_openib_finalize_resources() frees endpoint related btl. But the second call of mca_btl_openib_finalize_resources() checks endpoint that is released by previus call.

fixed by Igor, reviewed by miked/vasily
cmr=v1.7.4:reviewer=ompi-gk1.7

This commit was SVN r29563.
2013-10-30 11:47:49 +00:00
Rolf vandeVaart
18962d296b This has bothered me for a while. Change MCA_BTL_TAG_BTL to MCA_BTL_TAG_IB. They are the same
value so this does not change anything.  (MCA_BTL_TAG_IB = MCA_BTL_TAG_BTL + 0).  This just makes it more correct.

This commit was SVN r29099.
2013-08-30 14:53:59 +00:00
Jeff Squyres
63ac60864b Refs trac:3730
Turns out that AC_CHECK_DECLS is one of the "new style" Autoconf
macros that #defines the output to be 0 or 1 (vs. #define'ing or
#undef'ing it).  So don't check for "#if defined(..."; just check for
"#if ...".

This commit was SVN r29059.

The following Trac tickets were found above:
  Ticket 3730 --> https://svn.open-mpi.org/trac/ompi/ticket/3730
2013-08-22 17:44:20 +00:00
Rolf vandeVaart
96fdb060ea Fix compile errors and warnings from changeset 29052.
This commit was SVN r29054.
2013-08-21 19:01:54 +00:00
Steve Wise
67fe3f23ed Use the HAVE_DECL_IBV_LINK_LAYER_ETHERNET macro.
Commit r27211 added ifdef checks for #define
HAVE_IBV_LINK_LAYER_ETHERNET, which is incorrect.  The correct #define
is HAVE_DECL_IBV_LINK_LAYER_ETHERNET.  This broke OMPI over iWARP.

This fixes trac:3726 and should be added to cmr:v1.7.3:reviewer=jsquyres

This commit was SVN r29053.

The following SVN revision numbers were found above:
  r27211 --> open-mpi/ompi@b27862e5c7

The following Trac tickets were found above:
  Ticket 3726 --> https://svn.open-mpi.org/trac/ompi/ticket/3726
2013-08-20 20:00:46 +00:00
Ralph Castain
45e695928f As per the email discussion, revise the sparse handling of hostnames so that we avoid potential infinite loops while allowing large-scale users to improve their startup time:
* add a new MCA param orte_hostname_cutoff to specify the number of nodes at which we stop including hostnames. This defaults to INT_MAX => always include hostnames. If a value is given, then we will include hostnames for any allocation smaller than the given limit.

* remove ompi_proc_get_hostname. Replace all occurrences with a direct link to ompi_proc_t's proc_hostname, protected by appropriate "if NULL"

* modify the OMPI-ORTE integration component so that any call to modex_recv automatically loads the ompi_proc_t->proc_hostname field as well as returning the requested info. Thus, any process whose modex info you retrieve will automatically receive the hostname. Note that on-demand retrieval is still enabled - i.e., if we are running under direct launch with PMI, the hostname will be fetched upon first call to modex_recv, and then the ompi_proc_t->proc_hostname field will be loaded

* removed a stale MCA param "mpi_keep_peer_hostnames" that was no longer used anywhere in the code base

* added an envar lookup in ess/pmi for the number of nodes in the allocation. Sadly, PMI itself doesn't provide that info, so we have to get it a different way. Currently, we support PBS-based systems and SLURM - for any other, rank0 will emit a warning and we assume max number of daemons so we will always retain hostnames

This commit was SVN r29052.
2013-08-20 18:59:36 +00:00
Ralph Castain
611d7f9f6b When we direct launch an application, we rely on PMI for wireup support. In doing so, we lose the de facto data compression we get from the ORTE modex since we no longer get all the wireup info from every proc in a single blob. Instead, we have to iterate over all the procs, calling PMI_KVS_get for every value we require.
This creates a really bad scaling behavior. Users have found a nearly 20% launch time differential between mpirun and PMI, with PMI being the slower method. Some of the problem is attributable to poor exchange algorithms in RM's like Slurm and Alps, but we make things worse by calling "get" so many times.

Nathan (with a tad advice from me) has attempted to alleviate this problem by reducing the number of "get" calls. This required the following changes:

* upon first request for data, have the OPAL db pmi component fetch and decode *all* the info from a given remote proc. It turned out we weren't caching the info, so we would continually request it and only decode the piece we needed for the immediate request. We now decode all the info and push it into the db hash component for local storage - and then all subsequent retrievals are fulfilled locally

* reduced the amount of data by eliminating the exchange of the OMPI_ARCH value if heterogeneity is not enabled. This was used solely as a check so we would error out if the system wasn't actually homogeneous, which was fine when we thought there was no cost in doing the check. Unfortunately, at large scale and with direct launch, there is a non-zero cost of making this test. We are open to finding a compromise (perhaps turning the test off if requested?), if people feel strongly about performing the test

* reduced the amount of RTE data being automatically fetched, and fetched the rest only upon request. In particular, we no longer immediately fetch the hostname (which is only used for error reporting), but instead get it when needed. Likewise for the RML uri as that info is only required for some (not all) environments. In addition, we no longer fetch the locality unless required, relying instead on the PMI clique info to tell us who is on our local node (if additional info is required, the fetch is performed when a modex_recv is issued).

Again, all this only impacts direct launch - all the info is provided when launched via mpirun as there is no added cost to getting it

Barring objections, we may move this (plus any required other pieces) to the 1.7 branch once it soaks for an appropriate time.

This commit was SVN r29040.
2013-08-17 00:49:18 +00:00
Aurelien Bouteiller
e1066143a4 rename ompi_free_list operations to _mt, as per discussions at last face to face meeting
This commit was SVN r28734.
2013-07-08 22:07:52 +00:00
George Bosilca
c9e5ab9ed1 Our macros for the OMPI-level free list had one extra argument, a possible return
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.

The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.

This commit was SVN r28722.
2013-07-04 08:34:37 +00:00
Alex Mikheev
f76680fbd0 - btl_openib: fix total registered memory calculation for ConnectIB and Ofed 2.0
This commit was SVN r28432.
2013-05-01 13:39:29 +00:00
Jeff Squyres
8405975bf6 Be a little more conservative about initializing devices and modules
(i.e., ensure that more data items get zeroed out/set to NULL) so that
if something goes wrong during initialization, we don't try to clean
up something that isn't there (and segv).

The chance of this happening on the trunk is very low (and will also
be low once the verbs improvements are brought over to v1.7).  But it
can actually happen in the v1.6 branch (e.g., if no CPC is available,
we'll try to get the length of the endpoints list, but the endpoints
list is NULL).  

Hence, even though the real goal is to get this functionality over to
v1.6, I figured I'd commit to the trunk/CMR to v1.7 just to try to
keep commonality in the openib between all three where possible.

This commit was SVN r28317.
2013-04-09 21:55:31 +00:00
Ralph Castain
a4b6fb241f Remove all remaining vestiges of the Windows integration
This commit was SVN r28137.
2013-02-28 17:31:47 +00:00
Jeff Squyres
bbddd6ea03 Add header file for opal_show_help().
This commit was SVN r28056.
2013-02-13 16:31:59 +00:00
Brian Barrett
312f37706e In talking about this with Jeff and Ralph, we don't actually need
ompi_show_help, because opal_show_help is replaced with an 
aggregating version when using ORTE, so there's no reason to
directly call orte_show_help.

This commit was SVN r28051.
2013-02-12 21:10:11 +00:00
Joshua Ladd
70ad711337 Backing out the Open SHMEM project
This commit was SVN r28050.
2013-02-12 17:45:27 +00:00
Mike Dubman
ff384daab4 Added new project: oshmem.
This commit was SVN r28048.
2013-02-12 15:33:21 +00:00
Brian Barrett
f42783ae1a Move the RTE framework change into the trunk. With this change, all non-CR
runtime code goes through one of the rte, dpm, or pubsub frameworks.

This commit was SVN r27934.
2013-01-27 23:25:10 +00:00
Rolf vandeVaart
f63c88701f Improve CUDA GPU transfers over openib BTL. Use aynchronous copies.
This is RFC that was submitted in July and December of 2012.

This commit was SVN r27862.
2013-01-17 22:34:43 +00:00
Alex Mikheev
344d407ed4 fixed compilation warning
always send signalled when BTL_OPENIB_FAILOVER is defined

This commit was SVN r27801.
2013-01-13 10:11:03 +00:00
Mike Dubman
b6d50a5733 Performance optimizations by alexm:
* btl sendi(): if message can be send inline try to avoid signal
* signal is requested one per 64 or when
    there are no send wqes 
    when message can not be send inline 
    any other btl method then sendi()

This commit was SVN r27724.
2012-12-26 10:19:12 +00:00
Jeff Squyres
341ce2f9a4 Per some discussions between LANL, Cisco, ORNAL, and Mellanox, move
some new common OpenFabrics functionality to ompi/mca/common/verbs.
Also move everything that was in ompi/mca/common/ofautils under
ompi/mca/common/verbs.  

 * Move ofautils -> verbs
 * Add new functionality in ompi/mca/common/verbs (see doxygen
 * comments in ompi/mca/common/verbs/common_verbs.h for details):
   * ompi_common_verbs_find_ibv_ports()
   * ompi_common_verbs_port_bw()
   * ompi_common_verbs_mtu()
   * '''If you're writing verbs-based code, you should be using this
     common functionality'''
 * Adapt openib BTL to use some trivial common functionality in
   common/verbs
 * Don't use "#ifdef OMPI_HAVE_RDMAOE",use 
   "#if defined(HAVE_IBV_LINK_LAYER_ETHERNET)"
 * Update the following to include/link against common/verbs
   * bcol/iboffload
   * sbgp/ibnet
   * btl/openib

This commit was SVN r27212.
2012-09-01 01:42:37 +00:00
Jeff Squyres
e5babf830a Fixes trac:3258: add btl_openib_abort_not_enough_reg_mem MCA parameter
that causes MPI jobs to abort if there is not enough registered memory
available (vs. just warning).

This commit was SVN r27140.

The following Trac tickets were found above:
  Ticket 3258 --> https://svn.open-mpi.org/trac/ompi/ticket/3258
2012-08-25 11:39:06 +00:00
Nathan Hjelm
cd2cbdca09 btl/openib: limit each process to a ppn fraction of the available registered memory when using mellanox hardware (mlx4 and mthca). fixed
This commit was SVN r26811.
2012-07-19 17:52:21 +00:00
Ralph Castain
66fe57f746 Revert r26804 so openib can build again
This commit was SVN r26810.

The following SVN revision numbers were found above:
  r26804 --> open-mpi/ompi@610be870f9
2012-07-19 16:16:38 +00:00
Nathan Hjelm
610be870f9 btl/openib: limit each process to a ppn fraction of the available registered memory when using mellanox hardware (mlx4 and mthca)
This commit was SVN r26804.
2012-07-18 17:29:48 +00:00
Nathan Hjelm
4a97ecbdd2 btl/openib: remove tab characters
This commit was SVN r26803.
2012-07-18 17:29:37 +00:00
Nathan Hjelm
4d1920ee87 Fix a bug on 32-bit systems introduced by r26626. This fix ensures that all supported btls (with exception of wv-- shiqing will need to help bring that one up to date with r26626) set the lval in prepare_src/dst when preparing a put or get segment. This fix also ensures a consistent use of lval in put and get for both local and remote segments.
This commit was SVN r26793.

The following SVN revision numbers were found above:
  r26626 --> open-mpi/ompi@249066e06d
2012-07-13 21:19:16 +00:00
Terry Dontje
6f3195faca add some missing casts
This commit was SVN r26779.
2012-07-10 18:03:29 +00:00
Jeff Squyres
bb13e21538 Roll back r26730, but bump the default CQ length base up to 1500, not
1000.  Refs trac:3154.

IB/iWarp vendors need to get together to figure out a real fix.

This commit was SVN r26777.

The following SVN revision numbers were found above:
  r26730 --> open-mpi/ompi@5315c91baf

The following Trac tickets were found above:
  Ticket 3154 --> https://svn.open-mpi.org/trac/ompi/ticket/3154
2012-07-10 16:53:27 +00:00
Terry Dontje
95a3b4a423 corrected the change of pval to lval introduced in r26626
This commit was SVN r26732.

The following SVN revision numbers were found above:
  r26626 --> open-mpi/ompi@249066e06d
2012-07-03 18:52:18 +00:00
Jeff Squyres
5315c91baf Fixes trac:3152: slightly more advanced than the patch on the ticket:
* If the MCA param btl_openib_cq_size is set to 0 (which is the
   default), use the device CQ max size. Otherwise, use the MCA param
   value (and never adjust it again).
 * Remove the CQ size adjustment code. Since we default to max CQ
   size, there really isn't much point in having it any more. I think
   people setting an absolute CQ size is going to be rare, so let's
   not do anything fancy with it.
 * If the MCA param value is larger than what the device supports,
   print a warning (only once per process) and default to using the
   device max
 * Add a BTL_VERBOSE displaying which CQ size we used

This commit was SVN r26730.

The following Trac tickets were found above:
  Ticket 3152 --> https://svn.open-mpi.org/trac/ompi/ticket/3152
2012-07-03 16:49:59 +00:00
Josh Hursey
28681deffa Backout the ORCA commit. :(
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.

This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
542330e3a7 Commit of ORCA: Open MPI Runtime Collaborative Abstraction
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.

The project is described on the wiki:
  https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition

And on this email thread:
  http://www.open-mpi.org/community/lists/devel/2012/06/11109.php

This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Nathan Hjelm
37c624ee43 prepare to delete mpool/rdma
This commit was SVN r26664.
2012-06-26 15:55:23 +00:00
Nathan Hjelm
249066e06d Timeout! Per RFC update the BTL interface to hide segment keys. All BTLs (with the exception of wv), all relevant PMLs, and osc/rdma have been updated for the new interface.
This commit was SVN r26626.
2012-06-21 17:09:12 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Terry Dontje
3e70cad203 Correct a few alignment problems to address the issue brought up in ticket #2964
This commit was SVN r26078.
2012-03-01 17:29:40 +00:00
Nathan Hjelm
3048ce043d permanently disable ibcm
This commit was SVN r25137.
2011-09-13 15:10:51 +00:00
Yevgeny Kliteynik
4fbe68dd86 Removing trailing white spaces in all the openib btl code.
This commit was SVN r24855.
2011-07-04 14:00:41 +00:00
George Bosilca
4184baa67a Remove the proc_guid from the BTL proc structure. Instead use directly
the one stored in the ompi_proc_t.

This commit was SVN r24461.
2011-02-25 00:36:08 +00:00
Doron Shoham
e41e15c8db cosmetic fixes in openib btl:
* replace tabs with ws
* remove unnecessary casting
* use proper escape codes for printf() like functions

This commit was SVN r24445.
2011-02-23 15:50:37 +00:00
Doron Shoham
e5eef80364 fix type warning in openib btl
This commit was SVN r24419.
2011-02-21 15:13:30 +00:00
Eugene Loh
cd5c2e794f Some minor changes to help the openib BTL build and run on Solaris:
- poll() can return POLLRDNORM even if not requested (Solaris bug)
- MIN macro not defined in btl_openib.c
  and while we're at it, we clean up the MIN definition in ad_bgl_pset.h
- btl_openib_connect_rdmacm.c was calling rdma_destroy_id() twice
  leading to undefined behavior (a hang on Solaris)

This commit was SVN r24356.
2011-02-03 23:53:21 +00:00