1
1

185 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
32a8ff5255 btl/openib: bump up udcm priority
This commit was SVN r28505.
2013-05-14 20:02:40 +00:00
Nathan Hjelm
422331b4da btl/openib: fix unconnected datagram connection method (udcm)
The primary issue with udcm is that the immediate data in message
acks were often bogus. This caused the sender to keep trying even
though a message was received and acked. The fix is to use the
source LID and QP to determine which message is being acked. In
most cases this should work well since only one message will be
in flight to any peer.

This commit was SVN r28444.
2013-05-03 17:11:38 +00:00
Nathan Hjelm
f384263de7 btl/openib: fix typo
This commit was SVN r28413.
2013-04-29 22:21:25 +00:00
Ralph Castain
8996ecb128 Add missing include
This commit was SVN r28405.
2013-04-27 00:09:36 +00:00
Jeff Squyres
99b7a0f20d Remove unused variables.
This commit was SVN r28401.
2013-04-26 15:29:42 +00:00
Nathan Hjelm
9d4a26f47d Update OMPI frameworks to use the MCA framework system.
Notes:
  - This commit also eliminates the need for an available components list in use
    in several frameworks. None of the code in question was making use of the
    priority field of the priority component list item so these extra lists were
    removed.
  - Cleaned up selection code in several frameworks to sort lists using opal_list_sort.
  - Cleans up the ompi/orte-info functions. Expose the functions that construct the
    list of params so they can be used elsewhere.

patches for mtl/portals4 from brian

missed a few output variables in openib

This commit was SVN r28241.
2013-03-27 21:17:31 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Ralph Castain
a4b6fb241f Remove all remaining vestiges of the Windows integration
This commit was SVN r28137.
2013-02-28 17:31:47 +00:00
Jeff Squyres
bbddd6ea03 Add header file for opal_show_help().
This commit was SVN r28056.
2013-02-13 16:31:59 +00:00
Brian Barrett
312f37706e In talking about this with Jeff and Ralph, we don't actually need
ompi_show_help, because opal_show_help is replaced with an 
aggregating version when using ORTE, so there's no reason to
directly call orte_show_help.

This commit was SVN r28051.
2013-02-12 21:10:11 +00:00
Joshua Ladd
70ad711337 Backing out the Open SHMEM project
This commit was SVN r28050.
2013-02-12 17:45:27 +00:00
Mike Dubman
ff384daab4 Added new project: oshmem.
This commit was SVN r28048.
2013-02-12 15:33:21 +00:00
Brian Barrett
f42783ae1a Move the RTE framework change into the trunk. With this change, all non-CR
runtime code goes through one of the rte, dpm, or pubsub frameworks.

This commit was SVN r27934.
2013-01-27 23:25:10 +00:00
Jeff Squyres
dd254cc202 OMPI_HAVE_IBV_LINK_LAYER does not exist. Instead, check defined(HAVE_IBV_LINK_LAYER_ETHERNET).
This commit was SVN r27251.
2012-09-06 18:25:36 +00:00
Jeff Squyres
341ce2f9a4 Per some discussions between LANL, Cisco, ORNAL, and Mellanox, move
some new common OpenFabrics functionality to ompi/mca/common/verbs.
Also move everything that was in ompi/mca/common/ofautils under
ompi/mca/common/verbs.  

 * Move ofautils -> verbs
 * Add new functionality in ompi/mca/common/verbs (see doxygen
 * comments in ompi/mca/common/verbs/common_verbs.h for details):
   * ompi_common_verbs_find_ibv_ports()
   * ompi_common_verbs_port_bw()
   * ompi_common_verbs_mtu()
   * '''If you're writing verbs-based code, you should be using this
     common functionality'''
 * Adapt openib BTL to use some trivial common functionality in
   common/verbs
 * Don't use "#ifdef OMPI_HAVE_RDMAOE",use 
   "#if defined(HAVE_IBV_LINK_LAYER_ETHERNET)"
 * Update the following to include/link against common/verbs
   * bcol/iboffload
   * sbgp/ibnet
   * btl/openib

This commit was SVN r27212.
2012-09-01 01:42:37 +00:00
Jeff Squyres
c8cee23ee7 Priorities really shouldn't be less than 0.
This commit was SVN r27098.
2012-08-21 15:47:15 +00:00
Ralph Castain
dacb07000d Turn udcm and ud oob off by default, but allow them to build and be used if someone wants to test them
cmr:v1.7

This commit was SVN r27097.
2012-08-21 15:18:34 +00:00
Christopher Yeoh
cc091f4979 Adds synchronisation between main thread and service thread in
btl_openib_connect_udcm when notifying not to listen to an fd to ensure
that the main thread does not continue until the service thread has
processed the message

Adds ability to send message to openib async thread to tell it to
ignore the ERR state on a specific QP. Adds this call to udcm_module_finalize
so when we set the error state on the QP it doesn't cause the 
openib async thread to abort the mpi program prematurely

Fixes trac:3161

This commit was SVN r27064.

The following Trac tickets were found above:
  Ticket 3161 --> https://svn.open-mpi.org/trac/ompi/ticket/3161
2012-08-16 03:56:21 +00:00
Nathan Hjelm
8736953c7f btl/openib/connect improve the help message printed when a queue pair can not be created
This commit was SVN r26876.
2012-07-26 20:36:46 +00:00
Nathan Hjelm
771b427027 udcm: unmonitor the fd BEFORE tearing down the listen qp
This commit was SVN r26800.
2012-07-18 14:22:45 +00:00
Nathan Hjelm
fc1b295606 udcm: evict from the lru of the openib device's grdma mpool if a qp can not be created. Note: there doesn't appear to be a standard way to differentiate between ibv_create_qp failing because the node is out of registered memory and failing because no more qps are available
This commit was SVN r26797.
2012-07-14 01:58:29 +00:00
Nathan Hjelm
344fe61616 remove assertion in udcm
This commit was SVN r26790.
2012-07-13 15:14:48 +00:00
Nathan Hjelm
05c5c1f412 remove unused i_initiate function from udcm
This commit was SVN r26778.
2012-07-10 17:22:19 +00:00
Nathan Hjelm
9f3717959e remove sync step from udcm as it really isn't necessary
This commit was SVN r26724.
2012-07-02 22:54:44 +00:00
Josh Hursey
28681deffa Backout the ORCA commit. :(
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.

This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
542330e3a7 Commit of ORCA: Open MPI Runtime Collaborative Abstraction
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.

The project is described on the wiki:
  https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition

And on this email thread:
  http://www.open-mpi.org/community/lists/devel/2012/06/11109.php

This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Nathan Hjelm
249066e06d Timeout! Per RFC update the BTL interface to hide segment keys. All BTLs (with the exception of wv), all relevant PMLs, and osc/rdma have been updated for the new interface.
This commit was SVN r26626.
2012-06-21 17:09:12 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Nathan Hjelm
3d9bc68435 reorder udcm init/finalize code. fixes trac:2973
This commit was SVN r25787.

The following Trac tickets were found above:
  Ticket 2973 --> https://svn.open-mpi.org/trac/ompi/ticket/2973
2012-01-26 16:28:55 +00:00
Brad Benton
96395c916e de-tab'd
This commit was SVN r25465.
2011-11-09 19:45:12 +00:00
Christopher Yeoh
fb57a74a40 Removes pointless memmove which because of a previous memcpy will always
have identical source and destination pointers. See #2871
Plugs a couple of minor memory leaks related to remote qp info

This commit was SVN r25431.
2011-11-04 00:15:08 +00:00
George Bosilca
9d8e84142f Survivor!!!
This commit was SVN r25371.
2011-10-26 00:58:55 +00:00
George Bosilca
80c02647c8 Each level (OPAL/ORTE/OMPI) should only return it's own constants,
instead of the current mismatch.

This commit was SVN r25230.
2011-10-04 14:50:31 +00:00
Brad Benton
0f2475c554 Modified set_remote_info() to use memmove() instead of memcpy() when
copying rem_qp info.  This avoids potential errors when src & dest overlap.
This is a workaround for the issue in #2871

This commit was SVN r25180.
2011-09-26 20:07:36 +00:00
Nathan Hjelm
98b56108c4 add unconnect datagram connection manager (udcm)
This commit was SVN r25160.
2011-09-19 21:24:58 +00:00
Nathan Hjelm
8cd550f49f fixed error in last commit
This commit was SVN r25159.
2011-09-19 21:13:59 +00:00
Nathan Hjelm
de950959ee print a more meaningful error message when ibv_create_qp fails
This commit was SVN r25158.
2011-09-19 21:12:22 +00:00
Steve Wise
e5bba38434 Increase the rdmacm cpc address resolution timeout to 30 seconds.
Global rdmacm_resolve_timeout defaults to 1000 (1000 ms), which is way
too small for even a 16 node x 12 core iwarp cluster in the presence
of drops.  Bump up the default to 30000ms.

This commit fixes trac:2860 and should be added to cmr:v1.4:reviewer=jsquyres
and cmr:v1.5:reviewer=jsquyres

This commit was SVN r25144.

The following Trac tickets were found above:
  Ticket 2860 --> https://svn.open-mpi.org/trac/ompi/ticket/2860
2011-09-14 21:52:58 +00:00
Nathan Hjelm
3048ce043d permanently disable ibcm
This commit was SVN r25137.
2011-09-13 15:10:51 +00:00
Rainer Keller
9d5afc58c6 - Fix breakage of the epoch changes with PGI:
Don't juse include pre-processor macros between two strins ("s1" #if 0 ... "s2")...
   Rather print out the epoch as 0 always...

This commit was SVN r25110.
2011-08-31 08:40:31 +00:00
Wesley Bland
4e7ff0bd5e By popular demand the epoch code is now disabled by default.
To enable the epochs and the resilient orte code, use the configure flag:

--enable-resilient-orte

This will define both:

ORTE_ENABLE_EPOCH
ORTE_RESIL_ORTE

This commit was SVN r25093.
2011-08-26 22:16:14 +00:00
Yevgeny Kliteynik
7068dc64eb Dynamic SL rework:
- Added dynamic SL support to xoob
 - Fixed seg fault in finalization
 - All the code has been moved to separate files: connect/btl_openib_connect_sl.{c,h}
 - The new files compilation is conditionalized

This commit was SVN r24991.
2011-08-04 20:26:08 +00:00
Yevgeny Kliteynik
4fbe68dd86 Removing trailing white spaces in all the openib btl code.
This commit was SVN r24855.
2011-07-04 14:00:41 +00:00
Yevgeny Kliteynik
b05211148d Supporting dynamic SL (#2674)
- Added enable/disable configuration parameter for dynamic SL
 - All the dynamic SL code is conditionalized
 - Removed libibmad dependency
 - Using only one include - ib_types.h (part of opensm-devel package)
 - Removed all the macro and data types definitions, using the
   existing definitions from ib_types.h instead
 - general cleaning here and there

The async mode is not implemented yet - stay tuned...

This commit was SVN r24830.
2011-06-28 14:28:29 +00:00
Wesley Bland
e1ba09ad51 Add a resilience to ORTE. Allows the runtime to continue after a process (or
ORTED) failure. Note that more work will be necessary to allow the MPI layer to
take advantage of this.

Per RFC:
http://www.open-mpi.org/community/lists/devel/2011/06/9299.php

This commit was SVN r24815.
2011-06-23 20:38:02 +00:00
Jeff Squyres
82f9474fec Revert r24533 and r24507 until the compile errors can be fixed.
This commit was SVN r24541.

The following SVN revision numbers were found above:
  r24507 --> open-mpi/ompi@4ce1936fed
  r24533 --> open-mpi/ompi@3204af2d36
2011-03-18 13:33:02 +00:00
Mike Dubman
3204af2d36 * temporary fix for ib btl compilation with old ofed versions 1.3.x.
This commit was SVN r24533.
2011-03-16 17:53:51 +00:00
Doron Shoham
4ce1936fed Fix the following for dynamic SL patch:
* rename ib_path_rec_service_level -> ib_path_record_service_level
* use mad.h and ib_types.h
* free all resources
* move ibv_post_recv to be just before ibv_post_send
* cleanup and beatify code

This commit was SVN r24507.
2011-03-10 16:19:00 +00:00
George Bosilca
4184baa67a Remove the proc_guid from the BTL proc structure. Instead use directly
the one stored in the ompi_proc_t.

This commit was SVN r24461.
2011-02-25 00:36:08 +00:00
Jeff Squyres
b468c71b47 Use complete types
This commit was SVN r24434.
2011-02-22 22:34:44 +00:00