1
1

351 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
3798f38386 do not print out an error message if ibv_reg_mr fails
This commit was SVN r26796.
2012-07-14 01:35:45 +00:00
Nathan Hjelm
4d1920ee87 Fix a bug on 32-bit systems introduced by r26626. This fix ensures that all supported btls (with exception of wv-- shiqing will need to help bring that one up to date with r26626) set the lval in prepare_src/dst when preparing a put or get segment. This fix also ensures a consistent use of lval in put and get for both local and remote segments.
This commit was SVN r26793.

The following SVN revision numbers were found above:
  r26626 --> open-mpi/ompi@249066e06d
2012-07-13 21:19:16 +00:00
Pavel Shamis
f7664b3814 1. Adding 2 new components:
ofacm - generic connection manager for IB interconnects.
ofautils - IB common utilities and compatibility code

2. Updating OpenIB configure code

- ORNL & Mellanox Teams 

This commit was SVN r26707.
2012-07-02 15:20:12 +00:00
Jeff Squyres
b936229b54 Refs trac:3130: fix the openib BTL to properly set the memalign malloc
hook early in the setup, but ''not'' during the component register
function.  And then properly unset it if was set.

This commit was SVN r26697.

The following Trac tickets were found above:
  Ticket 3130 --> https://svn.open-mpi.org/trac/ompi/ticket/3130
2012-06-29 13:51:36 +00:00
Josh Hursey
28681deffa Backout the ORCA commit. :(
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.

This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
542330e3a7 Commit of ORCA: Open MPI Runtime Collaborative Abstraction
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.

The project is described on the wiki:
  https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition

And on this email thread:
  http://www.open-mpi.org/community/lists/devel/2012/06/11109.php

This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Nathan Hjelm
37c624ee43 prepare to delete mpool/rdma
This commit was SVN r26664.
2012-06-26 15:55:23 +00:00
Ralph Castain
e6f3586415 Remove the orte notifier framework, per discussion at the devel meeting and follow-up with Jeff (who took the action item)
This commit was SVN r26637.
2012-06-22 18:09:23 +00:00
Nathan Hjelm
249066e06d Timeout! Per RFC update the BTL interface to hide segment keys. All BTLs (with the exception of wv), all relevant PMLs, and osc/rdma have been updated for the new interface.
This commit was SVN r26626.
2012-06-21 17:09:12 +00:00
Yevgeny Kliteynik
df783c0472 Precise speed of FDR and EDR
This commit was SVN r26614.
2012-06-17 07:06:37 +00:00
Yevgeny Kliteynik
d59b8d5dc4 Fixing malformed error message
This commit was SVN r26434.
2012-05-12 21:13:42 +00:00
Yevgeny Kliteynik
244d66d95b Fixed FDR link speed details, added EDR.
This commit was SVN r26423.
2012-05-10 13:44:18 +00:00
Jeff Squyres
de4bbacd13 It turns out that we can't always include the hwloc OpenFabrics verbs
helper file, even if we find that the system has <infiniband/verbs.h>.
The reason is because there are some inline functions in that verbs
helper file that invoke ibv_* functions.  Some linkers (e.g., Solaris
Studio Compilers) will instantiate those static inline functions --
even if we don't use them -- and therefore we need to be able to
resolve the ibv_* symbols at link time.

But since -libverbs is only specified in places where we use other
ibv_* functions (e.g., the OpenFabrics-based BTLs), that means that
linking random executables can/will fail (e.g., orterun).

So instead, introduce a new #define: OPAL_HWLOC_WANT_VERBS_HELPER.  If
this macro is set to 1 before including opal/mca/hwloc/hwloc.h, then
you'll also get the hwloc OpenFabrics verbs helper header file (*if*
hwloc found <infiniband/verbs.h> -- otherwise, it'll #error).

This commit was SVN r26417.
2012-05-09 20:18:31 +00:00
Mike Dubman
cd17fee9a8 performance fix: openib use memalign for malloc
This commit was SVN r26409.
2012-05-08 20:42:09 +00:00
Jeff Squyres
2ba10c37fe Per RFC, bring in the following changes:
* Remove paffinity, maffinity, and carto frameworks -- they've been
   wholly replaced by hwloc.
 * Move ompi_mpi_init() affinity-setting/checking code down to ORTE.
 * Update sm, smcuda, wv, and openib components to no longer use carto.
   Instead, use hwloc data.  There are still optimizations possible in
   the sm/smcuda BTLs (i.e., making multiple mpools).  Also, the old
   carto-based code found out how many NUMA nodes were ''available''
   -- not how many were used ''in this job''.  The new hwloc-using
   code computes the same value -- it was not updated to calculate how
   many NUMA nodes are used ''by this job.''
   * Note that I cannot compile the smcuda and wv BTLs -- I ''think''
     they're right, but they need to be verified by their owners.
 * The openib component now does a bunch of stuff to figure out where
   "near" OpenFabrics devices are.  '''THIS IS A CHANGE IN DEFAULT
   BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors
   (I do not have a NUMA machine with an OpenFabrics device that is a
   non-uniform distance from multiple different NUMA nodes).
 * Completely rewrite the OMPI_Affinity_str() routine from the
   "affinity" mpiext extension.  This extension now understands
   hyperthreads; the output format of it has changed a bit to reflect
   this new information.
 * Bunches of minor changes around the code base to update names/types
   from maffinity/paffinity-based names to hwloc-based names.
 * Add some helper functions into the hwloc base, mainly having to do
   with the fact that we have the hwloc data reporting ''all''
   topology information, but sometimes you really only want the
   (online | available) data.

This commit was SVN r26391.
2012-05-07 14:52:54 +00:00
Mike Dubman
1b475523de add support for FDR speed
This commit was SVN r26385.
2012-05-06 05:53:05 +00:00
Terry Dontje
81d7fcaf82 back out r26255 to avoid cross component linkage so Solaris can build a usable openib btl
This commit was SVN r26269.

The following SVN revision numbers were found above:
  r26255 --> open-mpi/ompi@fe25b8704b
2012-04-13 18:08:54 +00:00
Mike Dubman
fe25b8704b performance fix: set alignment for openib internal buffers
Thanks to Jeff/Pasha for valuable comments
Thanks to Valentin Petrov for implementation

This commit was SVN r26255.
2012-04-09 08:06:15 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Mike Dubman
ff1c84c53f revert previous commit
This commit was SVN r26206.
2012-03-29 14:07:13 +00:00
Mike Dubman
43a5775e8a performance fix: set alignment for openib internal buffers
This commit was SVN r26205.
2012-03-29 14:00:08 +00:00
Pavel Shamis
102da281c4 OPENIB BTL - use orte_show_help instead of BTL_ERROR print in case ibv_reg_mr failed.
This commit was SVN r26111.
2012-03-08 09:04:03 +00:00
Mike Dubman
4e7e7d7c3f print error which is ignored on upper layer
This commit was SVN r26106.
2012-03-06 14:25:56 +00:00
Terry Dontje
3e70cad203 Correct a few alignment problems to address the issue brought up in ticket #2964
This commit was SVN r26078.
2012-03-01 17:29:40 +00:00
Pavel Shamis
29c4981caa Removing unused include from openib/ofud btls.
This include causes compilation failure on macos platform.

This commit was SVN r25170.
2011-09-20 19:25:59 +00:00
Josh Hursey
2d25d70a1c Missing header for opal_timer_base_get_cycles
This commit was SVN r25157.
2011-09-19 19:52:58 +00:00
George Bosilca
9687e7f38e This commit fixes trac:2679 and should be added to cmr:v1.4:reviewer=jsquyres
and cmr:v1.5:reviewer=jsquyres

This commit was SVN r25155.

The following Trac tickets were found above:
  Ticket 2679 --> https://svn.open-mpi.org/trac/ompi/ticket/2679
2011-09-18 00:58:26 +00:00
Jeff Squyres
1cbfb53801 r24976 wasn't quite right -- you now actually get a warning if you
specify btl_tcp_if_include because btl_tcp_if_exclude is defaulted to
the loopback devices.

This commit does a few things:

 * Introduce a new OPAL MCA base function:
   mca_base_param_check_exclusive_string().  It checks to see that the
   ''user'' does not set two MCA parameters that are mutually
   exclusive by checking the source of those MCS param values.
 * Use the above function in many BTLs (and the OOB TCP) to ensure
   that <foo>_if_include and <foo>_if_exclude are not both specified
   ''by the user''.
 * Re-arrange many of these BTLs to move their MCA registration code
   into a separate component_register() function (vs. the
   component_open() function).

This code has been nominally reviewed and checked by Ralph, George,
Terry, and Shiqing.

This commit was SVN r25043.

The following SVN revision numbers were found above:
  r24976 --> open-mpi/ompi@8f4ac54336
2011-08-10 17:24:36 +00:00
Rolf vandeVaart
3d3b3d4dad Add support for CUDA registering sm and openib buffers. Feature is disabled by default.
This commit was SVN r24987.
2011-08-04 10:15:45 +00:00
Yevgeny Kliteynik
4fbe68dd86 Removing trailing white spaces in all the openib btl code.
This commit was SVN r24855.
2011-07-04 14:00:41 +00:00
George Bosilca
4184baa67a Remove the proc_guid from the BTL proc structure. Instead use directly
the one stored in the ompi_proc_t.

This commit was SVN r24461.
2011-02-25 00:36:08 +00:00
Jeff Squyres
4cb8a42e7b Add btl_openib_gid_index MCA param to allow selecting which GID to use
from an openfabrics port's GID table.

This commit was SVN r24456.
2011-02-24 14:09:22 +00:00
Doron Shoham
e41e15c8db cosmetic fixes in openib btl:
* replace tabs with ws
* remove unnecessary casting
* use proper escape codes for printf() like functions

This commit was SVN r24445.
2011-02-23 15:50:37 +00:00
Jeff Squyres
b468c71b47 Use complete types
This commit was SVN r24434.
2011-02-22 22:34:44 +00:00
Shiqing Fan
90eeba252e Make openib compile again for Windows.
Update the CMake script for checking mca subdirs.
Add windows support for __attribute__ packed structures.
Define usleep and posix_memalign with equivalent windows functions.
And a few minor fixes, type casts.

This commit was SVN r24429.
2011-02-22 15:49:27 +00:00
Doron Shoham
e5eef80364 fix type warning in openib btl
This commit was SVN r24419.
2011-02-21 15:13:30 +00:00
Donald Kerr
995d46344c simplify the way IBV_ACCESS_SO is discovered
This commit was SVN r24409.
2011-02-17 04:28:56 +00:00
Donald Kerr
2b60b165aa on Solaris, when IBV_ACCESS_SO is available, use strong ordered memory region for eager rdma connection
This commit was SVN r24395.
2011-02-16 05:37:22 +00:00
Rolf vandeVaart
acd38ff746 Final changes from jsquyres review. Moved configure
code from upper level into btl configure.m4.  Changed
prefix from "OMPI" to "BTL" in preprocessor macro.  Add
an mca param that shows it has been configured in.

This commit was SVN r24270.
2011-01-19 20:58:22 +00:00
Rolf vandeVaart
f22f76a6ff Add byte swapping macro for failover control message per jsquyres review.
This commit was SVN r24266.
2011-01-19 19:58:35 +00:00
Rolf vandeVaart
e75b86d3ab Fix some issues from jsquyres review.
1. Use asprintf instead of snprintf
2. Return remote_proc where possible.
3. Remove dead code.
4. Fix two comment typos.

This commit was SVN r24265.
2011-01-19 16:09:17 +00:00
Ethan Mallove
82054cb02c Include <stdlib.h> instead of <malloc.h>. This avoids a compiler error
on some systems caused by the definition of malloc in
opal_config_bottom.h getting expanded in the system malloc.h when
OPAL_ENABLE_MEM_DEBUG is set to 1.

This commit was SVN r24210.
2011-01-06 18:16:36 +00:00
Jeff Squyres
58445f3775 After being hit by "why is openib not working?" ''again'', add a
verbose statement that shows up when you --mca btl_base_verbose 100.
It clearly states that the openib BTL disqualifies itself when
MPI_THREAD_MULTIPLE is used.

This commit was SVN r24209.
2011-01-05 22:01:15 +00:00
Doron Shoham
bfe611d3bd This patch fixes bugs #2627 (1.5.2) and #2623 (1.4.2) - Sending large messages over RDMA fails.
The patch includes the following:
 *  Add new mca parameter - btl_openib_max_hw_msg_size - Maximum size (in bytes) of a single fragment of a long message when using the RDMA protocols (must be > 0 and <= hw capabilities).
 *  If btl_openib_max_hw_msg_size is larger than the maximum hw limitation print error message.
 *  Change the default openib flags to include only PUT and not GET.
 *  Print error message if user choose manually GET flag in openib btl.
 *  In prepare_dst: limit the message size to be the minimum of both endpoint's hw_limitation and the user limitation (if requested).

This commit was SVN r24191.
2010-12-23 11:48:43 +00:00
Rolf vandeVaart
da9c936ba0 Fix cut and paste error. Checking the wrong flags.
This commit was SVN r24029.
2010-11-10 19:09:47 +00:00
Ralph Castain
fceabb2498 Update libevent to the 2.0 series, currently at 2.0.7rc. We will update to their final release when it becomes available. Currently known errors exist in unused portions of the libevent code. This revision passes the IBM test suite on a Linux machine and on a standalone Mac.
This is a fairly intrusive change, but outside of the moving of opal/event to opal/mca/event, the only changes involved (a) changing all calls to opal_event functions to reflect the new framework instead, and (b) ensuring that all opal_event_t objects are properly constructed since they are now true opal_objects.

Note: Shiqing has just returned from vacation and has not yet had a chance to complete the Windows integration. Thus, this commit almost certainly breaks Windows support on the trunk. However, I want this to have a chance to soak for as long as possible before I become less available a week from today (going to be at a class for 5 days, and thus will only be sparingly available) so we can find and fix any problems.

Biggest change is moving the libevent code from opal/event to a new opal/mca/event framework. This was done to make it much easier to update libevent in the future. New versions can be inserted as a new component and tested in parallel with the current version until validated, then we can remove the earlier version if we so choose. This is a statically built framework ala installdirs, so only one component will build at a time. There is no selection logic - the sole compiled component simply loads its function pointers into the opal_event struct.

I have gone thru the code base and converted all the libevent calls I could find. However, I cannot compile nor test every environment. It is therefore quite likely that errors remain in the system. Please keep an eye open for two things:

1. compile-time errors: these will be obvious as calls to the old functions (e.g., opal_evtimer_new) must be replaced by the new framework APIs (e.g., opal_event.evtimer_new)

2. run-time errors: these will likely show up as segfaults due to missing constructors on opal_event_t objects. It appears that it became a typical practice for people to "init" an opal_event_t by simply using memset to zero it out. This will no longer work - you must either OBJ_NEW or OBJ_CONSTRUCT an opal_event_t. I tried to catch these cases, but may have missed some. Believe me, you'll know when you hit it.

There is also the issue of the new libevent "no recursion" behavior. As I described on a recent email, we will have to discuss this and figure out what, if anything, we need to do.

This commit was SVN r23925.
2010-10-24 18:35:54 +00:00
Rolf vandeVaart
a91bd44463 Do not hand a function into this macro as the
function will get called twice.

This commit was SVN r23824.
2010-10-01 18:59:15 +00:00
Rolf vandeVaart
b7a27ab36a Add support for openib BTL failover to be used with bfo PML.
By default, feature is configured out so no effect on 
normal operation.

This commit was SVN r23412.
2010-07-14 10:08:19 +00:00
Rolf vandeVaart
b4af9c0efc Fix casts so trunk compiles
This commit was SVN r23381.
2010-07-13 01:52:22 +00:00
Shiqing Fan
cdc7e0bec9 Mainly type casts.
Get rid of pthread and other unnecessary stuffs for Windows.

This commit was SVN r23376.
2010-07-12 16:17:56 +00:00