1
1

838 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
9f3717959e remove sync step from udcm as it really isn't necessary
This commit was SVN r26724.
2012-07-02 22:54:44 +00:00
Pavel Shamis
f7664b3814 1. Adding 2 new components:
ofacm - generic connection manager for IB interconnects.
ofautils - IB common utilities and compatibility code

2. Updating OpenIB configure code

- ORNL & Mellanox Teams 

This commit was SVN r26707.
2012-07-02 15:20:12 +00:00
Jeff Squyres
b936229b54 Refs trac:3130: fix the openib BTL to properly set the memalign malloc
hook early in the setup, but ''not'' during the component register
function.  And then properly unset it if was set.

This commit was SVN r26697.

The following Trac tickets were found above:
  Ticket 3130 --> https://svn.open-mpi.org/trac/ompi/ticket/3130
2012-06-29 13:51:36 +00:00
Josh Hursey
28681deffa Backout the ORCA commit. :(
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.

This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
542330e3a7 Commit of ORCA: Open MPI Runtime Collaborative Abstraction
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.

The project is described on the wiki:
  https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition

And on this email thread:
  http://www.open-mpi.org/community/lists/devel/2012/06/11109.php

This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Nathan Hjelm
37c624ee43 prepare to delete mpool/rdma
This commit was SVN r26664.
2012-06-26 15:55:23 +00:00
Ralph Castain
e6f3586415 Remove the orte notifier framework, per discussion at the devel meeting and follow-up with Jeff (who took the action item)
This commit was SVN r26637.
2012-06-22 18:09:23 +00:00
Nathan Hjelm
249066e06d Timeout! Per RFC update the BTL interface to hide segment keys. All BTLs (with the exception of wv), all relevant PMLs, and osc/rdma have been updated for the new interface.
This commit was SVN r26626.
2012-06-21 17:09:12 +00:00
Yevgeny Kliteynik
df783c0472 Precise speed of FDR and EDR
This commit was SVN r26614.
2012-06-17 07:06:37 +00:00
Yevgeny Kliteynik
d59b8d5dc4 Fixing malformed error message
This commit was SVN r26434.
2012-05-12 21:13:42 +00:00
Mike Dubman
98c2c749fb fix define name to BTL_OPENIB_MALLOC_HOOKS_ENABLED
Thanks to Ludovic.Hablot@ext.bull.net  for pointing this out

This commit was SVN r26432.
2012-05-11 18:30:45 +00:00
Yevgeny Kliteynik
244d66d95b Fixed FDR link speed details, added EDR.
This commit was SVN r26423.
2012-05-10 13:44:18 +00:00
Jeff Squyres
de4bbacd13 It turns out that we can't always include the hwloc OpenFabrics verbs
helper file, even if we find that the system has <infiniband/verbs.h>.
The reason is because there are some inline functions in that verbs
helper file that invoke ibv_* functions.  Some linkers (e.g., Solaris
Studio Compilers) will instantiate those static inline functions --
even if we don't use them -- and therefore we need to be able to
resolve the ibv_* symbols at link time.

But since -libverbs is only specified in places where we use other
ibv_* functions (e.g., the OpenFabrics-based BTLs), that means that
linking random executables can/will fail (e.g., orterun).

So instead, introduce a new #define: OPAL_HWLOC_WANT_VERBS_HELPER.  If
this macro is set to 1 before including opal/mca/hwloc/hwloc.h, then
you'll also get the hwloc OpenFabrics verbs helper header file (*if*
hwloc found <infiniband/verbs.h> -- otherwise, it'll #error).

This commit was SVN r26417.
2012-05-09 20:18:31 +00:00
Mike Dubman
cd17fee9a8 performance fix: openib use memalign for malloc
This commit was SVN r26409.
2012-05-08 20:42:09 +00:00
Jeff Squyres
2ba10c37fe Per RFC, bring in the following changes:
* Remove paffinity, maffinity, and carto frameworks -- they've been
   wholly replaced by hwloc.
 * Move ompi_mpi_init() affinity-setting/checking code down to ORTE.
 * Update sm, smcuda, wv, and openib components to no longer use carto.
   Instead, use hwloc data.  There are still optimizations possible in
   the sm/smcuda BTLs (i.e., making multiple mpools).  Also, the old
   carto-based code found out how many NUMA nodes were ''available''
   -- not how many were used ''in this job''.  The new hwloc-using
   code computes the same value -- it was not updated to calculate how
   many NUMA nodes are used ''by this job.''
   * Note that I cannot compile the smcuda and wv BTLs -- I ''think''
     they're right, but they need to be verified by their owners.
 * The openib component now does a bunch of stuff to figure out where
   "near" OpenFabrics devices are.  '''THIS IS A CHANGE IN DEFAULT
   BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors
   (I do not have a NUMA machine with an OpenFabrics device that is a
   non-uniform distance from multiple different NUMA nodes).
 * Completely rewrite the OMPI_Affinity_str() routine from the
   "affinity" mpiext extension.  This extension now understands
   hyperthreads; the output format of it has changed a bit to reflect
   this new information.
 * Bunches of minor changes around the code base to update names/types
   from maffinity/paffinity-based names to hwloc-based names.
 * Add some helper functions into the hwloc base, mainly having to do
   with the fact that we have the hwloc data reporting ''all''
   topology information, but sometimes you really only want the
   (online | available) data.

This commit was SVN r26391.
2012-05-07 14:52:54 +00:00
Mike Dubman
1b475523de add support for FDR speed
This commit was SVN r26385.
2012-05-06 05:53:05 +00:00
Terry Dontje
81d7fcaf82 back out r26255 to avoid cross component linkage so Solaris can build a usable openib btl
This commit was SVN r26269.

The following SVN revision numbers were found above:
  r26255 --> open-mpi/ompi@fe25b8704b
2012-04-13 18:08:54 +00:00
Mike Dubman
fe25b8704b performance fix: set alignment for openib internal buffers
Thanks to Jeff/Pasha for valuable comments
Thanks to Valentin Petrov for implementation

This commit was SVN r26255.
2012-04-09 08:06:15 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00
Mike Dubman
ff1c84c53f revert previous commit
This commit was SVN r26206.
2012-03-29 14:07:13 +00:00
Mike Dubman
43a5775e8a performance fix: set alignment for openib internal buffers
This commit was SVN r26205.
2012-03-29 14:00:08 +00:00
Pavel Shamis
102da281c4 OPENIB BTL - use orte_show_help instead of BTL_ERROR print in case ibv_reg_mr failed.
This commit was SVN r26111.
2012-03-08 09:04:03 +00:00
Mike Dubman
4e7e7d7c3f print error which is ignored on upper layer
This commit was SVN r26106.
2012-03-06 14:25:56 +00:00
Abhishek Kulkarni
08ca0f80bc Fix a C/R bug where the restart hung due to
dangling fds in the openib btl.

This commit was SVN r26094.
2012-03-04 06:57:33 +00:00
Terry Dontje
3e70cad203 Correct a few alignment problems to address the issue brought up in ticket #2964
This commit was SVN r26078.
2012-03-01 17:29:40 +00:00
Nathan Hjelm
3d9bc68435 reorder udcm init/finalize code. fixes trac:2973
This commit was SVN r25787.

The following Trac tickets were found above:
  Ticket 2973 --> https://svn.open-mpi.org/trac/ompi/ticket/2973
2012-01-26 16:28:55 +00:00
Terry Dontje
5209de048c add code to service_thread_start to handle EBADF returns from select. This commit fixes trac:2922.
This commit was SVN r25520.

The following Trac tickets were found above:
  Ticket 2922 --> https://svn.open-mpi.org/trac/ompi/ticket/2922
2011-11-29 16:49:59 +00:00
Brad Benton
96395c916e de-tab'd
This commit was SVN r25465.
2011-11-09 19:45:12 +00:00
Brad Benton
0712b911a5 Updated IBM copyright
This commit was SVN r25464.
2011-11-09 19:38:53 +00:00
Christopher Yeoh
fb57a74a40 Removes pointless memmove which because of a previous memcpy will always
have identical source and destination pointers. See #2871
Plugs a couple of minor memory leaks related to remote qp info

This commit was SVN r25431.
2011-11-04 00:15:08 +00:00
George Bosilca
9d8e84142f Survivor!!!
This commit was SVN r25371.
2011-10-26 00:58:55 +00:00
Ralph Castain
e7f6be5385 Unused variable
This commit was SVN r25301.
2011-10-17 18:59:22 +00:00
Jeff Squyres
2c6254b70d Second change from Intel.
This commit was SVN r25279.
2011-10-12 23:26:34 +00:00
Jeff Squyres
28118d0611 Updte the parameters for the Intel iWARP devices, per request from
Faisal Latif <faisal.latif@intel.com>.

This commit was SVN r25278.
2011-10-12 22:58:30 +00:00
Rainer Keller
4e6a6fc146 - Check, whether the compiler supports __builtin_clz (count leading
zeroes);
   if so, use it for bit-operations like opal_cube_dim and opal_hibit.
   Implement two versions of power-of-two.
   In case of opal_next_poweroftwo, this reduces the average execution
   time from 83 cycles to 4 cycles (Intel Nehalem, icc, -O2, inlining,
   measured rdtsc, with loop over 2^27 values).
   Numbers for other functions are similar (but of course heavily depend
   on the usage, e.g. opal_hibit() with a start of 4 does not save
   much).  The bsr instruction on AMD Opteron is also not as fast.

 - Replace various places where the next power-of-two is computed.
   
   Tested on Intel Nehalem Cluster with openib, compilers GNU-4.6.1 and
   Intel-12.0.4 using mpi_testsuite -t "Collective" with 128 processes.

This commit was SVN r25270.
2011-10-11 22:49:01 +00:00
George Bosilca
80c02647c8 Each level (OPAL/ORTE/OMPI) should only return it's own constants,
instead of the current mismatch.

This commit was SVN r25230.
2011-10-04 14:50:31 +00:00
Brad Benton
0f2475c554 Modified set_remote_info() to use memmove() instead of memcpy() when
copying rem_qp info.  This avoids potential errors when src & dest overlap.
This is a workaround for the issue in #2871

This commit was SVN r25180.
2011-09-26 20:07:36 +00:00
Pavel Shamis
29c4981caa Removing unused include from openib/ofud btls.
This include causes compilation failure on macos platform.

This commit was SVN r25170.
2011-09-20 19:25:59 +00:00
Nathan Hjelm
98b56108c4 add unconnect datagram connection manager (udcm)
This commit was SVN r25160.
2011-09-19 21:24:58 +00:00
Nathan Hjelm
8cd550f49f fixed error in last commit
This commit was SVN r25159.
2011-09-19 21:13:59 +00:00
Nathan Hjelm
de950959ee print a more meaningful error message when ibv_create_qp fails
This commit was SVN r25158.
2011-09-19 21:12:22 +00:00
Josh Hursey
2d25d70a1c Missing header for opal_timer_base_get_cycles
This commit was SVN r25157.
2011-09-19 19:52:58 +00:00
George Bosilca
9687e7f38e This commit fixes trac:2679 and should be added to cmr:v1.4:reviewer=jsquyres
and cmr:v1.5:reviewer=jsquyres

This commit was SVN r25155.

The following Trac tickets were found above:
  Ticket 2679 --> https://svn.open-mpi.org/trac/ompi/ticket/2679
2011-09-18 00:58:26 +00:00
Steve Wise
e4629259f0 Update T4 openib btl defaults.
- add 2 new device ids.
- default rq depth to 64, which proved good for large runs.

This commit should be added to cmr:v1.4:reviewer=jsquyres and
cmr:v1.5:reviewer=jsquyres

This commit was SVN r25145.
2011-09-14 22:12:25 +00:00
Steve Wise
e5bba38434 Increase the rdmacm cpc address resolution timeout to 30 seconds.
Global rdmacm_resolve_timeout defaults to 1000 (1000 ms), which is way
too small for even a 16 node x 12 core iwarp cluster in the presence
of drops.  Bump up the default to 30000ms.

This commit fixes trac:2860 and should be added to cmr:v1.4:reviewer=jsquyres
and cmr:v1.5:reviewer=jsquyres

This commit was SVN r25144.

The following Trac tickets were found above:
  Ticket 2860 --> https://svn.open-mpi.org/trac/ompi/ticket/2860
2011-09-14 21:52:58 +00:00
Nathan Hjelm
3048ce043d permanently disable ibcm
This commit was SVN r25137.
2011-09-13 15:10:51 +00:00
Sylvain Jeaugey
002d39f345 Restored Bull vendor id for ConnectX card
This commit was SVN r25121.
2011-09-07 15:58:42 +00:00
Rainer Keller
9d5afc58c6 - Fix breakage of the epoch changes with PGI:
Don't juse include pre-processor macros between two strins ("s1" #if 0 ... "s2")...
   Rather print out the epoch as 0 always...

This commit was SVN r25110.
2011-08-31 08:40:31 +00:00
Wesley Bland
4e7ff0bd5e By popular demand the epoch code is now disabled by default.
To enable the epochs and the resilient orte code, use the configure flag:

--enable-resilient-orte

This will define both:

ORTE_ENABLE_EPOCH
ORTE_RESIL_ORTE

This commit was SVN r25093.
2011-08-26 22:16:14 +00:00
Jeff Squyres
1cbfb53801 r24976 wasn't quite right -- you now actually get a warning if you
specify btl_tcp_if_include because btl_tcp_if_exclude is defaulted to
the loopback devices.

This commit does a few things:

 * Introduce a new OPAL MCA base function:
   mca_base_param_check_exclusive_string().  It checks to see that the
   ''user'' does not set two MCA parameters that are mutually
   exclusive by checking the source of those MCS param values.
 * Use the above function in many BTLs (and the OOB TCP) to ensure
   that <foo>_if_include and <foo>_if_exclude are not both specified
   ''by the user''.
 * Re-arrange many of these BTLs to move their MCA registration code
   into a separate component_register() function (vs. the
   component_open() function).

This code has been nominally reviewed and checked by Ralph, George,
Terry, and Shiqing.

This commit was SVN r25043.

The following SVN revision numbers were found above:
  r24976 --> open-mpi/ompi@8f4ac54336
2011-08-10 17:24:36 +00:00