1
1

382 Коммитов

Автор SHA1 Сообщение Дата
Rolf vandeVaart
3d1d158a80 Do not abort in BTL. Rather, callback into PML error function. Thanks George for review.
This commit was SVN r28559.
2013-05-23 18:45:23 +00:00
Rolf vandeVaart
52ebb0b17f Change some opal_output to OPAL_OUTPUT per CMR review.
This commit was SVN r28510.
2013-05-14 20:49:42 +00:00
Rolf vandeVaart
9d569f1487 Fix warning when compiling in CUDA aware code.
This commit was SVN r28476.
2013-05-10 21:29:08 +00:00
Jeff Squyres
f55cea1a5b If there are no BTLs, do ''not'' actually shut down the fd listener,
because a) it may still be needed to shut down the CPCs, and b) it
will be shut down during component_close().

This commit was SVN r28402.
2013-04-26 15:31:50 +00:00
Nathan Hjelm
2edff7f784 btl/openib: don't free string handle by MCA variable system
This commit was SVN r28383.
2013-04-24 18:59:18 +00:00
Jeff Squyres
8405975bf6 Be a little more conservative about initializing devices and modules
(i.e., ensure that more data items get zeroed out/set to NULL) so that
if something goes wrong during initialization, we don't try to clean
up something that isn't there (and segv).

The chance of this happening on the trunk is very low (and will also
be low once the verbs improvements are brought over to v1.7).  But it
can actually happen in the v1.6 branch (e.g., if no CPC is available,
we'll try to get the length of the endpoints list, but the endpoints
list is NULL).  

Hence, even though the real goal is to get this functionality over to
v1.6, I figured I'd commit to the trunk/CMR to v1.7 just to try to
keep commonality in the openib between all three where possible.

This commit was SVN r28317.
2013-04-09 21:55:31 +00:00
Nathan Hjelm
cf377db823 MCA/base: Add new MCA variable system
Features:
 - Support for an override parameter file (openmpi-mca-param-override.conf).
   Variable values in this file can not be overridden by any file or environment
   value.
 - Support for boolean, unsigned, and unsigned long long variables.
 - Support for true/false values.
 - Support for enumerations on integer variables.
 - Support for MPIT scope, verbosity, and binding.
 - Support for command line source.
 - Support for setting variable source via the environment using
   OMPI_MCA_SOURCE_<var name>=source (either command or file:filename)
 - Cleaner API.
 - Support for variable groups (equivalent to MPIT categories).

Notes:
 - Variables must be created with a backing store (char **, int *, or bool *)
   that must live at least as long as the variable.
 - Creating a variable with the MCA_BASE_VAR_FLAG_SETTABLE enables the use of
   mca_base_var_set_value() to change the value.
 - String values are duplicated when the variable is registered. It is up to
   the caller to free the original value if necessary. The new value will be
   freed by the mca_base_var system and must not be freed by the user.
 - Variables with constant scope may not be settable.
 - Variable groups (and all associated variables) are deregistered when the
   component is closed or the component repository item is freed. This
   prevents a segmentation fault from accessing a variable after its component
   is unloaded.
 - After some discussion we decided we should remove the automatic registration
   of component priority variables. Few component actually made use of this
   feature.
 - The enumerator interface was updated to be general enough to handle
   future uses of the interface.
 - The code to generate ompi_info output has been moved into the MCA variable
   system. See mca_base_var_dump().

opal: update core and components to mca_base_var system
orte: update core and components to mca_base_var system
ompi: update core and components to mca_base_var system

This commit also modifies the rmaps framework. The following variables were
moved from ppr and lama: rmaps_base_pernode, rmaps_base_n_pernode,
rmaps_base_n_persocket. Both lama and ppr create synonyms for these variables.

This commit was SVN r28236.
2013-03-27 21:09:41 +00:00
Ralph Castain
a4b6fb241f Remove all remaining vestiges of the Windows integration
This commit was SVN r28137.
2013-02-28 17:31:47 +00:00
Nathan Hjelm
55cf850eca Add comment about r28083
This commit was SVN r28084.

The following SVN revision numbers were found above:
  r28083 --> open-mpi/ompi@5411e28c00
2013-02-20 21:42:13 +00:00
Nathan Hjelm
5411e28c00 btl/openib: don't align fragments on 2 byte boundaries (changed to 8)
cmr:v1.6,v1.7

This commit was SVN r28083.
2013-02-20 21:27:01 +00:00
Jeff Squyres
bbddd6ea03 Add header file for opal_show_help().
This commit was SVN r28056.
2013-02-13 16:31:59 +00:00
Brian Barrett
312f37706e In talking about this with Jeff and Ralph, we don't actually need
ompi_show_help, because opal_show_help is replaced with an 
aggregating version when using ORTE, so there's no reason to
directly call orte_show_help.

This commit was SVN r28051.
2013-02-12 21:10:11 +00:00
Brian Barrett
f42783ae1a Move the RTE framework change into the trunk. With this change, all non-CR
runtime code goes through one of the rte, dpm, or pubsub frameworks.

This commit was SVN r27934.
2013-01-27 23:25:10 +00:00
Rolf vandeVaart
f63c88701f Improve CUDA GPU transfers over openib BTL. Use aynchronous copies.
This is RFC that was submitted in July and December of 2012.

This commit was SVN r27862.
2013-01-17 22:34:43 +00:00
Mike Dubman
b6d50a5733 Performance optimizations by alexm:
* btl sendi(): if message can be send inline try to avoid signal
* signal is requested one per 64 or when
    there are no send wqes 
    when message can not be send inline 
    any other btl method then sendi()

This commit was SVN r27724.
2012-12-26 10:19:12 +00:00
Mike Dubman
d47d550dfc performance optimization: process completions in the batch manner
This commit was SVN r27559.
2012-11-05 14:02:37 +00:00
Jeff Squyres
4569f77645 Remove redundant common_verbs.h include.
This commit was SVN r27556.
2012-11-02 14:16:55 +00:00
Nathan Hjelm
2acd0f83de Revert "Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter".
It appears the problem was not with the command line parser but the rsh plm. I don't know why this problem was not occuring before the command line parser changes but it appears to be resolved now.

This commit was SVN r27527.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-30 19:45:18 +00:00
Ralph Castain
e6014bf2e1 Revert r27451 and r27456 - the cmd line parser is incorrectly marking the application as an MCA parameter
This commit was SVN r27477.

The following SVN revision numbers were found above:
  r27451 --> open-mpi/ompi@d59034e6ef
  r27456 --> open-mpi/ompi@ecdbf34937
2012-10-24 18:38:44 +00:00
Nathan Hjelm
d59034e6ef MCA: remove deprecated mca_base_param functions (mca_base_param_register_int, mca_base_param_register_string, mca_base_param_environ_variable). Remove all uses of deprecated functions.
cmr:v1.7

This commit was SVN r27451.
2012-10-17 20:17:37 +00:00
Jeff Squyres
8c369224bf More common/verbs improvements:
* Add OMPI_COMMON_VERBS_FLAGS_NOT_RC, which looks for a device	that
   does ''not'' support RC
 * Add ompi_common_verbs_find_max_inline(), and	remove that code from
   the openib BTL component

This commit was SVN r27393.
2012-10-03 00:57:39 +00:00
Jeff Squyres
3cc8b0461a More updates to common verbs infrastructure:
* Moved "check basics" sanity check from openib BTL to common/verbs
   (which also allows us to have openib ''not'' include
   <infiniband/driver.h>, which is a Very Good Thing)
 * Add new ompi_common_verbs_qp_test() function, which tests to see
   whether a device supports RC and/or UD QPs.  The openib BTL now
   uses this function to ensure that the device supports RC QPs.
 * Rename ompi_common_verbs_find_ibv_ports() to be
   ompi_common_verbs_find_ports() -- the "ibv" was redundant.
 * Re-work ompi_common_verbs_find_ports() to use
   ompi_common_verbs_qp_test() instead of testing for RC/UD QPs itself
 * Add bunches of opal_output_verbose() to the find_ports() routine
   (to help diagnosing connectivity problems -- imaging running with
   --mca btl_base_verbose 10; you'll see all the find_ports() test
   results)
 * Make ompi_common_verbs_qp_test() warn if devices/ports are supplied
   in the if_include/if_exclude strings that do not exists (quite
   similar to what the openib BTL does today).
 * Add ompi_common_verbs_mca_register() function, which registers
   common verbs MCA params.  It will also register MCA param synonyms
   for thse MCA params to upper-level components (e.g.,
   btl_<upper-level-component>_<the-mca-param>). 
   * common_verbs_warn_nonexistent_if: warn if
     if_include/if_exclude-specified devices or ports do not exist.  

This commit was SVN r27332.
2012-09-12 20:47:47 +00:00
Jeff Squyres
341ce2f9a4 Per some discussions between LANL, Cisco, ORNAL, and Mellanox, move
some new common OpenFabrics functionality to ompi/mca/common/verbs.
Also move everything that was in ompi/mca/common/ofautils under
ompi/mca/common/verbs.  

 * Move ofautils -> verbs
 * Add new functionality in ompi/mca/common/verbs (see doxygen
 * comments in ompi/mca/common/verbs/common_verbs.h for details):
   * ompi_common_verbs_find_ibv_ports()
   * ompi_common_verbs_port_bw()
   * ompi_common_verbs_mtu()
   * '''If you're writing verbs-based code, you should be using this
     common functionality'''
 * Adapt openib BTL to use some trivial common functionality in
   common/verbs
 * Don't use "#ifdef OMPI_HAVE_RDMAOE",use 
   "#if defined(HAVE_IBV_LINK_LAYER_ETHERNET)"
 * Update the following to include/link against common/verbs
   * bcol/iboffload
   * sbgp/ibnet
   * btl/openib

This commit was SVN r27212.
2012-09-01 01:42:37 +00:00
Ralph Castain
69753c37ef Turn off one place that won't compile if ompi progress threads enabled because it calls a non-existent function
This commit was SVN r27082.
2012-08-16 22:53:14 +00:00
Christopher Yeoh
cc091f4979 Adds synchronisation between main thread and service thread in
btl_openib_connect_udcm when notifying not to listen to an fd to ensure
that the main thread does not continue until the service thread has
processed the message

Adds ability to send message to openib async thread to tell it to
ignore the ERR state on a specific QP. Adds this call to udcm_module_finalize
so when we set the error state on the QP it doesn't cause the 
openib async thread to abort the mpi program prematurely

Fixes trac:3161

This commit was SVN r27064.

The following Trac tickets were found above:
  Ticket 3161 --> https://svn.open-mpi.org/trac/ompi/ticket/3161
2012-08-16 03:56:21 +00:00
Yael Dayan
7895cd1114 adding a fragmentation mechanism to the Get flow in function mca_pml_ob1_recv_request_progress_rget
This commit was SVN r26956.
2012-08-07 07:15:21 +00:00
Jeff Squyres
5ec6a65a72 After I spent a while looking in libibverbs for
ibv_get_device_list_compat() and not finding it, I finally realized
that it was a function in OMPI.  So let's name it with a proper ompi_
prefix, not an ibv_ prefix.

This commit was SVN r26867.
2012-07-25 16:32:51 +00:00
Nathan Hjelm
cd2cbdca09 btl/openib: limit each process to a ppn fraction of the available registered memory when using mellanox hardware (mlx4 and mthca). fixed
This commit was SVN r26811.
2012-07-19 17:52:21 +00:00
Ralph Castain
66fe57f746 Revert r26804 so openib can build again
This commit was SVN r26810.

The following SVN revision numbers were found above:
  r26804 --> open-mpi/ompi@610be870f9
2012-07-19 16:16:38 +00:00
Nathan Hjelm
610be870f9 btl/openib: limit each process to a ppn fraction of the available registered memory when using mellanox hardware (mlx4 and mthca)
This commit was SVN r26804.
2012-07-18 17:29:48 +00:00
Nathan Hjelm
4a97ecbdd2 btl/openib: remove tab characters
This commit was SVN r26803.
2012-07-18 17:29:37 +00:00
Nathan Hjelm
3798f38386 do not print out an error message if ibv_reg_mr fails
This commit was SVN r26796.
2012-07-14 01:35:45 +00:00
Nathan Hjelm
4d1920ee87 Fix a bug on 32-bit systems introduced by r26626. This fix ensures that all supported btls (with exception of wv-- shiqing will need to help bring that one up to date with r26626) set the lval in prepare_src/dst when preparing a put or get segment. This fix also ensures a consistent use of lval in put and get for both local and remote segments.
This commit was SVN r26793.

The following SVN revision numbers were found above:
  r26626 --> open-mpi/ompi@249066e06d
2012-07-13 21:19:16 +00:00
Pavel Shamis
f7664b3814 1. Adding 2 new components:
ofacm - generic connection manager for IB interconnects.
ofautils - IB common utilities and compatibility code

2. Updating OpenIB configure code

- ORNL & Mellanox Teams 

This commit was SVN r26707.
2012-07-02 15:20:12 +00:00
Jeff Squyres
b936229b54 Refs trac:3130: fix the openib BTL to properly set the memalign malloc
hook early in the setup, but ''not'' during the component register
function.  And then properly unset it if was set.

This commit was SVN r26697.

The following Trac tickets were found above:
  Ticket 3130 --> https://svn.open-mpi.org/trac/ompi/ticket/3130
2012-06-29 13:51:36 +00:00
Josh Hursey
28681deffa Backout the ORCA commit. :(
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.

This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
542330e3a7 Commit of ORCA: Open MPI Runtime Collaborative Abstraction
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.

The project is described on the wiki:
  https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition

And on this email thread:
  http://www.open-mpi.org/community/lists/devel/2012/06/11109.php

This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Nathan Hjelm
37c624ee43 prepare to delete mpool/rdma
This commit was SVN r26664.
2012-06-26 15:55:23 +00:00
Ralph Castain
e6f3586415 Remove the orte notifier framework, per discussion at the devel meeting and follow-up with Jeff (who took the action item)
This commit was SVN r26637.
2012-06-22 18:09:23 +00:00
Nathan Hjelm
249066e06d Timeout! Per RFC update the BTL interface to hide segment keys. All BTLs (with the exception of wv), all relevant PMLs, and osc/rdma have been updated for the new interface.
This commit was SVN r26626.
2012-06-21 17:09:12 +00:00
Yevgeny Kliteynik
df783c0472 Precise speed of FDR and EDR
This commit was SVN r26614.
2012-06-17 07:06:37 +00:00
Yevgeny Kliteynik
d59b8d5dc4 Fixing malformed error message
This commit was SVN r26434.
2012-05-12 21:13:42 +00:00
Yevgeny Kliteynik
244d66d95b Fixed FDR link speed details, added EDR.
This commit was SVN r26423.
2012-05-10 13:44:18 +00:00
Jeff Squyres
de4bbacd13 It turns out that we can't always include the hwloc OpenFabrics verbs
helper file, even if we find that the system has <infiniband/verbs.h>.
The reason is because there are some inline functions in that verbs
helper file that invoke ibv_* functions.  Some linkers (e.g., Solaris
Studio Compilers) will instantiate those static inline functions --
even if we don't use them -- and therefore we need to be able to
resolve the ibv_* symbols at link time.

But since -libverbs is only specified in places where we use other
ibv_* functions (e.g., the OpenFabrics-based BTLs), that means that
linking random executables can/will fail (e.g., orterun).

So instead, introduce a new #define: OPAL_HWLOC_WANT_VERBS_HELPER.  If
this macro is set to 1 before including opal/mca/hwloc/hwloc.h, then
you'll also get the hwloc OpenFabrics verbs helper header file (*if*
hwloc found <infiniband/verbs.h> -- otherwise, it'll #error).

This commit was SVN r26417.
2012-05-09 20:18:31 +00:00
Mike Dubman
cd17fee9a8 performance fix: openib use memalign for malloc
This commit was SVN r26409.
2012-05-08 20:42:09 +00:00
Jeff Squyres
2ba10c37fe Per RFC, bring in the following changes:
* Remove paffinity, maffinity, and carto frameworks -- they've been
   wholly replaced by hwloc.
 * Move ompi_mpi_init() affinity-setting/checking code down to ORTE.
 * Update sm, smcuda, wv, and openib components to no longer use carto.
   Instead, use hwloc data.  There are still optimizations possible in
   the sm/smcuda BTLs (i.e., making multiple mpools).  Also, the old
   carto-based code found out how many NUMA nodes were ''available''
   -- not how many were used ''in this job''.  The new hwloc-using
   code computes the same value -- it was not updated to calculate how
   many NUMA nodes are used ''by this job.''
   * Note that I cannot compile the smcuda and wv BTLs -- I ''think''
     they're right, but they need to be verified by their owners.
 * The openib component now does a bunch of stuff to figure out where
   "near" OpenFabrics devices are.  '''THIS IS A CHANGE IN DEFAULT
   BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors
   (I do not have a NUMA machine with an OpenFabrics device that is a
   non-uniform distance from multiple different NUMA nodes).
 * Completely rewrite the OMPI_Affinity_str() routine from the
   "affinity" mpiext extension.  This extension now understands
   hyperthreads; the output format of it has changed a bit to reflect
   this new information.
 * Bunches of minor changes around the code base to update names/types
   from maffinity/paffinity-based names to hwloc-based names.
 * Add some helper functions into the hwloc base, mainly having to do
   with the fact that we have the hwloc data reporting ''all''
   topology information, but sometimes you really only want the
   (online | available) data.

This commit was SVN r26391.
2012-05-07 14:52:54 +00:00
Mike Dubman
1b475523de add support for FDR speed
This commit was SVN r26385.
2012-05-06 05:53:05 +00:00
Terry Dontje
81d7fcaf82 back out r26255 to avoid cross component linkage so Solaris can build a usable openib btl
This commit was SVN r26269.

The following SVN revision numbers were found above:
  r26255 --> open-mpi/ompi@fe25b8704b
2012-04-13 18:08:54 +00:00
Mike Dubman
fe25b8704b performance fix: set alignment for openib internal buffers
Thanks to Jeff/Pasha for valuable comments
Thanks to Valentin Petrov for implementation

This commit was SVN r26255.
2012-04-09 08:06:15 +00:00
Ralph Castain
bd8b4f7f1e Sorry for mid-day commit, but I had promised on the call to do this upon my return.
Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code.

Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch.

This commit was SVN r26242.
2012-04-06 14:23:13 +00:00