1
1
Граф коммитов

3890 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
d56de80b5d * Properly initialize handle variable as a request (since the coll_libnbc_request contains everything an NBC_Handle used to contain). Not sure how this slipped through...
This commit was SVN r26710.
2012-07-02 16:39:42 +00:00
Brian Barrett
7e67bfa175 Use OMPI's ops instead of the libnbc ops.
This commit was SVN r26708.
2012-07-02 15:47:22 +00:00
Pavel Shamis
f7664b3814 1. Adding 2 new components:
ofacm - generic connection manager for IB interconnects.
ofautils - IB common utilities and compatibility code

2. Updating OpenIB configure code

- ORNL & Mellanox Teams 

This commit was SVN r26707.
2012-07-02 15:20:12 +00:00
Yevgeny Kliteynik
0e28fa984b Remove dead code that was related to ticket #2971
This commit was SVN r26701.
2012-07-02 11:19:09 +00:00
Nathan Hjelm
a847df9ba5 ugni: fix eager get
This commit was SVN r26699.
2012-06-29 15:43:29 +00:00
Jeff Squyres
5d030278e1 Refs trac:3130: Per comment 8 on the ticket, this MX patch fixes the cases
where the MX BTL and MTL are stepping on each other regarding the
mpool.  Thanks to Yong Qin for assistance in tracking this down.

This commit was SVN r26698.

The following Trac tickets were found above:
  Ticket 3130 --> https://svn.open-mpi.org/trac/ompi/ticket/3130
2012-06-29 13:52:40 +00:00
Jeff Squyres
b936229b54 Refs trac:3130: fix the openib BTL to properly set the memalign malloc
hook early in the setup, but ''not'' during the component register
function.  And then properly unset it if was set.

This commit was SVN r26697.

The following Trac tickets were found above:
  Ticket 3130 --> https://svn.open-mpi.org/trac/ompi/ticket/3130
2012-06-29 13:51:36 +00:00
Jeff Squyres
f3a8722360 Fix comment.
This commit was SVN r26696.
2012-06-29 01:38:04 +00:00
Brian Barrett
0b887ab5a1 * Remove unneeded prototype that was causing compile issues anyway
* Use proper tag space (the negatives below the blocking communicators)
  instead of the point-to-point space
* Use the PML interface instead of the MPI interface, since the MPI
  interface 1) shouldn't be used by components and 2) doesn't like
  negative tags

This commit was SVN r26693.
2012-06-28 16:52:03 +00:00
Edgar Gabriel
b0954a6a3e set the internal OMPIO file pointer to the end of the file if file has been
opened using the APPEND mode.

This commit was SVN r26692.
2012-06-28 15:15:47 +00:00
Edgar Gabriel
32b0dfed31 * set the status _ucount field correctly for individual read and write
operations
* removing a lingering reference to the ylib fcoll component, which will not
be part of the 1.7 branch.

This commit was SVN r26691.
2012-06-28 14:43:56 +00:00
Edgar Gabriel
be6ea52bb4 some further cleanup of resources in case of an error.
This commit was SVN r26690.
2012-06-28 13:58:23 +00:00
Ralph Castain
a1344bc5c0 Add missing header to tarball
This commit was SVN r26689.
2012-06-28 13:07:18 +00:00
Brian Barrett
32e70b691a Re-enable non-blocking collectives in libnbc after finding issue with the definition of
NBC_CACHE_SCHEDULE not being propogated to all uses.

This commit was SVN r26686.
2012-06-27 22:08:19 +00:00
Edgar Gabriel
b7a72feb1d minor code cleanup and make the MPI_MODE_DELETE_ON_CLOSE work
This commit was SVN r26685.
2012-06-27 20:54:58 +00:00
Brian Barrett
d85fdd2605 temporarily back out r26682 and r26683 until I can figure out why they cause crashes during shutdown
This commit was SVN r26684.

The following SVN revision numbers were found above:
  r26682 --> open-mpi/ompi@15a30af11f
  r26683 --> open-mpi/ompi@f6ea4b7234
2012-06-27 19:32:53 +00:00
Brian Barrett
f6ea4b7234 Remove now unneeded header file
This commit was SVN r26683.
2012-06-27 18:43:40 +00:00
Brian Barrett
15a30af11f Turn on all the non-blocking collectives provided by libnbc...
This commit was SVN r26682.
2012-06-27 18:32:57 +00:00
Brian Barrett
3933d0a8f0 Ibarrier works! :)
This commit was SVN r26680.
2012-06-27 15:58:17 +00:00
Ralph Castain
0dfe29b1a6 Roll in the rest of the modex change. Eliminate all non-modex API access of RTE info from the MPI layer - in some cases, the info was already present (either in the ompi_proc_t or in the orte_process_info struct) and no call was necessary. This removes all calls to orte_ess from the MPI layer. Calls to orte_grpcomm remain required.
Update all the orte ess components to remove their associated APIs for retrieving proc data. Update the grpcomm API to reflect transfer of set/get modex info to the db framework.

Note that this doesn't recreate the old GPR. This is strictly a local db storage that may (at some point) obtain any missing data from the local daemon as part of an async methodology. The framework allows us to experiment with such methods without perturbing the default one.

This commit was SVN r26678.
2012-06-27 14:53:55 +00:00
Josh Hursey
28681deffa Backout the ORCA commit. :(
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.

This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
32050f026f protect the ORTE_CHECK_PMI define in the OMPI layer for --no-orte builds
This commit was SVN r26674.
2012-06-27 00:28:37 +00:00
Josh Hursey
542330e3a7 Commit of ORCA: Open MPI Runtime Collaborative Abstraction
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.

The project is described on the wiki:
  https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition

And on this email thread:
  http://www.open-mpi.org/community/lists/devel/2012/06/11109.php

This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Edgar Gabriel
288d044097 get rid of the fcache framework. It was not being used as originally intended.
This commit was SVN r26668.
2012-06-26 19:53:26 +00:00
Nathan Hjelm
086000ce8d remove mpool/rdma
This commit was SVN r26665.
2012-06-26 15:56:07 +00:00
Nathan Hjelm
37c624ee43 prepare to delete mpool/rdma
This commit was SVN r26664.
2012-06-26 15:55:23 +00:00
Brian Barrett
7bdeafb772 Start bringing in libnbc. .ompi_ignored, as there's still a long way to go
This commit was SVN r26658.
2012-06-25 22:38:06 +00:00
Edgar Gabriel
6a2dd16ee3 cleaning up the usage of CFLAGS vs. CPPFLAGS. Thanks Jeff for helping with
that!

This commit was SVN r26655.
2012-06-25 20:32:58 +00:00
Nathan Hjelm
2dbe630138 fix more udapl warnings/errors
This commit was SVN r26648.
2012-06-25 15:18:50 +00:00
Brian Barrett
b9e8e4aeb9 * Initial merge of the non-blocking collectives interface. No implementation of
the back-end yet, coming real soon now, need to solve some tag issues first.

This commit was SVN r26641.
2012-06-22 20:54:12 +00:00
Nathan Hjelm
6a0ccf41e6 one more file
This commit was SVN r26638.
2012-06-22 18:21:57 +00:00
Ralph Castain
e6f3586415 Remove the orte notifier framework, per discussion at the devel meeting and follow-up with Jeff (who took the action item)
This commit was SVN r26637.
2012-06-22 18:09:23 +00:00
Nathan Hjelm
03f00c42b8 fix udapl compile problems from r26626
This commit was SVN r26635.

The following SVN revision numbers were found above:
  r26626 --> open-mpi/ompi@249066e06d
2012-06-22 14:20:45 +00:00
Nathan Hjelm
77f7171186 remove hdr_segkey from OMPI_OSC_RDMA_BASE_HDR_NTOH and OMPI_OSC_RDMA_BASE_HDR_HTON
This commit was SVN r26634.
2012-06-22 14:15:26 +00:00
Nathan Hjelm
249066e06d Timeout! Per RFC update the BTL interface to hide segment keys. All BTLs (with the exception of wv), all relevant PMLs, and osc/rdma have been updated for the new interface.
This commit was SVN r26626.
2012-06-21 17:09:12 +00:00
Ralph Castain
1e1c755fbc Remove non-existant windows file
This commit was SVN r26624.
2012-06-21 01:37:36 +00:00
Nathan Hjelm
e3bc6c0f73 btl/ugni: use grdma mpool to take advantage of shared lru
This commit was SVN r26623.
2012-06-20 23:03:59 +00:00
Nathan Hjelm
3d86b5055e btl/ugni: don't call opal_convertor_pack if there is nothing to pack
This commit was SVN r26622.
2012-06-20 23:01:37 +00:00
Nathan Hjelm
f5fd87a446 mpool/grdma: temporarily remove support for remote (local) process eviction and remove ignore.
This commit was SVN r26621.
2012-06-20 23:00:25 +00:00
Yevgeny Kliteynik
df783c0472 Precise speed of FDR and EDR
This commit was SVN r26614.
2012-06-17 07:06:37 +00:00
Nathan Hjelm
fbd1636ea4 fix seg fault when size == 0
This commit was SVN r26612.
2012-06-15 16:58:23 +00:00
Rolf vandeVaart
d6881f3a4f Rename one function. Add some new functions that can support asynchronous CUDA copies.
This commit was SVN r26611.
2012-06-15 16:56:30 +00:00
Brian Barrett
defaefd59e Clean up resources from flowcontrol on shutdown
This commit was SVN r26605.
2012-06-14 22:38:35 +00:00
Brian Barrett
946ec4cd97 * Update usage of PtlHandleIsEqual to match new semantic
* Properly set message to MPI_MESSAGE_NULL in the right places
* Fix double free of buffer for non-contiguous blocking sends
* Remove useless debugging output

This commit was SVN r26604.
2012-06-14 22:24:23 +00:00
Nathan Hjelm
0d13cbf11c ob1: bug fix. put fallback on send never actually worked. fixed.
This commit was SVN r26602.
2012-06-14 17:29:58 +00:00
Terry Dontje
634fc278d9 Fix issue with sctp config scripts not detecting netinet/in.h dependency. Also removing tabs from sctp m4 file
This commit was SVN r26599.
2012-06-13 10:38:28 +00:00
Nathan Hjelm
a809881f78 ob1: reset the converter after a failed sendi before trying send
This commit was SVN r26597.
2012-06-12 15:44:47 +00:00
Ralph Castain
269cb2b8d9 Some cleanup to remove calls to opal_progress when running with orte progress threads, and to ensure that all orte-related events are in the orte event base.
This commit was SVN r26591.
2012-06-11 19:59:53 +00:00
Brian Barrett
31279eb641 Fix segfault with long expected messages when using the rndv protocol. We were
freeing the ME before the get to grab the long part of the message.

This commit was SVN r26589.
2012-06-11 16:37:01 +00:00
Brian Barrett
7406ef1241 Make all the PMI components depend on the common pmi library and properly
install the common pmi library

This commit was SVN r26588.
2012-06-11 15:58:09 +00:00
Jeff Squyres
13707ec0af Remove this comment: it turns out that the benefit was to make
multiple SM ''modules'', not multiple SM ''mpools''.

This commit was SVN r26584.
2012-06-08 22:37:26 +00:00
Jeff Squyres
5451ee46bd Per r26575, the sync coll module is no longer necessary!
(the crowd goes wild)

This commit was SVN r26583.

The following SVN revision numbers were found above:
  r26575 --> open-mpi/ompi@59e529cf1d
2012-06-08 19:19:19 +00:00
Nathan Hjelm
59e529cf1d ob1: as per developer discussion disable rdma retries. the failure path currently suffers from live-lock
This commit was SVN r26575.
2012-06-07 23:31:20 +00:00
Nathan Hjelm
4c6be00de2 fix erroneous commit in rdma mpool
This commit was SVN r26572.
2012-06-07 20:01:44 +00:00
Nathan Hjelm
ceee4bcb0d libevent2019: libevent_pthreads.la is never built. don't include it
This commit was SVN r26570.
2012-06-07 19:22:45 +00:00
Jeff Squyres
56a537a5f5 This component wasn't even in 1.5.0; no one has had a GM network in
forever.  There is no point in carrying this component forward.

This commit was SVN r26563.
2012-06-06 21:43:54 +00:00
Mike Dubman
10831e111a detect num of local procs
This commit was SVN r26555.
2012-06-05 09:13:16 +00:00
Mike Dubman
e9c274f3b9 raise cm prio for mxm as well, somehow was removed from 1.7, exists in 1.6
This commit was SVN r26554.
2012-06-05 09:02:03 +00:00
Yevgeny Kliteynik
1cbce83ece Fixed wording of MXM parameters as suggested By Jeff.
This commit was SVN r26545.
2012-06-03 21:48:42 +00:00
Yevgeny Kliteynik
f02bf707a4 Added MXM parameter "np" that controls the minimal number of processes that allow MXM to run
Default: 128

MXM advantages kick in with large number of processes.

This commit was SVN r26544.
2012-06-02 11:07:20 +00:00
Nathan Hjelm
71bffa5158 ugni: update to latest btl code. bug fixes and cleanup
This commit was SVN r26529.
2012-05-31 20:02:41 +00:00
Edgar Gabriel
3ccd286de1 silence a compiler warning for optimized builds.
This commit was SVN r26528.
2012-05-31 13:32:10 +00:00
Vishwanath Venkatesan
86a57c7b66 Initializing sorted_file_offsets to NULL
This commit was SVN r26526.
2012-05-30 06:56:40 +00:00
Jeff Squyres
99c5afb397 Remove clang compiler warnings.
This commit was SVN r26523.
2012-05-29 23:36:06 +00:00
Edgar Gabriel
d1e91e9372 make the file compile properly.
This commit was SVN r26497.
2012-05-26 01:06:36 +00:00
Brian Barrett
2effbb1ba6 fix copy/paste typo
This commit was SVN r26492.
2012-05-24 16:06:20 +00:00
Ralph Castain
c0304eb23a Fix copy/paste typo
This commit was SVN r26491.
2012-05-24 15:47:20 +00:00
Nathan Hjelm
cdc3c87ba6 move pmi init/finalize into a common component
This commit was SVN r26470.
2012-05-22 15:15:39 +00:00
George Bosilca
e890a8379b Various minor cleanups.
This commit was SVN r26461.
2012-05-21 13:15:24 +00:00
Brian Barrett
25693363e9 * Fix internal accounting error regarding number of available credits
* Use a single MD covering all of address space for put transfers, rather
 than a per-send MD.

This commit was SVN r26458.
2012-05-20 23:42:26 +00:00
Vishwanath Venkatesan
8d4bb65bd4 Modifying the explicit operations to make it absolute
This commit was SVN r26451.
2012-05-18 21:43:34 +00:00
Vishwanath Venkatesan
cbad31cc88 1. Freeing the displs array after allgatherv to avoid segmentation faults in dynamic segmentation
2. Checking for 0 bytes datatypes  and sending only when data available to avoid 0 byte messages being sent and received. 
3. Changing timing extraction to support calculating, min, max and avg communication costs + min and avg write costs

This commit was SVN r26450.
2012-05-18 21:39:58 +00:00
Rolf vandeVaart
f8ace21366 Rename a few things for clarity. Add a stream.
This commit was SVN r26447.
2012-05-17 18:10:59 +00:00
Rolf vandeVaart
c228bd2311 Fix broken compile. Keep in sync with sm btl.
This commit was SVN r26440.
2012-05-15 15:32:33 +00:00
Yevgeny Kliteynik
d59b8d5dc4 Fixing malformed error message
This commit was SVN r26434.
2012-05-12 21:13:42 +00:00
Mike Dubman
98c2c749fb fix define name to BTL_OPENIB_MALLOC_HOOKS_ENABLED
Thanks to Ludovic.Hablot@ext.bull.net  for pointing this out

This commit was SVN r26432.
2012-05-11 18:30:45 +00:00
Brian Barrett
2e52374847 * Split send and receive eq sizes
* Need to look at slot count before flowcontrol for sending to prevent
  race in restart
* Need to free pending request fragments when done with the request
* A number of branch prediction optimizations for error conditions

This commit was SVN r26430.
2012-05-10 21:43:48 +00:00
Yevgeny Kliteynik
244d66d95b Fixed FDR link speed details, added EDR.
This commit was SVN r26423.
2012-05-10 13:44:18 +00:00
Nathan Hjelm
91d99c6fef ugni: reserve memory domain descriptors (MDDs) for mailbox registration
This commit was SVN r26419.
2012-05-10 00:24:42 +00:00
Jeff Squyres
de4bbacd13 It turns out that we can't always include the hwloc OpenFabrics verbs
helper file, even if we find that the system has <infiniband/verbs.h>.
The reason is because there are some inline functions in that verbs
helper file that invoke ibv_* functions.  Some linkers (e.g., Solaris
Studio Compilers) will instantiate those static inline functions --
even if we don't use them -- and therefore we need to be able to
resolve the ibv_* symbols at link time.

But since -libverbs is only specified in places where we use other
ibv_* functions (e.g., the OpenFabrics-based BTLs), that means that
linking random executables can/will fail (e.g., orterun).

So instead, introduce a new #define: OPAL_HWLOC_WANT_VERBS_HELPER.  If
this macro is set to 1 before including opal/mca/hwloc/hwloc.h, then
you'll also get the hwloc OpenFabrics verbs helper header file (*if*
hwloc found <infiniband/verbs.h> -- otherwise, it'll #error).

This commit was SVN r26417.
2012-05-09 20:18:31 +00:00
Mike Dubman
cd17fee9a8 performance fix: openib use memalign for malloc
This commit was SVN r26409.
2012-05-08 20:42:09 +00:00
Nathan Hjelm
903f9fac09 ugni: fixed buffered sends and code cleanup
This commit was SVN r26401.
2012-05-07 17:23:06 +00:00
Nathan Hjelm
49eda71ca0 ugni: fix invalid parameter with opal_pointer_array_init
This commit was SVN r26400.
2012-05-07 17:22:55 +00:00
Nathan Hjelm
584c457352 ugni: update smsg defaults and add parameter to control local completion queue size
This commit was SVN r26399.
2012-05-07 17:22:49 +00:00
Nathan Hjelm
bfcf67391a ugni: set fragment id from opal_pointer_array_add
This commit was SVN r26398.
2012-05-07 17:22:42 +00:00
Nathan Hjelm
b3dc726e9d ugni: don't create completion queues until add_procs
This commit was SVN r26397.
2012-05-07 17:22:35 +00:00
Nathan Hjelm
0e48ea1f65 vader: remove #include of headers that no longer exist
This commit was SVN r26396.
2012-05-07 17:22:28 +00:00
Nathan Hjelm
a32d4c648d ob1: rewind convertor after failed send
This commit was SVN r26395.
2012-05-07 17:22:22 +00:00
Jeff Squyres
2ba10c37fe Per RFC, bring in the following changes:
* Remove paffinity, maffinity, and carto frameworks -- they've been
   wholly replaced by hwloc.
 * Move ompi_mpi_init() affinity-setting/checking code down to ORTE.
 * Update sm, smcuda, wv, and openib components to no longer use carto.
   Instead, use hwloc data.  There are still optimizations possible in
   the sm/smcuda BTLs (i.e., making multiple mpools).  Also, the old
   carto-based code found out how many NUMA nodes were ''available''
   -- not how many were used ''in this job''.  The new hwloc-using
   code computes the same value -- it was not updated to calculate how
   many NUMA nodes are used ''by this job.''
   * Note that I cannot compile the smcuda and wv BTLs -- I ''think''
     they're right, but they need to be verified by their owners.
 * The openib component now does a bunch of stuff to figure out where
   "near" OpenFabrics devices are.  '''THIS IS A CHANGE IN DEFAULT
   BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors
   (I do not have a NUMA machine with an OpenFabrics device that is a
   non-uniform distance from multiple different NUMA nodes).
 * Completely rewrite the OMPI_Affinity_str() routine from the
   "affinity" mpiext extension.  This extension now understands
   hyperthreads; the output format of it has changed a bit to reflect
   this new information.
 * Bunches of minor changes around the code base to update names/types
   from maffinity/paffinity-based names to hwloc-based names.
 * Add some helper functions into the hwloc base, mainly having to do
   with the fact that we have the hwloc data reporting ''all''
   topology information, but sometimes you really only want the
   (online | available) data.

This commit was SVN r26391.
2012-05-07 14:52:54 +00:00
Mike Dubman
1b475523de add support for FDR speed
This commit was SVN r26385.
2012-05-06 05:53:05 +00:00
Nathan Hjelm
b6ae288a59 fix segfault when pml direct enabled
This commit was SVN r26371.
2012-05-01 23:12:41 +00:00
Brian Barrett
0ae2277796 Add a backoff mechanism for re-establishing communication
This commit was SVN r26366.
2012-05-01 15:53:00 +00:00
Brian Barrett
74ade8b181 need to order the pending list before we restart
This commit was SVN r26365.
2012-04-30 23:06:00 +00:00
Brian Barrett
5dec52af8d remove some now unneeded debugging
This commit was SVN r26364.
2012-04-30 22:50:52 +00:00
Brian Barrett
c654ee6afc * Use triggered operations for restart barrier as well
This commit was SVN r26363.
2012-04-30 22:48:10 +00:00
Brian Barrett
91a9973bde * Make flow control on by default
* Move alarm code back into a triggered operation

This commit was SVN r26362.
2012-04-30 22:25:40 +00:00
Brian Barrett
e6a0a1cf8a * Make sure to release all resources on failed send
* Avoid triggered ops until we get everything debugged
* Simplify flowctl interface a bit

This commit was SVN r26356.
2012-04-27 21:11:01 +00:00
Nathan Hjelm
c36ab84116 ugni: missed a couple of lines in the last commit
This commit was SVN r26340.
2012-04-25 14:24:48 +00:00
Nathan Hjelm
a753fe91f7 fix merge
This commit was SVN r26332.
2012-04-24 21:16:51 +00:00
Nathan Hjelm
0eb18b9699 ob1: update copyrights
This commit was SVN r26331.
2012-04-24 20:19:15 +00:00