Brian Barrett
d56de80b5d
* Properly initialize handle variable as a request (since the coll_libnbc_request contains everything an NBC_Handle used to contain). Not sure how this slipped through...
...
This commit was SVN r26710.
2012-07-02 16:39:42 +00:00
Brian Barrett
7e67bfa175
Use OMPI's ops instead of the libnbc ops.
...
This commit was SVN r26708.
2012-07-02 15:47:22 +00:00
Pavel Shamis
f7664b3814
1. Adding 2 new components:
...
ofacm - generic connection manager for IB interconnects.
ofautils - IB common utilities and compatibility code
2. Updating OpenIB configure code
- ORNL & Mellanox Teams
This commit was SVN r26707.
2012-07-02 15:20:12 +00:00
Yevgeny Kliteynik
0e28fa984b
Remove dead code that was related to ticket #2971
...
This commit was SVN r26701.
2012-07-02 11:19:09 +00:00
Nathan Hjelm
a847df9ba5
ugni: fix eager get
...
This commit was SVN r26699.
2012-06-29 15:43:29 +00:00
Jeff Squyres
5d030278e1
Refs trac:3130: Per comment 8 on the ticket, this MX patch fixes the cases
...
where the MX BTL and MTL are stepping on each other regarding the
mpool. Thanks to Yong Qin for assistance in tracking this down.
This commit was SVN r26698.
The following Trac tickets were found above:
Ticket 3130 --> https://svn.open-mpi.org/trac/ompi/ticket/3130
2012-06-29 13:52:40 +00:00
Jeff Squyres
b936229b54
Refs trac:3130: fix the openib BTL to properly set the memalign malloc
...
hook early in the setup, but ''not'' during the component register
function. And then properly unset it if was set.
This commit was SVN r26697.
The following Trac tickets were found above:
Ticket 3130 --> https://svn.open-mpi.org/trac/ompi/ticket/3130
2012-06-29 13:51:36 +00:00
Jeff Squyres
f3a8722360
Fix comment.
...
This commit was SVN r26696.
2012-06-29 01:38:04 +00:00
Brian Barrett
0b887ab5a1
* Remove unneeded prototype that was causing compile issues anyway
...
* Use proper tag space (the negatives below the blocking communicators)
instead of the point-to-point space
* Use the PML interface instead of the MPI interface, since the MPI
interface 1) shouldn't be used by components and 2) doesn't like
negative tags
This commit was SVN r26693.
2012-06-28 16:52:03 +00:00
Edgar Gabriel
b0954a6a3e
set the internal OMPIO file pointer to the end of the file if file has been
...
opened using the APPEND mode.
This commit was SVN r26692.
2012-06-28 15:15:47 +00:00
Edgar Gabriel
32b0dfed31
* set the status _ucount field correctly for individual read and write
...
operations
* removing a lingering reference to the ylib fcoll component, which will not
be part of the 1.7 branch.
This commit was SVN r26691.
2012-06-28 14:43:56 +00:00
Edgar Gabriel
be6ea52bb4
some further cleanup of resources in case of an error.
...
This commit was SVN r26690.
2012-06-28 13:58:23 +00:00
Ralph Castain
a1344bc5c0
Add missing header to tarball
...
This commit was SVN r26689.
2012-06-28 13:07:18 +00:00
Brian Barrett
32e70b691a
Re-enable non-blocking collectives in libnbc after finding issue with the definition of
...
NBC_CACHE_SCHEDULE not being propogated to all uses.
This commit was SVN r26686.
2012-06-27 22:08:19 +00:00
Edgar Gabriel
b7a72feb1d
minor code cleanup and make the MPI_MODE_DELETE_ON_CLOSE work
...
This commit was SVN r26685.
2012-06-27 20:54:58 +00:00
Brian Barrett
d85fdd2605
temporarily back out r26682 and r26683 until I can figure out why they cause crashes during shutdown
...
This commit was SVN r26684.
The following SVN revision numbers were found above:
r26682 --> open-mpi/ompi@15a30af11f
r26683 --> open-mpi/ompi@f6ea4b7234
2012-06-27 19:32:53 +00:00
Brian Barrett
f6ea4b7234
Remove now unneeded header file
...
This commit was SVN r26683.
2012-06-27 18:43:40 +00:00
Brian Barrett
15a30af11f
Turn on all the non-blocking collectives provided by libnbc...
...
This commit was SVN r26682.
2012-06-27 18:32:57 +00:00
Brian Barrett
3933d0a8f0
Ibarrier works! :)
...
This commit was SVN r26680.
2012-06-27 15:58:17 +00:00
Ralph Castain
0dfe29b1a6
Roll in the rest of the modex change. Eliminate all non-modex API access of RTE info from the MPI layer - in some cases, the info was already present (either in the ompi_proc_t or in the orte_process_info struct) and no call was necessary. This removes all calls to orte_ess from the MPI layer. Calls to orte_grpcomm remain required.
...
Update all the orte ess components to remove their associated APIs for retrieving proc data. Update the grpcomm API to reflect transfer of set/get modex info to the db framework.
Note that this doesn't recreate the old GPR. This is strictly a local db storage that may (at some point) obtain any missing data from the local daemon as part of an async methodology. The framework allows us to experiment with such methods without perturbing the default one.
This commit was SVN r26678.
2012-06-27 14:53:55 +00:00
Josh Hursey
28681deffa
Backout the ORCA commit. :(
...
There is a linking issue on Mac OSX that needs to be addressed before this is able to come back into the trunk.
This commit was SVN r26676.
2012-06-27 01:28:28 +00:00
Josh Hursey
32050f026f
protect the ORTE_CHECK_PMI define in the OMPI layer for --no-orte builds
...
This commit was SVN r26674.
2012-06-27 00:28:37 +00:00
Josh Hursey
542330e3a7
Commit of ORCA: Open MPI Runtime Collaborative Abstraction
...
This is a runtime interposition project that sits between the OMPI and ORTE layers in Open MPI.
The project is described on the wiki:
https://svn.open-mpi.org/trac/ompi/wiki/Runtime_Interposition
And on this email thread:
http://www.open-mpi.org/community/lists/devel/2012/06/11109.php
This commit was SVN r26670.
2012-06-26 21:42:16 +00:00
Edgar Gabriel
288d044097
get rid of the fcache framework. It was not being used as originally intended.
...
This commit was SVN r26668.
2012-06-26 19:53:26 +00:00
Nathan Hjelm
086000ce8d
remove mpool/rdma
...
This commit was SVN r26665.
2012-06-26 15:56:07 +00:00
Nathan Hjelm
37c624ee43
prepare to delete mpool/rdma
...
This commit was SVN r26664.
2012-06-26 15:55:23 +00:00
Brian Barrett
7bdeafb772
Start bringing in libnbc. .ompi_ignored, as there's still a long way to go
...
This commit was SVN r26658.
2012-06-25 22:38:06 +00:00
Edgar Gabriel
6a2dd16ee3
cleaning up the usage of CFLAGS vs. CPPFLAGS. Thanks Jeff for helping with
...
that!
This commit was SVN r26655.
2012-06-25 20:32:58 +00:00
Nathan Hjelm
2dbe630138
fix more udapl warnings/errors
...
This commit was SVN r26648.
2012-06-25 15:18:50 +00:00
Brian Barrett
b9e8e4aeb9
* Initial merge of the non-blocking collectives interface. No implementation of
...
the back-end yet, coming real soon now, need to solve some tag issues first.
This commit was SVN r26641.
2012-06-22 20:54:12 +00:00
Nathan Hjelm
6a0ccf41e6
one more file
...
This commit was SVN r26638.
2012-06-22 18:21:57 +00:00
Ralph Castain
e6f3586415
Remove the orte notifier framework, per discussion at the devel meeting and follow-up with Jeff (who took the action item)
...
This commit was SVN r26637.
2012-06-22 18:09:23 +00:00
Nathan Hjelm
03f00c42b8
fix udapl compile problems from r26626
...
This commit was SVN r26635.
The following SVN revision numbers were found above:
r26626 --> open-mpi/ompi@249066e06d
2012-06-22 14:20:45 +00:00
Nathan Hjelm
77f7171186
remove hdr_segkey from OMPI_OSC_RDMA_BASE_HDR_NTOH and OMPI_OSC_RDMA_BASE_HDR_HTON
...
This commit was SVN r26634.
2012-06-22 14:15:26 +00:00
Nathan Hjelm
249066e06d
Timeout! Per RFC update the BTL interface to hide segment keys. All BTLs (with the exception of wv), all relevant PMLs, and osc/rdma have been updated for the new interface.
...
This commit was SVN r26626.
2012-06-21 17:09:12 +00:00
Ralph Castain
1e1c755fbc
Remove non-existant windows file
...
This commit was SVN r26624.
2012-06-21 01:37:36 +00:00
Nathan Hjelm
e3bc6c0f73
btl/ugni: use grdma mpool to take advantage of shared lru
...
This commit was SVN r26623.
2012-06-20 23:03:59 +00:00
Nathan Hjelm
3d86b5055e
btl/ugni: don't call opal_convertor_pack if there is nothing to pack
...
This commit was SVN r26622.
2012-06-20 23:01:37 +00:00
Nathan Hjelm
f5fd87a446
mpool/grdma: temporarily remove support for remote (local) process eviction and remove ignore.
...
This commit was SVN r26621.
2012-06-20 23:00:25 +00:00
Yevgeny Kliteynik
df783c0472
Precise speed of FDR and EDR
...
This commit was SVN r26614.
2012-06-17 07:06:37 +00:00
Nathan Hjelm
fbd1636ea4
fix seg fault when size == 0
...
This commit was SVN r26612.
2012-06-15 16:58:23 +00:00
Rolf vandeVaart
d6881f3a4f
Rename one function. Add some new functions that can support asynchronous CUDA copies.
...
This commit was SVN r26611.
2012-06-15 16:56:30 +00:00
Brian Barrett
defaefd59e
Clean up resources from flowcontrol on shutdown
...
This commit was SVN r26605.
2012-06-14 22:38:35 +00:00
Brian Barrett
946ec4cd97
* Update usage of PtlHandleIsEqual to match new semantic
...
* Properly set message to MPI_MESSAGE_NULL in the right places
* Fix double free of buffer for non-contiguous blocking sends
* Remove useless debugging output
This commit was SVN r26604.
2012-06-14 22:24:23 +00:00
Nathan Hjelm
0d13cbf11c
ob1: bug fix. put fallback on send never actually worked. fixed.
...
This commit was SVN r26602.
2012-06-14 17:29:58 +00:00
Terry Dontje
634fc278d9
Fix issue with sctp config scripts not detecting netinet/in.h dependency. Also removing tabs from sctp m4 file
...
This commit was SVN r26599.
2012-06-13 10:38:28 +00:00
Nathan Hjelm
a809881f78
ob1: reset the converter after a failed sendi before trying send
...
This commit was SVN r26597.
2012-06-12 15:44:47 +00:00
Ralph Castain
269cb2b8d9
Some cleanup to remove calls to opal_progress when running with orte progress threads, and to ensure that all orte-related events are in the orte event base.
...
This commit was SVN r26591.
2012-06-11 19:59:53 +00:00
Brian Barrett
31279eb641
Fix segfault with long expected messages when using the rndv protocol. We were
...
freeing the ME before the get to grab the long part of the message.
This commit was SVN r26589.
2012-06-11 16:37:01 +00:00
Brian Barrett
7406ef1241
Make all the PMI components depend on the common pmi library and properly
...
install the common pmi library
This commit was SVN r26588.
2012-06-11 15:58:09 +00:00
Jeff Squyres
13707ec0af
Remove this comment: it turns out that the benefit was to make
...
multiple SM ''modules'', not multiple SM ''mpools''.
This commit was SVN r26584.
2012-06-08 22:37:26 +00:00
Jeff Squyres
5451ee46bd
Per r26575, the sync coll module is no longer necessary!
...
(the crowd goes wild)
This commit was SVN r26583.
The following SVN revision numbers were found above:
r26575 --> open-mpi/ompi@59e529cf1d
2012-06-08 19:19:19 +00:00
Nathan Hjelm
59e529cf1d
ob1: as per developer discussion disable rdma retries. the failure path currently suffers from live-lock
...
This commit was SVN r26575.
2012-06-07 23:31:20 +00:00
Nathan Hjelm
4c6be00de2
fix erroneous commit in rdma mpool
...
This commit was SVN r26572.
2012-06-07 20:01:44 +00:00
Nathan Hjelm
ceee4bcb0d
libevent2019: libevent_pthreads.la is never built. don't include it
...
This commit was SVN r26570.
2012-06-07 19:22:45 +00:00
Jeff Squyres
56a537a5f5
This component wasn't even in 1.5.0; no one has had a GM network in
...
forever. There is no point in carrying this component forward.
This commit was SVN r26563.
2012-06-06 21:43:54 +00:00
Mike Dubman
10831e111a
detect num of local procs
...
This commit was SVN r26555.
2012-06-05 09:13:16 +00:00
Mike Dubman
e9c274f3b9
raise cm prio for mxm as well, somehow was removed from 1.7, exists in 1.6
...
This commit was SVN r26554.
2012-06-05 09:02:03 +00:00
Yevgeny Kliteynik
1cbce83ece
Fixed wording of MXM parameters as suggested By Jeff.
...
This commit was SVN r26545.
2012-06-03 21:48:42 +00:00
Yevgeny Kliteynik
f02bf707a4
Added MXM parameter "np" that controls the minimal number of processes that allow MXM to run
...
Default: 128
MXM advantages kick in with large number of processes.
This commit was SVN r26544.
2012-06-02 11:07:20 +00:00
Nathan Hjelm
71bffa5158
ugni: update to latest btl code. bug fixes and cleanup
...
This commit was SVN r26529.
2012-05-31 20:02:41 +00:00
Edgar Gabriel
3ccd286de1
silence a compiler warning for optimized builds.
...
This commit was SVN r26528.
2012-05-31 13:32:10 +00:00
Vishwanath Venkatesan
86a57c7b66
Initializing sorted_file_offsets to NULL
...
This commit was SVN r26526.
2012-05-30 06:56:40 +00:00
Jeff Squyres
99c5afb397
Remove clang compiler warnings.
...
This commit was SVN r26523.
2012-05-29 23:36:06 +00:00
Edgar Gabriel
d1e91e9372
make the file compile properly.
...
This commit was SVN r26497.
2012-05-26 01:06:36 +00:00
Brian Barrett
2effbb1ba6
fix copy/paste typo
...
This commit was SVN r26492.
2012-05-24 16:06:20 +00:00
Ralph Castain
c0304eb23a
Fix copy/paste typo
...
This commit was SVN r26491.
2012-05-24 15:47:20 +00:00
Nathan Hjelm
cdc3c87ba6
move pmi init/finalize into a common component
...
This commit was SVN r26470.
2012-05-22 15:15:39 +00:00
George Bosilca
e890a8379b
Various minor cleanups.
...
This commit was SVN r26461.
2012-05-21 13:15:24 +00:00
Brian Barrett
25693363e9
* Fix internal accounting error regarding number of available credits
...
* Use a single MD covering all of address space for put transfers, rather
than a per-send MD.
This commit was SVN r26458.
2012-05-20 23:42:26 +00:00
Vishwanath Venkatesan
8d4bb65bd4
Modifying the explicit operations to make it absolute
...
This commit was SVN r26451.
2012-05-18 21:43:34 +00:00
Vishwanath Venkatesan
cbad31cc88
1. Freeing the displs array after allgatherv to avoid segmentation faults in dynamic segmentation
...
2. Checking for 0 bytes datatypes and sending only when data available to avoid 0 byte messages being sent and received.
3. Changing timing extraction to support calculating, min, max and avg communication costs + min and avg write costs
This commit was SVN r26450.
2012-05-18 21:39:58 +00:00
Rolf vandeVaart
f8ace21366
Rename a few things for clarity. Add a stream.
...
This commit was SVN r26447.
2012-05-17 18:10:59 +00:00
Rolf vandeVaart
c228bd2311
Fix broken compile. Keep in sync with sm btl.
...
This commit was SVN r26440.
2012-05-15 15:32:33 +00:00
Yevgeny Kliteynik
d59b8d5dc4
Fixing malformed error message
...
This commit was SVN r26434.
2012-05-12 21:13:42 +00:00
Mike Dubman
98c2c749fb
fix define name to BTL_OPENIB_MALLOC_HOOKS_ENABLED
...
Thanks to Ludovic.Hablot@ext.bull.net for pointing this out
This commit was SVN r26432.
2012-05-11 18:30:45 +00:00
Brian Barrett
2e52374847
* Split send and receive eq sizes
...
* Need to look at slot count before flowcontrol for sending to prevent
race in restart
* Need to free pending request fragments when done with the request
* A number of branch prediction optimizations for error conditions
This commit was SVN r26430.
2012-05-10 21:43:48 +00:00
Yevgeny Kliteynik
244d66d95b
Fixed FDR link speed details, added EDR.
...
This commit was SVN r26423.
2012-05-10 13:44:18 +00:00
Nathan Hjelm
91d99c6fef
ugni: reserve memory domain descriptors (MDDs) for mailbox registration
...
This commit was SVN r26419.
2012-05-10 00:24:42 +00:00
Jeff Squyres
de4bbacd13
It turns out that we can't always include the hwloc OpenFabrics verbs
...
helper file, even if we find that the system has <infiniband/verbs.h>.
The reason is because there are some inline functions in that verbs
helper file that invoke ibv_* functions. Some linkers (e.g., Solaris
Studio Compilers) will instantiate those static inline functions --
even if we don't use them -- and therefore we need to be able to
resolve the ibv_* symbols at link time.
But since -libverbs is only specified in places where we use other
ibv_* functions (e.g., the OpenFabrics-based BTLs), that means that
linking random executables can/will fail (e.g., orterun).
So instead, introduce a new #define: OPAL_HWLOC_WANT_VERBS_HELPER. If
this macro is set to 1 before including opal/mca/hwloc/hwloc.h, then
you'll also get the hwloc OpenFabrics verbs helper header file (*if*
hwloc found <infiniband/verbs.h> -- otherwise, it'll #error).
This commit was SVN r26417.
2012-05-09 20:18:31 +00:00
Mike Dubman
cd17fee9a8
performance fix: openib use memalign for malloc
...
This commit was SVN r26409.
2012-05-08 20:42:09 +00:00
Nathan Hjelm
903f9fac09
ugni: fixed buffered sends and code cleanup
...
This commit was SVN r26401.
2012-05-07 17:23:06 +00:00
Nathan Hjelm
49eda71ca0
ugni: fix invalid parameter with opal_pointer_array_init
...
This commit was SVN r26400.
2012-05-07 17:22:55 +00:00
Nathan Hjelm
584c457352
ugni: update smsg defaults and add parameter to control local completion queue size
...
This commit was SVN r26399.
2012-05-07 17:22:49 +00:00
Nathan Hjelm
bfcf67391a
ugni: set fragment id from opal_pointer_array_add
...
This commit was SVN r26398.
2012-05-07 17:22:42 +00:00
Nathan Hjelm
b3dc726e9d
ugni: don't create completion queues until add_procs
...
This commit was SVN r26397.
2012-05-07 17:22:35 +00:00
Nathan Hjelm
0e48ea1f65
vader: remove #include of headers that no longer exist
...
This commit was SVN r26396.
2012-05-07 17:22:28 +00:00
Nathan Hjelm
a32d4c648d
ob1: rewind convertor after failed send
...
This commit was SVN r26395.
2012-05-07 17:22:22 +00:00
Jeff Squyres
2ba10c37fe
Per RFC, bring in the following changes:
...
* Remove paffinity, maffinity, and carto frameworks -- they've been
wholly replaced by hwloc.
* Move ompi_mpi_init() affinity-setting/checking code down to ORTE.
* Update sm, smcuda, wv, and openib components to no longer use carto.
Instead, use hwloc data. There are still optimizations possible in
the sm/smcuda BTLs (i.e., making multiple mpools). Also, the old
carto-based code found out how many NUMA nodes were ''available''
-- not how many were used ''in this job''. The new hwloc-using
code computes the same value -- it was not updated to calculate how
many NUMA nodes are used ''by this job.''
* Note that I cannot compile the smcuda and wv BTLs -- I ''think''
they're right, but they need to be verified by their owners.
* The openib component now does a bunch of stuff to figure out where
"near" OpenFabrics devices are. '''THIS IS A CHANGE IN DEFAULT
BEHAVIOR!!''' and still needs to be verified by OpenFabrics vendors
(I do not have a NUMA machine with an OpenFabrics device that is a
non-uniform distance from multiple different NUMA nodes).
* Completely rewrite the OMPI_Affinity_str() routine from the
"affinity" mpiext extension. This extension now understands
hyperthreads; the output format of it has changed a bit to reflect
this new information.
* Bunches of minor changes around the code base to update names/types
from maffinity/paffinity-based names to hwloc-based names.
* Add some helper functions into the hwloc base, mainly having to do
with the fact that we have the hwloc data reporting ''all''
topology information, but sometimes you really only want the
(online | available) data.
This commit was SVN r26391.
2012-05-07 14:52:54 +00:00
Mike Dubman
1b475523de
add support for FDR speed
...
This commit was SVN r26385.
2012-05-06 05:53:05 +00:00
Nathan Hjelm
b6ae288a59
fix segfault when pml direct enabled
...
This commit was SVN r26371.
2012-05-01 23:12:41 +00:00
Brian Barrett
0ae2277796
Add a backoff mechanism for re-establishing communication
...
This commit was SVN r26366.
2012-05-01 15:53:00 +00:00
Brian Barrett
74ade8b181
need to order the pending list before we restart
...
This commit was SVN r26365.
2012-04-30 23:06:00 +00:00
Brian Barrett
5dec52af8d
remove some now unneeded debugging
...
This commit was SVN r26364.
2012-04-30 22:50:52 +00:00
Brian Barrett
c654ee6afc
* Use triggered operations for restart barrier as well
...
This commit was SVN r26363.
2012-04-30 22:48:10 +00:00
Brian Barrett
91a9973bde
* Make flow control on by default
...
* Move alarm code back into a triggered operation
This commit was SVN r26362.
2012-04-30 22:25:40 +00:00
Brian Barrett
e6a0a1cf8a
* Make sure to release all resources on failed send
...
* Avoid triggered ops until we get everything debugged
* Simplify flowctl interface a bit
This commit was SVN r26356.
2012-04-27 21:11:01 +00:00
Nathan Hjelm
c36ab84116
ugni: missed a couple of lines in the last commit
...
This commit was SVN r26340.
2012-04-25 14:24:48 +00:00
Nathan Hjelm
a753fe91f7
fix merge
...
This commit was SVN r26332.
2012-04-24 21:16:51 +00:00
Nathan Hjelm
0eb18b9699
ob1: update copyrights
...
This commit was SVN r26331.
2012-04-24 20:19:15 +00:00