1
1
Граф коммитов

587 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
cee2a4e5c8 Missing alloca.h. Thanks Paul for catching this.
This commit was SVN r32388.
2014-08-01 03:28:23 +00:00
Ralph Castain
552c9ca5a0 George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-)
WHAT:    Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL

All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies.  This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP.  Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose.  UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs.  A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic.

This commit was SVN r32317.
2014-07-26 00:47:28 +00:00
Nathan Hjelm
f960e4273e Fix typo in r32196
The wrong descriptor field was used when calculating the size received when
using the RDMA rendevous protcol.

This commit was SVN r32232.

The following SVN revision numbers were found above:
  r32196 --> open-mpi/ompi@a14e0f10d4
2014-07-14 21:00:53 +00:00
Gilles Gouaillardet
77184b5c4c Fix a cornercase with MPI_PROC_NULL persistent requests
Handle OMPI_REQUEST_NOOP in MPI_Startall rather than PML

cmr=v1.8.2:reviewer=bosilca:ticket=4764

This commit was SVN r32213.

The following Trac tickets were found above:
  Ticket 4764 --> https://svn.open-mpi.org/trac/ompi/ticket/4764
2014-07-11 04:37:01 +00:00
Nathan Hjelm
1b9621eeb0 Fix typo in r32196
This commit was SVN r32202.

The following SVN revision numbers were found above:
  r32196 --> open-mpi/ompi@a14e0f10d4
2014-07-10 18:43:49 +00:00
Nathan Hjelm
a14e0f10d4 Per RFC: Remove des_src and des_dst members from the
mca_btl_base_segment_t and replace them with des_local and des_remote

This change also updates the BTL version to 3.0.0. This commit does
not represent the final version of BTL 3.0.0. More changes are coming.

In making this change I updated all of the BTLs as well as BTL user's
to use the new structure members. Please evaluate your component to
ensure the changes are correct.

RFC text:

This is the first of several BTL interface changes I am proposing for
the 1.9/2.0 release series.

What: Change naming of btl descriptor members. I propose we change
des_src and des_dst (and their associated counts) to be des_local and
des_remote. For receive callbacks the des_local member will be used to
communicate the segment information to the callback. The proposed change
will include updating all of the doxygen in btl.h as well as updating
all BTLs and BTL users to use the new naming scheme.

Why: My btl usage makes use of both put and get operations on the same
descriptor. With the current naming scheme I need to ensure that there
is consistency beteen the segments described in des_src and des_dst
depending on whether a put or get operation is executed. Additionally,
the current naming prevents BTLs that do not require prepare/RMA matched
operations (do not set MCA_BTL_FLAGS_RDMA_MATCHED) from executing
multiple simultaneous put AND get operations. At the moment the
descriptor can only be used with one or the other. The naming change
makes it easier for BTL users to setup/modify descriptors for RMA
operations as the local segment and remote segment are always in the
same member field. The only issue I forsee with this change is that it
will require a little more work to move BTL fixes to the 1.8 release
series.

This commit was SVN r32196.
2014-07-10 16:31:15 +00:00
Gilles Gouaillardet
8d3bea2771 Fix the cornercase with MPI_PROC_NULL persistent requests.
This corner case is now handled in the pml so the same code
is invoked for both MPI_Start and MPI_Startall.
This also correctly report an error if MPI_Startall is invoked twice
on a MPI_PROC_NULL persistent request.

This commit was SVN r32139.
2014-07-04 04:58:52 +00:00
George Bosilca
843ef1fcb0 ompi_mpi_abort had one extra argument that was never used. Clean it up.
This commit was SVN r32124.
2014-07-03 00:34:44 +00:00
George Bosilca
fd0e1b7261 If we detect an error on a request that has been already released
at the MPI level, we should call abort on MPI_COMM_WORLD.

Fixes ticket #1943.
cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31982.
2014-06-10 16:24:13 +00:00
Jeff Squyres
b0a6e42f45 pml ob1: use the pre-computed size from the free lists
Based on a suggestion from George on #31806, use the pre-computed
sizes rather than duplicating the computation math (which may change
someday in the future).

cmr=v1.8.2:ticket=trac:4647

This commit was SVN r31841.

The following Trac tickets were found above:
  Ticket 4647 --> https://svn.open-mpi.org/trac/ompi/ticket/4647
2014-05-20 20:32:25 +00:00
George Bosilca
db9660264e Update the error message to pinpoint the right location.
Thanks Tim.

This commit was SVN r31839.
2014-05-20 20:08:42 +00:00
George Bosilca
685f051557 Move the allocator initialization from open to init. This clean
a memory leak. Similar changes shuld be applied to all the 
other PML that are copies of OB1. This patch is related to
#4653.

This commit was SVN r31838.
2014-05-20 19:34:18 +00:00
Nathan Hjelm
a1d5ce0893 pml/ob1: as per past RFC bring the inline send optimization to
MPI_Isend.

I filed an RFC for this optimization some time back. It is a
relatively simple optimization. If the data associated with an
MPI_Isend can be put on the wire without allocating an MPI_Request
then do so. In this case we can legally return omp_request_empty
which will correctly indicate that the request is complete and that is
was not cancelled (these are the only requirements on send requests).

cmr=v1.8.3:reviewer=bosilca

This commit was SVN r31828.
2014-05-19 19:34:59 +00:00
Gilles Gouaillardet
2b89aac15b Fix a typo in MCA_PML_OB1_RECV_REQUEST_UNPACK
cmr=v1.8.2:reviewer=rhc

This commit was SVN r31817.
2014-05-19 11:00:13 +00:00
Jeff Squyres
025e4a852b pml_ob1: ensure to have enough space for send/recvreq on stack
r30343 introduced the optimization of putting the OB1 sendreq and
recvreq on the stack for blocking sends and receives.  However, the
requests did not contain enough storage for the data that is normally
immediately ''after'' the request (e.g., BTL data).

This commit changes these requests to be pointers and to use alloca()
to get enough total space for the OB1 request and all the associated
data.

The change is smaller than it looks; most of it is just changing from
"foo.bar" to "foo->bar" notation (etc.).

Submitted by Jeff, reviewed by Nathan.  But we want George to look at
this (and get a little soak time on the trunk) before moving to v1.8.

cmr=v1.8.2:reviewer=bosilca

This commit was SVN r31806.

The following SVN revision numbers were found above:
  r30343 --> open-mpi/ompi@2b57f4227e
2014-05-17 01:05:59 +00:00
Nathan Hjelm
4113cfa03a pml/ob1: add missing OBJ_DESTRUCT
An OBJ_DESTRUCT was missing for mca_pml_ob1.send_ranges causing a
memory leak. Identified by valgrind.

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31768.
2014-05-14 21:15:45 +00:00
Ralph Castain
a8e2d6c3a6 The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature:
top_ompi_srcdir  ->  OMPI_TOP_SRCDIR
top_ompi_builddir -> OMPI_TOP_BUILDDIR

We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers.

Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon.

This commit was SVN r31678.
2014-05-07 21:48:53 +00:00
Nathan Hjelm
626b521e9c pml/ob1: fix heterogeneous support when using the send_inline optimization
We will track #4568 from the 1.8 CMR.

Closes trac:4568

cmr=v1.8.2:reviewer=jsquyres

This commit was SVN r31535.

The following Trac tickets were found above:
  Ticket 4568 --> https://svn.open-mpi.org/trac/ompi/ticket/4568
2014-04-28 17:36:26 +00:00
Jeff Squyres
f8dbba78a7 Send the BTL-passed message to ompi_rte_abort.
cmr=v1.8:reviewer=rolfv

This commit was SVN r30889.
2014-02-28 16:20:54 +00:00
Nathan Hjelm
a06e491c2c ob1: large buffered sends were broken by the ob1 optimizations. fix them
The problem was caused by the static request optimization. The buffered send case
is much like the isend case in that the request structure may be needed after
MPI_Bsend completes. Fix this case by calling isend and freeing the resulting
request.

cmr=v1.7.5:ticket=trac:4149

This commit was SVN r30601.

The following Trac tickets were found above:
  Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149
2014-02-07 00:12:36 +00:00
Nathan Hjelm
3902cf66f1 ob1: OBJ_CONSTRUCT the convertor in the send_inline optimization.
This change does not appear to increase the small message latency of ping-pong
benchmarks and fixes an issue found by our ibm datatype tests.

Fixes trac:4232

cmr=v1.7.5:ticket=trac:4149

This commit was SVN r30598.

The following Trac tickets were found above:
  Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149
  Ticket 4232 --> https://svn.open-mpi.org/trac/ompi/ticket/4232
2014-02-06 21:27:42 +00:00
George Bosilca
bde9619386 Various minor cleanups.
This commit was SVN r30431.
2014-01-26 17:27:12 +00:00
Ralph Castain
06e6a06f3e Cleanup a couple of abstraction breaks found by Thomas Naughton
This commit was SVN r30371.
2014-01-22 21:36:24 +00:00
Nathan Hjelm
66b69da394 Fix a bug in the ob1 optimizations that can cause a segfault.
btl sendi functions currently can not handle the descriptor being NULL. The
send inline optimization was assuming (incorrectly) that NULL was ok.

cmr=v1.7.5:ticket=trac:4149

This commit was SVN r30364.

The following Trac tickets were found above:
  Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149
2014-01-22 16:31:58 +00:00
Nathan Hjelm
2b57f4227e ob1: optimize blocking send and receive paths
Per RFC. There are two optimizations in this commit:

 - Allocate requests for blocking sends and receives on the stack. This
   bypasses the request free list and saves two atomics on the critical path.
   This change improves the small message ping-pong by 50-200ns on both AMD
   and Intel CPUs.

 - For small messages try to use the btl sendi function before intializing a
   send request. If the sendi fails or the btl does not have a sendi function
   silently fallback on the standard send path.

cmr=v1.7.5:reviewer=brbarret

This commit was SVN r30343.
2014-01-21 15:16:21 +00:00
Brian Barrett
8b778903d8 Fix longstanding issue with our multi-project support. Rather than using
pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is
always set to {datadir,libdir,includedir}/openmpi.  This will keep us from
having help files in prefix/share/open-rte when building without Open MPI,
but in prefix/share/openmpi when building with Open MPI.

This commit was SVN r30140.
2014-01-07 22:11:15 +00:00
Rolf vandeVaart
e7f430d9ac Add empty line that was inadvertently removed in message.
This commit was SVN r30099.
2013-12-30 18:38:07 +00:00
Rolf vandeVaart
b955dbd6d9 Fix various items discovered by review of ticket #3951.
This commit was SVN r29900.
2013-12-13 21:25:07 +00:00
Rolf vandeVaart
d556b60b21 Chnage some CUDA configure code and macro names per review request by jsquyres in ticket #3880.
Functionally, nothing changes.

This commit was SVN r29815.
2013-12-06 14:35:10 +00:00
Jeff Squyres
3a7af4ab40 Fix another clang warning: sendreq is undefined if proc==NULL.
cmr=v1.7.4:reviewer=hjelmn:subject=fix ob1 undefined sendreq value

This commit was SVN r29774.
2013-12-02 19:44:42 +00:00
Ralph Castain
ac9820c46f Link against common cuda library
Thanks to Jorg Bornschein for pointing it out

cmr=v1.7.4:reviewer=rolfv

This commit was SVN r29750.
2013-11-24 17:06:51 +00:00
Rolf vandeVaart
4964a5e98b Per this RFC from October 8, 2013 and as discuessed in telecon.
http://www.open-mpi.org/community/lists/devel/2013/10/13072.php

Add support for pinning GPU Direct RDMA in openib BTL for better small message latency of GPU buffers. 
Note that none of this is compiled in unless CUDA-aware support is requested.

This commit was SVN r29680.
2013-11-13 13:22:39 +00:00
Rolf vandeVaart
e57795f097 Revert r29594. That was just plain wrong. Sorry about workday configure change.
This commit was SVN r29605.

The following SVN revision numbers were found above:
  r29594 --> open-mpi/ompi@ed7ddcd9c7
2013-11-05 14:45:56 +00:00
Rolf vandeVaart
ed7ddcd9c7 Fix CUDA-aware compile error introduces with r29581.
This commit was SVN r29594.

The following SVN revision numbers were found above:
  r29581 --> open-mpi/ompi@ee7510b025
2013-11-05 00:08:33 +00:00
Rolf vandeVaart
ee7510b025 Remove redundant macro. This was from reviewed of earlier ticket.
Fixes trac:3878.  Reviewed by jsquyres.

This commit was SVN r29581.

The following Trac tickets were found above:
  Ticket 3878 --> https://svn.open-mpi.org/trac/ompi/ticket/3878
2013-11-01 12:19:40 +00:00
George Bosilca
55273f1c98 Cleanup spaces, nothing else.
This commit was SVN r29197.
2013-09-18 00:07:58 +00:00
Brian Barrett
16a1166884 Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a
configure-time dynamic allocation of flags.  The net result for platforms
which only support BTL-based communication is a reduction of 8*nprocs bytes
per process.  Platforms which support both MTLs and BTLs will not see
a space reduction, but will now be able to safely run both the MTL and BTL
side-by-side, which will prove useful.

This commit was SVN r29100.
2013-08-30 16:54:55 +00:00
Ralph Castain
a200e4f865 As per the RFC, bring in the ORTE async progress code and the rewrite of OOB:
*** THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE ***

Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro.

***************************************************************************************

I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week.

The code is in  https://bitbucket.org/rhc/ompi-oob2


WHAT:    Rewrite of ORTE OOB

WHY:       Support asynchronous progress and a host of other features

WHEN:    Wed, August 21

SYNOPSIS:
The current OOB has served us well, but a number of limitations have been identified over the years. Specifically:

* it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code)

* we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface.

* the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients

* there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort

* only one transport (i.e., component) can be "active"


The revised OOB resolves these problems:

* async progress is used for all application processes, with the progress thread blocking in the event library

* each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on")

* multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC.

* a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions.

* opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object

* NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions

* obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel

* the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport

* routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active

* all blocking send/recv APIs have been removed. Everything operates asynchronously.


KNOWN LIMITATIONS:

* although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline

* the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker

* routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways

* obviously, not every error path has been tested nor necessarily covered

* determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when *all* transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost.

* reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways

* the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC

This commit was SVN r29058.
2013-08-22 16:37:40 +00:00
Rolf vandeVaart
504fa2cda9 Fix support in smcuda btl so it does not blow up when there is no CUDA IPC support between two GPUs. Also make it so CUDA IPC support is added dynamically.
Fixes ticket 3531.    

This commit was SVN r29055.
2013-08-21 21:00:09 +00:00
Nathan Hjelm
f0aeb36d80 Fix warnings in ob1 introduced by the pvar commit
This commit was SVN r28817.
2013-07-17 03:41:05 +00:00
Nathan Hjelm
e6e9f2c6fd Add profiling function definitions for MPI_T and add a missing type into mpi.h
This commit was SVN r28803.
2013-07-16 16:03:33 +00:00
Nathan Hjelm
35673ea400 Add example performance variables to ob1: unexpected message queue length, posted receive length
This commit was SVN r28801.
2013-07-16 16:02:25 +00:00
Rolf vandeVaart
54b1fbdb4a Better error message code. Remove commented out code.
This commit was SVN r28793.
2013-07-15 22:27:34 +00:00
Aurelien Bouteiller
e1066143a4 rename ompi_free_list operations to _mt, as per discussions at last face to face meeting
This commit was SVN r28734.
2013-07-08 22:07:52 +00:00
George Bosilca
c9e5ab9ed1 Our macros for the OMPI-level free list had one extra argument, a possible return
value to signal that the operation of retrieving the element from the free list
failed. However in this case the returned pointer was set to NULL as well, so the
error code was redundant. Moreover, this was a continuous source of warnings when
the picky mode is on.

The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and
OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of
using the return code.

This commit was SVN r28722.
2013-07-04 08:34:37 +00:00
George Bosilca
702e669636 Remove a [very] annoying warning.
This commit was SVN r28690.
2013-07-01 16:49:13 +00:00
George Bosilca
5fae72b9aa Add the MPI 2.2 MPI_Dist_graph functionality.
This patch reshape the way we deal with topologies completely. Where
our topologies were mainly storage components (they were not capable
of creating the new communicator), the new version is built around a
[possibly] common representation (in mca/topo/topo.h), but the functions
to attach and retrieve the topological information are specific to each
component. As a result the ompi_create_cart and ompi_create_graph functions
become useless and have been removed.

In addition to adding the internal infrastructure to manage the topology
information, it updates the MPI interface, and the debuggers support and
provides all Fortran interfaces.

This commit was SVN r28687.
2013-07-01 12:40:08 +00:00
Rolf vandeVaart
5ebb74bee3 Fix case where amount of data sent is less than expected. Otherwise, we will get hang when running the RGET protocol.
Reviewed by hjelm,bosilca.

This commit was SVN r28667.
2013-06-21 18:35:16 +00:00
Rolf vandeVaart
52ebb0b17f Change some opal_output to OPAL_OUTPUT per CMR review.
This commit was SVN r28510.
2013-05-14 20:49:42 +00:00
Nathan Hjelm
4e95d691a7 pml/ob1: do not reset the convertor if one was not created (size = 0).
This macro is only used on the failure path so the additional if statement
should not have any affect on performance.

cmr:v1.7

This commit was SVN r28292.
2013-04-05 01:40:11 +00:00