openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	271818f887	pml/ob1: bug fixes and adjustments for changes in btl_sendi behavior	2014-11-19 11:33:03 -07:00
Nathan Hjelm	ee2b111011	Update PML for latest BTL update	2014-11-19 11:33:02 -07:00
Nathan Hjelm	c61e017177	pml: updates to reflect member changes in mca_btl_base_descriptor_t and mca_btl_base_module_t structures	2014-11-19 11:33:02 -07:00
Nathan Hjelm	5936411a07	pml/ob1: when using btl_get try to register the entire region before attempting to break the get into multiple rdma fragments A little background. Historically ob1 always registered the entire memory region when the RGET protocol was in use. This changed when Mellanox added support to fragment RGET using the btl_prepare_dst function. Now that the BTL layer has changed to split out the limits of get/put there is explicit fragmentation code in ob1. Before this commit the registration was still done per RGET fragment. This commit will attempt to register the entire region before creating RGET fragments. If the registration is successfull then all RGET fragments will use this registration otherwise they will each attempt to register their own segment of the receive buffer. If that fails enough times each fragment will give up and fall back on send/recv.	2014-11-19 11:33:02 -07:00
Nathan Hjelm	b75bb8aea7	Update pml for btl changes	2014-11-19 11:33:02 -07:00
Jeff Squyres	7a5b2e9b13	ob1: change an OPAL_UNLIKELY to OPAL_LIKELY Per `924d39e415 (commitcomment-8378266)`, this OPAN_UNLIKELY should really be OPAL_LIKELY.	2014-10-31 03:22:55 -07:00
George Bosilca	924d39e415	Always OBJ_DESTRUCT the send request.	2014-10-30 01:28:50 -04:00
Gilles Gouaillardet	ed93c8787d	ob1: add a destructor to mca_pml_ob1_recv_request_t opal_mutex_t must be OBJ_DESTRUCTed in order to avoid a memory leak (pthread_mutex_init allocates memory under Cygwin, so pthread_mutex_destroy is mandatory) Thanks to Marco Atzeri for reporting this issue	2014-10-29 13:30:29 +09:00
Jeff Squyres	c22e1ae33b	configury: new OPAL_SET_LIB_PREFIX/ORTE_SET_LIB_PREFIX macros These two macros set the prefix for the OPAL and ORTE libraries, respectively. Specifically, the OPAL library will be named libPREFIXopen-pal.la and the ORTE library will be named libPREFIXopen-rte.la. These macros must be called, even if the prefix argument is empty. The intent is that Open MPI will call these macros with an empty prefix, but other projects (such as ORCM) will call these macros with a non-empty prefix. For example, ORCM libraries can be named liborcm-open-pal.la and liborcm-open-rte.la. This scheme is necessary to allow running Open MPI applications under systems that use their own versions of ORTE and OPAL. For example, when running MPI applications under ORTE, if the ORTE and OPAL libraries between OMPI and ORCM are not identical (which, because they are released at different times, are likely to be different), we need to ensure that the OMPI applications link against their ORTE and OPAL libraries, but the ORCM executables link against their ORTE and OPAL libraries.	2014-10-22 10:32:19 -07:00
George Bosilca	cee2a4e5c8	Missing alloca.h. Thanks Paul for catching this. This commit was SVN r32388.	2014-08-01 03:28:23 +00:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
Nathan Hjelm	f960e4273e	Fix typo in r32196 The wrong descriptor field was used when calculating the size received when using the RDMA rendevous protcol. This commit was SVN r32232. The following SVN revision numbers were found above: r32196 --> open-mpi/ompi@a14e0f10d4	2014-07-14 21:00:53 +00:00
Gilles Gouaillardet	77184b5c4c	Fix a cornercase with MPI_PROC_NULL persistent requests Handle OMPI_REQUEST_NOOP in MPI_Startall rather than PML cmr=v1.8.2:reviewer=bosilca:ticket=4764 This commit was SVN r32213. The following Trac tickets were found above: Ticket 4764 --> https://svn.open-mpi.org/trac/ompi/ticket/4764	2014-07-11 04:37:01 +00:00
Nathan Hjelm	1b9621eeb0	Fix typo in r32196 This commit was SVN r32202. The following SVN revision numbers were found above: r32196 --> open-mpi/ompi@a14e0f10d4	2014-07-10 18:43:49 +00:00
Nathan Hjelm	a14e0f10d4	Per RFC: Remove des_src and des_dst members from the mca_btl_base_segment_t and replace them with des_local and des_remote This change also updates the BTL version to 3.0.0. This commit does not represent the final version of BTL 3.0.0. More changes are coming. In making this change I updated all of the BTLs as well as BTL user's to use the new structure members. Please evaluate your component to ensure the changes are correct. RFC text: This is the first of several BTL interface changes I am proposing for the 1.9/2.0 release series. What: Change naming of btl descriptor members. I propose we change des_src and des_dst (and their associated counts) to be des_local and des_remote. For receive callbacks the des_local member will be used to communicate the segment information to the callback. The proposed change will include updating all of the doxygen in btl.h as well as updating all BTLs and BTL users to use the new naming scheme. Why: My btl usage makes use of both put and get operations on the same descriptor. With the current naming scheme I need to ensure that there is consistency beteen the segments described in des_src and des_dst depending on whether a put or get operation is executed. Additionally, the current naming prevents BTLs that do not require prepare/RMA matched operations (do not set MCA_BTL_FLAGS_RDMA_MATCHED) from executing multiple simultaneous put AND get operations. At the moment the descriptor can only be used with one or the other. The naming change makes it easier for BTL users to setup/modify descriptors for RMA operations as the local segment and remote segment are always in the same member field. The only issue I forsee with this change is that it will require a little more work to move BTL fixes to the 1.8 release series. This commit was SVN r32196.	2014-07-10 16:31:15 +00:00
Gilles Gouaillardet	8d3bea2771	Fix the cornercase with MPI_PROC_NULL persistent requests. This corner case is now handled in the pml so the same code is invoked for both MPI_Start and MPI_Startall. This also correctly report an error if MPI_Startall is invoked twice on a MPI_PROC_NULL persistent request. This commit was SVN r32139.	2014-07-04 04:58:52 +00:00
George Bosilca	843ef1fcb0	ompi_mpi_abort had one extra argument that was never used. Clean it up. This commit was SVN r32124.	2014-07-03 00:34:44 +00:00
George Bosilca	fd0e1b7261	If we detect an error on a request that has been already released at the MPI level, we should call abort on MPI_COMM_WORLD. Fixes ticket #1943. cmr=v1.8.2:reviewer=jsquyres This commit was SVN r31982.	2014-06-10 16:24:13 +00:00
Jeff Squyres	b0a6e42f45	pml ob1: use the pre-computed size from the free lists Based on a suggestion from George on #31806, use the pre-computed sizes rather than duplicating the computation math (which may change someday in the future). cmr=v1.8.2:ticket=trac:4647 This commit was SVN r31841. The following Trac tickets were found above: Ticket 4647 --> https://svn.open-mpi.org/trac/ompi/ticket/4647	2014-05-20 20:32:25 +00:00
George Bosilca	db9660264e	Update the error message to pinpoint the right location. Thanks Tim. This commit was SVN r31839.	2014-05-20 20:08:42 +00:00
George Bosilca	685f051557	Move the allocator initialization from open to init. This clean a memory leak. Similar changes shuld be applied to all the other PML that are copies of OB1. This patch is related to #4653. This commit was SVN r31838.	2014-05-20 19:34:18 +00:00
Nathan Hjelm	a1d5ce0893	pml/ob1: as per past RFC bring the inline send optimization to MPI_Isend. I filed an RFC for this optimization some time back. It is a relatively simple optimization. If the data associated with an MPI_Isend can be put on the wire without allocating an MPI_Request then do so. In this case we can legally return omp_request_empty which will correctly indicate that the request is complete and that is was not cancelled (these are the only requirements on send requests). cmr=v1.8.3:reviewer=bosilca This commit was SVN r31828.	2014-05-19 19:34:59 +00:00
Gilles Gouaillardet	2b89aac15b	Fix a typo in MCA_PML_OB1_RECV_REQUEST_UNPACK cmr=v1.8.2:reviewer=rhc This commit was SVN r31817.	2014-05-19 11:00:13 +00:00
Jeff Squyres	025e4a852b	pml_ob1: ensure to have enough space for send/recvreq on stack r30343 introduced the optimization of putting the OB1 sendreq and recvreq on the stack for blocking sends and receives. However, the requests did not contain enough storage for the data that is normally immediately ''after'' the request (e.g., BTL data). This commit changes these requests to be pointers and to use alloca() to get enough total space for the OB1 request and all the associated data. The change is smaller than it looks; most of it is just changing from "foo.bar" to "foo->bar" notation (etc.). Submitted by Jeff, reviewed by Nathan. But we want George to look at this (and get a little soak time on the trunk) before moving to v1.8. cmr=v1.8.2:reviewer=bosilca This commit was SVN r31806. The following SVN revision numbers were found above: r30343 --> open-mpi/ompi@2b57f4227e	2014-05-17 01:05:59 +00:00
Nathan Hjelm	4113cfa03a	pml/ob1: add missing OBJ_DESTRUCT An OBJ_DESTRUCT was missing for mca_pml_ob1.send_ranges causing a memory leak. Identified by valgrind. cmr=v1.8.2:reviewer=jsquyres This commit was SVN r31768.	2014-05-14 21:15:45 +00:00
Ralph Castain	a8e2d6c3a6	The bulk of the remaining renaming changes, in one final glorious "blob". Thanks to Jeff for some help chasing down a few spots. Per chat with Jeff, we decided to cleanup a few things that were historical in nature: top_ompi_srcdir -> OMPI_TOP_SRCDIR top_ompi_builddir -> OMPI_TOP_BUILDDIR We also split the srcdir/builddir flags according to their local tree (e.g., OPAL_TOP_SRCDIR), and tied them all together in configure.ac. Renamed ompi_ignore and ompi_unignore to be opal_<foo> as these are agnostic markers. Only thing left is ompilibdir being treated similar to what we dif for srcdir/builddir. Coming soon. This commit was SVN r31678.	2014-05-07 21:48:53 +00:00
Nathan Hjelm	626b521e9c	pml/ob1: fix heterogeneous support when using the send_inline optimization We will track #4568 from the 1.8 CMR. Closes trac:4568 cmr=v1.8.2:reviewer=jsquyres This commit was SVN r31535. The following Trac tickets were found above: Ticket 4568 --> https://svn.open-mpi.org/trac/ompi/ticket/4568	2014-04-28 17:36:26 +00:00
Jeff Squyres	f8dbba78a7	Send the BTL-passed message to ompi_rte_abort. cmr=v1.8:reviewer=rolfv This commit was SVN r30889.	2014-02-28 16:20:54 +00:00
Nathan Hjelm	a06e491c2c	ob1: large buffered sends were broken by the ob1 optimizations. fix them The problem was caused by the static request optimization. The buffered send case is much like the isend case in that the request structure may be needed after MPI_Bsend completes. Fix this case by calling isend and freeing the resulting request. cmr=v1.7.5:ticket=trac:4149 This commit was SVN r30601. The following Trac tickets were found above: Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149	2014-02-07 00:12:36 +00:00
Nathan Hjelm	3902cf66f1	ob1: OBJ_CONSTRUCT the convertor in the send_inline optimization. This change does not appear to increase the small message latency of ping-pong benchmarks and fixes an issue found by our ibm datatype tests. Fixes trac:4232 cmr=v1.7.5:ticket=trac:4149 This commit was SVN r30598. The following Trac tickets were found above: Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149 Ticket 4232 --> https://svn.open-mpi.org/trac/ompi/ticket/4232	2014-02-06 21:27:42 +00:00
George Bosilca	bde9619386	Various minor cleanups. This commit was SVN r30431.	2014-01-26 17:27:12 +00:00
Ralph Castain	06e6a06f3e	Cleanup a couple of abstraction breaks found by Thomas Naughton This commit was SVN r30371.	2014-01-22 21:36:24 +00:00
Nathan Hjelm	66b69da394	Fix a bug in the ob1 optimizations that can cause a segfault. btl sendi functions currently can not handle the descriptor being NULL. The send inline optimization was assuming (incorrectly) that NULL was ok. cmr=v1.7.5:ticket=trac:4149 This commit was SVN r30364. The following Trac tickets were found above: Ticket 4149 --> https://svn.open-mpi.org/trac/ompi/ticket/4149	2014-01-22 16:31:58 +00:00
Nathan Hjelm	2b57f4227e	ob1: optimize blocking send and receive paths Per RFC. There are two optimizations in this commit: - Allocate requests for blocking sends and receives on the stack. This bypasses the request free list and saves two atomics on the critical path. This change improves the small message ping-pong by 50-200ns on both AMD and Intel CPUs. - For small messages try to use the btl sendi function before intializing a send request. If the sendi fails or the btl does not have a sendi function silently fallback on the standard send path. cmr=v1.7.5:reviewer=brbarret This commit was SVN r30343.	2014-01-21 15:16:21 +00:00
Brian Barrett	8b778903d8	Fix longstanding issue with our multi-project support. Rather than using pkg{data,lib,includedir}, use our own ompi{data,lib,includedir}, which is always set to {datadir,libdir,includedir}/openmpi. This will keep us from having help files in prefix/share/open-rte when building without Open MPI, but in prefix/share/openmpi when building with Open MPI. This commit was SVN r30140.	2014-01-07 22:11:15 +00:00
Rolf vandeVaart	e7f430d9ac	Add empty line that was inadvertently removed in message. This commit was SVN r30099.	2013-12-30 18:38:07 +00:00
Rolf vandeVaart	b955dbd6d9	Fix various items discovered by review of ticket #3951 . This commit was SVN r29900.	2013-12-13 21:25:07 +00:00
Rolf vandeVaart	d556b60b21	Chnage some CUDA configure code and macro names per review request by jsquyres in ticket #3880 . Functionally, nothing changes. This commit was SVN r29815.	2013-12-06 14:35:10 +00:00
Jeff Squyres	3a7af4ab40	Fix another clang warning: sendreq is undefined if proc==NULL. cmr=v1.7.4:reviewer=hjelmn:subject=fix ob1 undefined sendreq value This commit was SVN r29774.	2013-12-02 19:44:42 +00:00
Ralph Castain	ac9820c46f	Link against common cuda library Thanks to Jorg Bornschein for pointing it out cmr=v1.7.4:reviewer=rolfv This commit was SVN r29750.	2013-11-24 17:06:51 +00:00
Rolf vandeVaart	4964a5e98b	Per this RFC from October 8, 2013 and as discuessed in telecon. http://www.open-mpi.org/community/lists/devel/2013/10/13072.php Add support for pinning GPU Direct RDMA in openib BTL for better small message latency of GPU buffers. Note that none of this is compiled in unless CUDA-aware support is requested. This commit was SVN r29680.	2013-11-13 13:22:39 +00:00
Rolf vandeVaart	e57795f097	Revert r29594. That was just plain wrong. Sorry about workday configure change. This commit was SVN r29605. The following SVN revision numbers were found above: r29594 --> open-mpi/ompi@ed7ddcd9c7	2013-11-05 14:45:56 +00:00
Rolf vandeVaart	ed7ddcd9c7	Fix CUDA-aware compile error introduces with r29581. This commit was SVN r29594. The following SVN revision numbers were found above: r29581 --> open-mpi/ompi@ee7510b025	2013-11-05 00:08:33 +00:00
Rolf vandeVaart	ee7510b025	Remove redundant macro. This was from reviewed of earlier ticket. Fixes trac:3878. Reviewed by jsquyres. This commit was SVN r29581. The following Trac tickets were found above: Ticket 3878 --> https://svn.open-mpi.org/trac/ompi/ticket/3878	2013-11-01 12:19:40 +00:00
George Bosilca	55273f1c98	Cleanup spaces, nothing else. This commit was SVN r29197.	2013-09-18 00:07:58 +00:00
Brian Barrett	16a1166884	Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a configure-time dynamic allocation of flags. The net result for platforms which only support BTL-based communication is a reduction of 8*nprocs bytes per process. Platforms which support both MTLs and BTLs will not see a space reduction, but will now be able to safely run both the MTL and BTL side-by-side, which will prove useful. This commit was SVN r29100.	2013-08-30 16:54:55 +00:00
Ralph Castain	a200e4f865	As per the RFC, bring in the ORTE async progress code and the rewrite of OOB: * THIS RFC INCLUDES A MINOR CHANGE TO THE MPI-RTE INTERFACE * Note: during the course of this work, it was necessary to completely separate the MPI and RTE progress engines. There were multiple places in the MPI layer where ORTE_WAIT_FOR_COMPLETION was being used. A new OMPI_WAIT_FOR_COMPLETION macro was created (defined in ompi/mca/rte/rte.h) that simply cycles across opal_progress until the provided flag becomes false. Places where the MPI layer blocked waiting for RTE to complete an event have been modified to use this macro. *************************************************************************************** I am reissuing this RFC because of the time that has passed since its original release. Since its initial release and review, I have debugged it further to ensure it fully supports tests like loop_spawn. It therefore seems ready for merge back to the trunk. Given its prior review, I have set the timeout for one week. The code is in https://bitbucket.org/rhc/ompi-oob2 WHAT: Rewrite of ORTE OOB WHY: Support asynchronous progress and a host of other features WHEN: Wed, August 21 SYNOPSIS: The current OOB has served us well, but a number of limitations have been identified over the years. Specifically: * it is only progressed when called via opal_progress, which can lead to hangs or recursive calls into libevent (which is not supported by that code) * we've had issues when multiple NICs are available as the code doesn't "shift" messages between transports - thus, all nodes had to be available via the same TCP interface. * the OOB "unloads" incoming opal_buffer_t objects during the transmission, thus preventing use of OBJ_RETAIN in the code when repeatedly sending the same message to multiple recipients * there is no failover mechanism across NICs - if the selected NIC (or its attached switch) fails, we are forced to abort * only one transport (i.e., component) can be "active" The revised OOB resolves these problems: * async progress is used for all application processes, with the progress thread blocking in the event library * each available TCP NIC is supported by its own TCP module. The ability to asynchronously progress each module independently is provided, but not enabled by default (a runtime MCA parameter turns it "on") * multi-address TCP NICs (e.g., a NIC with both an IPv4 and IPv6 address, or with virtual interfaces) are supported - reachability is determined by comparing the contact info for a peer against all addresses within the range covered by the address/mask pairs for the NIC. * a message that arrives on one TCP NIC is automatically shifted to whatever NIC that is connected to the next "hop" if that peer cannot be reached by the incoming NIC. If no TCP module will reach the peer, then the OOB attempts to send the message via all other available components - if none can reach the peer, then an "error" is reported back to the RML, which then calls the errmgr for instructions. * opal_buffer_t now conforms to standard object rules re OBJ_RETAIN as we no longer "unload" the incoming object * NIC failure is reported to the TCP component, which then tries to resend the message across any other available TCP NIC. If that doesn't work, then the message is given back to the OOB base to try using other components. If all that fails, then the error is reported to the RML, which reports to the errmgr for instructions * obviously from the above, multiple OOB components (e.g., TCP and UD) can be active in parallel * the matching code has been moved to the RML (and out of the OOB/TCP component) so it is independent of transport * routing is done by the individual OOB modules (as opposed to the RML). Thus, both routed and non-routed transports can simultaneously be active * all blocking send/recv APIs have been removed. Everything operates asynchronously. KNOWN LIMITATIONS: * although provision is made for component failover as described above, the code for doing so has not been fully implemented yet. At the moment, if all connections for a given peer fail, the errmgr is notified of a "lost connection", which by default results in termination of the job if it was a lifeline * the IPv6 code is present and compiles, but is not complete. Since the current IPv6 support in the OOB doesn't work anyway, I don't consider this a blocker * routing is performed at the individual module level, yet the active routed component is selected on a global basis. We probably should update that to reflect that different transports may need/choose to route in different ways * obviously, not every error path has been tested nor necessarily covered * determining abnormal termination is more challenging than in the old code as we now potentially have multiple ways of connecting to a process. Ideally, we would declare "connection failed" when all transports can no longer reach the process, but that requires some additional (possibly complex) code. For now, the code replicates the old behavior only somewhat modified - i.e., if a module sees its connection fail, it checks to see if it is a lifeline. If so, it notifies the errmgr that the lifeline is lost - otherwise, it notifies the errmgr that a non-lifeline connection was lost. * reachability is determined solely on the basis of a shared subnet address/mask - more sophisticated algorithms (e.g., the one used in the tcp btl) are required to handle routing via gateways * the RML needs to assign sequence numbers to each message on a per-peer basis. The receiving RML will then deliver messages in order, thus preventing out-of-order messaging in the case where messages travel across different transports or a message needs to be redirected/resent due to failure of a NIC This commit was SVN r29058.	2013-08-22 16:37:40 +00:00
Rolf vandeVaart	504fa2cda9	Fix support in smcuda btl so it does not blow up when there is no CUDA IPC support between two GPUs. Also make it so CUDA IPC support is added dynamically. Fixes ticket 3531. This commit was SVN r29055.	2013-08-21 21:00:09 +00:00
Nathan Hjelm	f0aeb36d80	Fix warnings in ob1 introduced by the pvar commit This commit was SVN r28817.	2013-07-17 03:41:05 +00:00
Nathan Hjelm	e6e9f2c6fd	Add profiling function definitions for MPI_T and add a missing type into mpi.h This commit was SVN r28803.	2013-07-16 16:03:33 +00:00

1 2 3 4 5 ...

646 Коммитов