openmpi

Автор	SHA1	Сообщение	Дата
KAWASHIMA Takahiro	0021616984	pml/ob1: Fix data corruption of MPI_BSEND Data transferred by `MPI_BSEND` may corrupt if all of the following conditions are met. - The message size is less than the eager limit. - The `btl_alloc` function in the BTL interface returns `NULL` for some reason. - The MPI program overwrites the send buffer after `MPI_BSEND` returns. The problem is in the way of pending a send request in ob1 PML. The `mca_pml_ob1_send_request_start_copy` function retruns `OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns `des = NULL`. In this case, the send request is added to the `send_pending` list and `MPI_BSEND` returns immediately. Next time the `mca_pml_ob1_send_request_start_copy` function tries sending, the user buffer may have been overwritten by the MPI program. Call hierarchy of `MPI_BSEND`: ``` MPI_Bsend mca_pml_ob1_send if (MCA_PML_BASE_SEND_BUFFERED == sendmode) mca_pml_ob1_isend MCA_PML_OB1_SEND_REQUEST_START_W_SEQ mca_pml_ob1_send_request_start_seq mca_pml_ob1_send_request_start_btl if (size <= eager_limit) if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED) mca_pml_ob1_send_request_start_copy mca_bml_base_alloc btl_alloc if (OMPI_ERR_OUT_OF_RESOURCE == rc) add_request_to_send_pending ompi_request_free ``` To solve this problem, we should save the data to the buffer attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`. This problem was introduced by ob1 optimization (commits 2b57f422 and a06e491c) in v1.8 series. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-07-12 14:30:58 +09:00
Nathan Hjelm	1282e98a01	opal/asm: rename existing arithmetic atomic functions This commit renames the arithmetic atomic operations in opal to indicate that they return the new value not the old value. This naming differentiates these routines from new functions that return the old value. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-11-30 10:41:22 -07:00
George Bosilca	050bd3b6d7	Make the pipeline depth an int instead of a size_t. While they are supposed to be unsigned, casting them to a signed value for all atomic operations is as errorprone as handling them as signed entities. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-09-01 18:52:48 -04:00
George Bosilca	c2cd717f82	Don't refcount the predefined datatypes. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-01-11 16:48:59 -05:00
Nathan Hjelm	889dd32806	pml/ob1: reset req_bytes_packed on start On start we were not correctly resetting all request fields. This was leading to a double-completion on persistent receives. This commit updates the base start code to reset the receive req_bytes_packed and the send request convertor. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-03 11:29:30 -06:00
George Bosilca	2e1b1d34c6	Safety first !	2016-06-02 11:52:43 +09:00
Nathan Hjelm	086ffc1838	pml/ob1: fix race on pml completion of send requests The request code was setting the request as pml_complete before calling MCA_PML_OB1_SEND_REQUEST_MPI_COMPLETE. This was causing MCA_PML_OB1_SEND_REQUEST_RETURN to be called twice in some cases. The code now mirrors the recvreq code and only sets the request as pml complete if the request has not already been freed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-01 13:36:06 -06:00
bosilca	b90c83840f	Refactor the request completion (#1422 ) * Remodel the request. Added the wait sync primitive and integrate it into the PML and MTL infrastructure. The multi-threaded requests are now significantly less heavy and less noisy (only the threads associated with completed requests are signaled). * Fix the condition to release the request.	2016-05-24 18:20:51 -05:00
George Bosilca	bf190671e9	Make the request lock recursive. If during the request completion callback we post another request that completes right away (such a small send or a match for an unexpected short message) we will try to complete the second request while holding the lock for the completion of the first. For performance reasons (mainly to avoid unlocking and locking the request mutex several times) we have made the request lock recursive.	2016-04-26 16:16:07 -04:00
Nathan Hjelm	b4a0d40915	pml/ob1: Add support for dynamically calling add_procs This commit contains the following changes: - pml/ob1: use the bml accessor function when requesting a bml endpoint. this will ensure that bml endpoints are only created when needed. for example, a bml endpoint is not requested and not allocated when receiving an eager message from a peer. - pml/ob1: change the pml_procs array in the ob1 communicator to a proc pointer array. at the cost of a single level of extra redirection this will allow us to allocate pml procs on demand. - pml/ob1: add an accessor function to access the pml proc structure for a given peer. this function will allocate the proc if it doesn't already exist. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-10 08:55:54 -06:00
bosilca	1b8556f926	Merge pull request #653 from hjelmn/moar_ob1_fixes pml/ob1: fix bugs in static request objects	2015-06-24 14:28:11 -07:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Nathan Hjelm	9a8a87611e	pml/ob1: fix bugs in static request objects This commit fixes several bugs in the static request objects used by ob1 for blocking send/receive operations. - Fix memory leak when using MPI_THREAD_MULTIPLE. Requests were allocated off the free list but were destructed and NOT returned. - Fix double-destruct of static objects. There is no reason to CONSTRUCT/DESTUCT the static object for each send/receive operation. This adds overhead and no benefit. To keep the code clean helper functions have been added to finalize ob1 send/receive requests. - Remove now unnecessary include of alloca.h. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-06-23 11:00:45 -06:00
Nathan Hjelm	ce48eabd84	pml/ob1: use c99 flexible array members instead of size 1 arrays This commit updates several ob1 structures to take advantage of C99's flexible array member. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-05-20 10:31:35 -06:00
Nathan Hjelm	5f1254d710	Update code base to use the new opal_free_list_t Use of the old ompi_free_list_t and ompi_free_list_item_t is deprecated. These classes will be removed in a future commit. This commit updates the entire code base to use opal_free_list_t and opal_free_list_item_t. Notes: OMPI_FREE_LIST__MT -> opal_free_list_ (uses opal_using_threads ()) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-24 10:05:45 -07:00
Nathan Hjelm	c4a0e02261	pml/ob1: update for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:37 -07:00
Nathan Hjelm	1b564f62bd	Revert "Merge pull request #275 from hjelmn/btlmod" This reverts commit ccaecf0fd6c862877e6a1e2643f95fa956c87769, reversing changes made to 6a19bf85dde5306f559f09952cf3919d97f52502.	2014-11-19 23:22:43 -07:00
Nathan Hjelm	b75bb8aea7	Update pml for btl changes	2014-11-19 11:33:02 -07:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
George Bosilca	843ef1fcb0	ompi_mpi_abort had one extra argument that was never used. Clean it up. This commit was SVN r32124.	2014-07-03 00:34:44 +00:00
George Bosilca	fd0e1b7261	If we detect an error on a request that has been already released at the MPI level, we should call abort on MPI_COMM_WORLD. Fixes ticket #1943. cmr=v1.8.2:reviewer=jsquyres This commit was SVN r31982.	2014-06-10 16:24:13 +00:00
Nathan Hjelm	2b57f4227e	ob1: optimize blocking send and receive paths Per RFC. There are two optimizations in this commit: - Allocate requests for blocking sends and receives on the stack. This bypasses the request free list and saves two atomics on the critical path. This change improves the small message ping-pong by 50-200ns on both AMD and Intel CPUs. - For small messages try to use the btl sendi function before intializing a send request. If the sendi fails or the btl does not have a sendi function silently fallback on the standard send path. cmr=v1.7.5:reviewer=brbarret This commit was SVN r30343.	2014-01-21 15:16:21 +00:00
Rolf vandeVaart	d556b60b21	Chnage some CUDA configure code and macro names per review request by jsquyres in ticket #3880 . Functionally, nothing changes. This commit was SVN r29815.	2013-12-06 14:35:10 +00:00
Rolf vandeVaart	4964a5e98b	Per this RFC from October 8, 2013 and as discuessed in telecon. http://www.open-mpi.org/community/lists/devel/2013/10/13072.php Add support for pinning GPU Direct RDMA in openib BTL for better small message latency of GPU buffers. Note that none of this is compiled in unless CUDA-aware support is requested. This commit was SVN r29680.	2013-11-13 13:22:39 +00:00
Rolf vandeVaart	ee7510b025	Remove redundant macro. This was from reviewed of earlier ticket. Fixes trac:3878. Reviewed by jsquyres. This commit was SVN r29581. The following Trac tickets were found above: Ticket 3878 --> https://svn.open-mpi.org/trac/ompi/ticket/3878	2013-11-01 12:19:40 +00:00
George Bosilca	55273f1c98	Cleanup spaces, nothing else. This commit was SVN r29197.	2013-09-18 00:07:58 +00:00
Brian Barrett	16a1166884	Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a configure-time dynamic allocation of flags. The net result for platforms which only support BTL-based communication is a reduction of 8*nprocs bytes per process. Platforms which support both MTLs and BTLs will not see a space reduction, but will now be able to safely run both the MTL and BTL side-by-side, which will prove useful. This commit was SVN r29100.	2013-08-30 16:54:55 +00:00
Aurelien Bouteiller	e1066143a4	rename ompi_free_list operations to _mt, as per discussions at last face to face meeting This commit was SVN r28734.	2013-07-08 22:07:52 +00:00
George Bosilca	c9e5ab9ed1	Our macros for the OMPI-level free list had one extra argument, a possible return value to signal that the operation of retrieving the element from the free list failed. However in this case the returned pointer was set to NULL as well, so the error code was redundant. Moreover, this was a continuous source of warnings when the picky mode is on. The attached parch remove the rc argument from the OMPI_FREE_LIST_GET and OMPI_FREE_LIST_WAIT macros, and change to check if the item is NULL instead of using the return code. This commit was SVN r28722.	2013-07-04 08:34:37 +00:00
Nathan Hjelm	4e95d691a7	pml/ob1: do not reset the convertor if one was not created (size = 0). This macro is only used on the failure path so the additional if statement should not have any affect on performance. cmr:v1.7 This commit was SVN r28292.	2013-04-05 01:40:11 +00:00
Nathan Hjelm	a32d4c648d	ob1: rewind convertor after failed send This commit was SVN r26395.	2012-05-07 17:22:22 +00:00
Nathan Hjelm	0eb18b9699	ob1: update copyrights This commit was SVN r26331.	2012-04-24 20:19:15 +00:00
Nathan Hjelm	0a0e487d9c	ob1: add emacs mode/indentation defaults This commit was SVN r26330.	2012-04-24 20:19:06 +00:00
Nathan Hjelm	9a35f96bda	ob1: add support for get fallback on put/send This commit was SVN r26329.	2012-04-24 20:18:56 +00:00
Ralph Castain	bd8b4f7f1e	Sorry for mid-day commit, but I had promised on the call to do this upon my return. Roll in the ORTE state machine. Remove last traces of opal_sos. Remove UTK epoch code. Please see the various emails about the state machine change for details. I'll send something out later with more info on the new arch. This commit was SVN r26242.	2012-04-06 14:23:13 +00:00
Rolf vandeVaart	b0a84b0a7d	New btl that extends sm btl to support GPU transfers within a node. Uses new CUDA IPC support. Also, a few minor changes in PML to take advantage of it. This code has no effect unless user asks for it explicitly via configure arguments. Otherwise, it is either #ifdef'ed out or not compiled. This commit was SVN r26039.	2012-02-24 02:13:33 +00:00
Rolf vandeVaart	3d3b3d4dad	Add support for CUDA registering sm and openib buffers. Feature is disabled by default. This commit was SVN r24987.	2011-08-04 10:15:45 +00:00
Eugene Loh	2770a12beb	Continue clean up of thread options started in r22841, 22842, and 22849. No need for any CMRs to 1.5... that was already done in CMR 2728. This commit was SVN r24545. The following SVN revision numbers were found above: r22841 --> open-mpi/ompi@b400b84162	2011-03-18 21:36:35 +00:00
George Bosilca	733d25a8a3	First step toward fixing the MPI_Get_count issues from the ticket #2241 . Next step is the configure and Fortran mojo that Jeff will put in. Until then I guess the Fortran interface is broken (at least all functions using the hidden count firld in the MPI_Status). This commit was SVN r23467.	2010-07-21 20:07:00 +00:00
Abhishek Kulkarni	afbe3e99c6	* Wrap all the direct error-code checks of the form (OMPI_ERR_* == ret) with (OMPI_ERR_* = OPAL_SOS_GET_ERR_CODE(ret)), since the return value could be a SOS-encoded error. The OPAL_SOS_GET_ERR_CODE() takes in a SOS error and returns back the native error code. * Since OPAL_SUCCESS is preserved by SOS, also change all calls of the form (OPAL_ERROR == ret) to (OPAL_SUCCESS != ret). We thus avoid having to decode 'ret' to get the native error code. This commit was SVN r23162.	2010-05-17 23:08:56 +00:00
George Bosilca	cf8bd2142a	Various cleanups and typos. This commit was SVN r21765.	2009-08-05 03:12:33 +00:00
Terry Dontje	d432c9fdbc	Add asserts to catch when btl_eager_limit is smaller than the pml headers. This commit was SVN r21707.	2009-07-17 14:54:18 +00:00
Rainer Keller	6c5532072a	- Split the datatype engine into two parts: an MPI specific part in OMPI and a language agnostic part in OPAL. The convertor is completely moved into OPAL. This offers several benefits as described in RFC http://www.open-mpi.org/community/lists/devel/2009/07/6387.php namely: - Fewer basic types (int* and float* types, boolean and wchar - Fixing naming scheme to ompi-nomenclature. - Usability outside of the ompi-layer. - Due to the fixed nature of simple opal types, their information is completely known at compile time and therefore constified - With fewer datatypes (22), the actual sizes of bit-field types may be reduced from 64 to 32 bits, allowing reorganizing the opal_datatype structure, eliminating holes and keeping data required in convertor (upon send/recv) in one cacheline... This has implications to the convertor-datastructure and other parts of the code. - Several performance tests have been run, the netpipe latency does not change with this patch on Linux/x86-64 on the smoky cluster. - Extensive tests have been done to verify correctness (no new regressions) using: 1. mpi_test_suite on linux/x86-64 using clean ompi-trunk and ompi-ddt: a. running both trunk and ompi-ddt resulted in no differences (except for MPI_SHORT_INT and MPI_TYPE_MIX_LB_UB do now run correctly). b. with --enable-memchecker and running under valgrind (one buglet when run with static found in test-suite, commited) 2. ibm testsuite on linux/x86-64 using clean ompi-trunk and ompi-ddt: all passed (except for the dynamic/ tests failed!! as trunk/MTT) 3. compilation and usage of HDF5 tests on Jaguar using PGI and PathScale compilers. 4. compilation and usage on Scicortex. - Please note, that for the heterogeneous case, (-m32 compiled binaries/ompi), neither ompi-trunk, nor ompi-ddt branch would successfully launch. This commit was SVN r21641.	2009-07-13 04:56:31 +00:00
George Bosilca	f9a510fd8a	There is no need for an atomic read if we are not in a threaded case. This commit was SVN r21394.	2009-06-08 23:55:52 +00:00
George Bosilca	527540aeb1	Rename req_bytes_delivered to req_bytes_expected for the receive requests to really reflect what this field means. This commit was SVN r20971.	2009-04-10 16:36:20 +00:00
Ralph Castain	f72e3ba9f9	Update the PML base send init macro to take a converter_flag field (discussed with George). Update the csum pml module - still not quite right, but closer. Modify the LANL platform files to keep pace. This commit was SVN r20859.	2009-03-24 19:12:53 +00:00
Rainer Keller	02416033ad	- Get rid of warning on function declarations: First "static inline", then the type This commit was SVN r20657.	2009-02-28 14:15:34 +00:00
George Bosilca	00d24bf8ab	Scalability patch, or slim-fast effect #1 . All BML structures just got a whole lot smaller, decreasing the memory footprint of the running application. How much it's a good question. Here is a breakdown: - in mca_bml_base_endpoint_t: 3 size_t + 1 uint32_t - in mca_bml_base_btl_t: 1 * int + 1 * double - 1 * float + 6 * size_t + 9 * (void) The decrease in mca_bml_base_endpoint_t is for each peer and the decrease in mca_bml_base_btl_t is for each BTL for each peer. So, if we consider the most convenient case where there is only one network between all peers, this decrease the memory foot print per peer by 9size_t + 9(void) + 2 * int32_t + 1 * double - 1 * float. On a 64 bits machine this will be 156 bytes per peer. Now we access all these fields directly from the underlying BTL structure, and as this structure is common to multiple BML endpoint, we are a lot more cache friendly. Even if this do not improve the latency, it makes the SM performance graph a lot smoother. This commit was SVN r19659.	2008-09-30 21:02:37 +00:00
George Bosilca	325d006577	Mostly cleanups, and eventually a little bit more scalable add_procs. There was an argument that was barely used, and on return at the PML level it contained nothing usable. It has been removed, so now we're using less memory ... This commit was SVN r19657.	2008-09-30 15:47:43 +00:00
George Bosilca	939fa3001d	Small cleanups. Remove some switch cases that cannot be reached. Rename a struct field. This commit was SVN r18931.	2008-07-17 04:50:39 +00:00

1 2 3

130 Коммитов