openmpi

Автор	SHA1	Сообщение	Дата
George Bosilca	b2b3da3046	Do not access the frag after returning it. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-10-31 16:39:23 -04:00
George Bosilca	2a2db13b32	Gracefully deal with a get returning 1 (complete right away). Kudos to @EmmanuelBRELLE for spotting it. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-10-01 02:24:02 -04:00
George Bosilca	866899e836	Always abide to the RDMA pipeline limit. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-09-01 18:52:48 -04:00
George Bosilca	050bd3b6d7	Make the pipeline depth an int instead of a size_t. While they are supposed to be unsigned, casting them to a signed value for all atomic operations is as errorprone as handling them as signed entities. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-09-01 18:52:48 -04:00
KAWASHIMA Takahiro	ebc4eb347c	Merge pull request #3701 from kawashima-fj/pr/non-pml-persistent ompi/request: Support non-PML persistent requests	2017-07-31 02:36:17 -05:00
Nathan Hjelm	e73ab93ebf	pml/ob1: do not access fragment after calling btl rget This commit fixes a bug that occurs when the btl callback happens before the rget returns. In this case the fragment has been returned and is no longer valid. This commit saves the size before calling rget. This is valid since the BTL is not allowed to change the read size. Fixes #3821 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-07-11 15:59:40 -06:00
KAWASHIMA Takahiro	0cbdbe32f7	ompi/request: Support non-PML persistent requests This commit adds the `req_start` member to the `ompi_request_t` struct. The `MPI_START` and `MPI_STARTALL` routines call this callback function instead of `MCA_PML_CALL(start(...))`. So components that return persistent request must set this member to their request objects. `mca_pml_base_module_t::pml_start` is not deleted because `MCA_PML_CALL(start(...))` is still used elsewhere across OMPI. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2017-06-02 13:08:17 +09:00
Gilles Gouaillardet	3f1486a508	pml/ob1: initialize one more field in mca_pml_ob1_recv_request_progress_rget() always initialize recvreq->req_rdma_offset to zero. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2016-12-01 13:14:23 +09:00
Ralph Castain	1e2019ce2a	Revert "Update to sync with OMPI master and cleanup to build" This reverts commit cb55c88a8b7817d5891ff06a447ea190b0e77479.	2016-11-22 15:03:20 -08:00
Ralph Castain	cb55c88a8b	Update to sync with OMPI master and cleanup to build Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-22 14:24:54 -08:00
Gilles Gouaillardet	8e788b5aee	pml/ob1: refactor append_recv_req_to_queue() to improve readability and fix a typo in a comment Thanks George for the patch	2016-10-25 10:50:40 +09:00
Gilles Gouaillardet	dfbf2b7be4	opal/threads: add OPAL_THREAD_SUB_SIZE_T macro -1 is not a valid size_t, so instead of OPAL_THREAD_ADD_SIZE_T(..., -1), simply OPAL_THREAD_SUB_SIZE_T(..., 1) and keep picky compilers happy	2016-08-10 13:37:36 +09:00
Nathan Hjelm	799104f688	Merge pull request #1947 from hjelmn/perf pml/ob1: be more selective when using rdma capable btls	2016-08-09 22:15:09 -06:00
Nathan Hjelm	4079eec974	pml/ob1: be more selective when using rdma capable btls This commit updates the btl selection logic for the RDMA and RDMA pipeline protocols to use a btl iff: 1) the btl is also used for eager messages (high exclusivity), or 2) no other RDMA btl is available on an endpoint and the pml_ob1_use_all_rdma MCA variable is true. This fixes a performance regression with shared memory when an RDMA capable network is available. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-09 20:54:42 -06:00
Nathan Hjelm	889dd32806	pml/ob1: reset req_bytes_packed on start On start we were not correctly resetting all request fields. This was leading to a double-completion on persistent receives. This commit updates the base start code to reset the receive req_bytes_packed and the send request convertor. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-03 11:29:30 -06:00
bosilca	b90c83840f	Refactor the request completion (#1422 ) * Remodel the request. Added the wait sync primitive and integrate it into the PML and MTL infrastructure. The multi-threaded requests are now significantly less heavy and less noisy (only the threads associated with completed requests are signaled). * Fix the condition to release the request.	2016-05-24 18:20:51 -05:00
George Bosilca	37e03e3e5b	Don't update req_bytes_received if no bytes were received.	2016-05-12 23:39:32 -04:00
George Bosilca	896f857fc4	Thanks @hjelmn for catching up the typo.	2016-04-07 13:56:26 -04:00
Thananon Patinyasakdikul	92290b94e0	Fixed Coverity reports 1358014-1358018 (DEADCODE and CHECK_RETURN)	2016-04-07 12:52:17 -04:00
George Bosilca	f69eba1bc4	Update the copyright and cleanup the code. Per @jsquyres suggestion remove all trailing spaces. Credit to `sed -i.bak 's/ $//' /[ch]`.	2016-03-28 14:41:01 -04:00
Thananon Patinyasakdikul	92062492b9	Enable Threading in the BTL TCP Added mca parameter to turn progress thread on/off Add a flag to check if we have btl progress thread. Added macro for ob1 matching lock. Update the AUTHORS file.	2016-03-28 14:41:01 -04:00
Aurélien Bouteiller	892e1ed57e	Fix a potential race condition in which a progress matching thread could match a request while we are cancelling it.	2016-03-01 16:43:45 -05:00
Nathan Hjelm	6611c000c9	Fix coverity warnings Fix CID 1315271: Constant expression result The intent of this conditional is to not produce a peruse event for probe or mprobe requests. Coverity is correct that the expression is always true. Changed the \|\| to && to fix. Also moved the conditional within an OMPI_WANT_PERUSE to ensure the conditional is not evaluated if peruse is disabled. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-28 15:35:25 -06:00
Nathan Hjelm	b4a0d40915	pml/ob1: Add support for dynamically calling add_procs This commit contains the following changes: - pml/ob1: use the bml accessor function when requesting a bml endpoint. this will ensure that bml endpoints are only created when needed. for example, a bml endpoint is not requested and not allocated when receiving an eager message from a peer. - pml/ob1: change the pml_procs array in the ob1 communicator to a proc pointer array. at the cost of a single level of extra redirection this will allow us to allocate pml procs on demand. - pml/ob1: add an accessor function to access the pml proc structure for a given peer. this function will allocate the proc if it doesn't already exist. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-10 08:55:54 -06:00
Ralph Castain	869041f770	Purge whitespace from the repo	2015-06-23 20:59:57 -07:00
Gilles Gouaillardet	ee3a1da28a	pml/ob1:mca_pml_ob1_recv_request_put_frag silence a warning proc local variable is used only in heterogeneous mode	2015-06-15 10:00:53 +09:00
Gilles Gouaillardet	85c45e2275	pml/ob1: fix mca_pml_ob1_recv_request_put_frag(...) in heterogeneous mode	2015-05-22 15:48:45 +09:00
Nathan Hjelm	3d32dbd793	btl/openib: cuda: fix CUDA-aware support with async copy This commit should resolve an issue seen with CUDA-aware support. The problem came in with BTL 3.0. Before 3.0 the size of the copy was stored in the incoming segment's des_remote_count field. This field does not exist in BTL 3.0 so I stored the value in the des_segment_count field. This caused problems with the cuda support code. To fix the issue the endpoint pointer is now stored in the in fragment's endpoint pointer which free's up the segment's des_cbdata pointer for storing the transfer size. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-03-10 14:38:12 -06:00
Nathan Hjelm	0ac2f08460	pml/ob1: fix peruse compile error Fixes #416	2015-02-24 15:39:46 -07:00
Rolf vandeVaart	dbd0064713	Fix bug in CUDA-aware and GDR introduced by refactoring	2015-02-18 17:44:28 -05:00
Nathan Hjelm	3847025540	pml/ob1: when using btl_get try to register the entire region before attempting to break the get into multiple rdma fragments A little background. Historically ob1 always registered the entire memory region when the RGET protocol was in use. This changed when Mellanox added support to fragment RGET using the btl_prepare_dst function. Now that the BTL layer has changed to split out the limits of get/put there is explicit fragmentation code in ob1. Before this commit the registration was still done per RGET fragment. This commit will attempt to register the entire region before creating RGET fragments. If the registration is successfull then all RGET fragments will use this registration otherwise they will each attempt to register their own segment of the receive buffer. If that fails enough times each fragment will give up and fall back on send/recv. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:37 -07:00
Nathan Hjelm	c4a0e02261	pml/ob1: update for BTL 3.0 interface Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-02-13 11:46:37 -07:00
Howard Pritchard	3fc7b389ff	initial async progress changes for gni	2014-12-24 11:50:23 -07:00
Nathan Hjelm	1b564f62bd	Revert "Merge pull request #275 from hjelmn/btlmod" This reverts commit ccaecf0fd6c862877e6a1e2643f95fa956c87769, reversing changes made to 6a19bf85dde5306f559f09952cf3919d97f52502.	2014-11-19 23:22:43 -07:00
Nathan Hjelm	24427639b6	Fix ob1 warnings	2014-11-19 11:33:03 -07:00
Nathan Hjelm	271818f887	pml/ob1: bug fixes and adjustments for changes in btl_sendi behavior	2014-11-19 11:33:03 -07:00
Nathan Hjelm	ee2b111011	Update PML for latest BTL update	2014-11-19 11:33:02 -07:00
Nathan Hjelm	c61e017177	pml: updates to reflect member changes in mca_btl_base_descriptor_t and mca_btl_base_module_t structures	2014-11-19 11:33:02 -07:00
Nathan Hjelm	5936411a07	pml/ob1: when using btl_get try to register the entire region before attempting to break the get into multiple rdma fragments A little background. Historically ob1 always registered the entire memory region when the RGET protocol was in use. This changed when Mellanox added support to fragment RGET using the btl_prepare_dst function. Now that the BTL layer has changed to split out the limits of get/put there is explicit fragmentation code in ob1. Before this commit the registration was still done per RGET fragment. This commit will attempt to register the entire region before creating RGET fragments. If the registration is successfull then all RGET fragments will use this registration otherwise they will each attempt to register their own segment of the receive buffer. If that fails enough times each fragment will give up and fall back on send/recv.	2014-11-19 11:33:02 -07:00
Nathan Hjelm	b75bb8aea7	Update pml for btl changes	2014-11-19 11:33:02 -07:00
Gilles Gouaillardet	ed93c8787d	ob1: add a destructor to mca_pml_ob1_recv_request_t opal_mutex_t must be OBJ_DESTRUCTed in order to avoid a memory leak (pthread_mutex_init allocates memory under Cygwin, so pthread_mutex_destroy is mandatory) Thanks to Marco Atzeri for reporting this issue	2014-10-29 13:30:29 +09:00
Ralph Castain	552c9ca5a0	George did the work and deserves all the credit for it. Ralph did the merge, and deserves whatever blame results from errors in it :-) WHAT: Open our low-level communication infrastructure by moving all necessary components (btl/rcache/allocator/mpool) down in OPAL All the components required for inter-process communications are currently deeply integrated in the OMPI layer. Several groups/institutions have express interest in having a more generic communication infrastructure, without all the OMPI layer dependencies. This communication layer should be made available at a different software level, available to all layers in the Open MPI software stack. As an example, our ORTE layer could replace the current OOB and instead use the BTL directly, gaining access to more reactive network interfaces than TCP. Similarly, external software libraries could take advantage of our highly optimized AM (active message) communication layer for their own purpose. UTK with support from Sandia, developped a version of Open MPI where the entire communication infrastucture has been moved down to OPAL (btl/rcache/allocator/mpool). Most of the moved components have been updated to match the new schema, with few exceptions (mainly BTLs where I have no way of compiling/testing them). Thus, the completion of this RFC is tied to being able to completing this move for all BTLs. For this we need help from the rest of the Open MPI community, especially those supporting some of the BTLs. A non-exhaustive list of BTLs that qualify here is: mx, portals4, scif, udapl, ugni, usnic. This commit was SVN r32317.	2014-07-26 00:47:28 +00:00
Nathan Hjelm	f960e4273e	Fix typo in r32196 The wrong descriptor field was used when calculating the size received when using the RDMA rendevous protcol. This commit was SVN r32232. The following SVN revision numbers were found above: r32196 --> open-mpi/ompi@a14e0f10d4	2014-07-14 21:00:53 +00:00
Nathan Hjelm	1b9621eeb0	Fix typo in r32196 This commit was SVN r32202. The following SVN revision numbers were found above: r32196 --> open-mpi/ompi@a14e0f10d4	2014-07-10 18:43:49 +00:00
Nathan Hjelm	a14e0f10d4	Per RFC: Remove des_src and des_dst members from the mca_btl_base_segment_t and replace them with des_local and des_remote This change also updates the BTL version to 3.0.0. This commit does not represent the final version of BTL 3.0.0. More changes are coming. In making this change I updated all of the BTLs as well as BTL user's to use the new structure members. Please evaluate your component to ensure the changes are correct. RFC text: This is the first of several BTL interface changes I am proposing for the 1.9/2.0 release series. What: Change naming of btl descriptor members. I propose we change des_src and des_dst (and their associated counts) to be des_local and des_remote. For receive callbacks the des_local member will be used to communicate the segment information to the callback. The proposed change will include updating all of the doxygen in btl.h as well as updating all BTLs and BTL users to use the new naming scheme. Why: My btl usage makes use of both put and get operations on the same descriptor. With the current naming scheme I need to ensure that there is consistency beteen the segments described in des_src and des_dst depending on whether a put or get operation is executed. Additionally, the current naming prevents BTLs that do not require prepare/RMA matched operations (do not set MCA_BTL_FLAGS_RDMA_MATCHED) from executing multiple simultaneous put AND get operations. At the moment the descriptor can only be used with one or the other. The naming change makes it easier for BTL users to setup/modify descriptors for RMA operations as the local segment and remote segment are always in the same member field. The only issue I forsee with this change is that it will require a little more work to move BTL fixes to the 1.8 release series. This commit was SVN r32196.	2014-07-10 16:31:15 +00:00
Ralph Castain	06e6a06f3e	Cleanup a couple of abstraction breaks found by Thomas Naughton This commit was SVN r30371.	2014-01-22 21:36:24 +00:00
Nathan Hjelm	2b57f4227e	ob1: optimize blocking send and receive paths Per RFC. There are two optimizations in this commit: - Allocate requests for blocking sends and receives on the stack. This bypasses the request free list and saves two atomics on the critical path. This change improves the small message ping-pong by 50-200ns on both AMD and Intel CPUs. - For small messages try to use the btl sendi function before intializing a send request. If the sendi fails or the btl does not have a sendi function silently fallback on the standard send path. cmr=v1.7.5:reviewer=brbarret This commit was SVN r30343.	2014-01-21 15:16:21 +00:00
Rolf vandeVaart	ee7510b025	Remove redundant macro. This was from reviewed of earlier ticket. Fixes trac:3878. Reviewed by jsquyres. This commit was SVN r29581. The following Trac tickets were found above: Ticket 3878 --> https://svn.open-mpi.org/trac/ompi/ticket/3878	2013-11-01 12:19:40 +00:00
Brian Barrett	16a1166884	Remove the proc_pml and proc_bml fields from ompi_proc_t and replace with a configure-time dynamic allocation of flags. The net result for platforms which only support BTL-based communication is a reduction of 8*nprocs bytes per process. Platforms which support both MTLs and BTLs will not see a space reduction, but will now be able to safely run both the MTL and BTL side-by-side, which will prove useful. This commit was SVN r29100.	2013-08-30 16:54:55 +00:00
Rolf vandeVaart	504fa2cda9	Fix support in smcuda btl so it does not blow up when there is no CUDA IPC support between two GPUs. Also make it so CUDA IPC support is added dynamically. Fixes ticket 3531. This commit was SVN r29055.	2013-08-21 21:00:09 +00:00

1 2 3 4 5

220 Коммитов