openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	000f9eed4d	opal: add types for atomic variables This commit updates the entire codebase to use specific opal types for all atomic variables. This is a change from the prior atomic support which required the use of the volatile keyword. This is the first step towards implementing support for C11 atomics as that interface requires the use of types declared with the _Atomic keyword. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-09-14 10:48:55 -06:00
George Bosilca	366d64b7e5	Move the collective structure outside the communicator. As we changed the ABI (forcing a major release), we can limit the size of the predefined communicators by moving the collective structure outside the communicator. This might have a minimal, but unnoticeable, impact on performance. This approach has been discussed during the January 2017 devel meeting. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-27 11:54:17 -06:00
Nathan Hjelm	362ac8b87e	osc/pt2pt: fix threading issues This commit fixes a number of threading issues discovered in osc/pt2pt. This includes: - Lock the synchronization object not the module in osc_pt2pt_start. This fixes a race between the start function and processing post messages. - Always lock before calling cond_broadcast. Fixes a race between the waiting thread and signaling thread. - Make all atomically updated values volatile. - Make the module lock recursive to protect against some deadlock conditions. Will roll this back once the locks have been re-designed. - Mark incoming complete after completing an accumulate not before. This was causing an incorrect answer under certain conditions. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-01 10:33:01 -07:00
Ralph Castain	1e2019ce2a	Revert "Update to sync with OMPI master and cleanup to build" This reverts commit cb55c88a8b7817d5891ff06a447ea190b0e77479.	2016-11-22 15:03:20 -08:00
Ralph Castain	cb55c88a8b	Update to sync with OMPI master and cleanup to build Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-22 14:24:54 -08:00
Ralph Castain	6549c878a9	Silence the warnings	2016-08-22 15:35:27 -07:00
Nathan Hjelm	6aa658ae33	ompi/request: change semantics of ompi request callbacks This commit changes the sematics of ompi request callbacks. If a request's callback has freed or re-posted (using start) a request the callback must return 1 instead of OMPI_SUCCESS. This indicates to ompi_request_complete that the request should not be modified further. This fixes a race condition in osc/pt2pt that could lead to the req_state being inconsistent if a request is freed between the callback and setting the request as complete. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-17 20:14:01 -06:00
Nathan Hjelm	7589a25377	osc/pt2pt: do not repost receive from request callback This commit fixes an issue that can occur if a target gets overwhelmed with requests. This can cause osc/pt2pt to go into deep recursion with a stack like req_complete_cb -> ompi_osc_pt2pt_callback -> start -> req_complete_cb -> ... . At small scale this is fine as the recursion depth stays small but at larger scale we can quickly exhaust the stack processing frag requests. To fix the issue the request callback now simply puts the request on a list and returns. The osc/pt2pt progress function then handles the processing and reposting of the request. As part of this change osc/pt2pt can now post multiple fragment receive requests per window. This should help prevent a target from being overwhelmed. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-08-11 15:33:07 -06:00
Nathan Hjelm	e968ddfe64	start bug fixes (#1729 ) * mpi/start: fix bugs in cm and ob1 start functions There were several problems with the implementation of start in Open MPI: - There are no checks whatsoever on the state of the request(s) provided to MPI_Start/MPI_Start_all. It is erroneous to provide an active request to either of these calls. Since we are already looping over the provided requests there is little overhead in verifying that the request can be started. - Both ob1 and cm were always throwing away the request on the initial call to start and start_all with a particular request. Subsequent calls would see that the request was pml_complete and reuse it. This introduced a leak as the initial request was never freed. Since the only pml request that can be mpi complete but not pml complete is a buffered send the code to reallocate the request has been moved. To detect that a request is indeed mpi complete but not pml complete isend_init in both cm and ob1 now marks the new request as pml complete. - If a new request was needed the callbacks on the original request were not copied over to the new request. This can cause osc/pt2pt to hang as the incoming message callback is never called. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> * osc/pt2pt: add request for gc after starting a new request Starting a new receive may cause a recursive call into the pt2pt frag receive function. If this happens and the prior request is on the garbage collection list it could cause problems. This commit moves the gc insert until after the new request has been posted. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-02 20:22:40 -04:00
Nathan Hjelm	974061c38f	osc: fixed issues identified by coverity Fix CID 1324733: Null pointer dereferences (FORWARD_NULL) Fix CID 1324734: Null pointer dereferences (FORWARD_NULL) Fix CID 1324735: Null pointer dereferences (FORWARD_NULL) Fix CID 1324736: Null pointer dereferences (FORWARD_NULL) Fix CID 1324737: Null pointer dereferences (FORWARD_NULL) Fix CID 1324751: Memory - illegal accesses (USE_AFTER_FREE) Fix CID 1324750: (USE_AFTER_FREE) Fix CID 1324749: Memory - corruptions (USE_AFTER_FREE) Fix CID 1324748: Memory - illegal accesses (USE_AFTER_FREE) Fix CID 1324747: (USE_AFTER_FREE) Fix CID 1324746: Memory - corruptions (USE_AFTER_FREE) Add missing return on an error path. Fix CID 1324745: Code maintainability issues (UNUSED_VALUE) Ignore return code from barrier. It was not being used anyway. Fix CID 1324738: Null pointer dereferences (FORWARD_NULL) Fix CID 1324741: Null pointer dereferences (REVERSE_INULL) module->selected_btl can not be NULL in osc/rdma during normal operation. Removed the unnecessary NULL check. Fix CID 1324752: Memory - illegal accesses (USE_AFTER_FREE) Move ompi_osc_pt2pt_module_lock_remove to before the lock is freed. Fix CID 1324744: Uninitialized variables (UNINIT) Fix CID 1324743: Uninitialized variables (UNINIT) This array is not used unitialized but there is no reason not to use calloc here to silence the warning. The following CID is a false positive: 1324742. I will mark it such in coverity. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-22 09:23:39 -06:00
Nathan Hjelm	fd42343ff0	osc/pt2pt: reduce memory footprint of window This commit updates osc/pt2pt to allocate peer object as they are needed rather than all at once. Additionally, to help improve the memory footprint a new synchronization structure has been added. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-16 13:01:56 -06:00
Gilles Gouaillardet	21b1e7f8c5	mpi conformance: fix prototypes - MPI_Compare_and_swap - MPI_Fetch_and_op - MPI_Raccumulate - MPI_Win_detach Thanks to Michael Knobloch and Takahiro Kawashima for bringing this to our attention	2015-08-31 10:34:05 +09:00
Nathan Hjelm	80ed805a16	osc/pt2pt: fix synchronization bugs The fragment flush code tries to send the active fragment before sending any queued fragments. This could cause osc messages to arrive out-of-order at the target (bad). Ensure ordering by alway sending the active fragment after sending queued fragments. This commit also fixes a bug when a synchronization message (unlock, flush, complete) can not be packed at the end of an existing active fragment. In this case the source process will end up sending 1 more fragment than claimed in the synchronization message. To fix the issue a check has been added that fixes the fragment count if this situation is detected. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-04-06 08:39:19 -06:00
Nathan Hjelm	e68ed2876c	osc/pt2pt: threading fixes and code cleanup	2015-01-06 13:39:16 -07:00
Nathan Hjelm	9eba7b9d35	Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma	2015-01-06 13:38:55 -07:00
Nathan Hjelm	1b564f62bd	Revert "Merge pull request #275 from hjelmn/btlmod" This reverts commit ccaecf0fd6c862877e6a1e2643f95fa956c87769, reversing changes made to 6a19bf85dde5306f559f09952cf3919d97f52502.	2014-11-19 23:22:43 -07:00
Nathan Hjelm	22625b005b	osc/pt2pt: threading fixes and code cleanup	2014-11-19 11:33:04 -07:00
Nathan Hjelm	29e4e1c90a	Rename the OSC "rdma" component to pt2p to better reflect that it does not actually use btl rdma	2014-11-19 11:33:03 -07:00

18 Коммитов