openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	e5c7512692	Merge pull request #1983 from hjelmn/request_cb ompi/request: change semantics of ompi request callbacks	2016-08-18 08:31:56 -06:00
Nathan Hjelm	6aa658ae33	ompi/request: change semantics of ompi request callbacks This commit changes the sematics of ompi request callbacks. If a request's callback has freed or re-posted (using start) a request the callback must return 1 instead of OMPI_SUCCESS. This indicates to ompi_request_complete that the request should not be modified further. This fixes a race condition in osc/pt2pt that could lead to the req_state being inconsistent if a request is freed between the callback and setting the request as complete. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-17 20:14:01 -06:00
Nathan Hjelm	40b70889e5	osc/pt2pt: make receive count an unsigned int This receive_count MCA variable should never be negative. Change it to an unsigned int. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-08-17 08:14:24 -06:00
Gilles Gouaillardet	8faa1edafa	osc/pt2pt: silence misc warnings	2016-08-17 14:24:14 +09:00
Nathan Hjelm	9444df1eb7	osc/pt2pt: make lock_all locking on-demand The original lock_all algorithm in osc/pt2pt sent a lock message to each peer in the communicator even if the peer is never the target of an operation. Since this scales very poorly the implementation has been replaced by one that locks the remote peer on first communication after a call to MPI_Win_lock_all. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-11 15:33:07 -06:00
Nathan Hjelm	7589a25377	osc/pt2pt: do not repost receive from request callback This commit fixes an issue that can occur if a target gets overwhelmed with requests. This can cause osc/pt2pt to go into deep recursion with a stack like req_complete_cb -> ompi_osc_pt2pt_callback -> start -> req_complete_cb -> ... . At small scale this is fine as the recursion depth stays small but at larger scale we can quickly exhaust the stack processing frag requests. To fix the issue the request callback now simply puts the request on a list and returns. The osc/pt2pt progress function then handles the processing and reposting of the request. As part of this change osc/pt2pt can now post multiple fragment receive requests per window. This should help prevent a target from being overwhelmed. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-08-11 15:33:07 -06:00
Nathan Hjelm	11c853d05e	osc/pt2pt: do not set rdma_frag after start It is possible for the start call to complete the requests. For this reason the module rdma_frag field should be filled in before start is called. If the request completes the completion callback will reset the rdma_frag field to NULL. Fixes a bug discovered by @tkordenbrock. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-03 15:20:36 -06:00
Nathan Hjelm	aac611237b	opal/thread: clean up and add additional OPAL_THREAD macros This commit expands the OPAL_THREAD macros to include 32- and 64-bit atomic swap. Additionally, macro declararations have been updated to include both OPAL_THREAD_* and OPAL_ATOMIC_*. Before this commit the former was used with add and the later with cmpset. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-07-28 09:23:14 -06:00
Pascal Deveze	f19a2b961c	osc/portals4: Correct an error in an if statement	2016-07-18 13:16:12 +02:00
Pascal Deveze	81823d7a63	osc/portals4: Store the no_locks parameter in osc_portals4_component.no_locks	2016-07-18 11:51:52 +02:00
Pascal Deveze	76b38651da	osc/portals4: For the contiguous datatype, take into account the lower bound before calling portals4	2016-07-18 11:20:50 +02:00
Pascal Deveze	7aaf16e7fe	osc/portals4: Put/Get splitting because Portals4 may restrict sizes	2016-07-18 10:49:28 +02:00
Pascal Deveze	025201b459	osc/portals4: set the initial value of req_status.MPI_ERROR to MPI_SUCCESS	2016-07-18 09:52:56 +02:00
Pascal Deveze	aa0d687a0a	osc/portals4: Display an ouput message if ompi_osc_portals4_get_dt() or ompi_osc_portals4_get_op() returns an error	2016-07-18 09:52:56 +02:00
Pascal Deveze	c4181909a4	osc/portals4: Be sure that the ME are operationnal (wait for the PTL_EVENT_LINK)	2016-07-18 09:52:56 +02:00
Pascal Deveze	e99e7d08ed	osc/portals4: For the ME, use the uid from PtlGetUid instead of PTL_UID_ANY	2016-07-18 09:52:56 +02:00
Pascal Deveze	56b36eeb7e	osc/portals4: Format of "target_disp" is OPAL_PTRDIFF_TYPE and %lu is the appropriate format to display it.	2016-07-18 09:52:55 +02:00
Pascal Deveze	a76566c754	osc/portals4: To allocate a PT, use REQ_OSC_TABLE_ID and test that the right ID is allocated	2016-07-18 09:52:55 +02:00
Nathan Hjelm	b47208e909	osc/rdma: fix bug in CAS This commit fixes a bug in the RDMA compare-and-swap implementation that caused the origin value to always be written even if the compare should have failed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-07-11 09:54:23 -06:00
Nathan Hjelm	2409024c17	osc/rdma: fix typo Need to increment the total size after checking the local offset not before. This typo causes large allocations with MPI_Win_allocate() to fail. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-21 09:50:29 -06:00
Nathan Hjelm	e968ddfe64	start bug fixes (#1729 ) * mpi/start: fix bugs in cm and ob1 start functions There were several problems with the implementation of start in Open MPI: - There are no checks whatsoever on the state of the request(s) provided to MPI_Start/MPI_Start_all. It is erroneous to provide an active request to either of these calls. Since we are already looping over the provided requests there is little overhead in verifying that the request can be started. - Both ob1 and cm were always throwing away the request on the initial call to start and start_all with a particular request. Subsequent calls would see that the request was pml_complete and reuse it. This introduced a leak as the initial request was never freed. Since the only pml request that can be mpi complete but not pml complete is a buffered send the code to reallocate the request has been moved. To detect that a request is indeed mpi complete but not pml complete isend_init in both cm and ob1 now marks the new request as pml complete. - If a new request was needed the callbacks on the original request were not copied over to the new request. This can cause osc/pt2pt to hang as the incoming message callback is never called. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> * osc/pt2pt: add request for gc after starting a new request Starting a new receive may cause a recursive call into the pt2pt frag receive function. If this happens and the prior request is on the garbage collection list it could cause problems. This commit moves the gc insert until after the new request has been posted. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-02 20:22:40 -04:00
bosilca	b90c83840f	Refactor the request completion (#1422 ) * Remodel the request. Added the wait sync primitive and integrate it into the PML and MTL infrastructure. The multi-threaded requests are now significantly less heavy and less noisy (only the threads associated with completed requests are signaled). * Fix the condition to release the request.	2016-05-24 18:20:51 -05:00
Jeff Squyres	33dd8ca81e	osc_rdma_peer: properly include ompi_config.h Thanks to Paul Hargrove for reporting. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-03 07:39:55 -07:00
Nathan Hjelm	d3d779f6d9	osc/rdma: clear all_sync object when obtaining a lock This commit fixes a bad synchronization detection bug that occurs when mixing MPI_Win_fence() and MPI_Win_lock(). If no communication has occurred in the fence epoch it is safe to just clear the all_sync object (it was set up by fence). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-02 15:28:47 -06:00
Nathan Hjelm	7bda3eb2dc	osc/rdma: fix global index array calculation This commit fixes a bug that occurs when ranks are either not mapped evenly or by something other than core. Fixes open-mpi/ompi#1599 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-28 19:11:11 -06:00
Nathan Hjelm	34ff6293bd	osc/pt2pt: do not drop/reacquire the ompi_request_lock This lock is now recursive so it is safe to call into the pml without dropping the lock. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-26 14:19:38 -06:00
Nathan Hjelm	3245428e82	Merge pull request #1535 from kawashima-fj/pr/osc-pt2pt-header-fix osc/pt2pt: Fix a struct name typo	2016-04-14 15:55:25 -06:00
KAWASHIMA Takahiro	35ea9e5c3c	Add FUJITSU copyright	2016-04-12 13:47:53 +09:00
KAWASHIMA Takahiro	39bcbe439a	osc/pt2pt: Fix a struct name typo Fortunately the sizes of `ompi_osc_pt2pt_header_put_t` and `ompi_osc_pt2pt_header_get_t` are same. So this doesn't affect the behavior.	2016-04-11 20:55:22 +09:00
KAWASHIMA Takahiro	28a0577364	osc/pt2pt: Insert breaks in long lines	2016-04-11 19:06:01 +09:00
KAWASHIMA Takahiro	5ac95df9dc	osc/pt2pt: use two distinct "namespaces" for tags - revised Before this commit, a same PML tag may be used for distinct communications for long messages. For example, consider a condition where rank A calls ```MPI_PUT``` targeting rank B and rank B calls ```MPI_GET``` targeting rank A simultaneously. A PML tag for the ```MPI_PUT``` is acquired on rank A and is used for the long-message communication from rank A to rank B. A PML tag for the ```MPI_GET``` is acquired on rank B and is used for the long-message communication from rank A to rank B. These two tags may become a same value because they are managed independently on each rank. This will cause a data corruption. This commit separates the tag used in a single RMA communication call, one for communication from an origin to a target, and one for communication from a target to an origin. A "base" tag is acquired using ```get_tag``` function and PML tag is caluculated from the base tag by ```tag_to_target``` and ```tag_to_origin``` function.	2016-04-11 19:05:20 +09:00
KAWASHIMA Takahiro	3576ecafa7	Revert "osc/pt2pt: use two distinct "namespaces" for tags" This reverts commit `06ecdb6aa7` to reimplement the fix completely.	2016-04-11 19:04:11 +09:00
Ryan Grant	7cdf50533c	Merge pull request #1314 from francois-wellenreiter/osc_disable_portals4_evt_send OSC portals4 : do not generate an EVENT_SEND to avoid to filter it	2016-04-07 10:04:27 -06:00
Nathan Hjelm	2ed4501490	osc: fix coverity issues Fix CID 1324726 (#1 of 1): Free of address-of expression (BAD_FREE): Indeed, if a lock conflicts with the lock_all we will end up trying to free an invalid pointer. Fix CID 1328826 (#1 of 1): Dereference after null check (FORWARD_NULL): This was intentional but it would be a good idea to check for module->comm being non_NULL to be safe. Also cleaned out some checks for NULL before free(). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-03-18 09:11:48 -06:00
Nathan Hjelm	deae9e52bf	Merge pull request #1259 from kawashima-fj/pr/osc-sm-align osc/sm: Fix a bus error on MPI_WIN_{POST,START}.	2016-03-15 09:13:38 -06:00
George Bosilca	7c574a3530	Typo.	2016-02-07 07:22:22 +02:00
Nathan Hjelm	5b9c82a964	osc/pt2pt: bug fixes This commit fixes several bugs identified by @ggouaillardet and MTT: - Fix SEGV in long send completion caused by missing update to the request callback data. - Add an MPI_Barrier to the fence short-cut. This fixes potential semantic issues where messages may be received before fence is reached. - Ensure fragments are flushed when using request-based RMA. This allows MPI_Test/MPI_Wait/etc to work as expected. - Restore the tag space back to 16-bits. It was intended that the space be expanded to 32-bits but the required change to the fragment headers was not committed. The tag space may be expanded in a later commit. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-04 16:59:39 -07:00
Gilles Gouaillardet	6eac6a8b00	osc/sm: create datafile into the per proc directory in order to make it unique per communicator Thanks Peter Wind for the report	2016-02-03 10:12:37 +09:00
Nathan Hjelm	519fffb65e	osc/pt2pt: eager sends are always active if MPI_MODE_NOCHECK is used This commit fixes open-mpi/ompi#1299. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-02 12:44:17 -07:00
Nathan Hjelm	d7264aa613	osc/pt2pt: various threading fixes This commit fixes several bugs identified by a new multi-threaded RMA benchmarking suite. The following bugs have been identified and fixed: - The code that signaled the actual start of an access epoch changed the eager_send_active flag on a synchronization object without holding the object's lock. This could cause another thread waiting on eager sends to block indefinitely because the entirety of ompi_osc_pt2pt_sync_expected could exectute between the check of eager_send_active and the conditon wait of ompi_osc_pt2pt_sync_wait. - The bookkeeping of fragments could get screwed up when performing long put/accumulate operations from different threads. This was caused by the fragment flush code at the end of both put and accumulate. This code was put in place to avoid sending a large number of unexpected messages to a peer. To fix the bookkeeping issue we now 1) wait for eager sends to be active before stating any large isend's, and 2) keep track of the number of large isends associated with a fragment. If the number of large isends reaches 32 the active fragment is flushed. - Use atomics to update the large receive/send tag counters. This prevents duplicate tags from being used. The tag space has also been updated to use the entire 16-bits of the tag space. These changes should also fix open-mpi/ompi#1299. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-02-02 12:33:33 -07:00
Nathan Hjelm	a19c265ab5	osc/rdma: fix typo in ompi_osc_rdma_complete_atomic The typo caused SEGVs on systems with only fetching atomic support. Fixes open-mpi/ompi#1329 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-26 15:44:07 -07:00
Nathan Hjelm	45da311473	osc/rdma: fix hang when performing large unaligned gets This commit adds code to handle large unaligned gets. There are two possible code paths for these transactions: 1) The remote region and local region have the same alignment. In this case the get will be broken down into at most three get transactions: 1 transaction to get the unaligned start of the region (buffered), 1 transaction to get the aligned portion of the region, and 1 transaction to get the end of the region. 2) The remote and local regions do not have the same alignment. This should be an uncommon case and is not optimized. In this case a buffer is allocated and registered locally to hold the aligned data from the remote region. There may be cases where this fails (low memory, can't register memory). Those conditions are unlikely and will be handled later. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-22 21:06:46 -07:00
Nathan Hjelm	49d2f44b97	osc/rdma: use correct endpoint for local state If atomics are not globally visible (cpu and nic atomics do not mix) then a btl endpoint must be used to access local ranks. To avoid issues that are caused by having the same region registered with multiple handles osc/rdma was updated to always use the handle for rank 0. There was a bug in the update that caused osc/rdma to continue using the local endpoint for accessing the state even though the pointer/handle are not valid for that endpoint. This commit fixes the bug. Fixes open-mpi/ompi#1241. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-22 10:41:27 -07:00
Nathan Hjelm	6180386bea	osc/rdma: disable put aggregation when using threads Optimizing put aggregation in the presence of threads will require a redesign of the code. For now just ensure that put aggregation is turned off when MPI_THREAD_MULTIPLE is enabled. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-01-21 15:50:35 -07:00
Francois WELLENREITER	411b7301c3	OSC portals4 : do not generate an EVENT_SEND to avoid to filter it	2016-01-20 11:47:46 +01:00
KAWASHIMA Takahiro	ad26899110	osc/sm: Fix a bus error on MPI_WIN_{POST,START}. A bus error occurs in sm OSC under the following conditions. - sparc64 or any other architectures which need strict alignment. - `MPI_WIN_POST` or `MPI_WIN_START` is called for a window created by sm OSC. - The communicator size is odd and greater than 3. The lines 283-285 in current `ompi/mca/osc/sm/osc_sm_component.c` has the following code. ```c module->global_state = (ompi_osc_sm_global_state_t ) (module->segment_base); module->node_states = (ompi_osc_sm_node_state_t ) (module->global_state + 1); module->posts[0] = (uint64_t *) (module->node_states + comm_size); ``` The size of `ompi_osc_sm_node_state_t` is multiples of 4 but not multiples of 8. So if `comm_size` is odd, `module->posts[0]` does not aligned to 8. This causes a bus error when accessing `module->posts[i][j]`. This patch fixes the alignment of `module->posts[0]` by setting `module->posts[0]` first.	2016-01-05 19:04:53 +09:00
Gilles Gouaillardet	06ecdb6aa7	osc/pt2pt: use two distinct "namespaces" for tags	2016-01-05 16:57:37 +09:00
Gilles Gouaillardet	071ae39a44	osc/rdma: add missing #include <alloca.h>	2015-12-24 14:33:58 +09:00
Ralph Castain	ac6289dca6	Cleanup the warnings from the ompi layer when compiling optimized under Mac OSX Cleanup per George's comments	2015-12-17 17:39:15 -08:00
Ralph Castain	3a56f0d34b	Create the pmix external component. Fix a few places where opal/util/argv.h were required when building with an external pmix (go figure). NOTE: Building with external pmix requires that you also build with external libevent and hwloc libraries. Detect this at configure and error out with large message if this requirement is violated. Closes #1204 (replaces it) Fixes #1064	2015-12-15 15:26:13 -08:00

1 2 3 4 5 ...

411 Коммитов