openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	7f4872d483	osc/rdma: performance improvments and bug fixes This commit is a large update to the osc/rdma component. Included in this commit: - Add support for using hardware atomics for fetch-and-op and single count accumulate when using the accumulate lock. This will improve the performance of these operations even when not setting the single intrinsic info key. - Rework how large accumulates are done. They now block on the get operation to fix some bugs discovered by an IBM one-sided test. I may roll back some of the changes if the underlying bug in the original design is discovered. There appear to be no real difference (on the hardware this was tested with) in performance so its probably a non-issue. References #2530. - Add support for an additional lock-all algorithm: on-demand. The on-demand algorithm will attempt to acquire the peer lock when starting an RMA operation. The lock algorithm default has not changed. The algorithm can be selected by setting the osc_rdma_locking_mode MCA variable. The valid values are two_level and on_demand. - Make use of the btl_flush function if available. This can improve performance with some btls. - When using btl_flush do not keep track of the number of put operations. This reduces the number of atomic operations in the critical path. - Make the window buffers more friendly to multi-threaded applications. This was done by dropping support for multiple buffers per MPI window. I intend to re-add that support once the underlying performance bug under the old buffering scheme is fixed. - Fix a bug in request completion in the accumulate, get, and put paths. This also helps with #2530. - General code cleanup and fixes. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-03-15 14:53:53 -06:00
Nathan Hjelm	45db3637af	osc/rdma: bug fixes This commit fixes the following bugs: - Allow a btl to be used for communication if it can communicate with all non-self peers and it supports global atomic visibility. In this case CPU atomics can be used for self and the btl for any other peer. - It was possible to get into a state where different threads of an MPI process could issue conflicting accumulate operations to a remote peer. To eliminate this race we now update the peer flags atomically. - Queue up and re-issue put operations that failed during a BTL callback. This can occur during an accumulate operation. This was an unhandled error case. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-11-29 12:43:58 -07:00
Ralph Castain	1e2019ce2a	Revert "Update to sync with OMPI master and cleanup to build" This reverts commit cb55c88a8b7817d5891ff06a447ea190b0e77479.	2016-11-22 15:03:20 -08:00
Ralph Castain	cb55c88a8b	Update to sync with OMPI master and cleanup to build Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-22 14:24:54 -08:00
Nathan Hjelm	1ce5847e8b	osc/rdma: add support for network AMOs This commit adds support for using network AMOs for MPI_Accumulate, MPI_Fetch_and_op, and MPI_Compare_and_swap. This support is only enabled if the ompi_single_intrinsic info key is specified or the acc_single_interinsic MCA variable is set. This configuration indicates to this implementation that no long accumulates will be performed since these do not currently mix with the AMO implementation. This commit also cleans up the code somwhat. This includes removing unnecessary struct keywords where the type is also typedef'd. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-09-01 15:47:33 -06:00
Jeff Squyres	33dd8ca81e	osc_rdma_peer: properly include ompi_config.h Thanks to Paul Hargrove for reporting. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-03 07:39:55 -07:00
Nathan Hjelm	7bda3eb2dc	osc/rdma: fix global index array calculation This commit fixes a bug that occurs when ranks are either not mapped evenly or by something other than core. Fixes open-mpi/ompi#1599 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-28 19:11:11 -06:00
Gilles Gouaillardet	071ae39a44	osc/rdma: add missing #include <alloca.h>	2015-12-24 14:33:58 +09:00
Ralph Castain	ac6289dca6	Cleanup the warnings from the ompi layer when compiling optimized under Mac OSX Cleanup per George's comments	2015-12-17 17:39:15 -08:00
Nathan Hjelm	9476c7bbca	osc/rdma: use standard verbosity levels Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-10-21 12:31:41 -06:00
Nathan Hjelm	d8df9d414d	osc/rdma: add true RDMA one-sided component This commit adds support for performing one-sided operations over supported hardware (currently Infiniband and Cray Gemini/Aries). This component is still undergoing active development. Current features: - Use network atomic operations (fadd, cswap) for implementing locking and PSCW synchronization. - Aggregate small contiguous puts. - Reduced memory footprint by storing window data (pointer, keys, etc) at the lowest rank on each node. The data is fetched as each process needs to communicate with a new peer. This is a trade-off between the performance of the first operation on a peer and the memory utilization of a window. TODO: - Add support for the accumulate_ops info key. If it is known that the same op or same op/no op is used it may be possible to use hardware atomics for fetch-and-op and compare-and-swap. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-09-16 15:01:33 -06:00

11 Коммитов