openmpi

Автор	SHA1	Сообщение	Дата
Joshua Ladd	aa8f7f4ede	Merge pull request #7893 from bureddy/cuda-ucx UCX: initialize cuda from ucx pml component	2020-07-13 14:18:48 -04:00
bosilca	1f237f5fc9	Merge pull request #7419 from bosilca/topic/avx512 Add support for AVX512/AVX2/SSE/MMX	2020-07-13 11:56:50 -04:00
Devendar Bureddy	2547e24c55	UCX: initialize cuda from ucx pml component Signed-off-by: Devendar Bureddy <devendar@mellanox.com>	2020-07-12 18:41:40 +03:00
dongzhong	14b3c70628	Add supports for MPI_OP using AVX512, AVX2 and MMX Add logic to handle different architectural capabilities Detect the compiler flags necessary to build specialized versions of the MPI_OP. Once the different flavors (AVX512, AVX2, AVX) are built, detect at runtime which is the best match with the current processor capabilities. Add validation checks for loadu 256 and 512 bits. Add validation tests for MPI_Op. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: dongzhong <zhongdong0321@hotmail.com> Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-07-10 21:25:35 -04:00
Nathan Hjelm	88f51fbb8e	btl: change argument type of BTL receive callbacks This commit updates the btl interface to change the parameters passed to receive callbacks. The interface used to pass the tag, a btl base descriptor, and the callback context. Most of the values in the btl base descriptor were unused and only helped simplify the callbacks from the self btl. All of the arguments have now been replaced with a single receive callback descriptor. This descriptor contains the incoming endpoint, data segment(s), tag, and callback context. All btls have been updated to use the new callback and the btl interface version has been bumped to v3.2.0. As part of this change the descriptor argument (and the segments contained within it) have been marked as const. The were treated as const before but this change could allow the compiler to make better optimization decisions and will enforce that the callback does not attempt to change the data in the descriptor. Signed-off-by: Nathan Hjelm <hjelmn@google.com>	2020-07-08 07:38:46 -07:00
Austen Lauria	dbc56758b6	Merge pull request #7802 from badgerious/mtl_ofi_cqread_break mtl/ofi: break from progress loop when events are read	2020-07-06 09:20:07 -04:00
Austen Lauria	9b86f1442a	Merge pull request #7823 from jsquyres/pr/put-osc-pt2pt-back Fix typos in OSC RDMA BTL allowlist	2020-06-30 10:55:16 -04:00
Todd Kordenbrock	4358e75a75	Merge pull request #7866 from tkordenbrock/topic/master/portals4.fix-inappropriate-use-of-abort portals4: fix inappropriate use of abort() in mtl-portals4 and coll-portals4 components	2020-06-30 08:46:03 -05:00
Austen Lauria	a26e494953	Merge pull request #7882 from devreal/osc-rdma-noncontig-requests osc rdma: check for outstanding fragments before completing a request (II)	2020-06-29 09:51:47 -04:00
Joseph Schuchart	caed3b2eed	osc rdma: check for outstanding fragments before completing a request in ompi_osc_rdma_put_complete_flush as well Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-26 22:19:21 +02:00
Austen Lauria	5fa7ca7c15	Merge pull request #7858 from tkordenbrock/topic/master/portals4.call-pml-add_procs mtl-portals4: use the active PML to call add_procs()	2020-06-26 14:56:57 -04:00
Joseph Schuchart	2c36d37033	Merge pull request #7871 from devreal/osc-ucx-rget-rput-fetch-alignment OSC UCX: make sure no-op fetch in rget/rput is properly aligned	2020-06-26 15:58:51 +02:00
Joseph Schuchart	1314ef7668	OSC UCX: Remove stale free from merge conflict Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-25 19:01:53 +02:00
Joseph Schuchart	634f67b216	Merge pull request #7843 from devreal/clang-tidy-free Some fixups for issues detected by clang-tidy	2020-06-25 17:30:04 +02:00
Artem Polyakov	907f4e196a	Merge pull request #6980 from devreal/ucx-acc-singel-intrinsics UCX osc: add support for acc_single_intrinsic	2020-06-25 07:39:42 -07:00
Joseph Schuchart	c1f7776341	OSC UCX: make sure no-op fetch in rget/rput is properly aligned Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-25 16:16:58 +02:00
Austen Lauria	7814f4195c	Merge pull request #7845 from devreal/stack-fixes Fix unexpected optimizations detected by STACK	2020-06-25 08:15:09 -04:00
Todd Kordenbrock	04b94637dd	mtl-portals4: replace abort() with ompi_rte_abort() coll-portals4: replace abort() with ompi_rte_abort() Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>	2020-06-24 11:31:26 -05:00
Joseph Schuchart	e3b417c776	Add missing copyright header Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	e215eff43d	UCX osc: atomic fetch-and-op only on 32 and 64bit values Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	434c9055ee	UCX osc: fall back to get-compare-put for unsupported datatypes Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	7d5a6e3e8b	UCX osc: safely load/store 64bit integer from variable size pointer Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	5f786bcce4	UCX osc: make MPI_Fetch_and_op non-blocking if possible Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	d8696aa8c4	UCX osc: centralize decision on whether to use AMOs Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	427d4bd226	UCX osc: do not acquire accumulate lock if exclusive lock was taken Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	471d76777a	UCX osc: fence active operations before releasing accumulate lock and free memory if required Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	4d7a3856fa	UCX osc: Use accumulate for operations/datatypes that are not covered by UCX Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	899f58cef5	UCX osc: simplify output address computation Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	d888b4fd76	UCX osc: correctly handle MPI_NO_OP Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	7cfc0e71da	UCX osc: allow to asynchronously compare-and-swap Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	557ae80858	UCX osc: allow for overlap with (some) request-based atomic operations Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	1a3c6bbf35	UCX osc: re-use value returned by cswap to save additional get Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	8606a02b87	UCX osc: fix macro parameter name usage in OMPI_OSC_UCX_REQUEST_RETURN Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	d448efd49c	UCX osc: properly clean up requests in case of errors Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Joseph Schuchart	73a183408f	UCX osc: add support for acc_single_intrinsic info key / mca param Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-23 12:41:52 +02:00
Todd Kordenbrock	0a637967fa	Use the active PML to call add_procs() ompi_mtl_portals4_get_endpoint() was incorrectly making a direct call to ompi_mtl_portals4_add_procs(). Instead use the actve PML to call add_procs(). If add_procs() fails, call ompi_rte_abort() to terminate the job. Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>	2020-06-22 16:56:16 -05:00
Nathan Hjelm	a3e276fb03	Merge pull request #7829 from devreal/osc-rdma-noncontig-requests osc rdma: check for outstanding transfers before completing a request	2020-06-22 08:43:29 -06:00
Joseph Schuchart	d9d18acd49	Fix unintended optimizations detected by STACK Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-22 10:32:22 +02:00
Joseph Schuchart	d310a20ecb	Add missing free calls to mca_topo_treematch_dist_graph_create Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-19 14:30:07 +02:00
Joseph Schuchart	e23dcca448	Add missing free calls to osc/ucx Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-19 14:30:07 +02:00
Joseph Schuchart	ede3c0840a	Add missing free calls to osc/sm component_select Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-19 12:33:34 +02:00
Joseph Schuchart	d9b11b29cd	Properly free memory in case of error in mca_common_ompio_prepare_to_group Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-19 12:31:14 +02:00
Joseph Schuchart	ed1ca1a84b	Don't free memory escaping mca_common_ompio_prepare_to_group Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-19 12:30:38 +02:00
Joseph Schuchart	9a60f5b7fb	Add missing free calls to ompi_coll_base_reduce_intra_basic_linear Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-19 12:27:36 +02:00
Joseph Schuchart	8e24c0d532	Add missing free calls to ompi_coll_base_allgather_intra_bruck Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-19 12:24:56 +02:00
Jeff Squyres	18cfcc8b70	osc/rdma: update supported BTL list "openib" no longer exists. "tcp" had a typo. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2020-06-16 09:11:01 -07:00
Joseph Schuchart	85ed26f2f8	osc rdma: check for outstanding fragments before completing a request Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-16 17:45:00 +02:00
Edgar Gabriel	4a8a330bba	common/ompio: use avg. file view size in the aggregator selection logic This is a fix based on a bugreport on github/mailing list from CGNS. The core of the problem was that different processes entered different branches of our aggregator selection logic, due to the fact that in some cases processes had a matching file_view size and contiguous chunk size (thus assuming 1-D distribution), and some processes did not (thus assuming 2-D distribution). The fix is to calculate the avg. file view size across all processes and use this value, thus ensuring that all processes enter the same branch. Fixes issue #7809 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2020-06-15 09:17:44 -05:00
Eric Badger	35dbc18df5	mtl/ofi: do not repeat fi_cq_read() after events are read Once any number of events are read, return immediately, rather than waiting for fi_cq_read() to return FI_EAGAIN or an error. This can improve observed latency if the user application is in a blocking call waiting for us to return. Deleting the while loop here also means ofi_progress_event_count serves as an upper bound for the total number of events read in a single call (with the while loop we might read far more, as long as new events continue to arrive). Signed-off-by: Eric Badger <eric@badgerio.us>	2020-06-11 10:07:37 -07:00
Sergey Oblomov	df0f2ac026	OMPI/HCOLL: fixed typo in vars description Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2020-05-29 20:13:35 +03:00

1 2 3 4 5 ...

7145 Коммитов