1
1

7145 Коммитов

Автор SHA1 Сообщение Дата
Joshua Ladd
aa8f7f4ede
Merge pull request #7893 from bureddy/cuda-ucx
UCX: initialize cuda from ucx pml component
2020-07-13 14:18:48 -04:00
bosilca
1f237f5fc9
Merge pull request #7419 from bosilca/topic/avx512
Add support for AVX512/AVX2/SSE/MMX
2020-07-13 11:56:50 -04:00
Devendar Bureddy
2547e24c55 UCX: initialize cuda from ucx pml component
Signed-off-by: Devendar Bureddy <devendar@mellanox.com>
2020-07-12 18:41:40 +03:00
dongzhong
14b3c70628
Add supports for MPI_OP using AVX512, AVX2 and MMX
Add logic to handle different architectural capabilities
Detect the compiler flags necessary to build specialized
versions of the MPI_OP. Once the different flavors (AVX512,
AVX2, AVX) are built, detect at runtime which is the best
match with the current processor capabilities.

Add validation checks for loadu 256 and 512 bits.
Add validation tests for MPI_Op.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-10 21:25:35 -04:00
Nathan Hjelm
88f51fbb8e btl: change argument type of BTL receive callbacks
This commit updates the btl interface to change the parameters
passed to receive callbacks. The interface used to pass the tag,
a btl base descriptor, and the callback context. Most of the
values in the btl base descriptor were unused and only helped
simplify the callbacks from the self btl. All of the arguments
have now been replaced with a single receive callback descriptor.
This descriptor contains the incoming endpoint, data segment(s),
tag, and callback context. All btls have been updated to use
the new callback and the btl interface version has been bumped
to v3.2.0.

As part of this change the descriptor argument (and the segments
contained within it) have been marked as const. The were treated
as const before but this change could allow the compiler to make
better optimization decisions and will enforce that the callback
does not attempt to change the data in the descriptor.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-07-08 07:38:46 -07:00
Austen Lauria
dbc56758b6
Merge pull request #7802 from badgerious/mtl_ofi_cqread_break
mtl/ofi: break from progress loop when events are read
2020-07-06 09:20:07 -04:00
Austen Lauria
9b86f1442a
Merge pull request #7823 from jsquyres/pr/put-osc-pt2pt-back
Fix typos in OSC RDMA BTL allowlist
2020-06-30 10:55:16 -04:00
Todd Kordenbrock
4358e75a75
Merge pull request #7866 from tkordenbrock/topic/master/portals4.fix-inappropriate-use-of-abort
portals4: fix inappropriate use of abort() in mtl-portals4 and coll-portals4 components
2020-06-30 08:46:03 -05:00
Austen Lauria
a26e494953
Merge pull request #7882 from devreal/osc-rdma-noncontig-requests
osc rdma: check for outstanding fragments before completing a request (II)
2020-06-29 09:51:47 -04:00
Joseph Schuchart
caed3b2eed osc rdma: check for outstanding fragments before completing a request in ompi_osc_rdma_put_complete_flush as well
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-26 22:19:21 +02:00
Austen Lauria
5fa7ca7c15
Merge pull request #7858 from tkordenbrock/topic/master/portals4.call-pml-add_procs
mtl-portals4: use the active PML to call add_procs()
2020-06-26 14:56:57 -04:00
Joseph Schuchart
2c36d37033
Merge pull request #7871 from devreal/osc-ucx-rget-rput-fetch-alignment
OSC UCX: make sure no-op fetch in rget/rput is properly aligned
2020-06-26 15:58:51 +02:00
Joseph Schuchart
1314ef7668 OSC UCX: Remove stale free from merge conflict
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-25 19:01:53 +02:00
Joseph Schuchart
634f67b216
Merge pull request #7843 from devreal/clang-tidy-free
Some fixups for issues detected by clang-tidy
2020-06-25 17:30:04 +02:00
Artem Polyakov
907f4e196a
Merge pull request #6980 from devreal/ucx-acc-singel-intrinsics
UCX osc: add support for acc_single_intrinsic
2020-06-25 07:39:42 -07:00
Joseph Schuchart
c1f7776341 OSC UCX: make sure no-op fetch in rget/rput is properly aligned
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-25 16:16:58 +02:00
Austen Lauria
7814f4195c
Merge pull request #7845 from devreal/stack-fixes
Fix unexpected optimizations detected by STACK
2020-06-25 08:15:09 -04:00
Todd Kordenbrock
04b94637dd mtl-portals4: replace abort() with ompi_rte_abort()
coll-portals4: replace abort() with ompi_rte_abort()

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2020-06-24 11:31:26 -05:00
Joseph Schuchart
e3b417c776 Add missing copyright header
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
e215eff43d UCX osc: atomic fetch-and-op only on 32 and 64bit values
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
434c9055ee UCX osc: fall back to get-compare-put for unsupported datatypes
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
7d5a6e3e8b UCX osc: safely load/store 64bit integer from variable size pointer
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
5f786bcce4 UCX osc: make MPI_Fetch_and_op non-blocking if possible
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
d8696aa8c4 UCX osc: centralize decision on whether to use AMOs
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
427d4bd226 UCX osc: do not acquire accumulate lock if exclusive lock was taken
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
471d76777a UCX osc: fence active operations before releasing accumulate lock and free memory if required
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
4d7a3856fa UCX osc: Use accumulate for operations/datatypes that are not covered by UCX
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
899f58cef5 UCX osc: simplify output address computation
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
d888b4fd76 UCX osc: correctly handle MPI_NO_OP
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
7cfc0e71da UCX osc: allow to asynchronously compare-and-swap
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
557ae80858 UCX osc: allow for overlap with (some) request-based atomic operations
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
1a3c6bbf35 UCX osc: re-use value returned by cswap to save additional get
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
8606a02b87 UCX osc: fix macro parameter name usage in OMPI_OSC_UCX_REQUEST_RETURN
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
d448efd49c UCX osc: properly clean up requests in case of errors
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
73a183408f UCX osc: add support for acc_single_intrinsic info key / mca param
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Todd Kordenbrock
0a637967fa Use the active PML to call add_procs()
ompi_mtl_portals4_get_endpoint() was incorrectly making a direct
call to ompi_mtl_portals4_add_procs().  Instead use the actve PML
to call add_procs().  If add_procs() fails, call ompi_rte_abort()
to terminate the job.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2020-06-22 16:56:16 -05:00
Nathan Hjelm
a3e276fb03
Merge pull request #7829 from devreal/osc-rdma-noncontig-requests
osc rdma: check for outstanding transfers before completing a request
2020-06-22 08:43:29 -06:00
Joseph Schuchart
d9d18acd49 Fix unintended optimizations detected by STACK
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-22 10:32:22 +02:00
Joseph Schuchart
d310a20ecb Add missing free calls to mca_topo_treematch_dist_graph_create
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-19 14:30:07 +02:00
Joseph Schuchart
e23dcca448 Add missing free calls to osc/ucx
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-19 14:30:07 +02:00
Joseph Schuchart
ede3c0840a Add missing free calls to osc/sm component_select
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-19 12:33:34 +02:00
Joseph Schuchart
d9b11b29cd Properly free memory in case of error in mca_common_ompio_prepare_to_group
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-19 12:31:14 +02:00
Joseph Schuchart
ed1ca1a84b Don't free memory escaping mca_common_ompio_prepare_to_group
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-19 12:30:38 +02:00
Joseph Schuchart
9a60f5b7fb Add missing free calls to ompi_coll_base_reduce_intra_basic_linear
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-19 12:27:36 +02:00
Joseph Schuchart
8e24c0d532 Add missing free calls to ompi_coll_base_allgather_intra_bruck
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-19 12:24:56 +02:00
Jeff Squyres
18cfcc8b70 osc/rdma: update supported BTL list
"openib" no longer exists.

"tcp" had a typo.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-06-16 09:11:01 -07:00
Joseph Schuchart
85ed26f2f8 osc rdma: check for outstanding fragments before completing a request
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-16 17:45:00 +02:00
Edgar Gabriel
4a8a330bba common/ompio: use avg. file view size in the aggregator selection logic
This is a fix  based on a bugreport on github/mailing list from CGNS.
The core of the problem was that different processes entered different branches of
our aggregator selection logic, due to the fact that in some cases processes had
a matching file_view size and contiguous chunk size (thus assuming 1-D distribution),
and some processes did not (thus assuming 2-D distribution). The fix is to calculate
the avg. file view size across all processes and use this value, thus ensuring that
all processes enter the same branch.

Fixes issue #7809

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2020-06-15 09:17:44 -05:00
Eric Badger
35dbc18df5 mtl/ofi: do not repeat fi_cq_read() after events are read
Once any number of events are read, return immediately, rather than
waiting for fi_cq_read() to return FI_EAGAIN or an error. This can
improve observed latency if the user application is in a blocking call
waiting for us to return. Deleting the while loop here also means
ofi_progress_event_count serves as an upper bound for the total number
of events read in a single call (with the while loop we might read far
more, as long as new events continue to arrive).

Signed-off-by: Eric Badger <eric@badgerio.us>
2020-06-11 10:07:37 -07:00
Sergey Oblomov
df0f2ac026 OMPI/HCOLL: fixed typo in vars description
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2020-05-29 20:13:35 +03:00