Howard Pritchard
e6f81ed6d6
ofi mtl: fix problem with mrecv
...
the ofi mtl mrecv was not properly setting the message in/out
arg to MPI_MRECV to MPI_MESSAGE_NULL.
Signed-off-by: Howard Pritchard <hppritcha@gmail.com>
2020-08-18 15:39:19 -06:00
Tomislav Janjusic
cbfc9a3263
opal/mca/common/ucx: Use new TSD api
...
Co-authored-by: Artem Y. Polyakov <artemp@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-30 00:21:26 +03:00
Tomislav Janjusic
27ba4b612f
ompi/osc/ucx: Remove workerpool's global thread storage tables.
...
Co-authored-by: Artem Y. Polyakov <artemp@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-07-30 00:21:26 +03:00
Brian Barrett
41df122083
Merge pull request #7730 from wckzhang/newdefaults
...
coll/tuned: Change the default collective algorithm selection
2020-07-28 15:27:46 -07:00
William Zhang
ce40cfbaa5
coll/tuned: Change the default collective algorithm selection
...
The default algorithm selections were out of date and not performing
well. After gathering data from OMPI developers, new default algorithm
decisions were selected for:
allgather
allgatherv
allreduce
alltoall
alltoallv
barrier
bcast
gather
reduce
reduce_scatter_block
reduce_scatter
scatter
These results were gathered using the ompi-collectives-tuning package
and then averaged amongst the results gathered from multiple OMPI
developers on their clusters.
You can access the graphs and averaged data here:
https://drive.google.com/drive/folders/1MV5E9gN-5tootoWoh62aoXmN0jiWiqh3
Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-07-28 10:41:48 -07:00
Jeff Squyres
c07d77fbf2
Merge pull request #7957 from bosilca/fix/avx_alignment
...
Use the unaligned SSE memory access primitive.
2020-07-27 15:50:40 -04:00
Joshua Ladd
366e92ce54
Merge pull request #7860 from vspetrov/hcoll_reduce_scatter
...
Coll/Hcoll: reduce_scatter(block) interface
2020-07-22 09:45:34 -04:00
George Bosilca
b6d71aa893
Use the unaligned SSE memory access primitive.
...
Alter the test to validate misaligned data.
Fixes #7954 .
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-22 01:19:12 -04:00
Joseph Schuchart
60aa97b301
Merge pull request #7948 from devreal/osc-rdma-check-endpoints
...
osc/rdma: fail query_btls if no endpoint for non-local peer is found
2020-07-20 15:14:25 +02:00
bosilca
1139d9ecae
Merge pull request #7931 from bosilca/fix/7928
...
Fix the BTL API conversion for the SMCUDA BTL
2020-07-18 17:35:39 -04:00
Joseph Schuchart
eebc451ec8
osc/rdma: fail query_btls if no endpoint for non-local peer is found
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-07-16 17:06:35 +02:00
George Bosilca
96e8cbe25f
First step on fixing the BTL API conversion for the SMCUDA BTL
...
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-13 14:46:10 -04:00
Joshua Ladd
aa8f7f4ede
Merge pull request #7893 from bureddy/cuda-ucx
...
UCX: initialize cuda from ucx pml component
2020-07-13 14:18:48 -04:00
bosilca
1f237f5fc9
Merge pull request #7419 from bosilca/topic/avx512
...
Add support for AVX512/AVX2/SSE/MMX
2020-07-13 11:56:50 -04:00
Devendar Bureddy
2547e24c55
UCX: initialize cuda from ucx pml component
...
Signed-off-by: Devendar Bureddy <devendar@mellanox.com>
2020-07-12 18:41:40 +03:00
dongzhong
14b3c70628
Add supports for MPI_OP using AVX512, AVX2 and MMX
...
Add logic to handle different architectural capabilities
Detect the compiler flags necessary to build specialized
versions of the MPI_OP. Once the different flavors (AVX512,
AVX2, AVX) are built, detect at runtime which is the best
match with the current processor capabilities.
Add validation checks for loadu 256 and 512 bits.
Add validation tests for MPI_Op.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-10 21:25:35 -04:00
Nathan Hjelm
88f51fbb8e
btl: change argument type of BTL receive callbacks
...
This commit updates the btl interface to change the parameters
passed to receive callbacks. The interface used to pass the tag,
a btl base descriptor, and the callback context. Most of the
values in the btl base descriptor were unused and only helped
simplify the callbacks from the self btl. All of the arguments
have now been replaced with a single receive callback descriptor.
This descriptor contains the incoming endpoint, data segment(s),
tag, and callback context. All btls have been updated to use
the new callback and the btl interface version has been bumped
to v3.2.0.
As part of this change the descriptor argument (and the segments
contained within it) have been marked as const. The were treated
as const before but this change could allow the compiler to make
better optimization decisions and will enforce that the callback
does not attempt to change the data in the descriptor.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-07-08 07:38:46 -07:00
Austen Lauria
dbc56758b6
Merge pull request #7802 from badgerious/mtl_ofi_cqread_break
...
mtl/ofi: break from progress loop when events are read
2020-07-06 09:20:07 -04:00
Austen Lauria
9b86f1442a
Merge pull request #7823 from jsquyres/pr/put-osc-pt2pt-back
...
Fix typos in OSC RDMA BTL allowlist
2020-06-30 10:55:16 -04:00
Todd Kordenbrock
4358e75a75
Merge pull request #7866 from tkordenbrock/topic/master/portals4.fix-inappropriate-use-of-abort
...
portals4: fix inappropriate use of abort() in mtl-portals4 and coll-portals4 components
2020-06-30 08:46:03 -05:00
Valentin Petrov
d366812030
coll/hcoll: compile warning fix
...
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2020-06-30 09:44:46 +03:00
Valentin Petrov
1d54071fc1
coll/hcoll: reduce_scatter(block) interface
...
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2020-06-30 09:44:46 +03:00
Austen Lauria
a26e494953
Merge pull request #7882 from devreal/osc-rdma-noncontig-requests
...
osc rdma: check for outstanding fragments before completing a request (II)
2020-06-29 09:51:47 -04:00
Joseph Schuchart
caed3b2eed
osc rdma: check for outstanding fragments before completing a request in ompi_osc_rdma_put_complete_flush as well
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-26 22:19:21 +02:00
Austen Lauria
5fa7ca7c15
Merge pull request #7858 from tkordenbrock/topic/master/portals4.call-pml-add_procs
...
mtl-portals4: use the active PML to call add_procs()
2020-06-26 14:56:57 -04:00
Joseph Schuchart
2c36d37033
Merge pull request #7871 from devreal/osc-ucx-rget-rput-fetch-alignment
...
OSC UCX: make sure no-op fetch in rget/rput is properly aligned
2020-06-26 15:58:51 +02:00
Joseph Schuchart
1314ef7668
OSC UCX: Remove stale free from merge conflict
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-25 19:01:53 +02:00
Joseph Schuchart
634f67b216
Merge pull request #7843 from devreal/clang-tidy-free
...
Some fixups for issues detected by clang-tidy
2020-06-25 17:30:04 +02:00
Artem Polyakov
907f4e196a
Merge pull request #6980 from devreal/ucx-acc-singel-intrinsics
...
UCX osc: add support for acc_single_intrinsic
2020-06-25 07:39:42 -07:00
Joseph Schuchart
c1f7776341
OSC UCX: make sure no-op fetch in rget/rput is properly aligned
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-25 16:16:58 +02:00
Austen Lauria
7814f4195c
Merge pull request #7845 from devreal/stack-fixes
...
Fix unexpected optimizations detected by STACK
2020-06-25 08:15:09 -04:00
Todd Kordenbrock
04b94637dd
mtl-portals4: replace abort() with ompi_rte_abort()
...
coll-portals4: replace abort() with ompi_rte_abort()
Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2020-06-24 11:31:26 -05:00
Joseph Schuchart
e3b417c776
Add missing copyright header
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
e215eff43d
UCX osc: atomic fetch-and-op only on 32 and 64bit values
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
434c9055ee
UCX osc: fall back to get-compare-put for unsupported datatypes
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
7d5a6e3e8b
UCX osc: safely load/store 64bit integer from variable size pointer
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
5f786bcce4
UCX osc: make MPI_Fetch_and_op non-blocking if possible
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
d8696aa8c4
UCX osc: centralize decision on whether to use AMOs
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
427d4bd226
UCX osc: do not acquire accumulate lock if exclusive lock was taken
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
471d76777a
UCX osc: fence active operations before releasing accumulate lock and free memory if required
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
4d7a3856fa
UCX osc: Use accumulate for operations/datatypes that are not covered by UCX
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
899f58cef5
UCX osc: simplify output address computation
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
d888b4fd76
UCX osc: correctly handle MPI_NO_OP
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
7cfc0e71da
UCX osc: allow to asynchronously compare-and-swap
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
557ae80858
UCX osc: allow for overlap with (some) request-based atomic operations
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
1a3c6bbf35
UCX osc: re-use value returned by cswap to save additional get
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
8606a02b87
UCX osc: fix macro parameter name usage in OMPI_OSC_UCX_REQUEST_RETURN
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
d448efd49c
UCX osc: properly clean up requests in case of errors
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
73a183408f
UCX osc: add support for acc_single_intrinsic info key / mca param
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Todd Kordenbrock
0a637967fa
Use the active PML to call add_procs()
...
ompi_mtl_portals4_get_endpoint() was incorrectly making a direct
call to ompi_mtl_portals4_add_procs(). Instead use the actve PML
to call add_procs(). If add_procs() fails, call ompi_rte_abort()
to terminate the job.
Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2020-06-22 16:56:16 -05:00