Joseph Schuchart
2c36d37033
Merge pull request #7871 from devreal/osc-ucx-rget-rput-fetch-alignment
...
OSC UCX: make sure no-op fetch in rget/rput is properly aligned
2020-06-26 15:58:51 +02:00
Joseph Schuchart
1314ef7668
OSC UCX: Remove stale free from merge conflict
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-25 19:01:53 +02:00
Joseph Schuchart
634f67b216
Merge pull request #7843 from devreal/clang-tidy-free
...
Some fixups for issues detected by clang-tidy
2020-06-25 17:30:04 +02:00
Artem Polyakov
907f4e196a
Merge pull request #6980 from devreal/ucx-acc-singel-intrinsics
...
UCX osc: add support for acc_single_intrinsic
2020-06-25 07:39:42 -07:00
Joseph Schuchart
c1f7776341
OSC UCX: make sure no-op fetch in rget/rput is properly aligned
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-25 16:16:58 +02:00
Joseph Schuchart
e3b417c776
Add missing copyright header
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
e215eff43d
UCX osc: atomic fetch-and-op only on 32 and 64bit values
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
434c9055ee
UCX osc: fall back to get-compare-put for unsupported datatypes
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
7d5a6e3e8b
UCX osc: safely load/store 64bit integer from variable size pointer
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
5f786bcce4
UCX osc: make MPI_Fetch_and_op non-blocking if possible
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
d8696aa8c4
UCX osc: centralize decision on whether to use AMOs
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
427d4bd226
UCX osc: do not acquire accumulate lock if exclusive lock was taken
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
471d76777a
UCX osc: fence active operations before releasing accumulate lock and free memory if required
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
4d7a3856fa
UCX osc: Use accumulate for operations/datatypes that are not covered by UCX
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
899f58cef5
UCX osc: simplify output address computation
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
d888b4fd76
UCX osc: correctly handle MPI_NO_OP
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
7cfc0e71da
UCX osc: allow to asynchronously compare-and-swap
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
557ae80858
UCX osc: allow for overlap with (some) request-based atomic operations
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
1a3c6bbf35
UCX osc: re-use value returned by cswap to save additional get
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
8606a02b87
UCX osc: fix macro parameter name usage in OMPI_OSC_UCX_REQUEST_RETURN
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
d448efd49c
UCX osc: properly clean up requests in case of errors
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
73a183408f
UCX osc: add support for acc_single_intrinsic info key / mca param
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-23 12:41:52 +02:00
Joseph Schuchart
d9d18acd49
Fix unintended optimizations detected by STACK
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-22 10:32:22 +02:00
Joseph Schuchart
e23dcca448
Add missing free calls to osc/ucx
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-19 14:30:07 +02:00
Joseph Schuchart
a5cc380416
UCX osc: properly release exclusive lock to avoid lockup
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2019-08-27 23:01:35 +02:00
Tomislav Janjusic
d5f6b088ae
osc/ucx: Fix error path
...
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-08-12 21:54:01 +03:00
Nysal Jan K.A
3529d44702
osc/ucx: Fix data corruption with non-contiguous accumulates
...
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
2019-07-24 13:07:59 +05:30
Nysal Jan K.A
14808922cf
osc/ucx: Add support for the no_locks info key
...
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
2019-07-18 17:29:01 +05:30
Artem Polyakov
6678ac0f55
osc/ucx: Fix possible win creation/destruction race condition
...
To avoid fully initializing the osc/ucx component for MPI application
that are not using One-Sided functionality, the initialization happens
at the first MPI window creation.
This commit ensures atomicity of global state modifications.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-06-20 09:05:03 -07:00
Artem Polyakov
0857742624
osc/ucx: Fix worker pool finalization
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-06-20 09:05:03 -07:00
Yossi Itigin
9d1994b906
OSC/UCX: Fix deadlock with atomic lock
...
Atomic lock must progress local worker while obtaining the remote lock,
otherwise an active message which actually releases the lock might not
be processed while polling on local memory location.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-19 20:10:09 +03:00
Mikhail Brinskii
2ef5bd8b36
SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
...
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-04-26 14:47:58 +03:00
Valentin Petrov
30970bdfdf
OSC/UCX: correctly handle NULL origin addr and MPI_NO_OP
...
Addtional bugfix: origin_addr -> result_addr for no_op, replace_op
and sum_op fetch destination.
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-17 10:30:21 +03:00
Artem Polyakov
19e2ae2efb
opal/common/ucx: Switch to opal/tsd
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
7984d7d997
opal/common/ucx: Remove unused debugging macro
...
Will be reintroduced later if needed and after adaptation to the OMPI
infrastructure.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
43f16d8796
opal/common/ucx: Remove common_ucx_int.h
...
Place the content of common_ucx_int.h back to the common_ucx.h and
include common_ucx_wpool.h explicitly.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
c6de09940f
ompi/osc/ucx: Switch osc/ucx code to use Worker Pool.
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Yossi Itigin
83cca9d52a
ucx: add owner.txt for components
...
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-12-01 17:14:03 +02:00
Sergey Oblomov
2d230b3aac
OSC/UCX: set max level value to 60
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-11-27 14:20:28 +02:00
Sergey Oblomov
e91f214982
OSC/UCX: added UCX version evaluation
...
- added UCX version evaluation to set OSC UCX priority
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-11-14 10:03:13 +02:00
Sergey Oblomov
36934a8bb2
OSC: set UCX module used by default
...
- OSC/UCX module set priority to 200 to be used by default
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-11-12 15:08:22 +02:00
Sergey Oblomov
1099d5f023
COMMON/UCX: added error code to log output
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-21 11:37:25 +03:00
Sergey Oblomov
df765595e3
COMMON/UCX: suppressed coverity warnings
...
- suppressed coverity warnings - added log messages on failed calls
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-17 16:11:03 +03:00
Yossi Itigin
b8e1af6fcb
osc_ucx: add worker flush before osc module free
...
Make sure all pending communications are done on all ranks before
closing the window. This way it will be safe to close the endpoints when
closing the component.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:47:16 +03:00
Yossi Itigin
bcc48515e4
Revert "osc_ucx: fix hang/timeout in component finalize"
...
This reverts commit 438d13b4ca1e7333b789ca3fb536fda17b0feb38.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:47:13 +03:00
Yossi Itigin
a012ee91d8
Merge pull request #5886 from yosefe/topic/osc-ucx-fix-finalize-hang
...
osc_ucx: fix hang/timeout in component finalize
2018-10-10 16:29:29 +03:00
Yossi Itigin
dc6809495d
osc_ucx: fix hang/timeout in component finalize
...
Add barrier to make sure all endpoints are destroyed before destroying
the worker.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 14:38:06 +03:00
Sergey Oblomov
ae6f81983f
OSC/UCX: fixed zero-size window processing
...
- added processing of zero-size MPI window
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-10 13:08:01 +03:00
Brian Barrett
e9e4d2a4bc
Handle asprintf errors with opal_asprintf wrapper
...
The Open MPI code base assumed that asprintf always behaved like
the FreeBSD variant, where ptr is set to NULL on error. However,
the C standard (and Linux) only guarantee that the return code will
be -1 on error and leave ptr undefined. Rather than fix all the
usage in the code, we use opal_asprintf() wrapper instead, which
guarantees the BSD-like behavior of ptr always being set to NULL.
In addition to being correct, this will fix many, many warnings
in the Open MPI code base.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-08 16:43:53 -07:00
Sergey Oblomov
b72dd83f05
MCA/COMMON/UCX: added synonims for common ucx variables
...
- added synonims for atomic/osc modules
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-26 18:25:21 +03:00