1
1

345 Коммитов

Автор SHA1 Сообщение Дата
Xin Zhao
69a80fce9f ompi/oshmem/spml/ucx: use lockfree array to optimize spml_ucx_progress/delete oshmem_barrier in shmem_ctx_destroy
ompi/oshmem/spml/ucx: optimize spml ucx progress

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9c3d00b144641d2929f830279dcc9d163c38e9e1)
2019-03-21 23:59:58 +02:00
Xin Zhao
580b584179 ompi/oshmem/spml/ucx:delete oob path of getting rkeys in spml ucx
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e0414006b0c0a8e9918a4cf8ac4bb819b977ec91)
2019-03-21 23:59:46 +02:00
Xin Zhao
596997c194 ompi/oshmem/spml/ucx: defer clean up shmem_ctx to shmem_finalize
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e1c1ab020227fc18d145379ab29ea86a3cdb66b1)
2019-03-21 23:58:23 +02:00
Xin Zhao
ce54b63b90 ompi/oshmem: add spml_context back to sshmem_type in memheap, to keep track of ucx_ctx_default's rkeys
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 48033ac1f43159c053241b65e74a39777e5e31e4)
2019-03-20 23:30:21 +02:00
Xin Zhao
06183a7bec ompi/oshmem/spml/ucx: let shmem_finalize to clean up any ctx left
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9a060009622e9220d6332d9b63f6a1a7328418a0)
2019-03-20 23:30:09 +02:00
Xin Zhao
91793484ed OMPI/OSHMEM: bug-fix: store mkeys for each oshmem ctx.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 289595e45dc3ebfe5ae1a9dc6f347b1b2d569c4a)
2019-03-20 23:29:53 +02:00
Xin Zhao
f666d75322 ompi/oshmem/spml/ucx: fix eps destroy in shmem_ctx_destroy().
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 79ba7526677bd1641239bb77559a2999c8cd3a4a)
2019-03-20 23:29:38 +02:00
Sergey Oblomov
14c271f993 PML/SPML/UCX: added evaluation of mmap events
- there was a set of UCX related issues reported which caused
  by mmap API hooks conflicts. We added diagnostic of such
  problems to simplify bug-resolving pipeline

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d8e3562bae700d84873c1d5ca9c45c846d7387ed)
2019-03-14 16:48:25 +02:00
Yossi Itigin
ad4b33336d oshmem/scoll: fix shmem_collect32/64 for zero-size length
Fixes scoll_basic failures with shmem_verifier, caused by recent changes
in handling of zero-size collectives.

- Check for zero-size length only for fixed size collect (shmem_fcollect),
  but not for variable-size collect (shmem_collect)
- Add 'nlong_type' parameter to internal broadcast function, to indicate
  whether the 'nlong' parameter is valid on non-root PEs, since it's
  used by shmem_collect algorithm. Before this change, some components
  assumed it's true (scoll_mpi) while others assumed it's false
  (scoll_basic).
- In scoll_basic, if nlong_type==false, do not exit if nlong==0, since
  this parameter may not be the same on all PEs.
- In scoll_mpi, fallback to scoll_basic if nlong_type==false, since MPI
  requires the 'count' argument of MPI_Bcast to be valid on all ranks.

(Picked from master 939162e)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-01-02 12:15:01 +02:00
Sergey Oblomov
5838760a3a OSHMEM/COLL/BCAST: removed unnecessary bcast call
- removed unnecessary bcast call on zero-length request

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit c93927e27a8e4241236d82c0d34ea445aa619aff)
2018-11-27 14:26:56 +02:00
Sergey Oblomov
0a064d8c8d OSHMEM/COLL: optimization on zero-length ops
- removed barrier call on zero-length operations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit ff2fd0679eb4b31bfd840395d73746612e2670f4)
2018-11-27 14:26:52 +02:00
Sergey Oblomov
dea9cf6b63 OSHMEM: added processing of zero-length collectives
- according spec 1.4, annex C shmem collectives should process
  calls where number of elements is zero independently from pointer
  value
- added zero-count processing - it just call barrier to
  sync ranks

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 9de128afaf5224193a88f00c897f5d5c94336f99)
2018-11-27 14:26:44 +02:00
Yossi Itigin
8a329a797c SCOLL/BASIC: Fix invalid pSync pointer passed to barrier func
mca_scoll_basic_alltoall() passed (pSync + 1) to barrier function, but
the value of _SHMEM_ALLTOALL_SYNC_SIZE is 1, which made the barrier
function use an invalid memory location. In particular, this location
was not initialized to _SHMEM_SYNC_VALUE, which broke the barrier
algorithm and it did not complete: One PE could read 0 from its peer and
assume the peer already started the barrier, and then write 1 to the
peer. Then, the peer entered the barrier and overwrote the 1 with 0, and
then it waited forever to see '1' in its pSync.

Found with shmem_verifier test suite.

(picked from master 6754bf1)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-31 12:22:19 +02:00
Sergey Oblomov
3cace87749 MCA/COMMON/UCX: del_procs calls are unified to common module
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 920cc2e0d9994dfd49062822c89cb502274eb464)
2018-09-19 10:47:27 +03:00
Sergey Oblomov
028bcb8a73 MCA/COMMON/UCX: added synonim to opal_mem_hook variable
- added synonim to common ucx variables to allow
  to print it in opal_info -a

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e00f7a68ba0b1012f954910e39b26f6075f3d006)
2018-08-29 15:17:00 +03:00
Boris Karasev
8873d901e8 pmix: added check for pmix fence status
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit 57683366ca300fe353e91c52dc9aa0f657120d4d)

Conflicts:
	opal/mca/common/ucx/common_ucx.c
	opal/mca/common/ucx/common_ucx.h

Modified:
	ompi/mca/pml/ucx/pml_ucx.c
	oshmem/mca/spml/ucx/spml_ucx.c
2018-08-17 21:33:50 +06:00
Sergey Oblomov
b64502977a PML/SPML/UCX: init global objects using C99 style
- to avoid value mix used C99 style of object initializations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2806504290ff2fdd10f254f878e92cae6e90c854)
2018-07-28 16:47:43 +03:00
Sergey Oblomov
58b7786b70 MCA/ATOMIC: atomic_init renamed to atomic_startup
- there is C11 naming conflict - atomic_init is C macro
  which cause building issue

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 3295b238008a6b2e27941c60d757e64ad796fea2)
2018-07-24 17:23:42 +03:00
Xin Zhao
c429900cd9 OMPI/OSHMEM: add new functionality of OpenSHMEM v1.4.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-07-16 12:55:25 -07:00
Yossi Itigin
9d0b3a42aa
Merge pull request #5423 from hoopoepg/topic/bitwise-atomics-renaming
ATOMICS: renamed atomic calls to unsigned datatypes
2018-07-16 19:08:02 +03:00
Sergey Oblomov
bd84165277 ATOMICS: renamed atomic calls to unsigned datatypes
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-13 15:32:16 +03:00
Sergey Oblomov
d51426ff0a ATOMIC/MXM: fixed abstraction violation
- applied workaround for incorrect dynamic module dependency

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-13 14:30:12 +03:00
Sergey Oblomov
0212410187 OSHMEM/ATOMICS: fixed comments errors
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-10 10:13:40 +03:00
Sergey Oblomov
64212a9ff1 OSHMEM/ATOMICS: added C implementation of and/or/xor ops
- added implementation and/or/xor operations for post and
  fetch-op notations
- implemented basic and UCX transports, mxm added
  NON-IMPLEMENTED wrapper
- updated C interfaces only (fortran will be added later)
- existing API is not updated to spec v1.4

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-09 16:18:47 +03:00
Sergey Oblomov
240670152e MCA/COMMON/UCX: code beautify - alignment
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-06 19:40:58 +03:00
Sergey Oblomov
bef47b792c MCA/COMMON/UCX: unified logging across all UCX modules
- added common logging infrastructure for all
  UCX modules
- all UCX modules are switched to new infra

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-05 16:25:39 +03:00
Sergey Oblomov
8080283b3d MCA/COMMON/UCX: changed return type for wait_request
- for now wait_request returns OMPI status
- updated callers

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 23:29:38 +03:00
Sergey Oblomov
586c16be70 ATOMIC/UCX: renamed ATOMIC_PTR_2_INT to OSHMEM_ATOMIC_PTR_2_INT
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 15:07:39 +03:00
Sergey Oblomov
54d97efa84 ATOMIC/UCX: fixed atomic swap/cswap issues
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 14:41:48 +03:00
Sergey Oblomov
5eb8c99cd7 ATOMIC/UCX: optimization for cswap
- used uint64_t output datatype to avoid branches in
  implementations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 14:41:46 +03:00
Sergey Oblomov
f574c14e3a ATOMICS/UCX: redefine atomic module API
- now it accepts integer values directily instead of
  pointers

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 14:41:45 +03:00
Sergey Oblomov
a0ea368464 ATOMIC/UCX: minor optimization and code cleaning
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 14:41:43 +03:00
Sergey Oblomov
eb0abfcf92 SHMEM/ATOMIC: refactoring of module API
- removed atomic-basic-specific operand from module API
- added own calls for add and swap operations
- minor optimization for UCX atomics

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 14:40:14 +03:00
Sergey Oblomov
13331ba4d8 MCA/COMMON/UCX: code beautify + build fix
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 16:37:03 +03:00
Sergey Oblomov
8a793bb279 MCA/COMMON/UCX: fixed build issues
- fixed fuild issues when used older UCX
- added non-blocking call of ucp_put call

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 15:58:08 +03:00
Sergey Oblomov
c2bd6af9f2 MCA/COMMON/UCX: minor unification of del_proces calls
- some common functionality of del_procs calls is moved into
  mca_common module
- blocking ucp_put call is replaced by non-blocking routine

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 15:10:53 +03:00
Sergey Oblomov
952fa8ade7 PML/UCX: method mca_spml_ucx_get_mkey_slow is renamed to get_mkey_slow
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-01 20:44:19 +03:00
Sergey Oblomov
c55db78e93 SPML/UCX: get mkey call refactoring
- method mca_spml_ucx_get_mkey_slow is moved into .c module,
  added pointer to this method into mca_spml_ucx_t structure

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-01 18:58:14 +03:00
Sergey Oblomov
910e08f5ef MCA/ATOMICS/UCX: workaround for abstraction violation
- some spml calls are marked as inline to exclude cross-module
  dependency
- updated get-key call to get link to local module

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-29 16:32:04 +03:00
Sergey Oblomov
502d04bf12 UCX/PML/SPML: fixed few coverity issues
- fixed incorrect pointer manipulation/free
- cleaned dead code
- minor optimization on process delete routine
- fixed error handling - free pointers
- added debug output for woker flush failure

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-26 18:52:39 +03:00
Sergey Oblomov
63e7ba6843 MCA/COMMON/UCX: added parameter for UCX/opal progress
- added parameter to set UCX/opal progresses
- minor refactoring of request wait routines

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-25 11:00:12 +03:00
Sergey Oblomov
d57ae62dee MCA/UCX: added common module
- implemented non-blocking routines for flush operations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-22 16:41:09 +03:00
Sergey Oblomov
319bb376f9 MCA/UCX: branch optimization in cswap call
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-30 07:14:43 +03:00
Sergey Oblomov
daad71f036 MCA/UCX: switch/case optimization
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-29 22:33:33 +03:00
Sergey Oblomov
6be4066e23 MCA/UCX: cswap call if updated to non-blocking API
- minor fixes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-29 20:07:51 +03:00
Sergey Oblomov
b668e19cd1 Merge remote-tracking branch 'wg/master' into topic/amo-non-blocking-ucp
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-29 19:08:48 +03:00
Yossi Itigin
976cd5e307
Merge pull request #5186 from hoopoepg/topic/ucx-amo-error-msg
MCA/UCX: fixed error messages for incorrect msg size
2018-05-29 15:54:02 +03:00
Sergey Oblomov
0c3ed93ef0 MCA/UCX: added opal progress call to wait request routine
- added opal_progress call to wait function to avoid
  possible [dead]lock issues
- wait call is declared as inline
- minor fixes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-05-29 11:34:40 +03:00
Yossi Itigin
705c8a7b9b
Merge pull request #5198 from brminich/shmem_fence
OSHMEM/SMPL/UCX: Add real fence support
2018-05-27 11:25:42 +03:00
Mikhail Brinskii
8e9d401938 OSHMEM/SMPL/UCX: Add real fence support
+ Add quiet method to SPML, so it can have different implementation with
fence.
+ Use ucp_worker_fence for spml_fence method of UCX SPML

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2018-05-25 22:43:06 +03:00