1
1

197 Коммитов

Автор SHA1 Сообщение Дата
Yossi Itigin
fc41c16134 OSHMEM: Add support for shmemx_malloc_with_hint()
- added multiple segments processing
- added shmemx_malloc_with_hint call + set of hints

(picked from master 94b5e91)

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-12 11:42:59 +03:00
Mikhail Brinskii
6861a68de6 SPML/UCS: CR comments p2
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit d4843b165172a9cffbdc967ec92ec1d60d053903)
2019-05-02 21:27:15 +03:00
Mikhail Brinskii
1c56f49a44 SPML/UCX: CR comments p1
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit c4c99457db6577f85b5b1c4c6849b5a04c226a84)
2019-05-02 21:26:55 +03:00
Mikhail Brinskii
e4ee56d1f3 SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 2ef5bd8b3671f1e10caf00d06d66d120eac9c5be)
2019-05-02 21:25:59 +03:00
Xin Zhao
69a80fce9f ompi/oshmem/spml/ucx: use lockfree array to optimize spml_ucx_progress/delete oshmem_barrier in shmem_ctx_destroy
ompi/oshmem/spml/ucx: optimize spml ucx progress

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9c3d00b144641d2929f830279dcc9d163c38e9e1)
2019-03-21 23:59:58 +02:00
Xin Zhao
580b584179 ompi/oshmem/spml/ucx:delete oob path of getting rkeys in spml ucx
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e0414006b0c0a8e9918a4cf8ac4bb819b977ec91)
2019-03-21 23:59:46 +02:00
Xin Zhao
596997c194 ompi/oshmem/spml/ucx: defer clean up shmem_ctx to shmem_finalize
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e1c1ab020227fc18d145379ab29ea86a3cdb66b1)
2019-03-21 23:58:23 +02:00
Xin Zhao
ce54b63b90 ompi/oshmem: add spml_context back to sshmem_type in memheap, to keep track of ucx_ctx_default's rkeys
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 48033ac1f43159c053241b65e74a39777e5e31e4)
2019-03-20 23:30:21 +02:00
Xin Zhao
06183a7bec ompi/oshmem/spml/ucx: let shmem_finalize to clean up any ctx left
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9a060009622e9220d6332d9b63f6a1a7328418a0)
2019-03-20 23:30:09 +02:00
Xin Zhao
91793484ed OMPI/OSHMEM: bug-fix: store mkeys for each oshmem ctx.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 289595e45dc3ebfe5ae1a9dc6f347b1b2d569c4a)
2019-03-20 23:29:53 +02:00
Xin Zhao
f666d75322 ompi/oshmem/spml/ucx: fix eps destroy in shmem_ctx_destroy().
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 79ba7526677bd1641239bb77559a2999c8cd3a4a)
2019-03-20 23:29:38 +02:00
Sergey Oblomov
14c271f993 PML/SPML/UCX: added evaluation of mmap events
- there was a set of UCX related issues reported which caused
  by mmap API hooks conflicts. We added diagnostic of such
  problems to simplify bug-resolving pipeline

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d8e3562bae700d84873c1d5ca9c45c846d7387ed)
2019-03-14 16:48:25 +02:00
Sergey Oblomov
3cace87749 MCA/COMMON/UCX: del_procs calls are unified to common module
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 920cc2e0d9994dfd49062822c89cb502274eb464)
2018-09-19 10:47:27 +03:00
Sergey Oblomov
028bcb8a73 MCA/COMMON/UCX: added synonim to opal_mem_hook variable
- added synonim to common ucx variables to allow
  to print it in opal_info -a

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e00f7a68ba0b1012f954910e39b26f6075f3d006)
2018-08-29 15:17:00 +03:00
Boris Karasev
8873d901e8 pmix: added check for pmix fence status
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit 57683366ca300fe353e91c52dc9aa0f657120d4d)

Conflicts:
	opal/mca/common/ucx/common_ucx.c
	opal/mca/common/ucx/common_ucx.h

Modified:
	ompi/mca/pml/ucx/pml_ucx.c
	oshmem/mca/spml/ucx/spml_ucx.c
2018-08-17 21:33:50 +06:00
Sergey Oblomov
b64502977a PML/SPML/UCX: init global objects using C99 style
- to avoid value mix used C99 style of object initializations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2806504290ff2fdd10f254f878e92cae6e90c854)
2018-07-28 16:47:43 +03:00
Xin Zhao
c429900cd9 OMPI/OSHMEM: add new functionality of OpenSHMEM v1.4.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-07-16 12:55:25 -07:00
Sergey Oblomov
d51426ff0a ATOMIC/MXM: fixed abstraction violation
- applied workaround for incorrect dynamic module dependency

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-13 14:30:12 +03:00
Sergey Oblomov
240670152e MCA/COMMON/UCX: code beautify - alignment
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-06 19:40:58 +03:00
Sergey Oblomov
bef47b792c MCA/COMMON/UCX: unified logging across all UCX modules
- added common logging infrastructure for all
  UCX modules
- all UCX modules are switched to new infra

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-05 16:25:39 +03:00
Sergey Oblomov
8080283b3d MCA/COMMON/UCX: changed return type for wait_request
- for now wait_request returns OMPI status
- updated callers

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 23:29:38 +03:00
Sergey Oblomov
13331ba4d8 MCA/COMMON/UCX: code beautify + build fix
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 16:37:03 +03:00
Sergey Oblomov
8a793bb279 MCA/COMMON/UCX: fixed build issues
- fixed fuild issues when used older UCX
- added non-blocking call of ucp_put call

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 15:58:08 +03:00
Sergey Oblomov
c2bd6af9f2 MCA/COMMON/UCX: minor unification of del_proces calls
- some common functionality of del_procs calls is moved into
  mca_common module
- blocking ucp_put call is replaced by non-blocking routine

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 15:10:53 +03:00
Sergey Oblomov
952fa8ade7 PML/UCX: method mca_spml_ucx_get_mkey_slow is renamed to get_mkey_slow
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-01 20:44:19 +03:00
Sergey Oblomov
c55db78e93 SPML/UCX: get mkey call refactoring
- method mca_spml_ucx_get_mkey_slow is moved into .c module,
  added pointer to this method into mca_spml_ucx_t structure

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-01 18:58:14 +03:00
Sergey Oblomov
910e08f5ef MCA/ATOMICS/UCX: workaround for abstraction violation
- some spml calls are marked as inline to exclude cross-module
  dependency
- updated get-key call to get link to local module

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-29 16:32:04 +03:00
Sergey Oblomov
502d04bf12 UCX/PML/SPML: fixed few coverity issues
- fixed incorrect pointer manipulation/free
- cleaned dead code
- minor optimization on process delete routine
- fixed error handling - free pointers
- added debug output for woker flush failure

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-26 18:52:39 +03:00
Sergey Oblomov
63e7ba6843 MCA/COMMON/UCX: added parameter for UCX/opal progress
- added parameter to set UCX/opal progresses
- minor refactoring of request wait routines

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-25 11:00:12 +03:00
Sergey Oblomov
d57ae62dee MCA/UCX: added common module
- implemented non-blocking routines for flush operations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-22 16:41:09 +03:00
Yossi Itigin
705c8a7b9b
Merge pull request #5198 from brminich/shmem_fence
OSHMEM/SMPL/UCX: Add real fence support
2018-05-27 11:25:42 +03:00
Mikhail Brinskii
8e9d401938 OSHMEM/SMPL/UCX: Add real fence support
+ Add quiet method to SPML, so it can have different implementation with
fence.
+ Use ucp_worker_fence for spml_fence method of UCX SPML

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2018-05-25 22:43:06 +03:00
Brian Barrett
9fff40647d oshmem: disable if no spmls build
This patch disables the oshmem layer if there are no SPMLs that
will build.  With the limited set of SPMLs available to support
oshmem, many builds end up installing an oshmem library that we
know will not work.  There has been a bit of customer confusion
over oshmem, hopefully this will lead customers in the right
direction.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-05-25 08:48:50 -07:00
Yossi Itigin
385f38ab4e ucx: improve error messages during connection establishment
Also, unite common code calling ucp_ep_create()

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-04-30 15:45:05 +03:00
Gilles Gouaillardet
88e26c63e0 spml/ucx: fix a double free() issue
in mca_spml_ucx_add_procs() error path

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-01-22 13:42:16 +09:00
Alex Mikheev
ae326546f4
ompi/oshmem: ucx is selected over yalla/ikrit by default
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-01-17 15:08:04 +02:00
Yossi Itigin
1193e1eb83 spml_ucx: fix rkey leak
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-12-26 20:47:26 +02:00
Nathan Hjelm
1282e98a01 opal/asm: rename existing arithmetic atomic functions
This commit renames the arithmetic atomic operations in opal to
indicate that they return the new value not the old value. This naming
differentiates these routines from new functions that return the old
value.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-11-30 10:41:22 -07:00
Alex Mikheev
7cb7af1685
OSHMEM: add ucx to the list of default spmls
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-10-18 10:41:00 +03:00
Yossi Itigin
3081576124 spml_ucx: fix typo in shmem_quiet() error message.
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-09-24 19:20:55 +03:00
Alina Sklarevich
007b1803ec SPML_UCX: use ompi_proc_world_size() to set the estimated_num_eps value
before this fix, mca_spml_ucx_component_open was using
oshmem_num_procs() to set the value of params.estimated_num_eps for UCX.
The oshmem_num_procs() function uses oshmem_group_all which will be
initialized after the call to mca_spml_ucx_component_open and therefore,
cannot be used there.

Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-09-04 14:46:00 +03:00
Joshua Hursey
e1d079544b mca: Dynamic components link against project lib
* Resolves #3705
 * Components should link against the project level library to better
   support `dlopen` with `RTLD_LOCAL`.
 * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am`
   with the appropriate project level library:
```
MCA components in ompi/
       $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la
MCA components in orte/
       $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la
MCA components in opal/
       $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la
MCA components in oshmem/
       $(top_builddir)/oshmem/liboshmem.la"
```

Note: The changes in this commit were automated by the script in
the commit that proceeds it with the `libadd_mca_comp_update.py`
script. Some components were not included in this change because
they are statically built only.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2017-08-24 11:56:16 -04:00
Alex Mikheev
1b5df76f8b
oshmem: shmem_ptr() implementation
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-08-03 13:56:34 +03:00
Boris Karasev
77c50efb95 Yoda SPML is removed
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-07-14 08:47:16 +03:00
Xin Zhao
ee952fcccd Passing estimated_num_procs to UCX init in PML and SPML.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-27 20:36:52 +03:00
Pavel Shamis (Pasha)
95c440683b OSHMEM: shmem_wait code cleanup
* updating naming convention for the arguments in order to ensure
that the name aligns with an actual meaning of the argument
* remove local variable references in the macro
* adding volatile for the poll variables

Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>
2017-03-15 21:53:44 +00:00
Alex Mikheev
c63137e1c0 oshmem: sshmem ucx: minor code cleanup
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
132fbd9ae9 oshmem: sshmem: add UCX allocator
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-22 17:48:00 +02:00
Alex Mikheev
ea3ea4835b oshmem: mem use hook: apply code review fixes
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
(cherry picked from commit a422154a141f0be5b92d2b6c26d7b2b4176dfe18)
2017-01-30 11:30:20 +02:00
Alex Mikheev
9da9e6260d
oshmem: spml ucx: on error print ucx error string
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-01-29 10:28:24 +02:00