Dmitry Gladkov
b6a658b24a
SPML/UCX: Fix compilation warnings with GCC
...
Signed-off-by: Dmitry Gladkov <dmitrygla@mellanox.com>
2020-02-03 05:11:49 -08:00
Sergey Oblomov
8543860689
SPML/UCX: fixed coverity issues
...
- fixed sizeof(char***) by variable datatype
- fixed resorce leak in proc_add
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2020-01-23 14:02:08 +02:00
Tomislav Janjusic
3d6bf9fd8e
oshmem/ucx: improves spml ucx performance for multi-threaded
...
applications.
Improves multi-threaded performance by adding the option to create
multiple ucx workers in threaded applications.
Co-authored with:
Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-01-22 21:41:09 +02:00
Tomislav Janjusic
1b58e3d073
oshmem/ucx: Improves performance for non-blocking put/get operations.
...
Improves the performance when excess non-blocking operations are posted
by periodically calling progress on ucx workers.
Co-authored with:
Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2020-01-22 00:59:26 +02:00
Tomislav Janjusic
ebb985dea9
oshmem:ucx, fix race condition and add context recycling
...
1) Race condition: Do not add private contexts to active list.
Private contexts are only visible to the user.
2) Recycled contexts: Destroyed contexts are put on an idle list until
finalize, continuous context creation will lead to oom condition.
Instead, check if context from idle list meets new context requirements
and reuse it.
Co-authored with: Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-11-23 04:39:05 +02:00
Barton Chittenden
47816ef83b
Remove mxm, yalla and ikrit
...
Signed-off-by: Barton Chittenden <bartonski@gmail.com>
2019-11-22 13:40:16 -08:00
Sergey Oblomov
991082abf2
IKRIT: restored compilation
...
- due to some refactoring and adding new functionality compilation
of ikrit module was broken
- this commit restores compilation
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-09-16 19:00:34 +03:00
Sergey Oblomov
01dacaa6a4
SPML/UCX: fixed comment
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-08-21 12:08:58 +03:00
Sergey Oblomov
182023febb
SPML/UCX: fixed hang in SHMEM_FINALIZE
...
- used MPI _Barrier to synchronize processes
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-08-21 12:04:46 +03:00
Sergey Oblomov
43186e494b
UCX: added PPN hint for UCX context
...
- added PPN hint for UCX context init
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-08-05 18:07:06 +03:00
Sergey Oblomov
0b108411f8
SPML/UCX: added synchronized flush on quiet
...
- added synchronized flush operation on quiet call.
- flush is implemented using get operation
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-27 16:07:04 +03:00
Sergey Oblomov
421a7fd47d
SPML/UCX: fixed few compilation warnings
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-20 14:40:24 +03:00
Mikhail Brinskii
d81dc533f6
SPML/UCX: Fix coverity error
...
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-05-14 22:34:01 +03:00
Yossi Itigin
94b5e91194
OSHMEM: Add support for shmemx_malloc_with_hint()
...
- added multiple segments processing
- added shmemx_malloc_with_hint call + set of hints
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-10 20:04:57 +03:00
Mikhail Brinskii
d4843b1651
SPML/UCS: CR comments p2
...
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-04-30 16:49:11 +03:00
Mikhail Brinskii
c4c99457db
SPML/UCX: CR comments p1
...
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-04-30 16:26:45 +03:00
Mikhail Brinskii
2ef5bd8b36
SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
...
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-04-26 14:47:58 +03:00
Xin Zhao
9c3d00b144
ompi/oshmem/spml/ucx: use lockfree array to optimize spml_ucx_progress/delete oshmem_barrier in shmem_ctx_destroy
...
ompi/oshmem/spml/ucx: optimize spml ucx progress
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-21 23:01:45 +02:00
Xin Zhao
e0414006b0
ompi/oshmem/spml/ucx:delete oob path of getting rkeys in spml ucx
...
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-21 23:01:45 +02:00
Xin Zhao
e1c1ab0202
ompi/oshmem/spml/ucx: defer clean up shmem_ctx to shmem_finalize
...
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-21 23:01:37 +02:00
Xin Zhao
48033ac1f4
ompi/oshmem: add spml_context back to sshmem_type in memheap, to keep track of ucx_ctx_default's rkeys
...
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-15 18:48:21 +02:00
Xin Zhao
9a06000962
ompi/oshmem/spml/ucx: let shmem_finalize to clean up any ctx left
...
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-15 18:48:07 +02:00
Xin Zhao
289595e45d
OMPI/OSHMEM: bug-fix: store mkeys for each oshmem ctx.
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-15 18:47:50 +02:00
Xin Zhao
79ba752667
ompi/oshmem/spml/ucx: fix eps destroy in shmem_ctx_destroy().
...
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-15 18:47:38 +02:00
Xin Zhao
b00209e1f5
Revert "OMPI/OSHMEM: bug-fix: store mkeys for each oshmem ctx."
...
This reverts commit f1b095c784
.
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-15 18:46:56 +02:00
Sergey Oblomov
d8e3562bae
PML/SPML/UCX: added evaluation of mmap events
...
- there was a set of UCX related issues reported which caused
by mmap API hooks conflicts. We added diagnostic of such
problems to simplify bug-resolving pipeline
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-03-12 21:14:27 +02:00
Joshua Ladd
e57e18f6cc
Merge pull request #6290 from xinzhao3/topic/oshmem_mkeys
...
OMPI/OSHMEM: bug-fix: store mkeys for each oshmem ctx.
2019-02-25 13:09:44 -05:00
Xin Zhao
f1b095c784
OMPI/OSHMEM: bug-fix: store mkeys for each oshmem ctx.
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2019-02-25 16:19:08 +02:00
Gilles Gouaillardet
10cb9f6f9e
oshmem: remove unnecessary dependencies to ORTE
...
either use OPAL or OMPI layers, since ORTE layer
is not present when PMIx RTE is used
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-20 13:55:55 +09:00
Yossi Itigin
83cca9d52a
ucx: add owner.txt for components
...
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-12-01 17:14:03 +02:00
Brian Barrett
e9e4d2a4bc
Handle asprintf errors with opal_asprintf wrapper
...
The Open MPI code base assumed that asprintf always behaved like
the FreeBSD variant, where ptr is set to NULL on error. However,
the C standard (and Linux) only guarantee that the return code will
be -1 on error and leave ptr undefined. Rather than fix all the
usage in the code, we use opal_asprintf() wrapper instead, which
guarantees the BSD-like behavior of ptr always being set to NULL.
In addition to being correct, this will fix many, many warnings
in the Open MPI code base.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-08 16:43:53 -07:00
Sergey Oblomov
e00f7a68ba
MCA/COMMON/UCX: added synonim to opal_mem_hook variable
...
- added synonim to opal_mem_hook variable to allow
to print it in opal_info -a
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-21 15:05:12 +03:00
Sergey Oblomov
d204b8a678
PML/SPML/UCX/COMPONENT: applied C99 initialization
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-28 09:44:03 +03:00
Sergey Oblomov
2806504290
PML/SPML/UCX: init global objects using C99 style
...
- to avoid value mix used C99 style of object initializations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-25 14:52:45 +03:00
Sergey Oblomov
920cc2e0d9
MCA/COMMON/UCX: del_procs calls are unified to common module
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-18 07:37:25 +03:00
Xin Zhao
c429900cd9
OMPI/OSHMEM: add new functionality of OpenSHMEM v1.4.
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-07-16 12:55:25 -07:00
Sergey Oblomov
d51426ff0a
ATOMIC/MXM: fixed abstraction violation
...
- applied workaround for incorrect dynamic module dependency
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-13 14:30:12 +03:00
Sergey Oblomov
240670152e
MCA/COMMON/UCX: code beautify - alignment
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-06 19:40:58 +03:00
Sergey Oblomov
bef47b792c
MCA/COMMON/UCX: unified logging across all UCX modules
...
- added common logging infrastructure for all
UCX modules
- all UCX modules are switched to new infra
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-05 16:25:39 +03:00
Sergey Oblomov
8080283b3d
MCA/COMMON/UCX: changed return type for wait_request
...
- for now wait_request returns OMPI status
- updated callers
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 23:29:38 +03:00
Sergey Oblomov
13331ba4d8
MCA/COMMON/UCX: code beautify + build fix
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 16:37:03 +03:00
Sergey Oblomov
8a793bb279
MCA/COMMON/UCX: fixed build issues
...
- fixed fuild issues when used older UCX
- added non-blocking call of ucp_put call
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 15:58:08 +03:00
Sergey Oblomov
c2bd6af9f2
MCA/COMMON/UCX: minor unification of del_proces calls
...
- some common functionality of del_procs calls is moved into
mca_common module
- blocking ucp_put call is replaced by non-blocking routine
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 15:10:53 +03:00
Sergey Oblomov
952fa8ade7
PML/UCX: method mca_spml_ucx_get_mkey_slow is renamed to get_mkey_slow
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-01 20:44:19 +03:00
Sergey Oblomov
c55db78e93
SPML/UCX: get mkey call refactoring
...
- method mca_spml_ucx_get_mkey_slow is moved into .c module,
added pointer to this method into mca_spml_ucx_t structure
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-01 18:58:14 +03:00
Sergey Oblomov
910e08f5ef
MCA/ATOMICS/UCX: workaround for abstraction violation
...
- some spml calls are marked as inline to exclude cross-module
dependency
- updated get-key call to get link to local module
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-29 16:32:04 +03:00
Sergey Oblomov
502d04bf12
UCX/PML/SPML: fixed few coverity issues
...
- fixed incorrect pointer manipulation/free
- cleaned dead code
- minor optimization on process delete routine
- fixed error handling - free pointers
- added debug output for woker flush failure
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-26 18:52:39 +03:00
Sergey Oblomov
63e7ba6843
MCA/COMMON/UCX: added parameter for UCX/opal progress
...
- added parameter to set UCX/opal progresses
- minor refactoring of request wait routines
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-25 11:00:12 +03:00
Sergey Oblomov
d57ae62dee
MCA/UCX: added common module
...
- implemented non-blocking routines for flush operations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-22 16:41:09 +03:00
Yossi Itigin
705c8a7b9b
Merge pull request #5198 from brminich/shmem_fence
...
OSHMEM/SMPL/UCX: Add real fence support
2018-05-27 11:25:42 +03:00