1
1
Граф коммитов

374 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
0425a7a7d8 Consistent return from all progress functions.
This fix ensures that all progress functions return the number of
completed events.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 72501f8f9c)
2020-03-30 19:00:03 +02:00
Artem Polyakov
e5cdf2612a timings: Update/extend OSHMEM timings
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
(cherry picked from commit 0f51ea3fe5)
2020-03-11 21:05:34 -07:00
Sergey Oblomov
45a722ad6a OSHMEM/SEGMENTS: increase number of max segments
- increase number of max segments to allow application be launched
  on some Ubuntu configurations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit f742f289ea)
2020-02-10 07:44:50 +02:00
Sergey Oblomov
91ab0e2191 SPML/UCX: fixed coverity issues
- fixed sizeof(char***) by variable datatype
- fixed resorce leak in proc_add

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 8543860689)
2020-01-24 17:29:53 +02:00
Tomislav Janjusic
0daf3df384 oshmem/ucx: improves spml ucx performance for multi-threaded
applications.

Improves multi-threaded performance by adding the option to create
multiple ucx workers in threaded applications.

Co-authored with:
Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 3d6bf9fd8e)
2020-01-24 17:29:53 +02:00
Tomislav Janjusic
9e755d3803 oshmem/ucx: Improves performance for non-blocking put/get operations.
Improves the performance when excess non-blocking operations are posted
by periodically calling progress on ucx workers.

Co-authored with:
Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 1b58e3d073)
2020-01-22 21:45:32 +02:00
Tomislav Janjusic
ae30df4bae oshmem/ucx: fixed a build issue
Co-authored with: Artem Polyakov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit cb5ff55b27)
2019-12-19 21:20:05 +02:00
Tomislav Janjusic
b84074d4d0 oshmem:ucx, fix race condition and add context recycling
1) Race condition: Do not add private contexts to active list.
Private contexts are only visible to the user.
2) Recycled contexts: Destroyed contexts are put on an idle list until
finalize, continuous context creation will lead to oom condition.
Instead, check if context from idle list meets new context requirements
and reuse it.

Co-authored with: Artem Y. Polyakov <artemp@mellanox.com>,
                  Manjunath Gorentla Venkata <manjunath@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit bd7cdf718488627e7943aab34275c150baf2284a)
2019-12-10 17:52:31 +02:00
Joseph Schuchart
ad86d043cf Shmem: use bitwise and instead of logical and to check for allocator capabilities
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 9f2c6a42c3)
2019-11-06 08:46:06 +01:00
Sergey Oblomov
f8843bba7c IKRIT: restored compilation
- due to some refactoring and adding new functionality compilation
  of ikrit module was broken
- this commit restores compilation

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 991082abf2)
2019-09-23 15:49:50 +03:00
Sergey Oblomov
1f9fce8955 SPML/UCX: fixed comment
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 01dacaa6a4)
2019-08-22 11:42:03 +03:00
Sergey Oblomov
66e18563bf SPML/UCX: fixed hang in SHMEM_FINALIZE
- used MPI _Barrier to synchronize processes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 182023febb)
2019-08-22 11:41:52 +03:00
Sergey Oblomov
2fa112c0a6 UCX: added PPN hint for UCX context
- added PPN hint for UCX context init

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 43186e494b)

Conflicts:
	opal/mca/common/ucx/common_ucx_wpool.c
2019-08-09 11:51:30 +03:00
Howard Pritchard
a42977f1c2
Merge pull request #6707 from hoopoepg/topic/alloc-with-hint-realloc-inplace-v4.0
ALLOC_WITH_HINT: added inplace realloc - v4.0
2019-06-06 10:16:57 -07:00
Geoff Paulsen
bd602cc3a0
Merge pull request #6701 from hoopoepg/topic/sshmem-mpi-coll-collect-v4.0
SSHMEM/COLL: added sshmem/mpi implementation for shmem_collect call - v4.0
2019-06-05 13:02:43 -05:00
Sergey Oblomov
69923e78c7 SPML/UCX: added synchronized flush on quiet
- added synchronized flush operation on quiet call.
- flush is implemented using get operation

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 0b108411f8)
2019-05-30 18:08:33 +03:00
Sergey Oblomov
456c5b90ae OSHMEM: minor optimization of realloc in shadow allocator
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d6a0912024)
2019-05-27 11:44:32 +03:00
Sergey Oblomov
748a5f5e73 SHADOW ALLOCATOR: minor code optimization
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit a51badd627)
2019-05-27 11:44:26 +03:00
Sergey Oblomov
f75d46faa9 ALLOC_WITH_HINT: added implace realloc
- in some cases realloc operation may be completed without
  allocation of new buffer (and without additional data copy)
- added logic to reallocate buffer inplace if possible

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 277c2a9e5c)
2019-05-27 11:44:18 +03:00
Sergey Oblomov
c142605566 SSHMEM/COLL: added sshmem/mpi implementation for shmem_collect call
- added MPI based implementation of shmem_collect call

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 7d8cb75b2e)
2019-05-23 15:34:12 +03:00
Geoff Paulsen
c22326e59a
Merge pull request #6652 from yosefe/topic/alloc-with-hint-impl-master-v4.0.x
OSHMEM: Add support for shmemx_malloc_with_hint() - v4.0.x
2019-05-17 15:48:35 -05:00
Yossi Itigin
fbd6798bf8 OSHMEM/MMAP/SYSV: Return ERR_NOT_IMPLEMENTED if segment hint != 0
(picked from master f708674)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-15 17:11:24 +03:00
Mikhail Brinskii
ff9ecc183f SPML/UCX: Fix coverity error
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit d81dc533f6)
2019-05-15 14:20:05 +03:00
Yossi Itigin
fc41c16134 OSHMEM: Add support for shmemx_malloc_with_hint()
- added multiple segments processing
- added shmemx_malloc_with_hint call + set of hints

(picked from master 94b5e91)

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-12 11:42:59 +03:00
Mikhail Brinskii
6861a68de6 SPML/UCS: CR comments p2
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit d4843b1651)
2019-05-02 21:27:15 +03:00
Mikhail Brinskii
1c56f49a44 SPML/UCX: CR comments p1
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit c4c99457db)
2019-05-02 21:26:55 +03:00
Mikhail Brinskii
e4ee56d1f3 SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 2ef5bd8b36)
2019-05-02 21:25:59 +03:00
Valentin Petrov
281f78c6e4 Fixes the O(N^2) loop in the mca_scoll_mpi_comm_query
The new proc group is created from the "world_group" based on the
      ranks mapping which can be directly taken from proc_name->vpid.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-15 08:43:09 +03:00
Ben Menadue
001fa5b6ce Add missing nlong_type parameter to call to original broadcast in scoll/fca broadcast.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>
2019-04-03 14:01:41 +11:00
Xin Zhao
69a80fce9f ompi/oshmem/spml/ucx: use lockfree array to optimize spml_ucx_progress/delete oshmem_barrier in shmem_ctx_destroy
ompi/oshmem/spml/ucx: optimize spml ucx progress

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9c3d00b144)
2019-03-21 23:59:58 +02:00
Xin Zhao
580b584179 ompi/oshmem/spml/ucx:delete oob path of getting rkeys in spml ucx
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e0414006b0)
2019-03-21 23:59:46 +02:00
Xin Zhao
596997c194 ompi/oshmem/spml/ucx: defer clean up shmem_ctx to shmem_finalize
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e1c1ab0202)
2019-03-21 23:58:23 +02:00
Xin Zhao
ce54b63b90 ompi/oshmem: add spml_context back to sshmem_type in memheap, to keep track of ucx_ctx_default's rkeys
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 48033ac1f4)
2019-03-20 23:30:21 +02:00
Xin Zhao
06183a7bec ompi/oshmem/spml/ucx: let shmem_finalize to clean up any ctx left
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9a06000962)
2019-03-20 23:30:09 +02:00
Xin Zhao
91793484ed OMPI/OSHMEM: bug-fix: store mkeys for each oshmem ctx.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 289595e45d)
2019-03-20 23:29:53 +02:00
Xin Zhao
f666d75322 ompi/oshmem/spml/ucx: fix eps destroy in shmem_ctx_destroy().
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 79ba752667)
2019-03-20 23:29:38 +02:00
Sergey Oblomov
14c271f993 PML/SPML/UCX: added evaluation of mmap events
- there was a set of UCX related issues reported which caused
  by mmap API hooks conflicts. We added diagnostic of such
  problems to simplify bug-resolving pipeline

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d8e3562bae)
2019-03-14 16:48:25 +02:00
Yossi Itigin
ad4b33336d oshmem/scoll: fix shmem_collect32/64 for zero-size length
Fixes scoll_basic failures with shmem_verifier, caused by recent changes
in handling of zero-size collectives.

- Check for zero-size length only for fixed size collect (shmem_fcollect),
  but not for variable-size collect (shmem_collect)
- Add 'nlong_type' parameter to internal broadcast function, to indicate
  whether the 'nlong' parameter is valid on non-root PEs, since it's
  used by shmem_collect algorithm. Before this change, some components
  assumed it's true (scoll_mpi) while others assumed it's false
  (scoll_basic).
- In scoll_basic, if nlong_type==false, do not exit if nlong==0, since
  this parameter may not be the same on all PEs.
- In scoll_mpi, fallback to scoll_basic if nlong_type==false, since MPI
  requires the 'count' argument of MPI_Bcast to be valid on all ranks.

(Picked from master 939162e)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-01-02 12:15:01 +02:00
Sergey Oblomov
5838760a3a OSHMEM/COLL/BCAST: removed unnecessary bcast call
- removed unnecessary bcast call on zero-length request

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit c93927e27a)
2018-11-27 14:26:56 +02:00
Sergey Oblomov
0a064d8c8d OSHMEM/COLL: optimization on zero-length ops
- removed barrier call on zero-length operations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit ff2fd0679e)
2018-11-27 14:26:52 +02:00
Sergey Oblomov
dea9cf6b63 OSHMEM: added processing of zero-length collectives
- according spec 1.4, annex C shmem collectives should process
  calls where number of elements is zero independently from pointer
  value
- added zero-count processing - it just call barrier to
  sync ranks

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 9de128afaf)
2018-11-27 14:26:44 +02:00
Yossi Itigin
8a329a797c SCOLL/BASIC: Fix invalid pSync pointer passed to barrier func
mca_scoll_basic_alltoall() passed (pSync + 1) to barrier function, but
the value of _SHMEM_ALLTOALL_SYNC_SIZE is 1, which made the barrier
function use an invalid memory location. In particular, this location
was not initialized to _SHMEM_SYNC_VALUE, which broke the barrier
algorithm and it did not complete: One PE could read 0 from its peer and
assume the peer already started the barrier, and then write 1 to the
peer. Then, the peer entered the barrier and overwrote the 1 with 0, and
then it waited forever to see '1' in its pSync.

Found with shmem_verifier test suite.

(picked from master 6754bf1)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-31 12:22:19 +02:00
Sergey Oblomov
3cace87749 MCA/COMMON/UCX: del_procs calls are unified to common module
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 920cc2e0d9)
2018-09-19 10:47:27 +03:00
Sergey Oblomov
028bcb8a73 MCA/COMMON/UCX: added synonim to opal_mem_hook variable
- added synonim to common ucx variables to allow
  to print it in opal_info -a

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e00f7a68ba)
2018-08-29 15:17:00 +03:00
Boris Karasev
8873d901e8 pmix: added check for pmix fence status
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit 57683366ca)

Conflicts:
	opal/mca/common/ucx/common_ucx.c
	opal/mca/common/ucx/common_ucx.h

Modified:
	ompi/mca/pml/ucx/pml_ucx.c
	oshmem/mca/spml/ucx/spml_ucx.c
2018-08-17 21:33:50 +06:00
Sergey Oblomov
b64502977a PML/SPML/UCX: init global objects using C99 style
- to avoid value mix used C99 style of object initializations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2806504290)
2018-07-28 16:47:43 +03:00
Sergey Oblomov
58b7786b70 MCA/ATOMIC: atomic_init renamed to atomic_startup
- there is C11 naming conflict - atomic_init is C macro
  which cause building issue

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 3295b23800)
2018-07-24 17:23:42 +03:00
Xin Zhao
c429900cd9 OMPI/OSHMEM: add new functionality of OpenSHMEM v1.4.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-07-16 12:55:25 -07:00
Yossi Itigin
9d0b3a42aa
Merge pull request #5423 from hoopoepg/topic/bitwise-atomics-renaming
ATOMICS: renamed atomic calls to unsigned datatypes
2018-07-16 19:08:02 +03:00
Sergey Oblomov
bd84165277 ATOMICS: renamed atomic calls to unsigned datatypes
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-13 15:32:16 +03:00