- there was a set of UCX related issues reported which caused
by mmap API hooks conflicts. We added diagnostic of such
problems to simplify bug-resolving pipeline
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d8e3562bae)
pshmem.h now includes shmem.h (since open-mpi/ompi@f46130cd20) and some macros were removed at that time.
Use the OSHMEM_HAVE_C11 macro (defined in shmem.h) instead of the
previous OSHMEMP_HAVE_C11 macrso previously defined in pshmem.h
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 5ea939aa54)
Fixes scoll_basic failures with shmem_verifier, caused by recent changes
in handling of zero-size collectives.
- Check for zero-size length only for fixed size collect (shmem_fcollect),
but not for variable-size collect (shmem_collect)
- Add 'nlong_type' parameter to internal broadcast function, to indicate
whether the 'nlong' parameter is valid on non-root PEs, since it's
used by shmem_collect algorithm. Before this change, some components
assumed it's true (scoll_mpi) while others assumed it's false
(scoll_basic).
- In scoll_basic, if nlong_type==false, do not exit if nlong==0, since
this parameter may not be the same on all PEs.
- In scoll_mpi, fallback to scoll_basic if nlong_type==false, since MPI
requires the 'count' argument of MPI_Bcast to be valid on all ranks.
(Picked from master 939162e)
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
- according spec 1.4, annex C shmem collectives should process
calls where number of elements is zero independently from pointer
value
- added zero-count processing - it just call barrier to
sync ranks
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 9de128afaf)
- added <cr> to split API groups to simplify human processing
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 6e78102089)
- added missing file to profile makefile
- constants SHMEM_CTX_* are shifted into public header
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 4a3e83780c)
mca_scoll_basic_alltoall() passed (pSync + 1) to barrier function, but
the value of _SHMEM_ALLTOALL_SYNC_SIZE is 1, which made the barrier
function use an invalid memory location. In particular, this location
was not initialized to _SHMEM_SYNC_VALUE, which broke the barrier
algorithm and it did not complete: One PE could read 0 from its peer and
assume the peer already started the barrier, and then write 1 to the
peer. Then, the peer entered the barrier and overwrote the 1 with 0, and
then it waited forever to see '1' in its pSync.
Found with shmem_verifier test suite.
(picked from master 6754bf1)
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
- added synonim to common ucx variables to allow
to print it in opal_info -a
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e00f7a68ba)
- to avoid value mix used C99 style of object initializations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2806504290)
- there is C11 naming conflict - atomic_init is C macro
which cause building issue
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 3295b23800)
- added implementation and/or/xor operations for post and
fetch-op notations
- implemented basic and UCX transports, mxm added
NON-IMPLEMENTED wrapper
- updated C interfaces only (fortran will be added later)
- existing API is not updated to spec v1.4
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
- added common logging infrastructure for all
UCX modules
- all UCX modules are switched to new infra
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
- removed atomic-basic-specific operand from module API
- added own calls for add and swap operations
- minor optimization for UCX atomics
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
- some common functionality of del_procs calls is moved into
mca_common module
- blocking ucp_put call is replaced by non-blocking routine
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>