This fix ensures that all progress functions return the number of
completed events.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 72501f8f9c)
- increase number of max segments to allow application be launched
on some Ubuntu configurations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit f742f289ea)
Improves the performance when excess non-blocking operations are posted
by periodically calling progress on ucx workers.
Co-authored with:
Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 1b58e3d073)
1) Race condition: Do not add private contexts to active list.
Private contexts are only visible to the user.
2) Recycled contexts: Destroyed contexts are put on an idle list until
finalize, continuous context creation will lead to oom condition.
Instead, check if context from idle list meets new context requirements
and reuse it.
Co-authored with: Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit bd7cdf718488627e7943aab34275c150baf2284a)
- due to some refactoring and adding new functionality compilation
of ikrit module was broken
- this commit restores compilation
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 991082abf2)
- added synchronized flush operation on quiet call.
- flush is implemented using get operation
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 0b108411f8)
- in some cases realloc operation may be completed without
allocation of new buffer (and without additional data copy)
- added logic to reallocate buffer inplace if possible
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 277c2a9e5c)
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 2ef5bd8b36)
The new proc group is created from the "world_group" based on the
ranks mapping which can be directly taken from proc_name->vpid.
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
- there was a set of UCX related issues reported which caused
by mmap API hooks conflicts. We added diagnostic of such
problems to simplify bug-resolving pipeline
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d8e3562bae)
Fixes scoll_basic failures with shmem_verifier, caused by recent changes
in handling of zero-size collectives.
- Check for zero-size length only for fixed size collect (shmem_fcollect),
but not for variable-size collect (shmem_collect)
- Add 'nlong_type' parameter to internal broadcast function, to indicate
whether the 'nlong' parameter is valid on non-root PEs, since it's
used by shmem_collect algorithm. Before this change, some components
assumed it's true (scoll_mpi) while others assumed it's false
(scoll_basic).
- In scoll_basic, if nlong_type==false, do not exit if nlong==0, since
this parameter may not be the same on all PEs.
- In scoll_mpi, fallback to scoll_basic if nlong_type==false, since MPI
requires the 'count' argument of MPI_Bcast to be valid on all ranks.
(Picked from master 939162e)
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
- according spec 1.4, annex C shmem collectives should process
calls where number of elements is zero independently from pointer
value
- added zero-count processing - it just call barrier to
sync ranks
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 9de128afaf)
mca_scoll_basic_alltoall() passed (pSync + 1) to barrier function, but
the value of _SHMEM_ALLTOALL_SYNC_SIZE is 1, which made the barrier
function use an invalid memory location. In particular, this location
was not initialized to _SHMEM_SYNC_VALUE, which broke the barrier
algorithm and it did not complete: One PE could read 0 from its peer and
assume the peer already started the barrier, and then write 1 to the
peer. Then, the peer entered the barrier and overwrote the 1 with 0, and
then it waited forever to see '1' in its pSync.
Found with shmem_verifier test suite.
(picked from master 6754bf1)
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
- added synonim to common ucx variables to allow
to print it in opal_info -a
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e00f7a68ba)
- to avoid value mix used C99 style of object initializations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2806504290)
- there is C11 naming conflict - atomic_init is C macro
which cause building issue
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 3295b23800)