Improves the performance when excess non-blocking operations are posted
by periodically calling progress on ucx workers.
Co-authored with:
Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 1b58e3d07388c8c63d485fe308589009279c1f4f)
1) Race condition: Do not add private contexts to active list.
Private contexts are only visible to the user.
2) Recycled contexts: Destroyed contexts are put on an idle list until
finalize, continuous context creation will lead to oom condition.
Instead, check if context from idle list meets new context requirements
and reuse it.
Co-authored with: Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit bd7cdf718488627e7943aab34275c150baf2284a)
- added synchronized flush operation on quiet call.
- flush is implemented using get operation
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 0b108411f89727a68cd622f3b04c783efa359b8e)
- in some cases realloc operation may be completed without
allocation of new buffer (and without additional data copy)
- added logic to reallocate buffer inplace if possible
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 277c2a9e5c7711098be826e6c154253747fdad9a)
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 2ef5bd8b3671f1e10caf00d06d66d120eac9c5be)
- there was a set of UCX related issues reported which caused
by mmap API hooks conflicts. We added diagnostic of such
problems to simplify bug-resolving pipeline
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d8e3562bae700d84873c1d5ca9c45c846d7387ed)
- to avoid value mix used C99 style of object initializations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2806504290ff2fdd10f254f878e92cae6e90c854)
- added common logging infrastructure for all
UCX modules
- all UCX modules are switched to new infra
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
- some common functionality of del_procs calls is moved into
mca_common module
- blocking ucp_put call is replaced by non-blocking routine
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
- method mca_spml_ucx_get_mkey_slow is moved into .c module,
added pointer to this method into mca_spml_ucx_t structure
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
- some spml calls are marked as inline to exclude cross-module
dependency
- updated get-key call to get link to local module
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
+ Add quiet method to SPML, so it can have different implementation with
fence.
+ Use ucp_worker_fence for spml_fence method of UCX SPML
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
The hook is called from memheap when memory range
is going to be allocated by smalloc(), realloc() and others.
ucx spml uses this hook to call ucp_mem_advise in order to speedup
non blocking memory mapping.
Signed-off-by: Alex Mikheev <alexm@mellanox.com>