Sergey Oblomov
43186e494b
UCX: added PPN hint for UCX context
...
- added PPN hint for UCX context init
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-08-05 18:07:06 +03:00
Sergey Oblomov
ebc457baf5
COMMON/UCX: removed ucs stuff
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-16 20:56:30 +03:00
Sergey Oblomov
a0a9306066
COMMON/UCX: init memhooks infra on external hooks only
...
- initialize memory hooks infrastructure only in case
if external memory hooks are requested
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-16 20:13:16 +03:00
Mikhail Brinskii
2ef5bd8b36
SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
...
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-04-26 14:47:58 +03:00
Xin Zhao
9c3d00b144
ompi/oshmem/spml/ucx: use lockfree array to optimize spml_ucx_progress/delete oshmem_barrier in shmem_ctx_destroy
...
ompi/oshmem/spml/ucx: optimize spml ucx progress
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-21 23:01:45 +02:00
Xin Zhao
e1c1ab0202
ompi/oshmem/spml/ucx: defer clean up shmem_ctx to shmem_finalize
...
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-03-21 23:01:37 +02:00
Sergey Oblomov
c319cf9ade
COMMON/UCX: rewording of hooks suggestion
...
- also updated output macro
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-03-14 11:00:57 +02:00
Sergey Oblomov
d8e3562bae
PML/SPML/UCX: added evaluation of mmap events
...
- there was a set of UCX related issues reported which caused
by mmap API hooks conflicts. We added diagnostic of such
problems to simplify bug-resolving pipeline
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-03-12 21:14:27 +02:00
Artem Polyakov
91d6115d99
opal/common/ucx: Adjust the threasholds for periodical flushes
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
3aadc2b5e1
opal/common/ucx: Fix periodical flush in the worker pool
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
84dfe1277c
opal/common/ucx: Rename wpool recv_worker to dflt_worker
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
8a990c2b64
opal/common/ucx: Add comments clarifying data structures
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
19e2ae2efb
opal/common/ucx: Switch to opal/tsd
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
7984d7d997
opal/common/ucx: Remove unused debugging macro
...
Will be reintroduced later if needed and after adaptation to the OMPI
infrastructure.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
43f16d8796
opal/common/ucx: Remove common_ucx_int.h
...
Place the content of common_ucx_int.h back to the common_ucx.h and
include common_ucx_wpool.h explicitly.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
bb7d360621
opal/common/ucx: add refcnt in tlocal_ctx_tbl entry to keep track of usage
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
101036651b
opal/common/ucx: Fix the bug in wpool's periodical flush
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
bcb52ecade
opal/common/ucx: add winfo ptr into req
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
33517428a1
opal/common/ucx: add periodical flush and counter to opal directory.
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
1fa7054041
opal/common/ucx: use trylock in opal_common_progress
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
2d3cffe1a3
opal/common/ucx: replace opal_mutex_t with opal_recursive_mutex_t
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
aa26a724ed
opal/common/ucx: introduce internal UCX request in wpool.
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
07cb4134be
opal/common/ucx: Set of bug fixes in wpool
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
344bb641a1
opal/common/ucx: Minor changes in wpool
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
9fb9cfbe8e
opal/common/ucx: Simplify Worker Pool TLS structure
...
Get rid of unneeded context and memory region identifiers
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
1e7bf7085d
opal/common/ucx: Improve/fix debug output macro's
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Artem Polyakov
fd98ee14eb
opal/common/ucx: Code cleanup
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Artem Polyakov
f38c9f3e5f
opal/common/ucx: Simplify Worker Pool memory handler
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Artem Polyakov
6b7acdf21f
opal/common/ucx: Somplify Worker Pool context management
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Xin Zhao
8b7fa927ba
opal/common/ucx: Add fetch primitives to wpool
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Xin Zhao
bfbf818fe1
opal/common/ucx: Complete initialization of the Worker Pool
...
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Artem Polyakov
e28fadb048
opal/common/ucx: Introduce Worker Pool (wpool) functionality
...
Worker Pool is an object containing/managing a set of UCX workers
and providing access to those workers through a smal interface
to allow Multi-Threaded applicatoins to access multiple HW contexts.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:06 -08:00
Jeff Squyres
dd20174532
Remove opal/mca/common/ofi.
...
It never lived up to its purpose (and has caused amorphous indirect
errors such as https://github.com/open-mpi/ompi/issues/2519 ), so
delete it.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 06:29:58 -08:00
Jeff Squyres
3f4af8e51c
opal/common: remove stale common components
...
The verbs and verbs_usnic components are now no longer necessary / no
longer used anywhere in the code base.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 05:36:06 -08:00
Yossi Itigin
83cca9d52a
ucx: add owner.txt for components
...
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-12-01 17:14:03 +02:00
Sergey Oblomov
1099d5f023
COMMON/UCX: added error code to log output
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-21 11:37:25 +03:00
Sergey Oblomov
df765595e3
COMMON/UCX: suppressed coverity warnings
...
- suppressed coverity warnings - added log messages on failed calls
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-17 16:11:03 +03:00
Brian Barrett
e9e4d2a4bc
Handle asprintf errors with opal_asprintf wrapper
...
The Open MPI code base assumed that asprintf always behaved like
the FreeBSD variant, where ptr is set to NULL on error. However,
the C standard (and Linux) only guarantee that the return code will
be -1 on error and leave ptr undefined. Rather than fix all the
usage in the code, we use opal_asprintf() wrapper instead, which
guarantees the BSD-like behavior of ptr always being set to NULL.
In addition to being correct, this will fix many, many warnings
in the Open MPI code base.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-08 16:43:53 -07:00
Gilles Gouaillardet
db65dbd9a8
ucx: use the c99 __func__ macro instead
...
__FUNCTION__ macro was never standardized and should not be used.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-09-25 11:19:18 +09:00
Nathan Hjelm
000f9eed4d
opal: add types for atomic variables
...
This commit updates the entire codebase to use specific opal types for
all atomic variables. This is a change from the prior atomic support
which required the use of the volatile keyword. This is the first step
towards implementing support for C11 atomics as that interface
requires the use of types declared with the _Atomic keyword.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-14 10:48:55 -06:00
Jeff Squyres
05e5f61fe1
common/verbs-usnic: check that it will actually compile
...
If someone specifies --with-verbs-usnic, actually do a configury check
to ensure that it will compile (vs. assuming that it will compile if
someone asks for it).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-30 13:10:49 -07:00
Yossi Itigin
68206a5635
Merge pull request #5569 from hoopoepg/topic/optimize-blocked-calls
...
PML/UCX: blocked calls optimizations
2018-08-29 14:19:09 +03:00
Yossi Itigin
4bb6845888
Merge pull request #5570 from hoopoepg/topic/opal-mem-hooks-syno
...
MCA/COMMON/UCX: added synonym to opal_mem_hook variable
2018-08-29 14:16:33 +03:00
Sergey Oblomov
2cd9e04166
PML/UCX: optimization of mprobe call - renamed vars
...
- renamed of internal variable names
- used unsigned datatypes
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-27 09:50:39 +03:00
Sergey Oblomov
38e908f83e
PML/UCX: optimization of mprobe call
...
- refactoring of opal/UCX progress calls
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-27 09:50:38 +03:00
Sergey Oblomov
b0f87f2235
PML/UCX: blocked calls optimizations
...
- added UCX progress priority
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-27 09:50:38 +03:00
Sergey Oblomov
b72dd83f05
MCA/COMMON/UCX: added synonims for common ucx variables
...
- added synonims for atomic/osc modules
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-26 18:25:21 +03:00
Jeff Squyres
fe0852bcb4
Miscellaneous compiler warning stomps.
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-24 07:39:14 -07:00
Sergey Oblomov
6a7f66d9c2
MCA/COMMON/UCX: renamed synonim to opal_mem_hook variable
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-22 14:12:33 +03:00
Sergey Oblomov
e00f7a68ba
MCA/COMMON/UCX: added synonim to opal_mem_hook variable
...
- added synonim to opal_mem_hook variable to allow
to print it in opal_info -a
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-21 15:05:12 +03:00