Sergey Oblomov
7c621acf1b
OPAL/UCX: enabling new API provided by UCX
...
- added detection of new API into configuration
- added tag_send call implemented using new API
- added MPI_Send/MPI_Isend/MPI_Recv/MPI_Irecv implementations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 75bda25ddbeea18bb001f367a712dc72592e1e58)
2020-05-04 10:02:10 +03:00
Joseph Schuchart
7b1beb0f6c
Harmonize return values of progress callbacks
...
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 2c97187ee05e592346206a697ca3d9531d600fcc)
2020-03-30 18:58:57 +02:00
Sergey Oblomov
2fa112c0a6
UCX: added PPN hint for UCX context
...
- added PPN hint for UCX context init
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 43186e494b47ca29e8d5e7a864b6b98b8e873195)
Conflicts:
opal/mca/common/ucx/common_ucx_wpool.c
2019-08-09 11:51:30 +03:00
Nysal Jan K.A
b6da090090
pml/ucx: Fix the max tag and context id values
...
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
(cherry picked from commit fe4ef147f81b2ac56661175005de6c330eace690)
2019-07-03 16:38:07 +03:00
Sergey Oblomov
1edd36638b
PML/UCX: disable PML UCX if MT is requested but not supported
...
- in case if multithreading requested but not supported
disable PML UCX
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit a3578d9ece2b40a349529e7b223df50b0aac64aa)
2019-05-20 09:59:59 +03:00
Sergey Oblomov
14c271f993
PML/SPML/UCX: added evaluation of mmap events
...
- there was a set of UCX related issues reported which caused
by mmap API hooks conflicts. We added diagnostic of such
problems to simplify bug-resolving pipeline
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d8e3562bae700d84873c1d5ca9c45c846d7387ed)
2019-03-14 16:48:25 +02:00
Mikhail Brinskii
1c514948f6
PML/UCX: Use net worker address for remote peers
...
For remote node peers pack smaller worker address, which contains
network device addresses only. This would reduce amount of OOB traffic
during startup.
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 751d88192d05edb7e1912bab4e48643c6f9e1574)
2019-02-21 16:58:20 +02:00
Yossi Itigin
a112d10c93
pml_ucx: initialize req_mpi_object.comm for error handler
...
without this fix, an error handler invoked on pml_ucx request would
segfault while trying to dereference requests[i]->req_mpi_object.comm
(picked from master f36eeef)
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-11-26 11:57:34 +02:00
Yossi Itigin
4a97d6b9fa
pml_ucx: fix return code from mca_pml_ucx_init()
...
(picked from master 40ac9e4)
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:23:49 +03:00
Yossi Itigin
1bffd196ef
pml_ucx: add ompi datatype attribute to release ucp_datatype
...
(picked from master 4763822)
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:23:26 +03:00
Jeff Squyres
2e37f97a38
Miscellaneous compiler warning stomps.
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit fe0852bcb4d14a6aaf5a3e1021f60b5be32dd42d)
2018-09-21 14:35:51 -05:00
Sergey Oblomov
3cace87749
MCA/COMMON/UCX: del_procs calls are unified to common module
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 920cc2e0d9994dfd49062822c89cb502274eb464)
2018-09-19 10:47:27 +03:00
Sergey Oblomov
9215eb9a3b
PML/UCX: blocked calls optimizations
...
- refactoring of opal/UCX progress calls
- added UCX progress priority
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit b0f87f22358914ae9f8fc382daa4052b31ed2aeb)
2018-08-29 14:38:22 +03:00
Boris Karasev
8873d901e8
pmix: added check for pmix fence status
...
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit 57683366ca300fe353e91c52dc9aa0f657120d4d)
Conflicts:
opal/mca/common/ucx/common_ucx.c
opal/mca/common/ucx/common_ucx.h
Modified:
ompi/mca/pml/ucx/pml_ucx.c
oshmem/mca/spml/ucx/spml_ucx.c
2018-08-17 21:33:50 +06:00
Sergey Oblomov
b64502977a
PML/SPML/UCX: init global objects using C99 style
...
- to avoid value mix used C99 style of object initializations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2806504290ff2fdd10f254f878e92cae6e90c854)
2018-07-28 16:47:43 +03:00
Sergey Oblomov
bef47b792c
MCA/COMMON/UCX: unified logging across all UCX modules
...
- added common logging infrastructure for all
UCX modules
- all UCX modules are switched to new infra
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-05 16:25:39 +03:00
Sergey Oblomov
8080283b3d
MCA/COMMON/UCX: changed return type for wait_request
...
- for now wait_request returns OMPI status
- updated callers
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 23:29:38 +03:00
Sergey Oblomov
c2bd6af9f2
MCA/COMMON/UCX: minor unification of del_proces calls
...
- some common functionality of del_procs calls is moved into
mca_common module
- blocking ucp_put call is replaced by non-blocking routine
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 15:10:53 +03:00
Sergey Oblomov
074f30ba27
PML/UCX: suppressed compilation warning
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-27 12:05:07 +03:00
Sergey Oblomov
502d04bf12
UCX/PML/SPML: fixed few coverity issues
...
- fixed incorrect pointer manipulation/free
- cleaned dead code
- minor optimization on process delete routine
- fixed error handling - free pointers
- added debug output for woker flush failure
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-26 18:52:39 +03:00
Gilles Gouaillardet
edd02b7144
pml/ucx: silence a warning
...
declare 'fenced' volatile in order to silence CID 1437465
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-22 13:11:42 +09:00
Sergey Oblomov
5f03628560
PML/UCX: removed uneeded flush
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-21 12:40:46 +03:00
Sergey Oblomov
2745da7dcc
PML/UCX: use non-blocking fence instead of async progress
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-21 09:46:03 +03:00
Sergey Oblomov
10f2d831ec
PML/UCX: fixed hang on MPI_Finalize
...
- added async UCX progress thread to allow
pending requests to complete
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-20 16:12:05 +03:00
Sergey Oblomov
0a8261f3b0
PML/UCX: fixed hand on MPI_Finalize
...
fixes issue https://github.com/openucx/ucx/issues/2656
added flush for worker object to complete all pending operations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-05 17:22:03 +03:00
Yossi Itigin
385f38ab4e
ucx: improve error messages during connection establishment
...
Also, unite common code calling ucp_ep_create()
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-04-30 15:45:05 +03:00
Alex Mikheev
640e945b9c
ompi: pml/ucx: blocking send using ucp_tag_send_nbr
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-01-17 15:54:18 +02:00
Alex Mikheev
e7bf0617cf
ompi: pml ucx: improve recv latency
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-12-26 16:24:16 +02:00
Yossi Itigin
14a93a5992
pml_ucx: fix tag/context_id layout and upper bounds.
...
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-08-27 17:15:48 +03:00
Alina Sklarevich
49913c692a
PML UCX: unite the code for all the sending modes.
...
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-26 13:17:06 +03:00
Alina Sklarevich
d93b67257b
PML UCX: handle a synchronous send.
...
MCA_PML_BASE_SEND_SYNCHRONOUS
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-13 18:11:55 +03:00
Xin Zhao
ee952fcccd
Passing estimated_num_procs to UCX init in PML and SPML.
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-27 20:36:52 +03:00
Xin Zhao
6a99c60fbd
Add multithreading support in PML UCX framework.
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-20 19:55:00 +02:00
Alex Mikheev
152f77df59
ompi: pml ucx: fix datatype packing error in bsend
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-01 16:18:19 +02:00
Alex Mikheev
b015c8bb48
ompi: pml ucx: add support for the buffered send
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-21 17:19:22 +02:00
Xin Zhao
2d77912c19
Revert "PML/SPML/UCX: add UCX MT support to PML and SPML."
...
This reverts commit 0ecf3c951c0a87ab5bdd76a541a69852af671ba9.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2016-12-19 18:57:48 +02:00
Xin Zhao
0ecf3c951c
PML/SPML/UCX: add UCX MT support to PML and SPML.
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2016-12-15 23:59:15 +02:00
Alina Sklarevich
e9d2d029c6
PML/SPML/UCX: Adapt to the API changes in the UCX lib.
...
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2016-12-08 11:33:29 +02:00
Ralph Castain
1e2019ce2a
Revert "Update to sync with OMPI master and cleanup to build"
...
This reverts commit cb55c88a8b7817d5891ff06a447ea190b0e77479.
2016-11-22 15:03:20 -08:00
Ralph Castain
cb55c88a8b
Update to sync with OMPI master and cleanup to build
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-22 14:24:54 -08:00
Yossi Itigin
05ca466c6b
ucx: adapt pml_ucx and spml_ucx to new UCX APIs
...
- pass field_mask to ucp_init().
- use non-blocking disconnect.
- recv() with pre-allocated request.
- call opal_progress() from iprobe() and improbe().
- use shift pattern in connect/disconnect.
2016-10-12 23:45:45 +03:00
Thananon Patinyasakdikul
60d0fbf683
Removal of ompi_request_lock from pml/ucx.
2016-05-26 12:36:58 -04:00
bosilca
b90c83840f
Refactor the request completion ( #1422 )
...
* Remodel the request.
Added the wait sync primitive and integrate it into the PML and MTL
infrastructure. The multi-threaded requests are now significantly
less heavy and less noisy (only the threads associated with completed
requests are signaled).
* Fix the condition to release the request.
2016-05-24 18:20:51 -05:00
Joshua Ladd
18c5a21562
Fix typo in error handling flow.
2016-01-14 22:28:54 +02:00
Joshua Ladd
afa62d8ca1
Addressing reviewers' comments for https://github.com/open-mpi/ompi-release/pull/891
2016-01-14 19:22:27 +02:00
Tomislav Janjusic
3858bc8e62
Adding support for dynamic endpoint creation
...
Signed-off-by: Tomislav Janjusic <tomislavj@mngx-apl-01.mtl.labs.mlnx>
Signed-off-by: Tomislavj Janjusic <tomislavj@mellanox.com>
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-12 22:17:03 +02:00
Alina Sklarevich
3ffd8dcd20
PML UCX: fix typo (following 7becc54d).
2015-12-10 13:51:10 +02:00
yosefe
d66b01d380
pml_ucx: implement cancel, and add small optimizations.
2015-11-10 17:40:06 +02:00
yosefe
45c3d04857
pml_ucx: fix request construct/destruct.
...
We should invoke OBJ_CONTRUCT/OBJ_DESTRUCT only on regular requests
(which are embedded inside UCX requests) and for the completed request.
Persistent requests are already constructed/destructed by the free list.
This fixes an assertion in ompi_request_destruct.
2015-11-04 11:03:46 +02:00
yosefe
ae738d0434
pml_ucx: add pmi fence in del_procs
2015-10-28 18:34:36 +02:00