Sergey Oblomov
d8e3562bae
PML/SPML/UCX: added evaluation of mmap events
...
- there was a set of UCX related issues reported which caused
by mmap API hooks conflicts. We added diagnostic of such
problems to simplify bug-resolving pipeline
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-03-12 21:14:27 +02:00
Mikhail Brinskii
751d88192d
PML/UCX: Use net worker address for remote peers
...
For remote node peers pack smaller worker address, which contains
network device addresses only. This would reduce amount of OOB traffic
during startup.
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-02-14 18:06:36 +02:00
Yossi Itigin
f36eeef4c5
pml_ucx: initialize req_mpi_object.comm for error handler
...
without this fix, an error handler invoked on pml_ucx request would
segfault while trying to dereference requests[i]->req_mpi_object.comm
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-11-25 19:37:54 +02:00
Yossi Itigin
b71e85b8d5
pml_ucx: fix return code from mca_pml_ucx_init() error flow
...
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-11 18:48:54 +03:00
Yossi Itigin
40ac9e4771
pml_ucx: fix return code from mca_pml_ucx_init()
...
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 14:41:05 +03:00
Yossi Itigin
4763822a64
pml_ucx: add ompi datatype attribute to release ucp_datatype
...
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-09 17:34:34 +03:00
Sergey Oblomov
c201c0abb3
PML/UCX: blocked calls optimizations: removed reset progress count
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-27 09:50:39 +03:00
Sergey Oblomov
2cd9e04166
PML/UCX: optimization of mprobe call - renamed vars
...
- renamed of internal variable names
- used unsigned datatypes
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-27 09:50:39 +03:00
Sergey Oblomov
38e908f83e
PML/UCX: optimization of mprobe call
...
- refactoring of opal/UCX progress calls
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-27 09:50:38 +03:00
Sergey Oblomov
b0f87f2235
PML/UCX: blocked calls optimizations
...
- added UCX progress priority
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-27 09:50:38 +03:00
Jeff Squyres
fe0852bcb4
Miscellaneous compiler warning stomps.
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-24 07:39:14 -07:00
Sergey Oblomov
2806504290
PML/SPML/UCX: init global objects using C99 style
...
- to avoid value mix used C99 style of object initializations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-25 14:52:45 +03:00
Sergey Oblomov
920cc2e0d9
MCA/COMMON/UCX: del_procs calls are unified to common module
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-18 07:37:25 +03:00
Sergey Oblomov
bef47b792c
MCA/COMMON/UCX: unified logging across all UCX modules
...
- added common logging infrastructure for all
UCX modules
- all UCX modules are switched to new infra
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-05 16:25:39 +03:00
Sergey Oblomov
8080283b3d
MCA/COMMON/UCX: changed return type for wait_request
...
- for now wait_request returns OMPI status
- updated callers
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 23:29:38 +03:00
Sergey Oblomov
c2bd6af9f2
MCA/COMMON/UCX: minor unification of del_proces calls
...
- some common functionality of del_procs calls is moved into
mca_common module
- blocking ucp_put call is replaced by non-blocking routine
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 15:10:53 +03:00
Sergey Oblomov
074f30ba27
PML/UCX: suppressed compilation warning
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-27 12:05:07 +03:00
Sergey Oblomov
502d04bf12
UCX/PML/SPML: fixed few coverity issues
...
- fixed incorrect pointer manipulation/free
- cleaned dead code
- minor optimization on process delete routine
- fixed error handling - free pointers
- added debug output for woker flush failure
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-26 18:52:39 +03:00
Gilles Gouaillardet
edd02b7144
pml/ucx: silence a warning
...
declare 'fenced' volatile in order to silence CID 1437465
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-22 13:11:42 +09:00
Sergey Oblomov
5f03628560
PML/UCX: removed uneeded flush
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-21 12:40:46 +03:00
Sergey Oblomov
2745da7dcc
PML/UCX: use non-blocking fence instead of async progress
...
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-21 09:46:03 +03:00
Sergey Oblomov
10f2d831ec
PML/UCX: fixed hang on MPI_Finalize
...
- added async UCX progress thread to allow
pending requests to complete
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-20 16:12:05 +03:00
Sergey Oblomov
0a8261f3b0
PML/UCX: fixed hand on MPI_Finalize
...
fixes issue https://github.com/openucx/ucx/issues/2656
added flush for worker object to complete all pending operations
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-05 17:22:03 +03:00
Yossi Itigin
385f38ab4e
ucx: improve error messages during connection establishment
...
Also, unite common code calling ucp_ep_create()
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-04-30 15:45:05 +03:00
Alex Mikheev
640e945b9c
ompi: pml/ucx: blocking send using ucp_tag_send_nbr
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2018-01-17 15:54:18 +02:00
Alex Mikheev
e7bf0617cf
ompi: pml ucx: improve recv latency
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-12-26 16:24:16 +02:00
Yossi Itigin
14a93a5992
pml_ucx: fix tag/context_id layout and upper bounds.
...
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2017-08-27 17:15:48 +03:00
Alina Sklarevich
49913c692a
PML UCX: unite the code for all the sending modes.
...
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-26 13:17:06 +03:00
Alina Sklarevich
d93b67257b
PML UCX: handle a synchronous send.
...
MCA_PML_BASE_SEND_SYNCHRONOUS
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2017-04-13 18:11:55 +03:00
Xin Zhao
ee952fcccd
Passing estimated_num_procs to UCX init in PML and SPML.
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-27 20:36:52 +03:00
Xin Zhao
6a99c60fbd
Add multithreading support in PML UCX framework.
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2017-03-20 19:55:00 +02:00
Alex Mikheev
152f77df59
ompi: pml ucx: fix datatype packing error in bsend
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-03-01 16:18:19 +02:00
Alex Mikheev
b015c8bb48
ompi: pml ucx: add support for the buffered send
...
Signed-off-by: Alex Mikheev <alexm@mellanox.com>
2017-02-21 17:19:22 +02:00
Xin Zhao
2d77912c19
Revert "PML/SPML/UCX: add UCX MT support to PML and SPML."
...
This reverts commit 0ecf3c951c0a87ab5bdd76a541a69852af671ba9.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2016-12-19 18:57:48 +02:00
Xin Zhao
0ecf3c951c
PML/SPML/UCX: add UCX MT support to PML and SPML.
...
Signed-off-by: Xin Zhao <xinz@mellanox.com>
2016-12-15 23:59:15 +02:00
Alina Sklarevich
e9d2d029c6
PML/SPML/UCX: Adapt to the API changes in the UCX lib.
...
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
2016-12-08 11:33:29 +02:00
Ralph Castain
1e2019ce2a
Revert "Update to sync with OMPI master and cleanup to build"
...
This reverts commit cb55c88a8b7817d5891ff06a447ea190b0e77479.
2016-11-22 15:03:20 -08:00
Ralph Castain
cb55c88a8b
Update to sync with OMPI master and cleanup to build
...
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2016-11-22 14:24:54 -08:00
Yossi Itigin
05ca466c6b
ucx: adapt pml_ucx and spml_ucx to new UCX APIs
...
- pass field_mask to ucp_init().
- use non-blocking disconnect.
- recv() with pre-allocated request.
- call opal_progress() from iprobe() and improbe().
- use shift pattern in connect/disconnect.
2016-10-12 23:45:45 +03:00
Thananon Patinyasakdikul
60d0fbf683
Removal of ompi_request_lock from pml/ucx.
2016-05-26 12:36:58 -04:00
bosilca
b90c83840f
Refactor the request completion ( #1422 )
...
* Remodel the request.
Added the wait sync primitive and integrate it into the PML and MTL
infrastructure. The multi-threaded requests are now significantly
less heavy and less noisy (only the threads associated with completed
requests are signaled).
* Fix the condition to release the request.
2016-05-24 18:20:51 -05:00
Joshua Ladd
18c5a21562
Fix typo in error handling flow.
2016-01-14 22:28:54 +02:00
Joshua Ladd
afa62d8ca1
Addressing reviewers' comments for https://github.com/open-mpi/ompi-release/pull/891
2016-01-14 19:22:27 +02:00
Tomislav Janjusic
3858bc8e62
Adding support for dynamic endpoint creation
...
Signed-off-by: Tomislav Janjusic <tomislavj@mngx-apl-01.mtl.labs.mlnx>
Signed-off-by: Tomislavj Janjusic <tomislavj@mellanox.com>
Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>
2016-01-12 22:17:03 +02:00
Alina Sklarevich
3ffd8dcd20
PML UCX: fix typo (following 7becc54d).
2015-12-10 13:51:10 +02:00
yosefe
d66b01d380
pml_ucx: implement cancel, and add small optimizations.
2015-11-10 17:40:06 +02:00
yosefe
45c3d04857
pml_ucx: fix request construct/destruct.
...
We should invoke OBJ_CONTRUCT/OBJ_DESTRUCT only on regular requests
(which are embedded inside UCX requests) and for the completed request.
Persistent requests are already constructed/destructed by the free list.
This fixes an assertion in ompi_request_destruct.
2015-11-04 11:03:46 +02:00
yosefe
ae738d0434
pml_ucx: add pmi fence in del_procs
2015-10-28 18:34:36 +02:00
yosefe
a313588337
ompi: Add UCX PML.
2015-10-20 19:46:06 +03:00