openmpi

Автор	SHA1	Сообщение	Дата
Joseph Schuchart	7b1beb0f6c	Harmonize return values of progress callbacks Signed-off-by: Joseph Schuchart <schuchart@hlrs.de> (cherry picked from commit `2c97187ee0`)	2020-03-30 18:58:57 +02:00
Sergey Oblomov	2fa112c0a6	UCX: added PPN hint for UCX context - added PPN hint for UCX context init Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `43186e494b`) Conflicts: opal/mca/common/ucx/common_ucx_wpool.c	2019-08-09 11:51:30 +03:00
Nysal Jan K.A	b6da090090	pml/ucx: Fix the max tag and context id values Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com> (cherry picked from commit `fe4ef147f8`)	2019-07-03 16:38:07 +03:00
Howard Pritchard	6424857029	Merge pull request #6634 from jsquyres/pr/v4.0.x/ob1-fixes v4.0.x: Cherry pick ob1 fixes from master	2019-06-26 10:49:32 -06:00
Sergey Oblomov	1edd36638b	PML/UCX: disable PML UCX if MT is requested but not supported - in case if multithreading requested but not supported disable PML UCX Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `a3578d9ece`)	2019-05-20 09:59:59 +03:00
George Bosilca	48f824327c	Fix the leak of fragments for persistent sends. The rdma_frag attached to the send request was not correctly released upon request completion, leaking until MPI_Finalize. A quick solution would have been to add RDMA_FRAG_RETURN at different locations on the send request completion, but it would have unnecessarily made the sendreq completion path more complex. Instead, I added the length to the RDMA fragment so that it can be completed during the remote ack. Be more explicit on the comment. The rdma_frag can only be freed once when the peer forced a protocol change (from RDMA GET to send/recv). Otherwise the fragment will be returned once all data pertaining to it has been trasnferred. NOTE: Had to add a typedef for "opal_atomic_size_t" from master into opal/threads/thread_usage.h into this cherry pick (it is in opal/include/opal_stdatomic.h on master, but that file does not exist here on the v4.0.x branch). Signed-off-by: George Bosilca <bosilca@icl.utk.edu> (cherry picked from commit `a16cf0e4dd`) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-05-03 06:20:02 -07:00
Brelle Emmanuel	c44821aef5	pml/ob1: fixed local handle sent during PUT control message In case of using a btl_put in ob1, the handle of the locally registered memory is sent with a PUT control message. In the current master code the sent handle is necessary the handle in the frag but if the handle has been successfully registered in the request, the frag structure does not have any valid handle and all fragments use the request one. I suggest to check if the handle in the fragment is valid and if not to send the handle from the request. Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net> (cherry picked from commit `e630046a4b`)	2019-05-03 05:53:35 -07:00
Brelle Emmanuel	2a4bc0cb58	pml/ob1: fixed exit from get_frag_fail when falling back on btl_put In the case the btl_get fails Ob1 tries to fallback on btl_put first but the return code was ignored. So the code fell back on both btl_put and btl_send. Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net> (cherry picked from commit `9c689f2225`)	2019-04-22 14:25:34 -07:00
Thananon Patinyasakdikul	5999fdad5a	pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE. We missed an assert to check if ALLOW_OVERTAKE is set or not before validating the sequence number and this will cause deadlock. Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu> (cherry picked from commit `0263456cf4`)	2019-04-09 11:24:24 -07:00
Sergey Oblomov	14c271f993	PML/SPML/UCX: added evaluation of mmap events - there was a set of UCX related issues reported which caused by mmap API hooks conflicts. We added diagnostic of such problems to simplify bug-resolving pipeline Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `d8e3562bae`)	2019-03-14 16:48:25 +02:00
Mikhail Brinskii	1c514948f6	PML/UCX: Use net worker address for remote peers For remote node peers pack smaller worker address, which contains network device addresses only. This would reduce amount of OOB traffic during startup. Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com> (cherry picked from commit `751d88192d`)	2019-02-21 16:58:20 +02:00
Yossi Itigin	a112d10c93	pml_ucx: initialize req_mpi_object.comm for error handler without this fix, an error handler invoked on pml_ucx request would segfault while trying to dereference requests[i]->req_mpi_object.comm (picked from master `f36eeef`) Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-11-26 11:57:34 +02:00
Sergey Oblomov	0846c9d112	COMMON/UCX: added error code to log output Also fixes a PGI compilation error with --enable-debug. Signed-off-by: Geoff Paulsen <gpaulsen@users.noreply.github.com> Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `1099d5f023`)	2018-10-30 09:55:25 -05:00
Howard Pritchard	f9d2f3b912	Merge pull request #5941 from hppritcha/topic/remove_bfo_pml_v4.0.x v4.0.x: remove the bfo pml	2018-10-22 09:50:05 -06:00
Howard Pritchard	a806d09450	remove the bfo pml Signed-off-by: Howard Pritchard <howardp@lanl.gov> (cherry picked from commit `7d6774acf8`)	2018-10-17 14:00:11 -06:00
Yossi Itigin	4a97d6b9fa	pml_ucx: fix return code from mca_pml_ucx_init() (picked from master `40ac9e4`) Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-10 20:23:49 +03:00
Yossi Itigin	1bffd196ef	pml_ucx: add ompi datatype attribute to release ucp_datatype (picked from master `4763822`) Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-10 20:23:26 +03:00
Jeff Squyres	2e37f97a38	Miscellaneous compiler warning stomps. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit `fe0852bcb4`)	2018-09-21 14:35:51 -05:00
Sergey Oblomov	3cace87749	MCA/COMMON/UCX: del_procs calls are unified to common module Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `920cc2e0d9`)	2018-09-19 10:47:27 +03:00
Gilles Gouaillardet	4bd5c538a2	pml/ob1: plug a memory leak in mca_pml_ob1_component_fini() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (back-ported from commit open-mpi/ompi@fed33c1530)	2018-09-10 09:21:12 +09:00
Geoff Paulsen	3282c61048	Merge pull request #5625 from hoopoepg/topic/optimize-blocked-calls-v4.0 PML/UCX: blocked calls optimizations - v4.0	2018-08-31 14:11:11 -05:00
Sergey Oblomov	028bcb8a73	MCA/COMMON/UCX: added synonim to opal_mem_hook variable - added synonim to common ucx variables to allow to print it in opal_info -a Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `e00f7a68ba`)	2018-08-29 15:17:00 +03:00
Sergey Oblomov	9215eb9a3b	PML/UCX: blocked calls optimizations - refactoring of opal/UCX progress calls - added UCX progress priority Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `b0f87f2235`)	2018-08-29 14:38:22 +03:00
Boris Karasev	8873d901e8	pmix: added check for pmix fence status Signed-off-by: Boris Karasev <karasev.b@gmail.com> (cherry picked from commit `57683366ca`) Conflicts: opal/mca/common/ucx/common_ucx.c opal/mca/common/ucx/common_ucx.h Modified: ompi/mca/pml/ucx/pml_ucx.c oshmem/mca/spml/ucx/spml_ucx.c	2018-08-17 21:33:50 +06:00
Sergey Oblomov	b64502977a	PML/SPML/UCX: init global objects using C99 style - to avoid value mix used C99 style of object initializations Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `2806504290`)	2018-07-28 16:47:43 +03:00
Sergey Oblomov	af0e7b190e	PML/UCX: fixed ucp request free on persistent request completion - in sine cases persistent request was deleted during completion callback, this cause double free of linked UCX request (assert in debug build or hang in release build) - UCX request is freed prior completion callback Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `6fe0a73861`)	2018-07-20 22:20:14 +03:00
Sergey Oblomov	1c7ae22dfb	MCA/COMMON/UCX: shift opal memhooks into common UCX Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-17 13:46:38 +03:00
KAWASHIMA Takahiro	0021616984	pml/ob1: Fix data corruption of MPI_BSEND Data transferred by `MPI_BSEND` may corrupt if all of the following conditions are met. - The message size is less than the eager limit. - The `btl_alloc` function in the BTL interface returns `NULL` for some reason. - The MPI program overwrites the send buffer after `MPI_BSEND` returns. The problem is in the way of pending a send request in ob1 PML. The `mca_pml_ob1_send_request_start_copy` function retruns `OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns `des = NULL`. In this case, the send request is added to the `send_pending` list and `MPI_BSEND` returns immediately. Next time the `mca_pml_ob1_send_request_start_copy` function tries sending, the user buffer may have been overwritten by the MPI program. Call hierarchy of `MPI_BSEND`: ``` MPI_Bsend mca_pml_ob1_send if (MCA_PML_BASE_SEND_BUFFERED == sendmode) mca_pml_ob1_isend MCA_PML_OB1_SEND_REQUEST_START_W_SEQ mca_pml_ob1_send_request_start_seq mca_pml_ob1_send_request_start_btl if (size <= eager_limit) if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED) mca_pml_ob1_send_request_start_copy mca_bml_base_alloc btl_alloc if (OMPI_ERR_OUT_OF_RESOURCE == rc) add_request_to_send_pending ompi_request_free ``` To solve this problem, we should save the data to the buffer attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`. This problem was introduced by ob1 optimization (commits `2b57f422` and `a06e491c`) in v1.8 series. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-07-12 14:30:58 +09:00
Sergey Oblomov	240670152e	MCA/COMMON/UCX: code beautify - alignment Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-06 19:40:58 +03:00
Sergey Oblomov	bef47b792c	MCA/COMMON/UCX: unified logging across all UCX modules - added common logging infrastructure for all UCX modules - all UCX modules are switched to new infra Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-05 16:25:39 +03:00
Sergey Oblomov	8080283b3d	MCA/COMMON/UCX: changed return type for wait_request - for now wait_request returns OMPI status - updated callers Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-04 23:29:38 +03:00
Sergey Oblomov	c2bd6af9f2	MCA/COMMON/UCX: minor unification of del_proces calls - some common functionality of del_procs calls is moved into mca_common module - blocking ucp_put call is replaced by non-blocking routine Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-02 15:10:53 +03:00
Sergey Oblomov	074f30ba27	PML/UCX: suppressed compilation warning Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-27 12:05:07 +03:00
Sergey Oblomov	502d04bf12	UCX/PML/SPML: fixed few coverity issues - fixed incorrect pointer manipulation/free - cleaned dead code - minor optimization on process delete routine - fixed error handling - free pointers - added debug output for woker flush failure Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-26 18:52:39 +03:00
Yossi Itigin	ee873f4f79	Merge pull request #5322 from hoopoepg/topic/mca-ucx-common MCA/UCX: added common module	2018-06-26 13:54:12 +03:00
Sergey Oblomov	d57ae62dee	MCA/UCX: added common module - implemented non-blocking routines for flush operations Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-22 16:41:09 +03:00
Gilles Gouaillardet	edd02b7144	pml/ucx: silence a warning declare 'fenced' volatile in order to silence CID 1437465 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-22 13:11:42 +09:00
Sergey Oblomov	5f03628560	PML/UCX: removed uneeded flush Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-21 12:40:46 +03:00
Sergey Oblomov	2745da7dcc	PML/UCX: use non-blocking fence instead of async progress Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-21 09:46:03 +03:00
Sergey Oblomov	10f2d831ec	PML/UCX: fixed hang on MPI_Finalize - added async UCX progress thread to allow pending requests to complete Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-20 16:12:05 +03:00
Yossi Itigin	564f80d362	pml_ucx: add option to use opal memhooks instead of ucx internal hooks Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-06-17 15:30:44 +03:00
Thananon Patinyasakdikul	390d72addd	Merge pull request #4885 from davideberius/spc_pr Initial Software-based Performance Counters PR	2018-06-12 14:04:49 -07:00
David Eberius	d377a6b6f4	Added Software-based Performance Counters driver code along with several counters. This code is the implementation of Software-base Performance Counters as described in the paper 'Using Software-Base Performance Counters to Expose Low-Level Open MPI Performance Information' in EuroMPI/USA '17 (http://icl.cs.utk.edu/news_pub/submissions/software-performance-counters.pdf). More practical usage information can be found here: https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI. All software events functions are put in macros that become no-ops when SOFTWARE_EVENTS_ENABLE is not defined. The internal timer units have been changed to cycles to avoid division operations which was a large source of overhead as discussed in the paper. Added a --with-spc configure option to enable SPCs in the Open MPI build. This defines SOFTWARE_EVENTS_ENABLE. Added an MCA parameter, mpi_spc_enable, for turning on specific counters. Added an MCA parameter, mpi_spc_dump_enabled, for turning on and off dumping SPC counters in MPI_Finalize. Added an SPC test and example. Signed-off-by: David Eberius <deberius@vols.utk.edu>	2018-06-11 22:48:16 -04:00
Yossi Itigin	fd12540751	Merge pull request #5227 from hoopoepg/topic/pml-ucx-hang-on-finalize PML/UCX: fixed hand on MPI_Finalize	2018-06-08 13:19:49 +03:00
Sergey Oblomov	0a8261f3b0	PML/UCX: fixed hand on MPI_Finalize fixes issue https://github.com/openucx/ucx/issues/2656 added flush for worker object to complete all pending operations Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-05 17:22:03 +03:00
Jeff Squyres	35438ae9b5	mpi/finalized: revamp INITIALIZED/FINALIZED Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be invoked during an attribute destruction callback (e.g., during the destruction of keyvals on MPI_COMM_SELF during the very beginning of MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false". Prior to this commit, we hung in FINALIZED if it were invoked during a COMM_SELF attribute destruction callback in FINALIZE. See https://github.com/open-mpi/ompi/issues/5084. This commit converts the MPI_INITIALIZED / MPI_FINALIZED infrastructure to use a single enum (ompi_mpi_state, set atomically) to represent the state of MPI: - not initialized - init started - init completed - finalize started - finalize past COMM_SELF destruction - finalize completed The "finalize past COMM_SELF destruction" state is what allows us to return "false" from MPI_FINALIZED before COMM_SELF has been fully destroyed / all attribute callbacks have been invoked. Since this state is checked at nearly every MPI API call (to see if we're outside of the INIT/FINALIZE epoch), care was taken to use atomics to set the ompi_mpi_state value in ompi_mpi_init() and ompi_mpi_finalize(), but performance-critical code paths can simply read the variable without needing to use a slow call to an opal_atomic_*() function. Thanks to @AndrewGaspar for reporting the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-01 13:36:29 -07:00
Sergey Oblomov	5ec26914a6	PML/UCX: do not set offset on ordered data recv Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-21 19:40:07 +03:00
Sergey Oblomov	19607daa32	PML/UCX: create convertor clone instead of stack reset Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-17 16:39:13 +03:00
Sergey Oblomov	7c5de01c57	PML/UCX: reset converter stack on unordered messages Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-17 13:11:02 +03:00
bosilca	2ab628b92e	Merge pull request #5074 from bosilca/topic/remove_warnings Remove warnings identified by clang.	2018-05-15 11:15:23 -04:00

1 2 3 4 5 ...

1374 Коммитов