openmpi

Автор	SHA1	Сообщение	Дата
Nathan Hjelm	c294bbc352	Merge pull request #5508 from hjelmn/fuzzy_match Bring fuzzy matching support into master	2018-08-06 13:52:04 -06:00
Nathan Hjelm	eeae3f9b93	Merge pull request #5517 from bosilca/topic/treematch_warnings Remove few warnings identified by @rhc in #5514.	2018-08-06 13:25:07 -06:00
Matthew Dosanjh	c8d13486cc	Fixed promotion bug Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-08-06 12:56:36 -06:00
Boris Karasev	57683366ca	pmix: added check for pmix fence status Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2018-08-06 15:01:57 +06:00
George Bosilca	6d11a45f44	Remove few warnings identified by @rhc in #5514 . Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-08-03 16:21:06 -04:00
Nathan Hjelm	dd74c6252f	pml/ob1: custom matching cleanup and configury This commit updates the new custom matching code in pml/ob1 so it can not be enabled with a configure option. This commit also renames the fuzzy-matching headers to avoid potential name conflicts and removes the use of C reserved identifiers. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-08-02 13:06:19 -06:00
Matthew Dosanjh	572694b621	Adding custom match source. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-08-02 12:23:08 -06:00
Ralph Castain	1aef0a64aa	Merge pull request #5477 from nrspruit/ns_mtl_send_isend MTL OFI: send/isend split into blocking/non-blocking paths	2018-07-31 13:08:37 -07:00
Ralph Castain	8744320a18	Merge pull request #5476 from nrspruit/ns_cancel_fix MTL OFI: Fix Deadlock in fi_cancel given completion during cancel	2018-07-31 13:07:41 -07:00
Sergey Oblomov	d204b8a678	PML/SPML/UCX/COMPONENT: applied C99 initialization Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-28 09:44:03 +03:00
Sergey Oblomov	2806504290	PML/SPML/UCX: init global objects using C99 style - to avoid value mix used C99 style of object initializations Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-25 14:52:45 +03:00
Spruit, Neil R	7dc8c8ba3f	MTL OFI: send/isend split into blocking/non-blocking paths -Updated blocking send to directly call functionality and set completion events expected to 0 initally. This allows for optimization for providers that support fi_tinject up to larger sizes. This also reduces latency on running the OFI mtl with smaller sizes without requiring calls to progress given fi_tinject is required to complete the messaging before returning and will not create any events in the Completion Queue. -Updated non-blocking send to directly call fi_tsend and avoid calling fi_tinject as the functionality should not wait on completions. This resolves a bug where applications calling MPI_Isend can overrun the TX buffer with small (inject) messages causing a deadlock. In addition this improves performance in message rates by preventing waiting on any size message to complete in non-blocking send messages. -Created common ompi_mtl_ofi_ssend_recv function to post the ssend recv which is common between isend and send code paths. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-24 07:54:24 -07:00
Spruit, Neil R	767135c580	MTL OFI: Fix Deadlock in fi_cancel given completion during cancel - If a message for a recv that is being cancelled gets completed after the call to fi_cancel, then the OFI mtl will enter a deadlock state waiting for ofi_req->super.ompi_req->req_status._cancelled which will never happen since the recv was successfully finished. - To resolve this issue, the OFI mtl now checks ofi_req->req_started to see if the request has been started within the loop waiting for the event to be cancelled. If the request is being completed, then the loop is broken and fi_cancel exits setting ofi_req->super.ompi_req->req_status._cancelled = false; Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-24 03:12:44 -07:00
Matias Cabral	d996f529c0	MTL OFI: Add support for mem_tag_format OFI providers may reserve some of the upper bits of the tag for internal usage and expose it using mem_tag_format. Check for that and adjust communicator bits as needed. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-07-23 11:39:40 -07:00
Matias Cabral	30fb635836	Merge pull request #5446 from nrspruit/ns_mtl_ofi_overflow MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow	2018-07-20 14:53:53 -07:00
Sergey Oblomov	6fe0a73861	PML/UCX: fixed ucp request free on persistent request completion - in sine cases persistent request was deleted during completion callback, this cause double free of linked UCX request (assert in debug build or hang in release build) - UCX request is freed prior completion calback Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-20 19:32:20 +03:00
Yossi Itigin	bdb6ece3dd	Merge pull request #5452 from hoopoepg/topic/osc-ucx-fox-hang OSC/UCX: fixed hang on OSC init	2018-07-19 13:57:51 +03:00
Sergey Oblomov	fa33e322e7	OSC/UCX: code deduplication Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-19 12:39:15 +03:00
Sergey Oblomov	6f0a7a2005	OSC/UCX: opal progress register/unregister optimization Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-19 12:07:26 +03:00
Yossi Itigin	29812494f2	Merge pull request #5402 from hoopoepg/topic/common-del-procs MCA/COMMON/UCX: del_procs calls are unified to common module	2018-07-19 11:19:45 +03:00
Sergey Oblomov	55b934bacf	OSC/UCX: enable progress when at least one window is allocated Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-18 17:52:30 +03:00
Sergey Oblomov	a081fba046	OSC/UCX: fixed hang on OSC init - there worked progress was missed on startup which caused hang on one of ranks Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-18 17:01:53 +03:00
Edgar Gabriel	b6b9552ca9	Merge pull request #5444 from gbossu/fix-file-delete io/ompio: Call component-specific file_delete function instead of POSIX unlink	2018-07-18 08:45:57 -05:00
Sergey Oblomov	920cc2e0d9	MCA/COMMON/UCX: del_procs calls are unified to common module Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-18 07:37:25 +03:00
Joshua Ladd	3add13c72e	Merge pull request #5441 from hoopoepg/topic/ucx-memhooks-to-common-module MCA/COMMON/UCX: shift opal memhooks into common UCX	2018-07-17 15:52:44 -04:00
Matias Cabral	be3cb01cb4	Merge pull request #5397 from nrspruit/ns_ofi_mtl_ssend MTL OFI: Redesign sync send with reduced tag bits and quick ack	2018-07-17 10:14:33 -07:00
Gaëtan Bossu	8522ba112c	MCA/IO/OMPIO: fix MPI_File_delete implementation. OMPIO now uses the correct delete function depending on the fs mca_common_ompio_file_delete now works this way instead of calling POSIX unlink: - create a minimal file handle with the given file name - select the best fs component using this file handle - call the component-specific file delete function Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>	2018-07-17 18:17:13 +02:00
Gaëtan Bossu	ac6f75e3d1	MCA/FS: check communicator validity in query functions It is needed because the fs components might be queried due to a MPI_File_delete call. And in this case, we don't have a communicator value. Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>	2018-07-17 18:16:21 +02:00
Josh Hursey	9aa5168795	Merge pull request #5353 from ggouaillardet/topic/romio321_grequests io/romio321: make grequest extensions internal	2018-07-17 10:53:53 -05:00
Gilles Gouaillardet	1a41482720	coll/libnbc: do not recursively call opal_progress() instead of invoking ompi_request_test_all(), that will end up calling opal_progress() recursively, manually check the status of the requests. the same method is used in ompi_comm_request_progress() Refs open-mpi/ompi#3901 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-17 09:45:08 -06:00
Sergey Oblomov	1c7ae22dfb	MCA/COMMON/UCX: shift opal memhooks into common UCX Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-17 13:46:38 +03:00
Spruit, Neil R	d4f408a7f8	MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow - Added support in MTL_OFI_RETRY_UNTIL_DONE to handle -FI_EAGAIN from the provider and correctly attempt to progress the OFI Completion queue by calling ompi_mtl_ofi_progress. - If events were pending that blocked OFI operations from being enqueued they will be completed and the OFI operation will be retried once ompi_mtl_ofi_progress has successfully completed. - Updated MTL_OFI_RETRY_UNTIL_DONE to take a RETURN variable instead of requiring the existance of a "ret" variable to pass back the return value from completing the OFI operation. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-17 03:00:38 -07:00
Mikhail Kurnosov	ba83cc91eb	coll/base: add MPI_Bcast based on a binomial tree scatter followed by a ring allgather Implements MPI_Bcast using a binomial tree scatter followed by a ring allgather. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-07-16 08:56:09 -06:00
Joshua Ladd	b12868239c	Merge pull request #4765 from xinzhao3/topic/osc-ucx-mem-hook OMPI/OSC/UCX: move memory hooks init in osc to win creation.	2018-07-13 09:36:20 -04:00
Xin Zhao	74ef51af1b	OMPI/OSC/UCX: move memory hooks init in osc to win creation. Move memory hooks init (for request based operation) in osc ucx to window creation time, to avoid performance issue in MPI initialization. Signed-off-by: Xin Zhao <xinz@mellanox.com>	2018-07-12 15:03:02 -07:00
Nathan Hjelm	304a6a52d4	osc/rdma: use local base for local process when possible This commit fixes a crash that occurs when using btl/vader as an RDMA btl. This btl supports using CPU atomics and does not support using the btl for self communication so we must use the local memory optimizations in osc/rdma. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-07-12 15:50:50 -06:00
KAWASHIMA Takahiro	c87a3df0c9	Merge pull request #5416 from kawashima-fj/pr/coll-libnbc-suppress-warnings coll/libnbc: Suppress compiler warnings	2018-07-12 15:45:59 +09:00
KAWASHIMA Takahiro	37a05e74aa	coll/libnbc: Suppress compiler warnings Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-07-12 14:42:39 +09:00
KAWASHIMA Takahiro	0021616984	pml/ob1: Fix data corruption of MPI_BSEND Data transferred by `MPI_BSEND` may corrupt if all of the following conditions are met. - The message size is less than the eager limit. - The `btl_alloc` function in the BTL interface returns `NULL` for some reason. - The MPI program overwrites the send buffer after `MPI_BSEND` returns. The problem is in the way of pending a send request in ob1 PML. The `mca_pml_ob1_send_request_start_copy` function retruns `OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns `des = NULL`. In this case, the send request is added to the `send_pending` list and `MPI_BSEND` returns immediately. Next time the `mca_pml_ob1_send_request_start_copy` function tries sending, the user buffer may have been overwritten by the MPI program. Call hierarchy of `MPI_BSEND`: ``` MPI_Bsend mca_pml_ob1_send if (MCA_PML_BASE_SEND_BUFFERED == sendmode) mca_pml_ob1_isend MCA_PML_OB1_SEND_REQUEST_START_W_SEQ mca_pml_ob1_send_request_start_seq mca_pml_ob1_send_request_start_btl if (size <= eager_limit) if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED) mca_pml_ob1_send_request_start_copy mca_bml_base_alloc btl_alloc if (OMPI_ERR_OUT_OF_RESOURCE == rc) add_request_to_send_pending ompi_request_free ``` To solve this problem, we should save the data to the buffer attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`. This problem was introduced by ob1 optimization (commits `2b57f422` and `a06e491c`) in v1.8 series. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-07-12 14:30:58 +09:00
Howard Pritchard	34bc77747c	Merge pull request #5388 from mkurnosov/base-gather-bmtree-fix-mpi-in-place coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing	2018-07-11 18:34:35 -05:00
Nathan Hjelm	35a75a6bf5	osc/sm: avoid filename collision when multiple windows share same CID This commit fixes an issue identified by MTT where we can have two different sets of processes on the same node creating a shared memory window with communicators sharing the same CID. To avoid this issue the temporary filename now includes the creating processes vpid. References #5363 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-07-11 14:32:27 -06:00
Nathan Hjelm	037656bc1d	osc/rdma: fix bug introduced in `b90c838` This commit fixes an bug that was introduced back in 2016 which impacts request-based RMA in some cases. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-07-10 18:17:55 -06:00
Gilles Gouaillardet	76292951e5	coll/libnbc: fix integer overflow Use internal pack/unpack subroutines that operate on MPI_Aint instead of int and hence solve some integer overflows. Thanks Clyde Stanfield for reporting this issue. Refs open-mpi/ompi#5383 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-09 10:08:33 -06:00
Mikhail Kurnosov	22fa5a8a67	coll/base/scatter: replaces right skewed binomial tree (in order) with left skewed binomial tree Current implementation of `coll/base/MPI_Scatter` is based on in-order binomial tree. This tree is right skewed and it provides good performance for a MPI_Gather operation. But for a MPI_Scatter operation left skewed binomial tree is effective. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-07-09 10:04:41 -06:00
Spruit, Neil R	9a17864278	MTL OFI: Redesign sync send with reduced tag bits and quick ack -Updated the design for sync send MPI calls to use 2 protocol bits for denoting "sync_send" or "sync_send_ack". -"Sync_send" is added to the send tag only and is masked out in receives such that it can be read by the original Recv posted in the send/recv operation. -"Sync_send_ack" is sent from the recv callback to the send side. This 0 byte send does not generate a completion entry and instead sends the message and immediately completes the opal completion in the recv. -Tag formats ofi_tag_1 and ofi_tag_2 have been updated to include 2 more tag bits per format type due to the reduced protocal bits required by OMPI. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-09 06:50:21 -07:00
Yossi Itigin	e77e31b50b	Merge pull request #5378 from hoopoepg/topic/unify-ucx-logging MCA/COMMON/UCX: unified logging across all UCX modules	2018-07-08 12:45:26 +03:00
Mikhail Kurnosov	b9e14cd7d0	coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing The call of MPI_Gather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault in the root process. The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard (page 150, line 37), sendtype and sendcount parameters should be ignored in this case. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-07-07 20:59:39 +07:00
Sergey Oblomov	240670152e	MCA/COMMON/UCX: code beautify - alignment Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-06 19:40:58 +03:00
Sergey Oblomov	eb7010933d	OSC/UCX: suppressed compilation warnings - suppressed sing/unsign-compare warnings Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-06 10:58:09 +03:00
Sergey Oblomov	bef47b792c	MCA/COMMON/UCX: unified logging across all UCX modules - added common logging infrastructure for all UCX modules - all UCX modules are switched to new infra Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-05 16:25:39 +03:00

1 2 3 4 5 ...

6730 Коммитов