openmpi

Автор	SHA1	Сообщение	Дата
KAWASHIMA Takahiro	c87a3df0c9	Merge pull request #5416 from kawashima-fj/pr/coll-libnbc-suppress-warnings coll/libnbc: Suppress compiler warnings	2018-07-12 15:45:59 +09:00
KAWASHIMA Takahiro	37a05e74aa	coll/libnbc: Suppress compiler warnings Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-07-12 14:42:39 +09:00
KAWASHIMA Takahiro	0021616984	pml/ob1: Fix data corruption of MPI_BSEND Data transferred by `MPI_BSEND` may corrupt if all of the following conditions are met. - The message size is less than the eager limit. - The `btl_alloc` function in the BTL interface returns `NULL` for some reason. - The MPI program overwrites the send buffer after `MPI_BSEND` returns. The problem is in the way of pending a send request in ob1 PML. The `mca_pml_ob1_send_request_start_copy` function retruns `OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns `des = NULL`. In this case, the send request is added to the `send_pending` list and `MPI_BSEND` returns immediately. Next time the `mca_pml_ob1_send_request_start_copy` function tries sending, the user buffer may have been overwritten by the MPI program. Call hierarchy of `MPI_BSEND`: ``` MPI_Bsend mca_pml_ob1_send if (MCA_PML_BASE_SEND_BUFFERED == sendmode) mca_pml_ob1_isend MCA_PML_OB1_SEND_REQUEST_START_W_SEQ mca_pml_ob1_send_request_start_seq mca_pml_ob1_send_request_start_btl if (size <= eager_limit) if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED) mca_pml_ob1_send_request_start_copy mca_bml_base_alloc btl_alloc if (OMPI_ERR_OUT_OF_RESOURCE == rc) add_request_to_send_pending ompi_request_free ``` To solve this problem, we should save the data to the buffer attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`. This problem was introduced by ob1 optimization (commits 2b57f422 and a06e491c) in v1.8 series. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-07-12 14:30:58 +09:00
Howard Pritchard	34bc77747c	Merge pull request #5388 from mkurnosov/base-gather-bmtree-fix-mpi-in-place coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing	2018-07-11 18:34:35 -05:00
Nathan Hjelm	35a75a6bf5	osc/sm: avoid filename collision when multiple windows share same CID This commit fixes an issue identified by MTT where we can have two different sets of processes on the same node creating a shared memory window with communicators sharing the same CID. To avoid this issue the temporary filename now includes the creating processes vpid. References #5363 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-07-11 14:32:27 -06:00
Nathan Hjelm	037656bc1d	osc/rdma: fix bug introduced in b90c838 This commit fixes an bug that was introduced back in 2016 which impacts request-based RMA in some cases. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-07-10 18:17:55 -06:00
Gilles Gouaillardet	76292951e5	coll/libnbc: fix integer overflow Use internal pack/unpack subroutines that operate on MPI_Aint instead of int and hence solve some integer overflows. Thanks Clyde Stanfield for reporting this issue. Refs open-mpi/ompi#5383 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-09 10:08:33 -06:00
Mikhail Kurnosov	22fa5a8a67	coll/base/scatter: replaces right skewed binomial tree (in order) with left skewed binomial tree Current implementation of `coll/base/MPI_Scatter` is based on in-order binomial tree. This tree is right skewed and it provides good performance for a MPI_Gather operation. But for a MPI_Scatter operation left skewed binomial tree is effective. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-07-09 10:04:41 -06:00
Yossi Itigin	e77e31b50b	Merge pull request #5378 from hoopoepg/topic/unify-ucx-logging MCA/COMMON/UCX: unified logging across all UCX modules	2018-07-08 12:45:26 +03:00
Mikhail Kurnosov	b9e14cd7d0	coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing The call of MPI_Gather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault in the root process. The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard (page 150, line 37), sendtype and sendcount parameters should be ignored in this case. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-07-07 20:59:39 +07:00
Sergey Oblomov	240670152e	MCA/COMMON/UCX: code beautify - alignment Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-06 19:40:58 +03:00
Sergey Oblomov	eb7010933d	OSC/UCX: suppressed compilation warnings - suppressed sing/unsign-compare warnings Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-06 10:58:09 +03:00
Sergey Oblomov	bef47b792c	MCA/COMMON/UCX: unified logging across all UCX modules - added common logging infrastructure for all UCX modules - all UCX modules are switched to new infra Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-05 16:25:39 +03:00
Sergey Oblomov	8080283b3d	MCA/COMMON/UCX: changed return type for wait_request - for now wait_request returns OMPI status - updated callers Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-04 23:29:38 +03:00
Sergey Oblomov	c2bd6af9f2	MCA/COMMON/UCX: minor unification of del_proces calls - some common functionality of del_procs calls is moved into mca_common module - blocking ucp_put call is replaced by non-blocking routine Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-02 15:10:53 +03:00
Yossi Itigin	09c10d5e09	Merge pull request #5345 from hoopoepg/topic/pml-ucx-suppress-compiler-warning PML/UCX: suppressed compilation warning	2018-07-02 13:41:12 +03:00
Edgar Gabriel	d191ed6b4f	fs/base: move redundant code to fs/base moving some code from fs/ufs into fs/base. The benefit of this approach is that fs components that are fundamentally based on posix I/O (and only differ in some non-posix functionality such as setting stripe size, or which hints are being supported) can avoid having to replicate the same code over and over again. First beneficiary is the lustre fs component, but more are to follow soon. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-07-01 10:20:32 -05:00
Xin Zhao	c1ac0c00c5	Merge pull request #5185 from jjolly/fix-memcpy-size-mismatch - Build warning: stringop-overflow in get_dynamic_win_info() at osc_ucx_comm.c	2018-06-29 19:37:53 -07:00
Sergey Oblomov	074f30ba27	PML/UCX: suppressed compilation warning Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-27 12:05:07 +03:00
Yossi Itigin	aca61a6bfb	Merge pull request #5238 from hoopoepg/topic/fixed-coverity-issues-ucx-pml UCX/PML: fixed few coverity issues	2018-06-27 11:14:06 +03:00
Nathan Hjelm	4c230683e7	osc/sm: fix a typo This commit fixes a typo where a bcast is used instead of the intended collective (barrier). References #5262 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-06-26 12:53:12 -06:00
Sergey Oblomov	502d04bf12	UCX/PML/SPML: fixed few coverity issues - fixed incorrect pointer manipulation/free - cleaned dead code - minor optimization on process delete routine - fixed error handling - free pointers - added debug output for woker flush failure Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-26 18:52:39 +03:00
Yossi Itigin	ee873f4f79	Merge pull request #5322 from hoopoepg/topic/mca-ucx-common MCA/UCX: added common module	2018-06-26 13:54:12 +03:00
Gilles Gouaillardet	e609cf7bc3	Merge pull request #5337 from ggouaillardet/topic/generalized_requests ompi/requests: implement generalized request extensions	2018-06-26 13:01:04 +09:00
KAWASHIMA Takahiro	a8da78eeaa	Merge pull request #4618 from ggouaillardet/topic/pcoll Add the persistent collectives feature	2018-06-26 12:36:34 +09:00
Gilles Gouaillardet	5c394377d0	io/romio312: use Grequest extensions provided by Open MPI Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-26 10:52:18 +09:00
Gilles Gouaillardet	f72922b8b1	io/romio321: do not use removed MPI1 primitives Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-26 10:52:18 +09:00
Gilles Gouaillardet	1e5404873f	io/romio321: update .gitignore and remove two files that should have never been commited Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-26 10:52:17 +09:00
Joshua Ladd	256ad707f1	Merge pull request #5293 from yosefe/topic/osc-ucx-on-demand-progress osc_ucx: register progress on-demand	2018-06-25 15:09:11 -04:00
Joshua Ladd	98afc838aa	Merge pull request #5294 from yosefe/topic/coll-hcoll-progress-fn coll_hcoll: register progress callback directly without a proxy	2018-06-25 15:07:26 -04:00
Nathan Hjelm	e4989714c2	osc/rdma: fix data race on teardown The osc/rdma module did not wait for all pending atomics to complete before tearing down. This could lead to weird issues as the target location may no longer be registered or allocated. This commit also fixes an offset calculation issue in ompi_osc_get_data_blocking (). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-06-25 11:47:34 -06:00
Ralph Castain	3b2390e5d5	Silence coverity warnings, remove/ignore build product Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-25 08:01:28 -07:00
Sergey Oblomov	bf7fd480e9	MCA/COMMON/UCX: added non-blocking implementations of atomics - added implementation of swap/cswap/fadd operations - blocking add64 is replaced by non-blocking routine Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-25 12:25:31 +03:00
Sergey Oblomov	63e7ba6843	MCA/COMMON/UCX: added parameter for UCX/opal progress - added parameter to set UCX/opal progresses - minor refactoring of request wait routines Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-25 11:00:12 +03:00
Yossi Itigin	e3ee11608b	coll_hcoll: register progress callback directly without a proxy Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-06-24 18:06:07 +03:00
Edgar Gabriel	edfdcb6e82	Merge pull request #5324 from edgargabriel/pr/minor-fixes Pr/minor fixes	2018-06-22 17:20:02 -05:00
Howard Pritchard	8babaad35c	Merge pull request #4520 from ggouaillardet/refresh/romio321 io/romio321: refresh ROMIO based on latest stable MPICH 3.2.1	2018-06-22 16:58:46 -05:00
Edgar Gabriel	cf5cdad40f	fcoll: make vulcan the default component make vulcan the default component except for Lustre file systems. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-22 14:12:02 -05:00
Edgar Gabriel	fd8c5fba4e	common/ompio: fix the fview based grouping options a bug sneaked into constructing the list of aggregators processes when using the fileview based grouping options Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-22 14:01:31 -05:00
Sergey Oblomov	d57ae62dee	MCA/UCX: added common module - implemented non-blocking routines for flush operations Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-22 16:41:09 +03:00
Gilles Gouaillardet	edd02b7144	pml/ucx: silence a warning declare 'fenced' volatile in order to silence CID 1437465 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-22 13:11:42 +09:00
Edgar Gabriel	743e0dff5a	common/ompio: fix zero size fview issue handle the situation where the user requests a non-zero amount of data but has a zero-size fileview. My instrinct would have been to return an error code, but according to the test that I used it should be MPI_SUCCESS and zero bytes. It is definitely better than segfaulting :-) THis makes another test from the IBM testsuite pass. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 17:02:13 -05:00
Edgar Gabriel	7643ccfbcf	sharedfp/sm and sharedfp/lockedfile: fix seek offset calculation the seek offset calculation did not treat the offset as a multiple of the etype provided. Fixing this makes some more ibm tests pass. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 14:26:36 -05:00
Mikhail Kurnosov	c500739293	coll/base: Add MPI_Bcast based on a scatter followed by an allgather Implements MPI_Bcast using a binomial tree scatter followed by an recursive doubling allgather. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-06-21 11:47:07 -06:00
Edgar Gabriel	fb16d40775	Merge pull request #5196 from edgargabriel/topic/cuda io/ompio: introduce initial support for cuda buffers in ompio	2018-06-21 10:14:43 -05:00
Edgar Gabriel	7808379a47	common/ompio: incorporate George's comments incorporate a couple of comments by George as part of the review on github. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:29:49 -05:00
Edgar Gabriel	3c10ed4ed1	common/ompio: use allocator to manage temporary buffers use an allocator to manage temporary buffers when copying unmanaged data from GPU buffer to host. This is necessary, since the buffers have to be pinned for better performance, which is an expensive operation. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Edgar Gabriel	ac79e576ef	fcoll/base: do not use the two_phase compoment with CUDA support the two_phase compoment does not work with some collective I/O operations on CUDA buffers due to the data sieving (i.e. both read and write operations) executed on some buffers, which are not anticipated in the GPU buffer management of the code. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Edgar Gabriel	6a532101aa	io/ompio and common/ompio: add initial support for cuda buffers in ompio this commit adds the initial support for cuda buffers in ompio, for blocking and non-blocking individual read and write operations. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Yossi Itigin	db26c08336	Merge pull request #5307 from hoopoepg/topic/async-progress-on-mpi-fin PML/UCX: fixed hang on MPI_Finalize	2018-06-21 13:44:14 +03:00

1 2 3 4 5 ...

6692 Коммитов