openmpi

Автор	SHA1	Сообщение	Дата
Sergey Oblomov	de8568c822	MCA/COMMON/UCX: enabled fallback into older UCX API Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-27 19:59:40 +03:00
Sergey Oblomov	1223b05811	MCA/COMMON/UCX: fixed build scripts - updated evaluation of UCX lib - used call from UCX v1.3 - updated makefile compilation flags Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-27 11:10:25 +03:00
Nathan Hjelm	4c230683e7	osc/sm: fix a typo This commit fixes a typo where a bcast is used instead of the intended collective (barrier). References #5262 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-06-26 12:53:12 -06:00
Nathan Hjelm	b0ac6276a6	btl/ugni: improve multi-threaded RDMA performance This commit improves the injection rate and latency for RDMA operations. This is done by the following improvements: - If C11's _Thread_local keyword is available then always use the same virtual device index for the same thread when using RDMA. If the keyword is not available then attempt to use any device that isn't already in use. The binding support is enabled by default but can be disabled via the btl_ugni_bind_devices MCA variable. - When posting FMA and RDMA operations always attempt to reap completions after posting the operation. This allows us to better balance the work of reaping completions across all application threads. - Limit the total number of outstanding BTE transactions. This fixes a performance bug when using many threads. - Split out RDMA and local SMSG completion queue sizes. The RDMA queue size is better tuned for performance with RMA-MT. - Split out put and get FMA limits. The old btl_ugni_fma_limit MCA variable is deprecated. The new variable names are: btl_ugni_fma_put_limit and btl_ugni_fma_get_limit. - Change how post descriptors are handled. They are no longer allocated seperately from the RDMA endpoints. - Some cleanup to move error code out of the critical path. - Disable the FMA sharing flag on the CDM when we detect that there should be enough FMA descriptors for the number of virtual devices we plan will create. If the user sets this flag we will not unset it. This change should improve the small-message RMA performance by ~ 10%. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-06-26 11:31:35 -06:00
Ralph Castain	0ddbc75ce5	Merge pull request #4930 from kizill/fix-ipv6 fixed ipv6 OOB connection problems (fix issue #1585)	2018-06-26 09:13:53 -07:00
Nathan Hjelm	abb87f9137	Merge pull request #5338 from ggouaillardet/topic/uct btl/uct: misc fixes	2018-06-26 08:56:40 -06:00
Yossi Itigin	ee873f4f79	Merge pull request #5322 from hoopoepg/topic/mca-ucx-common MCA/UCX: added common module	2018-06-26 13:54:12 +03:00
Gilles Gouaillardet	b40b835a70	btl/uct: remove debug code Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-26 16:03:16 +09:00
Gilles Gouaillardet	552d0809aa	btl/uct: add missing include file Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-26 14:53:02 +09:00
Gilles Gouaillardet	e609cf7bc3	Merge pull request #5337 from ggouaillardet/topic/generalized_requests ompi/requests: implement generalized request extensions	2018-06-26 13:01:04 +09:00
KAWASHIMA Takahiro	a8da78eeaa	Merge pull request #4618 from ggouaillardet/topic/pcoll Add the persistent collectives feature	2018-06-26 12:36:34 +09:00
Gilles Gouaillardet	5c394377d0	io/romio312: use Grequest extensions provided by Open MPI Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-26 10:52:18 +09:00
Gilles Gouaillardet	f72922b8b1	io/romio321: do not use removed MPI1 primitives Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-26 10:52:18 +09:00
Gilles Gouaillardet	383f23bf35	ompi/request: implement MPI Generalized request extensions so latest ROM-IO can be used with Open MPI. Note this first and naive implementation does not use the wait_fn callback. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-26 10:52:18 +09:00
Gilles Gouaillardet	1e5404873f	io/romio321: update .gitignore and remove two files that should have never been commited Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-26 10:52:17 +09:00
Nathan Hjelm	6c089518e7	btl/uct: make uct endpoints array a flexible array member Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-06-25 18:14:58 -06:00
Nathan Hjelm	c5c5b42307	btl: add a new btl for the UCT layer in OpenUCX This commit adds a new btl for one-sided and two-sided. This btl uses the uct layer in OpenUCX. This btl makes use of multiple uct contexts and per-thread device pinning to provide good performance when using threads and osc/rdma. This btl has been tested extensively with osc/rdma and passes all MTT tests on aries and IB hardware. For now this new component disables itself but can be enabled by setting the btl_ucx_transports MCA variable with a comma-delimited list of supported memory domains/transport layers. For example: --mca btl_uct_memory_domains ib/mlx5_0. The specific transports used can be selected using --mca btl_uct_transports. The default is to use any available transport. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-06-25 18:14:58 -06:00
Joshua Ladd	256ad707f1	Merge pull request #5293 from yosefe/topic/osc-ucx-on-demand-progress osc_ucx: register progress on-demand	2018-06-25 15:09:11 -04:00
Joshua Ladd	98afc838aa	Merge pull request #5294 from yosefe/topic/coll-hcoll-progress-fn coll_hcoll: register progress callback directly without a proxy	2018-06-25 15:07:26 -04:00
Nathan Hjelm	e4989714c2	osc/rdma: fix data race on teardown The osc/rdma module did not wait for all pending atomics to complete before tearing down. This could lead to weird issues as the target location may no longer be registered or allocated. This commit also fixes an offset calculation issue in ompi_osc_get_data_blocking (). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-06-25 11:47:34 -06:00
Nathan Hjelm	c9e58cedc1	mpi.h: fix warning with gcc Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-06-25 11:45:36 -06:00
Ralph Castain	0efd07623a	Merge pull request #5327 from rhc54/topic/cov Silence coverity warnings, remove/ignore build product	2018-06-25 08:51:27 -07:00
Ralph Castain	3b2390e5d5	Silence coverity warnings, remove/ignore build product Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-25 08:01:28 -07:00
Jeff Squyres	538528f659	Merge pull request #5326 from jsquyres/pr/tcp-btl-use-opal-hash-map-for-kindex btl/tcp: use a hash map for kernel IP interface indexes	2018-06-25 10:50:50 -04:00
Sergey Oblomov	bf7fd480e9	MCA/COMMON/UCX: added non-blocking implementations of atomics - added implementation of swap/cswap/fadd operations - blocking add64 is replaced by non-blocking routine Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-25 12:25:31 +03:00
Sergey Oblomov	63e7ba6843	MCA/COMMON/UCX: added parameter for UCX/opal progress - added parameter to set UCX/opal progresses - minor refactoring of request wait routines Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-25 11:00:12 +03:00
Yossi Itigin	e3ee11608b	coll_hcoll: register progress callback directly without a proxy Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-06-24 18:06:07 +03:00
Jeff Squyres	3767ce27c0	btl/tcp: trivial whitespace clean No code/logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-23 08:04:12 -07:00
Jeff Squyres	9034717876	btl/tcp: use a hash map for kernel IP interface indexes The giant size of the TCP proc struct is causing a problem in some environments (because it is allocated on the stack), and it was too big, anyway. Instead, use a hash map. That way, it starts small and can grow if it needs to. It also makes no assumptions about the values of the kernel interface indexes. Fixes #5292. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-23 08:03:30 -07:00
Ralph Castain	259d9bd4fe	Merge pull request #5325 from jsquyres/pr/compiler-warning-stomps pmix3/pmix_server.c: minor compiler warning stomp	2018-06-23 07:39:27 -07:00
Jeff Squyres	e3d6c5ce3a	pmix3/pmix_server.c: minor compiler warning stomp Submitted upstream https://github.com/pmix/pmix/pull/776. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-23 06:35:09 -07:00
Edgar Gabriel	edfdcb6e82	Merge pull request #5324 from edgargabriel/pr/minor-fixes Pr/minor fixes	2018-06-22 17:20:02 -05:00
Howard Pritchard	8babaad35c	Merge pull request #4520 from ggouaillardet/refresh/romio321 io/romio321: refresh ROMIO based on latest stable MPICH 3.2.1	2018-06-22 16:58:46 -05:00
Edgar Gabriel	cf5cdad40f	fcoll: make vulcan the default component make vulcan the default component except for Lustre file systems. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-22 14:12:02 -05:00
Edgar Gabriel	fd8c5fba4e	common/ompio: fix the fview based grouping options a bug sneaked into constructing the list of aggregators processes when using the fileview based grouping options Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-22 14:01:31 -05:00
Sergey Oblomov	d57ae62dee	MCA/UCX: added common module - implemented non-blocking routines for flush operations Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-22 16:41:09 +03:00
Gilles Gouaillardet	45b6e785aa	Merge pull request #5320 from ggouaillardet/topic/ucx_volatile pml/ucx: silence a warning	2018-06-22 14:00:44 +09:00
Gilles Gouaillardet	edd02b7144	pml/ucx: silence a warning declare 'fenced' volatile in order to silence CID 1437465 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-22 13:11:42 +09:00
Edgar Gabriel	d5dd008193	Merge pull request #5319 from edgargabriel/pr/ibm-testsuite-fixes2 Pr/ibm testsuite fixes2	2018-06-21 19:46:22 -05:00
Edgar Gabriel	743e0dff5a	common/ompio: fix zero size fview issue handle the situation where the user requests a non-zero amount of data but has a zero-size fileview. My instrinct would have been to return an error code, but according to the test that I used it should be MPI_SUCCESS and zero bytes. It is definitely better than segfaulting :-) THis makes another test from the IBM testsuite pass. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 17:02:13 -05:00
Edgar Gabriel	7643ccfbcf	sharedfp/sm and sharedfp/lockedfile: fix seek offset calculation the seek offset calculation did not treat the offset as a multiple of the etype provided. Fixing this makes some more ibm tests pass. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 14:26:36 -05:00
Mikhail Kurnosov	c500739293	coll/base: Add MPI_Bcast based on a scatter followed by an allgather Implements MPI_Bcast using a binomial tree scatter followed by an recursive doubling allgather. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-06-21 11:47:07 -06:00
Jeff Squyres	e305e80aff	Merge pull request #5317 from jsquyres/pr/update-bind-to-cpulist-option orterun: use consistent CLI option name for --bind-to	2018-06-21 12:43:18 -04:00
Jeff Squyres	4603852740	orterun: use consistent CLI option name for --bind-to Since the new binding option is tied to the --cpu-list orterun CLI option, make the --bind-to option reflect the same name (vs. the --cpu-set CLI option, which is entirely different). For example: mpirun --bind-to cpu-list:ordered ... Note that "--bind-to cpulist:ordered" is accepted as a synonym, because people will be lazy. Also add some minor updates to the orterun.1in man page for clarification. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-21 08:22:00 -07:00
Edgar Gabriel	fb16d40775	Merge pull request #5196 from edgargabriel/topic/cuda io/ompio: introduce initial support for cuda buffers in ompio	2018-06-21 10:14:43 -05:00
Edgar Gabriel	7808379a47	common/ompio: incorporate George's comments incorporate a couple of comments by George as part of the review on github. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:29:49 -05:00
Edgar Gabriel	3c10ed4ed1	common/ompio: use allocator to manage temporary buffers use an allocator to manage temporary buffers when copying unmanaged data from GPU buffer to host. This is necessary, since the buffers have to be pinned for better performance, which is an expensive operation. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Edgar Gabriel	ac79e576ef	fcoll/base: do not use the two_phase compoment with CUDA support the two_phase compoment does not work with some collective I/O operations on CUDA buffers due to the data sieving (i.e. both read and write operations) executed on some buffers, which are not anticipated in the GPU buffer management of the code. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Edgar Gabriel	6a532101aa	io/ompio and common/ompio: add initial support for cuda buffers in ompio this commit adds the initial support for cuda buffers in ompio, for blocking and non-blocking individual read and write operations. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Edgar Gabriel	8c2ea0ef49	opal/dataype: add additional interface to retrieve more details about cuda buffer the existing interface in opal_datatype_cuda do not allow to distinguish whether a buffer is a managed or unmanaged cuda buffer. Add an interface that allows to retrieve this information throug a convertor, since the information is actually available in the mca_common_cuda_* routines. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00

1 2 3 4 5 ...

28751 Коммитов