openmpi

Автор	SHA1	Сообщение	Дата
Howard Pritchard	41ef5c7a10	Merge pull request #6594 from vspetrov/osc_ucx_rget_rkey_fix OSC/UCX: use correct rkey for atomic_fadd in rget/rput	2019-05-01 11:53:17 -06:00
Mark Allen	c081757462	fixing an unsafe usage of integer disps[] (romio321 gpfs) There are a couple MPI_Alltoallv calls in ad_gpfs_aggrs.c where the send/recv data comes from places like req[r].lens, and the send buffer and send displacements for example were being calculated as sbuf = pick one of the reqs: req[bottom].lens sdisps[r] = req[r].lens - req[bottom].lens which might be okay if the .lens was data inside of req[] so they'd all be close to each other. But each .lens field is just a pointer that's malloced, so those addresses can be all over the place, so the integer-sized sdisps[] isn't safe. I changed it to have a new extra array sbuf and rbuf for those two Alltoallv calls, and copied the data into the sbuf from the same locations it used to be setting up the sdisps[] at, and after the Alltoallv I copy the data out of the new rbuf into the same locations it used to be setting up the rdisps[] at. For what it's worth I was able to get this to fail -np 2 on a GPFS filesystem with hints romio_cb_write enable. I didn't whittle the test down to something small, but it was failing in an MPI_File_write_all call. Signed-off-by: Mark Allen <markalle@us.ibm.com> (cherry picked from commit d85cac8f1a11495415b67ecab69d2ae1cd19d155)	2019-04-25 14:22:19 -04:00
Brelle Emmanuel	2a4bc0cb58	pml/ob1: fixed exit from get_frag_fail when falling back on btl_put In the case the btl_get fails Ob1 tries to fallback on btl_put first but the return code was ignored. So the code fell back on both btl_put and btl_send. Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net> (cherry picked from commit 9c689f2225d29aa152627f39bab841afead254af)	2019-04-22 14:25:34 -07:00
Valentin Petrov	2947ab2dbc	OSC/UCX: correctly handle NULL origin addr and MPI_NO_OP Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2019-04-17 10:35:34 +03:00
Valentin Petrov	68c88e86f2	OSC/UCX: use correct rkey for atomic_fadd in rget/rput Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2019-04-16 15:24:57 +03:00
Thananon Patinyasakdikul	5999fdad5a	pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE. We missed an assert to check if ALLOW_OVERTAKE is set or not before validating the sequence number and this will cause deadlock. Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu> (cherry picked from commit 0263456cf4e99efc67d38acd100cf948e0399d63)	2019-04-09 11:24:24 -07:00
Sergey Oblomov	14c271f993	PML/SPML/UCX: added evaluation of mmap events - there was a set of UCX related issues reported which caused by mmap API hooks conflicts. We added diagnostic of such problems to simplify bug-resolving pipeline Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit d8e3562bae700d84873c1d5ca9c45c846d7387ed)	2019-03-14 16:48:25 +02:00
Howard Pritchard	7aeb65579b	Merge pull request #6395 from brminich/topic/ucx_net_waddr_4.0.x PML/UCX: Use net worker address for remote peers - v4.0.x	2019-02-21 20:29:47 -07:00
Mikhail Brinskii	1c514948f6	PML/UCX: Use net worker address for remote peers For remote node peers pack smaller worker address, which contains network device addresses only. This would reduce amount of OOB traffic during startup. Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com> (cherry picked from commit 751d88192d05edb7e1912bab4e48643c6f9e1574)	2019-02-21 16:58:20 +02:00
Gilles Gouaillardet	749f51845b	osc/rdma: correctly handle communications to self mark the "self" peer OMPI_OSC_RDMA_PEER_LOCAL_BASE when the window is dynamically created and use_cpu_atomics is set in order to correctly handle communications to self. Thanks Bart Janssens for reporting this issue. Refs. open-mpi/ompi#6394 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (back-ported from commit open-mpi/ompi@fe05fcc11a)	2019-02-20 13:06:05 +09:00
Howard Pritchard	0b915b7e56	Merge pull request #6333 from jsquyres/pr/v4.0.x/hwloc-macro-conflict-fixes v4.0.x: Various minor hwloc cleanups	2019-02-12 09:13:19 -07:00
Howard Pritchard	5dd63405ce	Merge pull request #6368 from jsquyres/pr/v4.0.x/fix-ofi-configury v4.0.x: fix OFI configury	2019-02-11 13:15:52 -07:00
Jeff Squyres	9ad871fc38	ofi: revamp OPAL_CHECK_OFI configury Update the OPAL_CHECK_OFI configury macro: - Make it safe to call the macro multiple times: - The checks only execute the first time it is invoked - Subsequent invocations, it just emits a friendly "checking..." message so that configure output is sensible/logical - With the goal of ultimately removing opal/mca/common/ofi, rename the output variables from OPAL_CHECK_OFI to be opal_ofi_{happy\|CPPFLAGS\|LDFLAGS\|LIBS}. - Update btl/usnic and mtl/ofi for these new conventions. - Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that causes the macro to be invoked at a fairly random time, which makes configure stdout confusing / hard to grok. - Remove a little left-over kruft in OPAL_CHECK_OFI, too (which resulted in an indenting change, making the change to opal_check_ofi.m4 look larger than it really is). Thanks Alastair McKinstry for the report and initial fix. Thanks Rashika Kheria for the reminder. Updated from master cherry pick: the OFI BTL does not exist on the v4.0.x branch. Therefore, did not include the OFI BTL changes on master in this cherry pick. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit f5e1a672ccd5db127e85e1e8f6bcfeb8a8b04527)	2019-02-07 06:36:35 -08:00
René Widera	e30e5b95c6	common/ompio: possible rounding issue Similar to #6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors. - remove floating point operations for `round up` - removes floating point conversion for round down (native behavior of integer division) Signed-off-by: René Widera <r.widera@hzdr.de> (cherry picked from commit a91fab80a1e55e1df15f649e18d247e5d4654eb9)	2019-01-30 12:31:39 -06:00
Edgar Gabriel	d1e8779fe3	common/ompio: fix a floating point division problem This commit fixes a problem reported on the mailing list with individual writes larger than 512 MB. The culprit is a floating point division of two large, close values. Changing the datatypes from float to double (which is what is being used in the fcoll components) fixes the problem. See issue #6285 and https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118 Thanks for Axel Huebl and René Widera for reporting the issue. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu> (cherry picked from commit c0f8ce0fff4684b670135043dd150abc9d83d988)	2019-01-30 12:31:16 -06:00
Gilles Gouaillardet	a247292275	topo/treematch: silence a hwloc related warning treematch/km_partitioning.c #include "config.h", but there is no such file when the embedded treematch is used. In order to prevent the embedded treematch from incorrectly using the config.h from the embedded hwloc, generate a dummy config.h. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit 0aeb27f77650d3ee97e17e770c9e5aa487d5e1f5)	2019-01-30 07:33:33 -05:00
Jeff Squyres	1a1a932acc	romio321: ensure to distribute ompi_grequestx.h Refs https://github.com/open-mpi/ompi/issues/6227. Thanks to George Marselis for reporting. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit 62321be186dd7d3efcedc2e801f226f6660ea0c4)	2018-12-28 13:18:10 -08:00
Matias A Cabral	b2327049c1	MTL/PSM2: add missing default priority Missing default priority after PR #6153 Signed-off-by: Matias Cabral <matias.a.cabral@intel.com> (cherry picked from commit c76c6d8b2801ca43ba33168a0b92522786c7c5bb)	2018-12-07 16:22:59 -08:00
Matias A Cabral	80113a368f	MTL/PSM2: Do not lower the priority when all processes are local. The intention of lowering the priority when all processes are local was to favor Vader BTL. However, in builds including the OFI MTL it gets selected instead. Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com> Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com> Signed-off-by: Matias Cabral <matias.a.cabral@intel.com> (cherry picked from commit fc8582c5606b7a3d1b711f8f7b6144808290a48f)	2018-12-07 11:11:43 -08:00
Geoff Paulsen	752bbd195f	Merge pull request #6102 from hoopoepg/topic/set-osc-ucx-level-200-v4.0 OSC: set UCX module used by default - v4.0	2018-12-04 10:26:37 -06:00
Geoff Paulsen	bd2990f502	Merge pull request #6131 from devreal/rdma-plug-memleak-v4.0.x v4.0.x: Plug two memory leaks in rdma osc	2018-11-30 13:54:51 -06:00
Joseph Schuchart	c5346751e6	Plug two memory leaks in rdma osc Signed-off-by: Joseph Schuchart <schuchart@hlrs.de> (cherry picked from commit 91885f5876129aa4fb43ed4b3404c9d1ca7e08b8)	2018-11-29 10:19:26 -05:00
Sergey Oblomov	6651672711	OSC/UCX: set max level value to 60 Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit 2d230b3aacce0185f0d46e69f608071b670eeb3c)	2018-11-27 20:35:30 +02:00
Yossi Itigin	a112d10c93	pml_ucx: initialize req_mpi_object.comm for error handler without this fix, an error handler invoked on pml_ucx request would segfault while trying to dereference requests[i]->req_mpi_object.comm (picked from master f36eeef) Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-11-26 11:57:34 +02:00
Sergey Oblomov	38a4953707	OSC/UCX: added UCX version evaluation - added UCX version evaluation to set OSC UCX priority Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit e91f214982391b8e1b26be39147c357d32b8380e)	2018-11-22 11:31:53 +02:00
Sergey Oblomov	012e27af77	OSC: set UCX module used by default - OSC/UCX module set priority to 200 to be used by default Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit 36934a8bb2484c3d27d14683d65012ff422334f4)	2018-11-22 10:59:43 +02:00
Howard Pritchard	8adaeb1536	Merge pull request #6007 from aravindksg/coll-tuned-fix-40x coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms	2018-11-19 13:15:40 -07:00
Howard Pritchard	ec79631ba2	Merge pull request #5936 from edgargabriel/pr/testmpio-v4.0.x Pr/testmpio v4.0.x	2018-11-19 13:11:50 -07:00
Howard Pritchard	e851879081	Merge pull request #5994 from rhc54/cmr40/cleanup Remove stale defunct tools	2018-10-31 13:29:57 -06:00
Aravind Gopalakrishnan	5a74ddb34d	coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms. But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths as well before calling ompi_datatype_type_size() as otherwise we segfault. MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and Allgatherv operations. So, extending the check to these algorithms as well. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com> (cherry picked from commit 88d781056f43934a93e16db556b340e72cdd3742)	2018-10-31 11:37:29 -07:00
Ralph Castain	ba6ad9fe42	Remove stale defunct tools Signed-off-by: Ralph Castain <rhc@open-mpi.org> (cherry picked from commit 05ac8fa71c0833eeeaa878b72a31503d361e145e)	2018-10-30 08:51:25 -07:00
Sergey Oblomov	0846c9d112	COMMON/UCX: added error code to log output Also fixes a PGI compilation error with --enable-debug. Signed-off-by: Geoff Paulsen <gpaulsen@users.noreply.github.com> Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit 1099d5f02327329e0c58d9403e3e0a7f1e1d1920)	2018-10-30 09:55:25 -05:00
Ralph Castain	712ddd326f	Remove the stale orte-dvm code Users should migrate to https://github.com/pmix/prrte Signed-off-by: Ralph Castain <rhc@open-mpi.org> (cherry picked from commit 1bd772e8ebf66f705537b9a6e1af2b6093ef8471)	2018-10-30 07:54:35 -07:00
Howard Pritchard	f9d2f3b912	Merge pull request #5941 from hppritcha/topic/remove_bfo_pml_v4.0.x v4.0.x: remove the bfo pml	2018-10-22 09:50:05 -06:00
Howard Pritchard	a806d09450	remove the bfo pml Signed-off-by: Howard Pritchard <howardp@lanl.gov> (cherry picked from commit 7d6774acf89558c05f415c96c00502429e26e502)	2018-10-17 14:00:11 -06:00
Edgar Gabriel	278ecf2205	io/ompio: add verification for data representations. check for providing a data representation that is actually supported by ompio. Add also one check for a non-NULL pointer in mpi/c/file_set_view for the data representation. Also fixes parts of issue #5643 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-17 11:22:48 -05:00
Edgar Gabriel	a07c9e96b1	io/ompio: execute barrier before sync this ensures that all processes are done modifying a file before syncing. Fixes an error in the testmpio testsuite. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-17 11:22:35 -05:00
Edgar Gabriel	96c1a5b9dc	common/ompio: check datatypes when setting file view return MPI_ERR_ARG if the size of the fileview is not a multiple of the size of the etype provided. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-17 11:22:19 -05:00
Edgar Gabriel	425a71799e	common/ompio: return correct error code for improper access return MPI_ERR_ACCESS if the user tries to read from a file that was opened using MPI_MODE_WRONLY return MPI_ERR_READ_ONLY if the user tries to write a file that was opened using MPI_MODE_RDONLY Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-17 11:22:04 -05:00
Edgar Gabriel	c65dda6f5f	io/ompio: fix seek position calculation for SEEK_CUR This commit fixes the calculation of the position where to seek to, in case SEEK_CUR is used. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-17 11:21:47 -05:00
Howard Pritchard	7ceb508b93	Merge pull request #5889 from yosefe/topic/pml-ucx-fix-datatype-leak-v4.0.x pml_ucx: add ompi datatype attribute to release ucp_datatype - v4.0.x	2018-10-16 16:29:01 -06:00
Howard Pritchard	e2cf1e3ec5	Merge pull request #5887 from yosefe/topic/osc-ucx-fix-finalize-hang-v4.0.x osc_ucx: fix hang/timeout in component finalize - v4.0	2018-10-16 09:21:50 -06:00
Yossi Itigin	eabc94cab0	osc_ucx: add worker flush before osc module free Make sure all pending communications are done on all ranks before closing the window. This way it will be safe to close the endpoints when closing the component. (picked from master b8e1af6) Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-10 23:02:19 +03:00
Yossi Itigin	4a97d6b9fa	pml_ucx: fix return code from mca_pml_ucx_init() (picked from master 40ac9e4) Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-10 20:23:49 +03:00
Yossi Itigin	1bffd196ef	pml_ucx: add ompi datatype attribute to release ucp_datatype (picked from master 4763822) Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-10 20:23:26 +03:00
Sergey Oblomov	274cbc3c03	OSC/UCX: fixed zero-size window processing - added processing of zero-size MPI window Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit ae6f81983fe354de812ebe2532120fb20ae24d3b)	2018-10-10 16:49:02 +03:00
Brian Barrett	10d0a430c4	mtl ofi: Change from opt-in to opt-out provider selection Change default provider selection logic for the OFI MTL. The old logic was whitelist-only, so any new HPC NIC provider would have to ask users to do extra work or wait for an OMPI release to be whitelisted. The reason for the logic was to avoid selecting a "generic" provider like sockets or shm that would frequently have worse performance than the optimized BTL options Open MPI supports. With the change, we blacklist the (small, relatively static) list of providers that duplicate internal capabilities. Users can use one of thse blacklisted providers in two ways: first, they can explicitly request the provider in the include list (which will override the default exclude list) and second, the can set a new empty exclude list. Since most HPC networks require special libraries and therefore an explicit build of libfabric, it is highly unlikely that this change will cause users to use libfabric when they didn't want to do so. It does, however, solve the whitelisting problem. Signed-off-by: Brian Barrett <bbarrett@amazon.com> (cherry picked from commit c5eaa38491c7197f7dbc74c299ade18e09bf5f64)	2018-09-27 18:41:47 +00:00
Geoff Paulsen	3d4164e1e1	Merge pull request #5752 from gpaulsen/misc-warnings-fixes Miscellaneous compiler warning stomps.	2018-09-22 15:01:53 -05:00
Geoff Paulsen	bc798b6135	Merge pull request #5755 from gpaulsen/osc_rdma_cleanup osc/rdma: clean out stale aggregation code	2018-09-22 15:00:21 -05:00
Nathan Hjelm	72fc8acb50	osc/rdma: quiet warning gcc complains about ret possibly being used uninitialized. That will never happen but we should still quiet the warning. This commit sets ret to a valid value. Fixes #5513 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-09-21 14:44:56 -05:00

1 2 3 4 5 ...

6792 Коммитов