openmpi

Автор	SHA1	Сообщение	Дата
Edgar Gabriel	c0f8ce0fff	common/ompio: fix a floating point division problem This commit fixes a problem reported on the mailing list with individual writes larger than 512 MB. The culprit is a floating point division of two large, close values. Changing the datatypes from float to double (which is what is being used in the fcoll components) fixes the problem. See issue #6285 and https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118 Thanks for Axel Huebl and René Widera for reporting the issue. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-01-21 17:59:12 -06:00
Brian Barrett	fe25097194	mtl/ofi: Print descriptive error message on modex failure With MTLs, there's no "other transport" when the remote side does not have an active NIC, so we should print a useful error message when the modex failed (indicating lack of a NIC on the remote side). Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2019-01-21 23:50:31 +00:00
René Widera	a91fab80a1	common/ompio: possible rounding issue Similar to #6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors. - remove floating point operations for `round up` - removes floating point conversion for round down (native behavior of integer division) Signed-off-by: René Widera <r.widera@hzdr.de>	2019-01-18 14:05:23 +01:00
Yossi Itigin	387b2ff56f	Merge pull request #6260 from hoopoepg/topic/removed-fca COLL: removed FCA component	2019-01-17 00:05:07 +08:00
Aravind Gopalakrishnan	37f9aff2a0	mtl/ofi: Add MCA variables to enable SEP and to request number of OFI contexts Moving to a model where we have users actively _enable_ SEP feature for use rather than opening SEP by default if provider supports it. This allows us to not regress (either functionally or for performance reasons) any apps that were working correctly on regular endpoints. Also, providing MCA to specify number of OFI contexts to create and default this value to 1 (Given btl/ofi also creates one by default, this reduces the incidence of a scenario where we allocate all available contexts by default and if btl/ofi asks for one more, then provider breaks as it doesn't support it). While at it, spruce up README on SEP content. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-01-14 09:58:36 -08:00
Ralph Castain	d1fd1f4cce	Merge pull request #6151 from nrspruit/ns_ompi_mtl_ofi_specializations MTL_OFI: Generation of specialized functions at build time	2019-01-14 09:31:54 -08:00
Sergey Oblomov	0759bb8561	COLL: removed FCA component - removed FCA collectives from coll/scoll Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2019-01-09 16:51:40 +02:00
Jeff Squyres	17be4c6d1f	Merge pull request #6229 from jsquyres/pr/fix-enable-grequest-extension-in-a-tarball romio321: ensure to distribute ompi_grequestx.h	2018-12-28 16:15:23 -05:00
Jeff Squyres	62321be186	romio321: ensure to distribute ompi_grequestx.h Refs https://github.com/open-mpi/ompi/issues/6227. Thanks to @georgemarselis for reporting. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-12-27 15:39:47 -08:00
bosilca	96f88052e9	Merge pull request #5948 from mkurnosov/coll-ireduce-silence-coverity coll/libnbc/ireduce: silence Coverity warning CID 1440360	2018-12-24 12:59:16 -05:00
bosilca	593db292da	Merge pull request #5644 from mkurnosov/coll-iallreduce-rabenseifner coll/libnbc: add Rabenseifner's algorithm for MPI_Iallreduce	2018-12-24 12:58:21 -05:00
Aurelien Bouteiller	bd0d2b832e	Merge pull request #6086 from ICLDisco/export/errors_nbc Manage errors in NBC collective ops	2018-12-21 02:34:00 -05:00
Jeff Squyres	e9a6246b90	treematch: fix global common symbol Despite its name, this symbol doesn't need to be global. So just make it static. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-12-20 11:06:14 -08:00
Nathan Hjelm	06baa518f7	rte/pmix: fill in opal_process_info when using prrte/pmix This commit fixes a bug when launching with prun where the process info structures used by the btls are not populated. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-12-13 16:04:31 -07:00
bosilca	804a517929	Merge pull request #6146 from bosilca/topic/treematch_update Update to the latest TreeMatch (v1.3).	2018-12-13 13:26:40 -05:00
Spruit, Neil R	bef5f50a42	MTL_OFI: Generation of specialized functions at build time -> Added new targets in Makefile.am to call a new build script generate-opt-funcs.pl to generate specialized functions for each .pm file. -> Added new perl module .pm files for send,isend,irecv,iprobe,improbe which are loaded by generate-opt-funcs.pl to create new source files that correspond to the name of the .pm file to be used as part of MTL OFI. -> Added mtl_ofi_opt.pm.template and updated README with details on the specialization features and how to add additional specialization support. -> Added new opt_common/mtl_ofi_opt_common.pm containing common functions for generating the specialized functions used by all other *.pm modules. -> Added new mtl_ofi.h which includes the definitions for the function symbol table for storing the specialized functions along with the definitions for the initialization functions for the corresponding function pointers. -> Based off the OFI provider capabilities the specialized function pointers are assigned at mtl_ofi_component_init to the corresponding MTL OFI function. -> mca_mtl_ofi_module_t has been updated with the symbol table struct which is assigned at component init. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-12-13 00:35:19 -08:00
Aravind Gopalakrishnan	e5e19dfcf7	Fix for SEP when num local procs is greater than available contexts For cases when the number of local processes is greater than the number of available contexts, the SEP initialization phase would calculate the number of contexts to provision for each rank to be 0 and would eventually crash. Fix the issue here by using regular endpoints in the event the number of local processes is more than available contexts. This fixes issue #6182. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-12-12 16:49:04 -08:00
Brian Barrett	6e15128d96	mtl/ofi: Fix crash if no providers found Commit `109d0569ff` introduced a crash when an error occurred before ofi_ctxt was allocated, including when no providers passed the selection logic. Properly check that the pointer is not NULL in the error cleanup code before dereferencing the pointer. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-12-11 15:46:18 -08:00
Matias A Cabral	c76c6d8b28	MTL/PSM2: add missing default priority Missing default priority after PR #6153 Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-12-07 14:46:34 -08:00
George Bosilca	74f2365d6e	Remove most (all) warnings from the new TreeMatch. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-12-05 15:38:39 -05:00
Guillaume Mercier	27aa34e53f	New version based on TM 1.3 Optimize_topology is commented for now until bug resolved in TM Signed-off-by: Guillaume Mercier <guillaume.mercier@bordeaux-inp.fr>	2018-12-05 15:38:39 -05:00
Matias A Cabral	fc8582c560	MTL/PSM2: Do not lower the priority when all processes are local. The intention of lowering the priority when all processes are local was to favor Vader BTL. However, in builds including the OFI MTL it gets selected instead. Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com> Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com> Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-12-04 15:31:09 -08:00
Matias Cabral	abd34620f4	Merge pull request #5972 from aravindksg/ofi_sep_master MTL/OFI: Add OFI Scalable Endpoint support	2018-12-04 13:07:44 -08:00
Aravind Gopalakrishnan	109d0569ff	MTL/OFI: Add OFI Scalable Endpoint support OFI MTL supports OFI Scalable Endpoints feature as means to improve multi-threaded application throughput and message rate. Currently the feature is designed to utilize multiple TX/RX contexts exposed by the OFI provider in conjunction with a multi-communicator MPI application model. For more information, refer to README under mtl/ofi. Reviewed-by: Matias Cabral <matias.a.cabral@intel.com> Reviewed-by: Neil Spruit <neil.r.spruit@intel.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-12-03 09:56:52 -08:00
George Bosilca	c6f73e8883	First step of the integration with the new TreeMatch. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-12-02 20:05:03 -05:00
Yossi Itigin	83cca9d52a	ucx: add owner.txt for components Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-12-01 17:14:03 +02:00
matcabral	6a15712df5	MTL/OFI: revert PR 6082 Revert to avoid issues with dynamic processes. Signed-off-by: matcabral <matias.a.cabral@intel.com>	2018-11-30 13:44:39 -08:00
Matias Cabral	ef5db1b752	Merge pull request #6082 from matcabral/lower_mtl_ofi_p MTL/OFI: Lower priority when all procs are local	2018-11-30 12:05:40 -08:00
Nathan Hjelm	5ebcbe444e	Merge pull request #6083 from devreal/rdma-plug-memleak Plug two memory leaks in rdma osc	2018-11-27 09:56:19 -07:00
Sergey Oblomov	2d230b3aac	OSC/UCX: set max level value to 60 Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-11-27 14:20:28 +02:00
Jeff Squyres	8459d29738	Merge pull request #5979 from mkurnosov/coll-libnbc-cleanup coll/libnbc: remove debug output	2018-11-26 18:10:10 -05:00
Yossi Itigin	f36eeef4c5	pml_ucx: initialize req_mpi_object.comm for error handler without this fix, an error handler invoked on pml_ucx request would segfault while trying to dereference requests[i]->req_mpi_object.comm Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-11-25 19:37:54 +02:00
Yossi Itigin	ed967d867b	Merge pull request #6073 from hoopoepg/topic/set-osc-ucx-level-200 OSC: set UCX module used by default	2018-11-22 10:53:37 +02:00
KAWASHIMA Takahiro	303d7842d9	Merge pull request #6074 from kawashima-fj/pr/remove-c99-type-check Remove `#if HAVE_[TYPE]` for types available in C99	2018-11-20 11:42:13 +09:00
Aurelien Bouteiller	20447be744	Someone left a debug printf in NBC Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2018-11-16 10:37:04 -05:00
Aurelien Bouteiller	65660e5999	Manage errors in NBC collective ops Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu> Correctly bubble up errors in NBC collective operations Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu> The error field of requests needs to be rearmed at start, not at create Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2018-11-15 16:43:56 -05:00
Joseph Schuchart	91885f5876	Plug two memory leaks in rdma osc Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2018-11-14 14:31:54 -05:00
matcabral	5f58453e63	MTL/OFI: Lower priority when all procs are local So far Vader is faster than OFI MTL for doing shared memory. Therefore, let it run by default when all procs are local. Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com> Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com> Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-11-14 11:01:33 -08:00
Sergey Oblomov	e91f214982	OSC/UCX: added UCX version evaluation - added UCX version evaluation to set OSC UCX priority Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-11-14 10:03:13 +02:00
KAWASHIMA Takahiro	cacd6f389c	datatype: Remove `#if HAVE_[TYPE]` for C99 types Now Open MPI requires a C99 compiler. Checking availability of the following types is no more needed. - `long long` (`signed` and `unsigned`) - `long double` - `float _Complex` - `double _Complex` - `long double _Complex` Furthermore, the `#if HAVE_[TYPE]` style checking is not correct. Availability of C types is checked by `AC_CHECK_TYPES` in `configure.ac`. `AC_CHECK_TYPES` defines macro `HAVE_[TYPE]` as `1` in `opal_config.h` if the `[TYPE]` is available. But it does not define `HAVE_[TYPE]` (instead of defining as `0`) if it is not available. So even if we need `HAVE_[TYPE]` checking, it should be `#if defined(HAVE_[TYPE])`. I didn't remove `AC_CHECK_TYPES` for these types in `configure.ac` since someone may use `HAVE_[TYPE]` macros somewhere. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-11-14 09:32:52 +09:00
Sergey Oblomov	36934a8bb2	OSC: set UCX module used by default - OSC/UCX module set priority to 200 to be used by default Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-11-12 15:08:22 +02:00
Aravind Gopalakrishnan	5cf43de445	MTL/OFI: Check threshold number of peers allowed per rank When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when this limit is crossed. Check the max allowed number of ranks during add_procs() and return if there is danger of exceeding this threshold. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-11-01 14:03:00 -07:00
Matias Cabral	2da31706bf	Merge pull request #5970 from aravindksg/coll-tuned-fix coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms	2018-10-31 11:20:07 -07:00
Ralph Castain	05ac8fa71c	Remove stale defunct tools Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-10-30 08:48:16 -07:00
Mikhail Kurnosov	64abd0f405	coll/libnbc: remove debug output 1. Remove debug output in iallgather (I have forgotten to remove it). 2. Remove an incorrect comment in description of ibcast Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-26 15:52:02 +07:00
Aravind Gopalakrishnan	88d781056f	coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms. But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths as well before calling ompi_datatype_type_size() as otherwise we segfault. MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and Allgatherv operations. So, extending the check to these algorithms as well. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-10-24 15:31:33 -07:00
Joseph Schuchart	a193ae26bf	Fix regression introduced earlier by re-adding a barrier after shared memory has been registered Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2018-10-24 15:54:19 -04:00
Yossi Itigin	4c442f2601	Merge pull request #5934 from hoopoepg/topic/suppressed-cov-warn-added-log-msg COMMON/UCX: suppressed coverity warnings	2018-10-22 11:00:47 +03:00
Mikhail Kurnosov	8b511c7889	coll/libnbc/ireduce: silence Coverity warning CID 1440360 Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-22 11:20:28 +07:00
Sergey Oblomov	1099d5f023	COMMON/UCX: added error code to log output Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-10-21 11:37:25 +03:00
Nathan Hjelm	a66373454e	Merge pull request #5943 from bosilca/fix/libnbc_warnings Remove few warnings in libnbc identified by clang-1000.11.45.2	2018-10-20 21:24:30 -06:00
bosilca	c3abedbd2c	Merge pull request #5759 from bosilca/fix/monitoring Fix/monitoring	2018-10-19 07:18:41 -07:00
Nathan Hjelm	dbae9c0958	romio/romio321: silence some compiler warnings Some compilers complain when comparing signed and unsigned. romio321 was doing just this. The check is meant to check whether a size (which is an ADIO_Offset-- a signed number) will work with memcpy which takes a size_t. To silence the warning I added a new type (ADIO_Size) which is an unsigned type and cast the ADIO_Offset to this new type. Fixes #5951 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-10-18 13:36:51 -06:00
George Bosilca	dc972f0b92	Fix the PML monitoring. The monitoring PML hides it's existence from the OMPI infrastructure by removing itself from the list of PML loaded components, remaining hidden until MPI_Finalize. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-10-18 00:29:23 -04:00
George Bosilca	668aa15dda	Early selection of the best PML. With this patch the best PML is selected earlier, before finalizing the others PML. This provides a simpler mechanism to intercept and highjack the PML (as done in the monitoring PML) Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-10-18 00:29:23 -04:00
Mikhail Kurnosov	73e048b62a	coll/libnbc: add Rabenseifner's algorithm for MPI_Iallreduce An implementation of R. Rabenseifner's algorithm for MPI_Iallreduce. This algorithm is a combination of a reduce-scatter implemented with recursive vector halving and recursive distance doubling, followed either by an allgather. Limitations: -- count >= 2^{\floor{\log_2 p}} -- commutative operations only -- intra-communicators only Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-18 08:50:16 +07:00
Ralph Castain	1bd772e8eb	Remove the stale orte-dvm code Users should migrate to https://github.com/pmix/prrte Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-10-17 15:11:38 -07:00
George Bosilca	66182a294d	Remove few warnings in libnbc identified by clang-1000.11.45.2 Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-10-17 18:04:39 -04:00
Howard Pritchard	a435bfe1cf	Merge pull request #5933 from hppritcha/topic/remove_bfo_pml remove the bfo pml	2018-10-17 09:39:58 -06:00
Nathan Hjelm	43547ade4c	Merge pull request #5663 from mkurnosov/coll-ireduce-rabenseifner coll/libnbc: add Rabenseifner's algorithm for MPI_Ireduce	2018-10-17 09:02:06 -06:00
Nathan Hjelm	979a199b4f	Merge pull request #5896 from mkurnosov/coll-iallgather-recursivedoubling coll/libnbc: add recursive doubling algorithm for MPI_Iallgather	2018-10-17 09:01:14 -06:00
Sergey Oblomov	df765595e3	COMMON/UCX: suppressed coverity warnings - suppressed coverity warnings - added log messages on failed calls Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-10-17 16:11:03 +03:00
Howard Pritchard	7d6774acf8	remove the bfo pml Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2018-10-17 06:50:11 -06:00
Nathan Hjelm	1ff3cfedb6	Merge pull request #5921 from devreal/ompi-rdma-preinit RDMA OSC: initialize segment memory before registering the segment	2018-10-16 15:10:02 -06:00
Joseph Schuchart	d9dcdfdfba	RDMA OSC: initialize segment memory before registering the segment Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2018-10-16 16:12:14 -04:00
Edgar Gabriel	069084e6ad	Merge pull request #5907 from edgargabriel/topic/testmpio-fixes Topic/testmpio fixes	2018-10-16 13:03:22 -07:00
Edgar Gabriel	ba95588332	io/ompio: add verification for data representations. check for providing a data representation that is actually supported by ompio. Add also one check for a non-NULL pointer in mpi/c/file_set_view for the data representation. Also fixes parts of issue #5643 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-16 12:45:33 -05:00
Jeff Squyres	54ca3310ea	ompi: cleanup various string operations Several fixes to string handling: 1. strncpy() -> opal_string_copy() (because opal_string_copy() guarantees to NULL-terminate, and strncpy() does not) 2. Simplify a few places, such as: * Since opal_string_copy() guarantees to NULL terminate, eliminate some memsets(), etc. * Use opal_asprintf() to eliminate multi-step string creation There's more work that could be done; e.g., this commit doesn't attempt to clean up any strcpy() usage. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-10-14 16:10:20 -07:00
Edgar Gabriel	849d0452a0	io/ompio: execute barrier before sync this ensures that all processes are done modifying a file before syncing. Fixes an error in the testmpio testsuite. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-11 17:39:05 -05:00
Edgar Gabriel	bf058ca6b0	common/ompio: check datatypes when setting file view return MPI_ERR_ARG if the size of the fileview is not a multiple of the size of the etype provided. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-11 14:43:32 -05:00
Edgar Gabriel	05d25383c2	common/ompio: return correct error code for improper access return MPI_ERR_ACCESS if the user tries to read from a file that was opened using MPI_MODE_WRONLY return MPI_ERR_READ_ONLY if the user tries to write a file that was opened using MPI_MODE_RDONLY Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-11 14:41:58 -05:00
Edgar Gabriel	c0d7b578be	io/ompio: fix seek position calculation for SEEK_CUR This commit fixes the calculation of the position where to seek to, in case SEEK_CUR is used. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-11 14:09:03 -05:00
Yossi Itigin	b71e85b8d5	pml_ucx: fix return code from mca_pml_ucx_init() error flow Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-11 18:48:54 +03:00
Mikhail Kurnosov	a7386c1e09	coll/libnbc: add recursive doubling algorithm for MPI_Iallgather Implements recursive doubling algorithm for MPI_Iallgather. The algorithm can be used only for power-of-two number of processes. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-11 21:43:13 +07:00
Yossi Itigin	b8e1af6fcb	osc_ucx: add worker flush before osc module free Make sure all pending communications are done on all ranks before closing the window. This way it will be safe to close the endpoints when closing the component. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-10 20:47:16 +03:00
Yossi Itigin	bcc48515e4	Revert "osc_ucx: fix hang/timeout in component finalize" This reverts commit 438d13b4ca1e7333b789ca3fb536fda17b0feb38. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-10 20:47:13 +03:00
Yossi Itigin	27d8c8e83c	Merge pull request #5878 from yosefe/topic/pml-ucx-fix-datatype-leak pml_ucx: add ompi datatype attribute to release ucp_datatype	2018-10-10 20:18:39 +03:00
Yossi Itigin	a012ee91d8	Merge pull request #5886 from yosefe/topic/osc-ucx-fix-finalize-hang osc_ucx: fix hang/timeout in component finalize	2018-10-10 16:29:29 +03:00
Yossi Itigin	9a365555b0	Merge pull request #5879 from hoopoepg/topic/fixed-zero-size-window OSC/UCX: fixed zero-size window processing	2018-10-10 16:28:55 +03:00
Yossi Itigin	40ac9e4771	pml_ucx: fix return code from mca_pml_ucx_init() Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-10 14:41:05 +03:00
Yossi Itigin	dc6809495d	osc_ucx: fix hang/timeout in component finalize Add barrier to make sure all endpoints are destroyed before destroying the worker. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-10 14:38:06 +03:00
Sergey Oblomov	ae6f81983f	OSC/UCX: fixed zero-size window processing - added processing of zero-size MPI window Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-10-10 13:08:01 +03:00
Yossi Itigin	4763822a64	pml_ucx: add ompi datatype attribute to release ucp_datatype Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-09 17:34:34 +03:00
Mikhail Kurnosov	b0429d25df	coll/libnbc: add knomial tree algorithm for MPI_Ibcast Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-09 20:43:04 +07:00
Mikhail Kurnosov	7bd63e79c8	coll/libnbc: add Rabenseifner's algorithm for MPI_Ireduce An implementation of R. Rabenseifner's algorithm for MPI_Ireduce. This algorithm is a combination of a reduce-scatter implemented with recursive vector halving and recursive distance doubling, followed either by a gather. Limitations: -- count >= 2^{\floor{\log_2 p}} -- commutative operations only -- intra-communicators only Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-09 20:27:09 +07:00
Brian Barrett	e9e4d2a4bc	Handle asprintf errors with opal_asprintf wrapper The Open MPI code base assumed that asprintf always behaved like the FreeBSD variant, where ptr is set to NULL on error. However, the C standard (and Linux) only guarantee that the return code will be -1 on error and leave ptr undefined. Rather than fix all the usage in the code, we use opal_asprintf() wrapper instead, which guarantees the BSD-like behavior of ptr always being set to NULL. In addition to being correct, this will fix many, many warnings in the Open MPI code base. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-10-08 16:43:53 -07:00
Mikhail Kurnosov	9557fa087f	Resolve merge conflicts Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-05 21:40:27 +07:00
Nathan Hjelm	88a560fa3c	Merge pull request #5744 from mkurnosov/coll-iscan-recursivedoubling coll/libnbc: add recursive doubling algorithm for MPI_Iscan	2018-10-03 09:02:05 -06:00
Brian Barrett	2e24e6ec08	coll libnbc: Remove dead code Remove dead code that was causing warnings about unused static functions. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-10-02 13:35:15 -04:00
Brian Barrett	c5eaa38491	mtl ofi: Change from opt-in to opt-out provider selection Change default provider selection logic for the OFI MTL. The old logic was whitelist-only, so any new HPC NIC provider would have to ask users to do extra work or wait for an OMPI release to be whitelisted. The reason for the logic was to avoid selecting a "generic" provider like sockets or shm that would frequently have worse performance than the optimized BTL options Open MPI supports. With the change, we blacklist the (small, relatively static) list of providers that duplicate internal capabilities. Users can use one of thse blacklisted providers in two ways: first, they can explicitly request the provider in the include list (which will override the default exclude list) and second, the can set a new empty exclude list. Since most HPC networks require special libraries and therefore an explicit build of libfabric, it is highly unlikely that this change will cause users to use libfabric when they didn't want to do so. It does, however, solve the whitelisting problem. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-09-27 11:02:18 -07:00
Jeff Squyres	6bb356ab87	Squash a bunch of harmless compiler warnings. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-09-26 12:15:21 -07:00
Mikhail Kurnosov	dfe203e167	coll/libnbc: add recursive doubling algorithm for MPI_Iexscan Implements recursive doubling algorithm for MPI_Iexscan. The algorithm preserves order of operations so it can be used both by commutative and non-commutative operations. The MCA parameter 'coll_libnbc_iexscan_algorithm' was added for dynamic algorithm selection. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-09-23 19:54:27 +07:00
Mikhail Kurnosov	3d43ff0f32	coll/libnbc: add recursive doubling algorithm for MPI_Iscan Implements recursive doubling algorithm for MPI_Iscan. The algorithm preserves order of operations so it can be used both by commutative and non-commutative operations. The MCA parameter coll_libnbc_iscan_algorithm was added for dynamic algorithm selection. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-09-22 21:09:12 +07:00
bosilca	3f598e9e83	Merge pull request #5450 from mkurnosov/coll-base-allgather-fix-in-place coll-base-allgather: fix MPI_IN_PLACE processing	2018-09-21 14:51:45 -04:00
bosilca	17f1684438	Merge pull request #5491 from mkurnosov/coll-base-allgatherv-fix-mpi-in-place coll/base/allgatherv: fix MPI_IN_PLACE processing	2018-09-21 14:48:16 -04:00
bosilca	441727fcb0	Merge pull request #5680 from ggouaillardet/topic/nbc_unpack coll/libnbc: fix NBC_Unpack()	2018-09-19 10:17:21 -04:00
bosilca	2ae3cfd9bc	Merge pull request #5699 from ICLDisco/export/coll_errors Error cases in base collectives	2018-09-19 09:47:24 -04:00
Gilles Gouaillardet	8b51862fb2	coll/libnbc: fix various error paths The parameter passed to NBC_Return_handle() was incorrectly casted and not dereferenced. Thanks Yossi for the bug report. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-09-18 13:17:43 +09:00
Jeff Squyres	5be0ba0247	common/monitoring: fix include files Move includes to top of file. Set some #defines so that monitoring_prof.c compiles without warning (as identified by gcc 8 on MacOS). Also ensure to include the internal Open MPI "mpi.h" file (not some random system <mpi.h> file). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-09-15 06:04:13 -07:00
Jeff Squyres	06c1bf73da	libnbc: remove some stale/dead code Gcc 8 identified hb_tree_csearch() as an infinite recursion, and it turns out that we never call this function, anyway. So just remove it. Fixes #5670. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-09-15 06:04:13 -07:00

1 2 3 4 5 ...

6918 Коммитов