openmpi

Автор	SHA1	Сообщение	Дата
Aravind Gopalakrishnan	6edcc479c4	mtl/ofi: Fix segfault when not using Thread-Grouping feature For the non thread-grouping paths, only the first (0th) OFI context should be used for communication. Otherwise this would access a non existant array item and cause segfault. While at it, clarifiy some content regarding SEPs in README (Credit to Matias Cabral for README edits). Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-02-07 11:52:53 -08:00
Jeff Squyres	f5e1a672cc	ofi: revamp OPAL_CHECK_OFI configury Update the OPAL_CHECK_OFI configury macro: - Make it safe to call the macro multiple times: - The checks only execute the first time it is invoked - Subsequent invocations, it just emits a friendly "checking..." message so that configure output is sensible/logical - With the goal of ultimately removing opal/mca/common/ofi, rename the output variables from OPAL_CHECK_OFI to be opal_ofi_{happy\|CPPFLAGS\|LDFLAGS\|LIBS}. - Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions. - Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that causes the macro to be invoked at a fairly random time, which makes configure stdout confusing / hard to grok. - Remove a little left-over kruft in OPAL_CHECK_OFI, too (which resulted in an indenting change, making the change to opal_check_ofi.m4 look larger than it really is). Thanks Alastair McKinstry for the report and initial fix. Thanks Rashika Kheria for the reminder. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Jeff Squyres	aba2571881	mtl/ofi/Makefile.am: down with tabs! Replace all tabs with spaces. No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Gilles Gouaillardet	945f830f7a	mtl/ofi: fix configury when VPATH is used Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-07 06:29:58 -08:00
Matias Cabral	0601b3e982	Merge pull request #6325 from aravindksg/fix_help_reference mtl/ofi: Fix reference to help text object	2019-02-05 07:22:51 -08:00
George Bosilca	e42b573cd3	Fix the PVAR allocation usage. According to the MPI standard the obj_handle is a pointer to an MPI object, and therefore cannot be MPI_COMM_WORLD. The MPI standard example 14.6 highlight this usage. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-02-02 19:03:43 -05:00
KAWASHIMA Takahiro	f8a441957a	mpiext/shortfloat: Add `MPIX_C_FLOAT16` datatype `MPIX_C_FLOAT16` is defined as a synonym for `MPIX_SHORT_FLOAT` if the C compiler supports `_Float16`, which is defined in ISO/IEC JTC 1/SC 22/WG 14 N1945 (ISO/IEC TS 18661-3:2015). This name and meaning are same as that of MPICH. This may be a transitional datatype until the MPI Forum decides a proper name for the type. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 14:55:52 +09:00
KAWASHIMA Takahiro	c44599ec13	mpiext/shortfloat: Add `shortfloat` MPI extension This extension provides additional MPI datatypes `MPIX_SHORT_FLOAT`, `MPIX_C_SHORT_FLOAT_COMPLEX`, and `MPIX_CXX_SHORT_FLOAT_COMPLEX` for `short float` (C/C++), `short float _Complex` (C), and `std::complex<short float>` (C++), respectively, or their alternate types like `_Float16`. See `ompi/mpiext/shortfloat/README.txt` for details. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 13:01:14 +09:00
KAWASHIMA Takahiro	4d7bde27fb	ompi/datatype: Use `short float` for `MPI_REAL2` ... and add `MPI_COMPLEX4`. This commit changes values of existing `OMPI_DATATYPE_MPI_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OMPI internal code. On the other hand, `ompi_datatype_t::id` values of existing datatypes are not changed and 73 is newly assigned to for `MPI_COMPLEX4` to retain ABI compatibility. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 13:01:10 +09:00
KAWASHIMA Takahiro	4375c11a58	ompi/datatype: Add `ompi_mpi_short_float` ... and `ompi_mpi_c_short_float_complex` and `ompi_mpi_cxx_sfltcplex`. These are Open MPI internal variables intended to be defined as `MPI_SHORT_FLOAT`, `MPI_C_SHORT_FLOAT_COMPLEX`, and `MPI_CXX_SHORT_FLOAT_COMPLEX` in the future. `OMPI_DATATYPE_MPI_C_SHORT_FLOAT_COMPLEX` is also required to support `MPI_COMPLEX4` in the next commit. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:43:13 +09:00
Sergey Lebedev	829846dbcc	fp16 hcoll bindings Signed-off-by: Sergey Lebedev <sergeyle@mellanox.com>	2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro	2ad1c09848	opal/datatype: Add `opal_short_float_t` The type `short float`, which is proposed in ISO/IEC JTC 1/SC 22 WG 14 (C WG), is not supported by most compilers yet. But some compilers (including gcc 7 for AArch64 and clang 6) support `_Float16`, which is defined in ISO/IEC TS 18661-3:2015 (ISO/IEC JTC 1/SC 22/WG 14 N1945) as an extensions for C. If it is detected in `configure`, it is used as an alternate type of `short float` in Open MPI internal code. This commit adds a `configure` option `--enable-alt-short-float=TYPE`. It can be used to specify a type other than `short float` and `_Float16` as the alternate type. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro	f6b39452f6	opal/datatype: Support `short float` The type `short float` is proposed for the C language in ISO/IEC JTC 1/SC 22 WG 14 (C WG) for mainly IEEE 754-2008 binary16, a.k.a. half-precision floating point or FP16. By this commit, `short float` and `short float _Complex` are detected in `configure` and used in Open MPI internal code. `MPI_SHORT_FLOAT` and its complex number version are not added yet. This commit changes values of existing `OPAL_DATATYPE_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OPAL and OMPI internal code. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:40:14 +09:00
Jeff Squyres	4c64322db4	Merge pull request #6334 from jsquyres/pr/make-mpi-h-a-little-more-c++-friendly mpi.h.in: use C++ static_cast<> where appropriate	2019-01-31 07:14:34 -05:00
Jeff Squyres	30afdcead9	mpi.h.in: use C++ static_cast<> where appropriate When compiling mpi.h with a modern C++ compiler and a high degree of pickyness (e.g., -Wold-style-cast), casting using (void) in the OMPI_PREDEFINED_GLOBAL and MPI_STATUS_IGNORE macros will emit warnings. So if we're compiling with a C++ compiler, use C++'s static_cast<> instead of (void*). Thanks to @shadow-fax for identifying the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-01-31 03:22:26 -08:00
Thananon Patinyasakdikul	782ec851ea	Merge pull request #6319 from thananon/pr/allow_overtake pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.	2019-01-30 15:32:04 -05:00
Jeff Squyres	2203f8d900	Merge pull request #6185 from ggouaillardet/topic/hwloc_macros hwloc: remove public hwloc macros from opal_config.h	2019-01-30 07:32:22 -05:00
Gilles Gouaillardet	0aeb27f776	topo/treematch: silence a hwloc related warning treematch/km_partitioning.c #include "config.h", but there is no such file when the embedded treematch is used. In order to prevent the embedded treematch from incorrectly using the config.h from the embedded hwloc, generate a dummy config.h. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-01-30 14:51:38 +09:00
Aravind Gopalakrishnan	9cabcfdbba	mtl/ofi: Fix reference to help text object When we exceed the threshold number of contexts created, print appropriate help text Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-01-29 15:10:06 -08:00
Thananon Patinyasakdikul	0263456cf4	pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE. We missed an assert to check if ALLOW_OVERTAKE is set or not before validating the sequence number and this will cause deadlock. Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>	2019-01-29 14:55:06 -05:00
Nathan Hjelm	f9338dac93	Merge pull request #6312 from ggouaillardet/topic/op ompi/op: fix support of non predefined datatypes with predefined oper…	2019-01-29 10:55:00 -07:00
Brian Barrett	23da9fac23	Merge pull request #6294 from bwbarrett/mtl-ofi-no-device-warning mtl/ofi: Print descriptive error message on modex failure	2019-01-29 08:32:49 -08:00
Brian Barrett	1bb7a73a9c	Merge pull request #6302 from bwbarrett/feature/ofi-av-count mtl/ofi: Provide av count hint during initialization	2019-01-29 08:32:24 -08:00
Edgar Gabriel	7023357843	Merge pull request #6286 from edgargabriel/pr/floating-point-division-problem common/ompio: fix a floating point division problem	2019-01-29 10:07:09 -06:00
Gilles Gouaillardet	bc1cab5498	ompi/op: fix support of non predefined datatypes with predefined operators ACCUMULATE, unlike REDUCE, can use with derived datatypes with predefinied operations, with some restrictions outlined in MPI-3:11.3.4. The derived datatype must be composed entierly from one predefined datatype (so you can do all the construction you want, but at the bottom, you can only use one datatype, say, MPI_INT). Refs. open-mpi/ompi#6275 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-01-29 09:33:39 +09:00
Gilles Gouaillardet	45fb69b2b9	ompi/datatype: fix how we compute the space needed for the args Refs. open-mpi/ompi#6275 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-01-28 15:26:11 +09:00
Brian Barrett	44be7f139a	mtl/ofi: Provide av count hint during initialization Provide the av_attr.count hint (number of addresses that will be inserted into the address vector through the life of the process) at initialization of the address vector. It's ok to be a bit wrong, but some endpoints (RxR) can benefit by not going through the slow growth realloc churn. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2019-01-24 15:47:24 -08:00
Edgar Gabriel	c0f8ce0fff	common/ompio: fix a floating point division problem This commit fixes a problem reported on the mailing list with individual writes larger than 512 MB. The culprit is a floating point division of two large, close values. Changing the datatypes from float to double (which is what is being used in the fcoll components) fixes the problem. See issue #6285 and https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118 Thanks for Axel Huebl and René Widera for reporting the issue. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-01-21 17:59:12 -06:00
Brian Barrett	fe25097194	mtl/ofi: Print descriptive error message on modex failure With MTLs, there's no "other transport" when the remote side does not have an active NIC, so we should print a useful error message when the modex failed (indicating lack of a NIC on the remote side). Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2019-01-21 23:50:31 +00:00
KAWASHIMA Takahiro	352b667323	Merge pull request #6210 from kawashima-fj/pr/mpiext-use-mod Use mpi_f08 module in mpi_f08_ext module	2019-01-21 11:56:41 +09:00
René Widera	a91fab80a1	common/ompio: possible rounding issue Similar to #6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors. - remove floating point operations for `round up` - removes floating point conversion for round down (native behavior of integer division) Signed-off-by: René Widera <r.widera@hzdr.de>	2019-01-18 14:05:23 +01:00
Yossi Itigin	387b2ff56f	Merge pull request #6260 from hoopoepg/topic/removed-fca COLL: removed FCA component	2019-01-17 00:05:07 +08:00
KAWASHIMA Takahiro	b380dd58b5	config/ompi_ext: use mpi module in mpi_ext module If MPI extensions are enabled, all `ompi/mpiext/pcollreq/use-mpi/mpiext__usempi.h` are included in `ompi/mpi/fortran/mpiext-use-mpi/mpi-ext-module.F90` and all `ompi/mpiext/pcollreq/use-mpi/mpiext__usempif08.h` are included in `ompi/mpi/fortran/mpiext-use-mpi-f08/mpi-f08-ext-module.F90` using `#include` directives. In `mpiext__usempi.h` and `mpiext__usempif08.h`, some MPI extension may want to use constants or handles defined in the `mpi` module and the `mpi_f08` module. For example, if you want to define a new datatype in `mpi_f08_ext`, you'll need the definition of `type(mpi_datatype)`. However, putting `use mpi_f08` line in thier `mpiext_*_usempif08.h` may cause a compilation error if more than one MPI extensions are enabled because the `use` statement must be put prior to any variable declarations. To resolve this problem, this commit puts `use mpi` and `use mpi_f08` as first lines of `mpi-ext-module.F90` and `mpi-f08-ext-module.F90` respectively. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-01-16 11:55:55 +09:00
KAWASHIMA Takahiro	2220623f34	config/ompi_ext: Don't include mpiext__mpifh.h in mpi_f08_ext Including `mpiext__mpifh.h` in the source file of the `mpi_f08_ext` module is not always appropriate. For example, if you want to define a new datatype in an MPI extension, the `include 'mpif-ext.h'` binding defines the datatype as `integer` but the `use mpi_f08_ext` binding defines it as `type(mpi_datatype)`. They conflict. This commit allows each MPI extension to declare whether it wants to include its `mpiext_*_mpifh.h` in `mpi_f08` and `mpi_f08_ext` respectively. The default (no declaration) is 'want'. See `ompi/mpiext/example/configure.m4` for an example. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-01-16 11:55:55 +09:00
Aravind Gopalakrishnan	37f9aff2a0	mtl/ofi: Add MCA variables to enable SEP and to request number of OFI contexts Moving to a model where we have users actively _enable_ SEP feature for use rather than opening SEP by default if provider supports it. This allows us to not regress (either functionally or for performance reasons) any apps that were working correctly on regular endpoints. Also, providing MCA to specify number of OFI contexts to create and default this value to 1 (Given btl/ofi also creates one by default, this reduces the incidence of a scenario where we allocate all available contexts by default and if btl/ofi asks for one more, then provider breaks as it doesn't support it). While at it, spruce up README on SEP content. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-01-14 09:58:36 -08:00
Ralph Castain	d1fd1f4cce	Merge pull request #6151 from nrspruit/ns_ompi_mtl_ofi_specializations MTL_OFI: Generation of specialized functions at build time	2019-01-14 09:31:54 -08:00
Sergey Oblomov	0759bb8561	COLL: removed FCA component - removed FCA collectives from coll/scoll Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2019-01-09 16:51:40 +02:00
Risto Toijala	f14a0f4fc9	mpi/fortran: Fix valgrind warnings for type create Valgrind warns that newtype is uninitialized when calling from Fortran as e.g. use mpi integer :: t, err call MPI_Type_create_f90_integer(5, t, err) Since newtype is intent(out), this should not happen. There is no reason to convert the type using PMPI_Type_f2c, only to over- write it immediately afterwards. The other type_create_ functions did not convert newtype. The valgrind warnings: ==28441== Conditional jump or move depends on uninitialised value(s) ==28441== at 0x581B555: PMPI_Type_f2c (in [...]/lib/libmpi.so.0.0.0) ==28441== by 0x4E87AB7: MPI_TYPE_CREATE_F90_INTEGER (in [...]/lib/libmpi_mpifh.so.0.0.0) ==28441== by 0x400BA1: MAIN__ (in [...]) ==28441== by 0x400C46: main (in [...]) ==28441== ==28441== Conditional jump or move depends on uninitialised value(s) ==28441== at 0x581B563: PMPI_Type_f2c (in [...]/lib/libmpi.so.0.0.0) ==28441== by 0x4E87AB7: MPI_TYPE_CREATE_F90_INTEGER (in [...]/lib/libmpi_mpifh.so.0.0.0) ==28441== by 0x400BA1: MAIN__ (in [..]) ==28441== by 0x400C46: main (in [...]) ==28441== ==28441== Use of uninitialised value of size 8 ==28441== at 0x581B577: PMPI_Type_f2c (in [...]/lib/libmpi.so.0.0.0) ==28441== by 0x4E87AB7: MPI_TYPE_CREATE_F90_INTEGER (in [...]/lib/libmpi_mpifh.so.0.0.0) ==28441== by 0x400BA1: MAIN__ (in [...]) ==28441== by 0x400C46: main (in [...]) ==28441== Signed-off-by: Risto Toijala <risto.toijala@gmail.com>	2019-01-08 22:00:00 +02:00
Aurelien Bouteiller	e54496bf2a	Merge pull request #6087 from ICLDisco/export/errors_cid Manage errors in communicator creations (cid)	2018-12-31 15:01:55 -05:00
Jeff Squyres	17be4c6d1f	Merge pull request #6229 from jsquyres/pr/fix-enable-grequest-extension-in-a-tarball romio321: ensure to distribute ompi_grequestx.h	2018-12-28 16:15:23 -05:00
Jeff Squyres	62321be186	romio321: ensure to distribute ompi_grequestx.h Refs https://github.com/open-mpi/ompi/issues/6227. Thanks to @georgemarselis for reporting. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-12-27 15:39:47 -08:00
bosilca	96f88052e9	Merge pull request #5948 from mkurnosov/coll-ireduce-silence-coverity coll/libnbc/ireduce: silence Coverity warning CID 1440360	2018-12-24 12:59:16 -05:00
bosilca	593db292da	Merge pull request #5644 from mkurnosov/coll-iallreduce-rabenseifner coll/libnbc: add Rabenseifner's algorithm for MPI_Iallreduce	2018-12-24 12:58:21 -05:00
Jeff Squyres	efcaef74d8	MPI_Type_set_name: fix string length at target opal_string_copy() takes care of all the string computations. Specifically: when we converted to opal_string_copy(), we accidentally left the source length as the argument, not the target length, which resulted in one less character being copied than intended (as was showing up in MTT C++ testing results). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-12-23 13:00:01 -08:00
Aurelien Bouteiller	bd0d2b832e	Merge pull request #6086 from ICLDisco/export/errors_nbc Manage errors in NBC collective ops	2018-12-21 02:34:00 -05:00
Jeff Squyres	1be5358834	Merge pull request #6212 from jsquyres/pr/fix-treematch-common-symbol treematch: fix global common symbol	2018-12-20 15:20:41 -05:00
Jeff Squyres	e9a6246b90	treematch: fix global common symbol Despite its name, this symbol doesn't need to be global. So just make it static. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-12-20 11:06:14 -08:00
Jeff Squyres	81bfb5f5e5	Remove some IMPI attributes that were never implemented. This is a holdover from LAM/MPI that was never implemented here in Open MPI (and never will be). Might as well remove this dead code. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-12-20 10:12:32 -08:00
Nathan Hjelm	4944508603	Merge pull request #6136 from hjelmn/opal_cleanup opal: clean up init/finalize	2018-12-18 15:23:32 -07:00
Nathan Hjelm	a39cb747dd	ompi/datatype: don't call opal_datatype_finalize directly Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-12-18 14:37:04 -07:00

1 2 3 4 5 ...

10463 Коммитов