openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	14aa5fae3c	Fix many compiler warnings: - Add some missing AC_CHECK_SIZEOF's in configure.ac - Remove some unused variables - Initialize some variables - Fix some parameter types - Cast where appropriate/safe to fix warnings - Move ompi/mca/common/monitoring Fortran bindings to a separate .c file so that they can use different #define's than the C bindings, and therefore compile properly / without warnings. - Fix signedness discrepancies - Who knew? Separated these into multiple #if's, instead: ``` // This is undefined behavior #define HAVE_FOO defined(FOO) #define YOW (HAVE_FOO && defined(BAR)) ``` - Fix some typos in OMPI_BUILD_HOST logic - Don't "2>/dev/null" in OMPI_BUILD_HOST logic; it just hides errors Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2020-11-14 07:20:30 -08:00
bosilca	ce97090673	Merge pull request #7735 from bosilca/coll/han A hierarchical, architecture-aware collective communication module	2020-10-26 00:07:03 -04:00
George Bosilca	cc6432b4a2	Fix partial packing of non data elements. There was a bug allowing for partial packing of non-data elements (such as loop and end_loop markers) during the exit condition of a pack/unpack call. This has basically no meaning. Prevent this bug from happening by making sure the element point to a data before trying to partially pack it. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-10-25 18:15:09 -04:00
Mark Allen	bca3c0ed17	make Type_create_resized set FLAG_USER_UB In the below type creation sequence MPI_Type_create_resized(MPI_INT, 0, 6, &mydt1); MPI_Type_contiguous(1, mydt1, &mydt2); I think both mydt1 and mydt2 should have extent 6. The Type_create_resized would add an UB marker into the type map, and the definition of Type_contiguous would maintain the same markers in the new map. The only counter argument I can think of to the above is if we declared that mydt1 is illegal because it's putting data on addresses that don't satisfy the alignment requirement. But in my interpretation of the standard the term "alignment requirement" is a property of the system memory, and MPI defines "extent" in a way to make it easy to create MPI datatypes that support the system's alignment requirements. But the standard isn't saying it's illegal to make MPI datatypes that don't satisfy the system's alignment requirements. I think this is true also because the MPI datatypes might be used in file IO where the requirements are different, so that's my long winded explanation for why I don't think we can declare mydt1 illegal. Complete example: #include <stdio.h> #include <mpi.h> int main() { MPI_Datatype mydt1, mydt2; MPI_Aint lb, ext; MPI_Init(0, 0); MPI_Type_create_resized(MPI_INT, 0, 6, &mydt1); MPI_Type_commit(&mydt1); MPI_Type_contiguous(1, mydt1, &mydt2); MPI_Type_commit(&mydt2); MPI_Type_get_extent(mydt1, &lb, &ext); printf("mydt1 extent %d\n", (int)ext); MPI_Type_get_extent(mydt2, &lb, &ext); printf("mydt2 extent %d\n", (int)ext); MPI_Type_free(&mydt1); MPI_Type_free(&mydt2); MPI_Finalize(); return(0); } % mpicc -o x test.c % mpirun -np 1 ./x Without this PR the output is > mydt1 extent 6 > mydt2 extent 8 With this PR both extents are 6. Fwiw I also tested with mpich and they give 6 for both extents. Signed-off-by: Mark Allen <markalle@us.ibm.com>	2020-10-15 16:10:50 -05:00
Joseph Schuchart	70776b43fe	Remove stale datatype functions from opal header Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-22 15:56:31 +02:00
George Bosilca	3de636dc6f	Swap the 2 fields to maintain the size of the struct. Thanks @devreal for catching this. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-11-07 11:25:03 -05:00
George Bosilca	59fb02618e	Prevent overflow when dealing with datatype count. This patch fixes #7147 by preventing overflow when multiplying the count and the blocklen. The count reflects MPI count and is therefore bound to the size of an int (it is an uint32_t) while the blocklen can be merged together to represent the largest contiguous memory layout and it is therefore promoted to a size_t. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-11-07 11:25:03 -05:00
Austen Lauria	0d4004cc3c	Fix miscellaneous compiler warnings. Signed-off-by: Austen Lauria <awlauria@us.ibm.com>	2019-10-01 16:27:25 -04:00
George Bosilca	3522916971	Mark predefined empty datatype contiguous. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-09-07 14:40:21 +10:00
George Bosilca	41e6f55807	Small optimization on the datatype commit. This patch fixes the merge of contiguous elements into larger but more compact datatypes, and allows for contiguous elements to have thir blocklen increasing instead of the count. The idea is to always maximize the blocklen, aka. the contiguous part of the datatype. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-30 19:56:48 -04:00
George Bosilca	904276bb44	Fix the variable names used for the datatype dump. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-14 10:59:50 -04:00
George Bosilca	daf4338c31	Fix the stack displacement. Fixes the convertor iovec description on the MPI-IO reported by Edgar. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-14 01:16:30 -04:00
George Bosilca	aa17392309	Optimize the pack/unpack. Start optimizing the code. This commit divides the operations in 2 parts, the first, outside the critical part, deals with partial blocks of predefined elements, and the second, inside the critical path, only deals with full blocks of elements. This reduces the number of expensive operations in the critical path and results in a decent performance increase. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-11 14:49:29 -04:00
George Bosilca	3562d70679	Get rid of the division in the critical path. Amazing how a bad instruction scheduling can have such a drastic impact on the code performance. With this change, the get a boost of at least 50% on the performance of data with a small blocklen and/or count. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-10 00:28:29 -04:00
George Bosilca	a80255235a	Rework the datatype commit. Optimize contiguous loops by collapsing them into a single element. During datatype optimization collapse similar elements into larger blocks. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:09 -04:00
George Bosilca	9ff15efac8	Optimize the position placement. Upon detecting a datatype loop representation skip the entire loop according the the remaining space. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:09 -04:00
George Bosilca	0a24f0374e	Small improvements on the test. Rework the to_self test to be able to be used as a benchmark. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:09 -04:00
George Bosilca	75a53976a3	Disable checksum. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:09 -04:00
George Bosilca	46ddf5460d	Clean and sync the pack and unpack functions. - optimize handling of contiguous with gaps datatypes. - fixes a performance issue for all datatypes with a count of 1. - optimize the pack/unpack of contiguous with gaps datatype. - optimize the case of blocklen == 1 Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:09 -04:00
George Bosilca	d335eea18f	Optimize the raw representation. Merge contiguous iov in order to minimize the number of returned iovec. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:08 -04:00
George Bosilca	f25674291b	Optimized datatype description. Move toward a base type of vector (count, type, blocklen, extent, disp) with disp and extent applying toward the count repertition and blocklen being a contiguous memory of type type. Implement 2 optimizations on this description used during type_commit: - collapse: successive similar datatype descriptions are collapsed together with an increased count. - fusion: fuse successive datatype descriptions in order to minimize the number of resulting memcpy during pack/unpack. Fixes at the OMPI datatype level including: - Fix the create_hindexed and vector creation. - Fix the handling of [get\|set]_elements and _count. - Correctly compute the dispacement for block indexed types. - Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:08 -04:00
George Bosilca	d141bf7912	Update the datatype dump to match the actual types. Update the comments to better reflect what is going on. Minor indentations. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-05-10 18:03:57 -04:00
KAWASHIMA Takahiro	8bbd201029	Merge pull request #6205 from kawashima-fj/pr/fp16 Add FP16 datatypes	2019-02-08 14:52:13 +09:00
KAWASHIMA Takahiro	4d7bde27fb	ompi/datatype: Use `short float` for `MPI_REAL2` ... and add `MPI_COMPLEX4`. This commit changes values of existing `OMPI_DATATYPE_MPI_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OMPI internal code. On the other hand, `ompi_datatype_t::id` values of existing datatypes are not changed and 73 is newly assigned to for `MPI_COMPLEX4` to retain ABI compatibility. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 13:01:10 +09:00
KAWASHIMA Takahiro	4375c11a58	ompi/datatype: Add `ompi_mpi_short_float` ... and `ompi_mpi_c_short_float_complex` and `ompi_mpi_cxx_sfltcplex`. These are Open MPI internal variables intended to be defined as `MPI_SHORT_FLOAT`, `MPI_C_SHORT_FLOAT_COMPLEX`, and `MPI_CXX_SHORT_FLOAT_COMPLEX` in the future. `OMPI_DATATYPE_MPI_C_SHORT_FLOAT_COMPLEX` is also required to support `MPI_COMPLEX4` in the next commit. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:43:13 +09:00
KAWASHIMA Takahiro	2ad1c09848	opal/datatype: Add `opal_short_float_t` The type `short float`, which is proposed in ISO/IEC JTC 1/SC 22 WG 14 (C WG), is not supported by most compilers yet. But some compilers (including gcc 7 for AArch64 and clang 6) support `_Float16`, which is defined in ISO/IEC TS 18661-3:2015 (ISO/IEC JTC 1/SC 22/WG 14 N1945) as an extensions for C. If it is detected in `configure`, it is used as an alternate type of `short float` in Open MPI internal code. This commit adds a `configure` option `--enable-alt-short-float=TYPE`. It can be used to specify a type other than `short float` and `_Float16` as the alternate type. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro	f6b39452f6	opal/datatype: Support `short float` The type `short float` is proposed for the C language in ISO/IEC JTC 1/SC 22 WG 14 (C WG) for mainly IEEE 754-2008 binary16, a.k.a. half-precision floating point or FP16. By this commit, `short float` and `short float _Complex` are detected in `configure` and used in Open MPI internal code. `MPI_SHORT_FLOAT` and its complex number version are not added yet. This commit changes values of existing `OPAL_DATATYPE_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OPAL and OMPI internal code. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:40:14 +09:00
Gilles Gouaillardet	b395342c9f	opal/datatype: reset ptypes in opal_datatype_clone() Reset ptypes when cloning a datatype in order to prevent a double free() in the opal_datatype_t destructor. This fixes a bug introduced in open-mpi/ompi@7c938f070f Fixes open-mpi/ompi#6346 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-01 11:20:13 +09:00
George Bosilca	5a82c4fd07	Provide a better fix for #6285 . The issue was a little complicated due to the internal stack used in the convertor. The main issue was that in the case where we run out of iov space to save the raw description of the data while hanbdling a repetition (loop), instead of saving the current position and bailing out directly we reading of the next predefined type element. It worked in most cases, except the one identified by the HDF5 test. However, the biggest issue here was the drop in performance for all ensuing calls to the convertor pack/unpack, as instead of handling contiguous loops as a whole (and minimizing the number of memory copies) we copied data description by data description. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-01-31 10:01:48 -05:00
bosilca	29915fc943	Merge pull request #6292 from ggouaillardet/topic/opal_datatype_destruct opal/datatype: plug a memory leak in opal_datatype_t destructor	2019-01-29 17:33:18 -05:00
Gilles Gouaillardet	0832ab5acc	opal/datatype: fix opal_convertor_raw correctly handle the case in which iovec is full and the last accessed element of the datatype is the beginning of a loop Refs. open-mpi/ompi#6285 Thanks Axel Huebl for reporting this Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-01-23 15:38:43 +09:00
Gilles Gouaillardet	7c938f070f	opal/datatype: plug a memory leak in opal_datatype_t destructor correctly free ptypes if the datatype is not pre-defined. Thanks Axel Huebl for reporting this. Refs. open-mpi/ompi#6291 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-01-22 10:57:57 +09:00
bosilca	182a2db2a4	Merge pull request #6029 from ggouaillardet/topic/large_datatypes opal/datatype: correctly handle large datatypes	2018-12-24 12:49:52 -05:00
Nathan Hjelm	0edfd328f8	opal: clean up init/finalize This commit contains the following changes: - Remove the unused opal_test_init/opal_test_finalize functions. These functions are not used by anything in the code base or MTT. Tests use opal_init_util/opal_finalize_util instead. - Get rid of gotos in opal_init_util and opal_init. Replaced them with a cleaner solution. - Automatically register cleanup functions in init functions. The cleanup functions are executed in the reverse order of the initialization functions. The cleanup functions are run in opal_finalize_util() before tearing down the class system. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-12-18 14:37:04 -07:00
George Bosilca	88a693bf71	Add a test for very large data. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-12-06 13:30:58 +09:00
Gilles Gouaillardet	fbb5bb8860	opal/datatype: correctly handle large datatypes Always use size_t (instead of converting to an uint32_t) in order to correctly support large datatypes. Thanks Ben Menadue for the initial bug report Refs open-mpi/ompi#6016 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-12-06 13:30:58 +09:00
KAWASHIMA Takahiro	cacd6f389c	datatype: Remove `#if HAVE_[TYPE]` for C99 types Now Open MPI requires a C99 compiler. Checking availability of the following types is no more needed. - `long long` (`signed` and `unsigned`) - `long double` - `float _Complex` - `double _Complex` - `long double _Complex` Furthermore, the `#if HAVE_[TYPE]` style checking is not correct. Availability of C types is checked by `AC_CHECK_TYPES` in `configure.ac`. `AC_CHECK_TYPES` defines macro `HAVE_[TYPE]` as `1` in `opal_config.h` if the `[TYPE]` is available. But it does not define `HAVE_[TYPE]` (instead of defining as `0`) if it is not available. So even if we need `HAVE_[TYPE]` checking, it should be `#if defined(HAVE_[TYPE])`. I didn't remove `AC_CHECK_TYPES` for these types in `configure.ac` since someone may use `HAVE_[TYPE]` macros somewhere. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-11-14 09:32:52 +09:00
Edgar Gabriel	8c2ea0ef49	opal/dataype: add additional interface to retrieve more details about cuda buffer the existing interface in opal_datatype_cuda do not allow to distinguish whether a buffer is a managed or unmanaged cuda buffer. Add an interface that allows to retrieve this information throug a convertor, since the information is actually available in the mca_common_cuda_* routines. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Jeff Squyres	dec247d96e	opal/datatype: minor compiler warning stomp Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-05-30 10:08:19 -07:00
Sergey Oblomov	52d5ca048e	CONVERTOR: fixed typos in comments Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-16 22:02:39 +03:00
George Bosilca	cd683e3eec	Allow OPAL DDT to receive size_t count argument. Fixes issue #5069, which relates a BigMPI bug with the use of MPI_Type_vectpor to construct very large datatypes (>2GB). Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-04-14 15:32:19 -04:00
Jeff Squyres	2713a24009	opal_datatype_module.c: reset opal_cuda_verbose 999de137ce6 accidentally reset opal_cuda_verbose's default value. This commit puts it back. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-03-13 10:10:15 -07:00
George Bosilca	999de137ce	Fix the datatype debug. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-03-08 03:40:08 +09:00
George Bosilca	7848035195	Update the loop stats. The loop should be updated on each internal iteration. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-03-08 03:18:39 +09:00
Gilles Gouaillardet	1a17cb3b1c	opal/datatype: add opal_datatype_is_monotonic() return true if the datatype has non-negative displacements and monotonically nondecreasing, and false otherwise. Thanks George for the guidance. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-01-09 18:05:14 +09:00
George Bosilca	8a9ef3dc2d	Delay the initialization until necessary. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-11-08 17:32:18 -05:00
Aravind Gopalakrishnan	2e83cf15ce	Add support for GPU buffers for PSM2 MTL PSM2 enables support for GPU buffers and CUDA managed memory and it can directly recognize GPU buffers, handle copies between HFIs and GPUs. Therefore, it is not required for OMPI to handle GPU buffers for pt2pt cases. In this patch, we allow the PSM2 MTL to specify when it does not require CUDA convertor support. This allows us to skip CUDA convertor init phases and lets PSM2 handle the memory transfers. This translates to improvements in latency. The patch enables blocking collectives and workloads with GPU contiguous, GPU non-contiguous memory. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2017-09-01 16:59:03 -07:00
George Bosilca	50f471e31e	Cleanup a set of warnings reported by Ralph. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-08-22 23:00:18 -04:00
Gilles Gouaillardet	a111fc8ff2	opal/datatype: fix opal_dt_swap_long_double if no IEEE754_H Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-12 10:27:45 +09:00
Gilles Gouaillardet	8fd08b933a	opal/datatype: add minimal support to convert long double between ieee 754 quadruple precision and extended precision formats. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-12 10:27:45 +09:00

1 2 3 4

168 Коммитов