openmpi

Автор	SHA1	Сообщение	Дата
Austen Lauria	0d4004cc3c	Fix miscellaneous compiler warnings. Signed-off-by: Austen Lauria <awlauria@us.ibm.com>	2019-10-01 16:27:25 -04:00
George Bosilca	3522916971	Mark predefined empty datatype contiguous. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-09-07 14:40:21 +10:00
George Bosilca	41e6f55807	Small optimization on the datatype commit. This patch fixes the merge of contiguous elements into larger but more compact datatypes, and allows for contiguous elements to have thir blocklen increasing instead of the count. The idea is to always maximize the blocklen, aka. the contiguous part of the datatype. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-30 19:56:48 -04:00
George Bosilca	904276bb44	Fix the variable names used for the datatype dump. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-14 10:59:50 -04:00
George Bosilca	daf4338c31	Fix the stack displacement. Fixes the convertor iovec description on the MPI-IO reported by Edgar. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-14 01:16:30 -04:00
George Bosilca	aa17392309	Optimize the pack/unpack. Start optimizing the code. This commit divides the operations in 2 parts, the first, outside the critical part, deals with partial blocks of predefined elements, and the second, inside the critical path, only deals with full blocks of elements. This reduces the number of expensive operations in the critical path and results in a decent performance increase. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-11 14:49:29 -04:00
George Bosilca	3562d70679	Get rid of the division in the critical path. Amazing how a bad instruction scheduling can have such a drastic impact on the code performance. With this change, the get a boost of at least 50% on the performance of data with a small blocklen and/or count. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-10 00:28:29 -04:00
George Bosilca	a80255235a	Rework the datatype commit. Optimize contiguous loops by collapsing them into a single element. During datatype optimization collapse similar elements into larger blocks. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:09 -04:00
George Bosilca	9ff15efac8	Optimize the position placement. Upon detecting a datatype loop representation skip the entire loop according the the remaining space. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:09 -04:00
George Bosilca	0a24f0374e	Small improvements on the test. Rework the to_self test to be able to be used as a benchmark. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:09 -04:00
George Bosilca	75a53976a3	Disable checksum. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:09 -04:00
George Bosilca	46ddf5460d	Clean and sync the pack and unpack functions. - optimize handling of contiguous with gaps datatypes. - fixes a performance issue for all datatypes with a count of 1. - optimize the pack/unpack of contiguous with gaps datatype. - optimize the case of blocklen == 1 Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:09 -04:00
George Bosilca	d335eea18f	Optimize the raw representation. Merge contiguous iov in order to minimize the number of returned iovec. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:08 -04:00
George Bosilca	f25674291b	Optimized datatype description. Move toward a base type of vector (count, type, blocklen, extent, disp) with disp and extent applying toward the count repertition and blocklen being a contiguous memory of type type. Implement 2 optimizations on this description used during type_commit: - collapse: successive similar datatype descriptions are collapsed together with an increased count. - fusion: fuse successive datatype descriptions in order to minimize the number of resulting memcpy during pack/unpack. Fixes at the OMPI datatype level including: - Fix the create_hindexed and vector creation. - Fix the handling of [get\|set]_elements and _count. - Correctly compute the dispacement for block indexed types. - Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-07-09 14:50:08 -04:00
George Bosilca	d141bf7912	Update the datatype dump to match the actual types. Update the comments to better reflect what is going on. Minor indentations. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-05-10 18:03:57 -04:00
KAWASHIMA Takahiro	8bbd201029	Merge pull request #6205 from kawashima-fj/pr/fp16 Add FP16 datatypes	2019-02-08 14:52:13 +09:00
KAWASHIMA Takahiro	4d7bde27fb	ompi/datatype: Use `short float` for `MPI_REAL2` ... and add `MPI_COMPLEX4`. This commit changes values of existing `OMPI_DATATYPE_MPI_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OMPI internal code. On the other hand, `ompi_datatype_t::id` values of existing datatypes are not changed and 73 is newly assigned to for `MPI_COMPLEX4` to retain ABI compatibility. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 13:01:10 +09:00
KAWASHIMA Takahiro	4375c11a58	ompi/datatype: Add `ompi_mpi_short_float` ... and `ompi_mpi_c_short_float_complex` and `ompi_mpi_cxx_sfltcplex`. These are Open MPI internal variables intended to be defined as `MPI_SHORT_FLOAT`, `MPI_C_SHORT_FLOAT_COMPLEX`, and `MPI_CXX_SHORT_FLOAT_COMPLEX` in the future. `OMPI_DATATYPE_MPI_C_SHORT_FLOAT_COMPLEX` is also required to support `MPI_COMPLEX4` in the next commit. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:43:13 +09:00
KAWASHIMA Takahiro	2ad1c09848	opal/datatype: Add `opal_short_float_t` The type `short float`, which is proposed in ISO/IEC JTC 1/SC 22 WG 14 (C WG), is not supported by most compilers yet. But some compilers (including gcc 7 for AArch64 and clang 6) support `_Float16`, which is defined in ISO/IEC TS 18661-3:2015 (ISO/IEC JTC 1/SC 22/WG 14 N1945) as an extensions for C. If it is detected in `configure`, it is used as an alternate type of `short float` in Open MPI internal code. This commit adds a `configure` option `--enable-alt-short-float=TYPE`. It can be used to specify a type other than `short float` and `_Float16` as the alternate type. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro	f6b39452f6	opal/datatype: Support `short float` The type `short float` is proposed for the C language in ISO/IEC JTC 1/SC 22 WG 14 (C WG) for mainly IEEE 754-2008 binary16, a.k.a. half-precision floating point or FP16. By this commit, `short float` and `short float _Complex` are detected in `configure` and used in Open MPI internal code. `MPI_SHORT_FLOAT` and its complex number version are not added yet. This commit changes values of existing `OPAL_DATATYPE_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OPAL and OMPI internal code. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:40:14 +09:00
Gilles Gouaillardet	b395342c9f	opal/datatype: reset ptypes in opal_datatype_clone() Reset ptypes when cloning a datatype in order to prevent a double free() in the opal_datatype_t destructor. This fixes a bug introduced in open-mpi/ompi@7c938f070f Fixes open-mpi/ompi#6346 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-01 11:20:13 +09:00
George Bosilca	5a82c4fd07	Provide a better fix for #6285 . The issue was a little complicated due to the internal stack used in the convertor. The main issue was that in the case where we run out of iov space to save the raw description of the data while hanbdling a repetition (loop), instead of saving the current position and bailing out directly we reading of the next predefined type element. It worked in most cases, except the one identified by the HDF5 test. However, the biggest issue here was the drop in performance for all ensuing calls to the convertor pack/unpack, as instead of handling contiguous loops as a whole (and minimizing the number of memory copies) we copied data description by data description. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-01-31 10:01:48 -05:00
bosilca	29915fc943	Merge pull request #6292 from ggouaillardet/topic/opal_datatype_destruct opal/datatype: plug a memory leak in opal_datatype_t destructor	2019-01-29 17:33:18 -05:00
Gilles Gouaillardet	0832ab5acc	opal/datatype: fix opal_convertor_raw correctly handle the case in which iovec is full and the last accessed element of the datatype is the beginning of a loop Refs. open-mpi/ompi#6285 Thanks Axel Huebl for reporting this Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-01-23 15:38:43 +09:00
Gilles Gouaillardet	7c938f070f	opal/datatype: plug a memory leak in opal_datatype_t destructor correctly free ptypes if the datatype is not pre-defined. Thanks Axel Huebl for reporting this. Refs. open-mpi/ompi#6291 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-01-22 10:57:57 +09:00
bosilca	182a2db2a4	Merge pull request #6029 from ggouaillardet/topic/large_datatypes opal/datatype: correctly handle large datatypes	2018-12-24 12:49:52 -05:00
Nathan Hjelm	0edfd328f8	opal: clean up init/finalize This commit contains the following changes: - Remove the unused opal_test_init/opal_test_finalize functions. These functions are not used by anything in the code base or MTT. Tests use opal_init_util/opal_finalize_util instead. - Get rid of gotos in opal_init_util and opal_init. Replaced them with a cleaner solution. - Automatically register cleanup functions in init functions. The cleanup functions are executed in the reverse order of the initialization functions. The cleanup functions are run in opal_finalize_util() before tearing down the class system. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-12-18 14:37:04 -07:00
George Bosilca	88a693bf71	Add a test for very large data. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-12-06 13:30:58 +09:00
Gilles Gouaillardet	fbb5bb8860	opal/datatype: correctly handle large datatypes Always use size_t (instead of converting to an uint32_t) in order to correctly support large datatypes. Thanks Ben Menadue for the initial bug report Refs open-mpi/ompi#6016 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-12-06 13:30:58 +09:00
KAWASHIMA Takahiro	cacd6f389c	datatype: Remove `#if HAVE_[TYPE]` for C99 types Now Open MPI requires a C99 compiler. Checking availability of the following types is no more needed. - `long long` (`signed` and `unsigned`) - `long double` - `float _Complex` - `double _Complex` - `long double _Complex` Furthermore, the `#if HAVE_[TYPE]` style checking is not correct. Availability of C types is checked by `AC_CHECK_TYPES` in `configure.ac`. `AC_CHECK_TYPES` defines macro `HAVE_[TYPE]` as `1` in `opal_config.h` if the `[TYPE]` is available. But it does not define `HAVE_[TYPE]` (instead of defining as `0`) if it is not available. So even if we need `HAVE_[TYPE]` checking, it should be `#if defined(HAVE_[TYPE])`. I didn't remove `AC_CHECK_TYPES` for these types in `configure.ac` since someone may use `HAVE_[TYPE]` macros somewhere. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-11-14 09:32:52 +09:00
Edgar Gabriel	8c2ea0ef49	opal/dataype: add additional interface to retrieve more details about cuda buffer the existing interface in opal_datatype_cuda do not allow to distinguish whether a buffer is a managed or unmanaged cuda buffer. Add an interface that allows to retrieve this information throug a convertor, since the information is actually available in the mca_common_cuda_* routines. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Jeff Squyres	dec247d96e	opal/datatype: minor compiler warning stomp Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-05-30 10:08:19 -07:00
Sergey Oblomov	52d5ca048e	CONVERTOR: fixed typos in comments Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-16 22:02:39 +03:00
George Bosilca	cd683e3eec	Allow OPAL DDT to receive size_t count argument. Fixes issue #5069, which relates a BigMPI bug with the use of MPI_Type_vectpor to construct very large datatypes (>2GB). Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-04-14 15:32:19 -04:00
Jeff Squyres	2713a24009	opal_datatype_module.c: reset opal_cuda_verbose `999de137ce` accidentally reset opal_cuda_verbose's default value. This commit puts it back. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-03-13 10:10:15 -07:00
George Bosilca	999de137ce	Fix the datatype debug. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-03-08 03:40:08 +09:00
George Bosilca	7848035195	Update the loop stats. The loop should be updated on each internal iteration. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-03-08 03:18:39 +09:00
Gilles Gouaillardet	1a17cb3b1c	opal/datatype: add opal_datatype_is_monotonic() return true if the datatype has non-negative displacements and monotonically nondecreasing, and false otherwise. Thanks George for the guidance. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-01-09 18:05:14 +09:00
George Bosilca	8a9ef3dc2d	Delay the initialization until necessary. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-11-08 17:32:18 -05:00
Aravind Gopalakrishnan	2e83cf15ce	Add support for GPU buffers for PSM2 MTL PSM2 enables support for GPU buffers and CUDA managed memory and it can directly recognize GPU buffers, handle copies between HFIs and GPUs. Therefore, it is not required for OMPI to handle GPU buffers for pt2pt cases. In this patch, we allow the PSM2 MTL to specify when it does not require CUDA convertor support. This allows us to skip CUDA convertor init phases and lets PSM2 handle the memory transfers. This translates to improvements in latency. The patch enables blocking collectives and workloads with GPU contiguous, GPU non-contiguous memory. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2017-09-01 16:59:03 -07:00
George Bosilca	50f471e31e	Cleanup a set of warnings reported by Ralph. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-08-22 23:00:18 -04:00
Gilles Gouaillardet	a111fc8ff2	opal/datatype: fix opal_dt_swap_long_double if no IEEE754_H Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-12 10:27:45 +09:00
Gilles Gouaillardet	8fd08b933a	opal/datatype: add minimal support to convert long double between ieee 754 quadruple precision and extended precision formats. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-12 10:27:45 +09:00
Gilles Gouaillardet	9118777b66	opal/ddt: use optimized description when packing contiguous datatypes Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-12 10:27:45 +09:00
Gilles Gouaillardet	5a35a8e82c	opal/datatype: do not compute ptypes for OPAL predefined datatypes Fixes open-mpi/ompi#3522 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-05-15 11:43:48 +09:00
bosilca	cbf03b3113	Topic/datatype (#3441 ) * Don't overflow the internal datatype count. Change the type of the count to be a size_t (it does not alter the total size of the internal structures, so has no impact on the ABI). Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Optimize the datatype creation. The internal array of counts of predefined types is now only created when needed, which is either in a heterogeneous environment, or when one call get_elements. It saves space and makes the convertor creation a little faster in some cases. Rearrange the fields in the datatype description structs. The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the static array was only partially created. All predefined types should have the ptypes array created and initialized. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Fix the boundary computation. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * test/datatype: add test for short unpack on heteregeneous cluster Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Trying to reduce the cost of creating a convertor. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Respect the unpack boundaries. As Gilles suggested on #2535 the opal_unpack_general_function was unpacking based on the requested count and not on the amount of packed data provided. Fixes #2535. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-05-09 09:31:40 -04:00
Gilles Gouaillardet	fa5cd0dbe5	use ptrdiff_t instead of OPAL_PTRDIFF_TYPE since Open MPI now requires a C99, and ptrdiff_t type is part of C99, there is no more need for the abstract OPAL_PTRDIFF_TYPE type. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-19 13:41:56 +09:00
Gilles Gouaillardet	bf0fc4a84c	opal/datatype: correctly handle zero size datatype or zero count Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-13 15:21:28 +09:00
George Bosilca	295eec7059	Small fix for persistence receives. A minor optimization, few typos and extra comments	2016-09-16 10:27:32 -04:00
George Bosilca	fd57f5bccd	Remove some of the clang warnings.	2016-08-20 14:21:42 -04:00

1 2 3 4

161 Коммитов