openmpi

Автор	SHA1	Сообщение	Дата
George Bosilca	9330dc2a42	Swap the 2 fields to maintain the size of the struct. Thanks @devreal for catching this. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> (cherry picked from commit `3de636dc6f`)	2020-01-07 15:13:14 -08:00
George Bosilca	a1b4e697f5	Prevent overflow when dealing with datatype count. This patch fixes #7147 by preventing overflow when multiplying the count and the blocklen. The count reflects MPI count and is therefore bound to the size of an int (it is an uint32_t) while the blocklen can be merged together to represent the largest contiguous memory layout and it is therefore promoted to a size_t. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> (cherry picked from commit `59fb02618e`)	2020-01-07 15:13:14 -08:00
George Bosilca	e2b154327e	Small optimization on the datatype commit. This patch fixes the merge of contiguous elements into larger but more compact datatypes, and allows for contiguous elements to have thir blocklen increasing instead of the count. The idea is to always maximize the blocklen, aka. the contiguous part of the datatype. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> (cherry picked from commit `41e6f55807`)	2019-09-03 15:09:33 -04:00
George Bosilca	8e6e826b54	Fix the variable names used for the datatype dump. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-16 10:27:35 -04:00
George Bosilca	83d40c1e14	Fix the stack displacement. Fixes the convertor iovec description on the MPI-IO reported by Edgar. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-16 10:27:23 -04:00
George Bosilca	f78d3d52cd	Optimize the pack/unpack. Start optimizing the code. This commit divides the operations in 2 parts, the first, outside the critical part, deals with partial blocks of predefined elements, and the second, inside the critical path, only deals with full blocks of elements. This reduces the number of expensive operations in the critical path and results in a decent performance increase. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-05 09:39:53 -04:00
George Bosilca	87299e0b1c	Get rid of the division in the critical path. Amazing how a bad instruction scheduling can have such a drastic impact on the code performance. With this change, the get a boost of at least 50% on the performance of data with a small blocklen and/or count. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-05 09:39:44 -04:00
George Bosilca	fad707d3b0	Rework the datatype commit. Optimize contiguous loops by collapsing them into a single element. During datatype optimization collapse similar elements into larger blocks. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-05 09:39:36 -04:00
George Bosilca	d5cdfe70ef	Optimize the position placement. Upon detecting a datatype loop representation skip the entire loop according the the remaining space. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-05 09:39:27 -04:00
George Bosilca	78cc0ff891	Disable checksum. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-05 09:39:19 -04:00
George Bosilca	012a004806	Clean and sync the pack and unpack functions. - optimize handling of contiguous with gaps datatypes. - fixes a performance issue for all datatypes with a count of 1. - optimize the pack/unpack of contiguous with gaps datatype. - optimize the case of blocklen == 1 Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-05 09:39:11 -04:00
George Bosilca	0a00b02e48	Small improvements on the test. Rework the to_self test to be able to be used as a benchmark. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-05 09:39:02 -04:00
George Bosilca	4cdc2155e5	Optimize the raw representation. Merge contiguous iov in order to minimize the number of returned iovec. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-05 09:38:52 -04:00
George Bosilca	8b794235b8	Update the datatype dump to match the actual types. Update the comments to better reflect what is going on. Minor indentations. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-05 09:37:47 -04:00
George Bosilca	4f754d0156	Optimized datatype description. Move toward a base type of vector (count, type, blocklen, extent, disp) with disp and extent applying toward the count repertition and blocklen being a contiguous memory of type type. Implement 2 optimizations on this description used during type_commit: - collapse: successive similar datatype descriptions are collapsed together with an increased count. - fusion: fuse successive datatype descriptions in order to minimize the number of resulting memcpy during pack/unpack. Fixes at the OMPI datatype level including: - Fix the create_hindexed and vector creation. - Fix the handling of [get\|set]_elements and _count. - Correctly compute the dispacement for block indexed types. - Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-08-05 09:35:07 -04:00
George Bosilca	e4aae6b5c8	Add a test for very large data. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-03-05 19:43:31 -05:00
Gilles Gouaillardet	320a839be9	opal/datatype: correctly handle large datatypes Always use size_t (instead of converting to an uint32_t) in order to correctly support large datatypes. Thanks Ben Menadue for the initial bug report Refs open-mpi/ompi#6016 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-03-05 19:41:39 -05:00
Howard Pritchard	9e306cee49	Merge pull request #6336 from jsquyres/pr/v4.0.x/fix-datatype-destructor-leak v4.0.x: opal/datatype: plug a memory leak in opal_datatype_t destructor	2019-02-11 13:14:06 -07:00
Gilles Gouaillardet	0ae48475a1	opal/datatype: reset ptypes in opal_datatype_clone() Reset ptypes when cloning a datatype in order to prevent a double free() in the opal_datatype_t destructor. This fixes a bug introduced in open-mpi/ompi@7c938f070f Fixes open-mpi/ompi#6346 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit open-mpi/ompi@b395342c9f)	2019-02-01 14:39:49 +09:00
George Bosilca	8acdc53892	Provide a better fix for #6285 . The issue was a little complicated due to the internal stack used in the convertor. The main issue was that in the case where we run out of iov space to save the raw description of the data while hanbdling a repetition (loop), instead of saving the current position and bailing out directly we reading of the next predefined type element. It worked in most cases, except the one identified by the HDF5 test. However, the biggest issue here was the drop in performance for all ensuing calls to the convertor pack/unpack, as instead of handling contiguous loops as a whole (and minimizing the number of memory copies) we copied data description by data description. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> (back-ported from commit open-mpi/ompi@5a82c4fd07)	2019-02-01 09:28:52 +09:00
Gilles Gouaillardet	f7327735a0	opal/datatype: fix opal_convertor_raw correctly handle the case in which iovec is full and the last accessed element of the datatype is the beginning of a loop Refs. open-mpi/ompi#6285 Thanks Axel Huebl for reporting this Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (back-ported from commit open-mpi/ompi@0832ab5acc)	2019-02-01 09:26:30 +09:00
Gilles Gouaillardet	90a9c12fdb	opal/datatype: plug a memory leak in opal_datatype_t destructor correctly free ptypes if the datatype is not pre-defined. Thanks Axel Huebl for reporting this. Refs. open-mpi/ompi#6291 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit `7c938f070f`)	2019-01-30 10:41:14 -08:00
Edgar Gabriel	8c2ea0ef49	opal/dataype: add additional interface to retrieve more details about cuda buffer the existing interface in opal_datatype_cuda do not allow to distinguish whether a buffer is a managed or unmanaged cuda buffer. Add an interface that allows to retrieve this information throug a convertor, since the information is actually available in the mca_common_cuda_* routines. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Jeff Squyres	dec247d96e	opal/datatype: minor compiler warning stomp Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-05-30 10:08:19 -07:00
Sergey Oblomov	52d5ca048e	CONVERTOR: fixed typos in comments Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-16 22:02:39 +03:00
George Bosilca	cd683e3eec	Allow OPAL DDT to receive size_t count argument. Fixes issue #5069, which relates a BigMPI bug with the use of MPI_Type_vectpor to construct very large datatypes (>2GB). Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-04-14 15:32:19 -04:00
Jeff Squyres	2713a24009	opal_datatype_module.c: reset opal_cuda_verbose `999de137ce` accidentally reset opal_cuda_verbose's default value. This commit puts it back. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-03-13 10:10:15 -07:00
George Bosilca	999de137ce	Fix the datatype debug. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-03-08 03:40:08 +09:00
George Bosilca	7848035195	Update the loop stats. The loop should be updated on each internal iteration. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-03-08 03:18:39 +09:00
Gilles Gouaillardet	1a17cb3b1c	opal/datatype: add opal_datatype_is_monotonic() return true if the datatype has non-negative displacements and monotonically nondecreasing, and false otherwise. Thanks George for the guidance. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-01-09 18:05:14 +09:00
George Bosilca	8a9ef3dc2d	Delay the initialization until necessary. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-11-08 17:32:18 -05:00
Aravind Gopalakrishnan	2e83cf15ce	Add support for GPU buffers for PSM2 MTL PSM2 enables support for GPU buffers and CUDA managed memory and it can directly recognize GPU buffers, handle copies between HFIs and GPUs. Therefore, it is not required for OMPI to handle GPU buffers for pt2pt cases. In this patch, we allow the PSM2 MTL to specify when it does not require CUDA convertor support. This allows us to skip CUDA convertor init phases and lets PSM2 handle the memory transfers. This translates to improvements in latency. The patch enables blocking collectives and workloads with GPU contiguous, GPU non-contiguous memory. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2017-09-01 16:59:03 -07:00
George Bosilca	50f471e31e	Cleanup a set of warnings reported by Ralph. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-08-22 23:00:18 -04:00
Gilles Gouaillardet	a111fc8ff2	opal/datatype: fix opal_dt_swap_long_double if no IEEE754_H Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-12 10:27:45 +09:00
Gilles Gouaillardet	8fd08b933a	opal/datatype: add minimal support to convert long double between ieee 754 quadruple precision and extended precision formats. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-12 10:27:45 +09:00
Gilles Gouaillardet	9118777b66	opal/ddt: use optimized description when packing contiguous datatypes Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-07-12 10:27:45 +09:00
Gilles Gouaillardet	5a35a8e82c	opal/datatype: do not compute ptypes for OPAL predefined datatypes Fixes open-mpi/ompi#3522 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-05-15 11:43:48 +09:00
bosilca	cbf03b3113	Topic/datatype (#3441 ) * Don't overflow the internal datatype count. Change the type of the count to be a size_t (it does not alter the total size of the internal structures, so has no impact on the ABI). Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Optimize the datatype creation. The internal array of counts of predefined types is now only created when needed, which is either in a heterogeneous environment, or when one call get_elements. It saves space and makes the convertor creation a little faster in some cases. Rearrange the fields in the datatype description structs. The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the static array was only partially created. All predefined types should have the ptypes array created and initialized. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Fix the boundary computation. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * test/datatype: add test for short unpack on heteregeneous cluster Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Trying to reduce the cost of creating a convertor. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> * Respect the unpack boundaries. As Gilles suggested on #2535 the opal_unpack_general_function was unpacking based on the requested count and not on the amount of packed data provided. Fixes #2535. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-05-09 09:31:40 -04:00
Gilles Gouaillardet	fa5cd0dbe5	use ptrdiff_t instead of OPAL_PTRDIFF_TYPE since Open MPI now requires a C99, and ptrdiff_t type is part of C99, there is no more need for the abstract OPAL_PTRDIFF_TYPE type. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-19 13:41:56 +09:00
Gilles Gouaillardet	bf0fc4a84c	opal/datatype: correctly handle zero size datatype or zero count Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-13 15:21:28 +09:00
George Bosilca	295eec7059	Small fix for persistence receives. A minor optimization, few typos and extra comments	2016-09-16 10:27:32 -04:00
George Bosilca	fd57f5bccd	Remove some of the clang warnings.	2016-08-20 14:21:42 -04:00
George Bosilca	e1c6b0e4a7	Some compilers are more than picky.	2016-06-03 09:04:34 +09:00
George Bosilca	87b1d17e7e	Remove warnings. clang 7.0 with the picky option on is extremely verbose, and complains about almost everything. Trying to make him happy, at least regarding the datatype engine.	2016-06-03 00:56:24 +09:00
George Bosilca	d379e23bf7	One less warning. The heterogeneous code need to gracefully handly the contiguous datatype loops in order to have the "#if 0" code path enabled again. This is a performance issue (the correctness is guaranteed by the current code).	2016-04-21 18:11:29 -04:00
George Bosilca	26fc8533f8	Remove compiler warnings.	2016-04-04 16:34:23 -04:00
Gilles Gouaillardet	cd829e4646	opal/datatype: only use opal_pack_general[_checksum] if CONVERTOR_SEND_CONVERSION && ! CONVERTOR_HOMOGENEOUS	2016-03-30 11:40:18 +09:00
George Bosilca	cf2bb20bac	Always build support for HETEROGENEOUS environment (this is needed to provide external32 support). Add a pack function allowing to provide send conversion (needed on little endian machine in order to pack in the external32 format).	2016-03-30 11:40:18 +09:00
George Bosilca	639f4b1086	Add a small optimization for the vector of predefined datatype.	2016-03-30 11:40:18 +09:00
George Bosilca	1ff2a38b46	Dump also the blockLen.	2016-03-30 11:39:10 +09:00

1 2 3 4

153 Коммитов