openmpi

Автор	SHA1	Сообщение	Дата
Joseph Schuchart	634f67b216	Merge pull request #7843 from devreal/clang-tidy-free Some fixups for issues detected by clang-tidy	2020-06-25 17:30:04 +02:00
Joseph Schuchart	d9b11b29cd	Properly free memory in case of error in mca_common_ompio_prepare_to_group Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-19 12:31:14 +02:00
Joseph Schuchart	ed1ca1a84b	Don't free memory escaping mca_common_ompio_prepare_to_group Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-19 12:30:38 +02:00
Edgar Gabriel	4a8a330bba	common/ompio: use avg. file view size in the aggregator selection logic This is a fix based on a bugreport on github/mailing list from CGNS. The core of the problem was that different processes entered different branches of our aggregator selection logic, due to the fact that in some cases processes had a matching file_view size and contiguous chunk size (thus assuming 1-D distribution), and some processes did not (thus assuming 2-D distribution). The fix is to calculate the avg. file view size across all processes and use this value, thus ensuring that all processes enter the same branch. Fixes issue #7809 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2020-06-15 09:17:44 -05:00
Nathan Hjelm	160ff188b8	Merge pull request #7169 from hjelmn/fix_what_wg21_calls_our_problem_not_theirs_seriously__in_some_ways_they_are_correct_but_wtf configure: use -iquote for non-system include paths	2020-03-30 09:22:54 -07:00
Noah Evans	ee3517427e	Add threads framework Add a framework to support different types of threading models including user space thread packages such as Qthreads and argobot: https://github.com/pmodels/argobots https://github.com/Qthreads/qthreads The default threading model is pthreads. Alternate thread models are specificed at configure time using the --with-threads=X option. The framework is static. The theading model to use is selected at Open MPI configure/build time. mca/threads: implement Argobots threading layer config: fix thread configury - Add double quotations - Change Argobot to Argobots config: implement Argobots check If the poll time is too long, MPI hangs. This quick fix just sets it to 0, but it is not good for the Pthreads version. Need to find a good way to abstract it. Note that even 1 (= 1 millisecond) causes disastrous performance degradation. rework threads MCA framework configury It now works more like the ompi/mca/rte configury, modulo some edge items that are special for threading package linking, etc. qthreads module some argobots cleanup Signed-off-by: Noah Evans <noah.evans@gmail.com> Signed-off-by: Shintaro Iwasaki <siwasaki@anl.gov> Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2020-03-27 10:15:45 -06:00
Gilles Gouaillardet	69bc2e8372	misc: fix <> vs "" includes throught the ompi codebase This commit fixes an issue with the include usage in some ompi source files. These source files are using the <> form of include when the "" form is correct (as these are internal, not system headers). Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Nathan Hjelm <hjelmn@google.com>	2020-03-09 21:13:49 -04:00
Austen Lauria	4f6978466d	Merge pull request #7284 from bosilca/fix/monitoring_registration Minor cleanup in the monitoring PML.	2020-01-27 13:01:30 -05:00
Charles Shereda	cbc6feaab2	Created opal_gethostname() as safer gethostname substitute. The opal_gethostname() function provides a more robust mechanism to retrieve the hostname than gethostname(), which can return results that are not null-terminated, and which can vary in its behavior from system to system. opal_gethostname() just returns the value in opal_process_info.nodename; this is populated in opal_init_gethostname() inside opal_init.c. -Changed all gethostname calls in opal subtree to opal_gethostname -Changed all gethostname calls in orte subtree to opal_gethostname -Changed all gethostname calls in ompi subdir to opal_gethostname -Changed all gethostname calls in oshmem subdir to opal_gethostname -Changed opal_if.c in test subdir to use opal_gethostname -Changed opal_init.c to include opal_init_gethostname. This function returns an int and directly sets opal_process_info.nodename per jsquyres' modifications. Relates to open-mpi#6801 Signed-off-by: Charles Shereda <cpshereda@lanl.gov>	2020-01-13 08:52:17 -08:00
George Bosilca	05093f9cb1	Minor cleanup in the monitoring PML. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-01-13 09:24:00 -05:00
XuanWang1982	b1dc58eeb2	First version for GPFS module. To be tested Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-11-25 09:01:38 -06:00
Edgar Gabriel	ad5d0df4e9	common/ompio: fix calculation in simple-grouping option This is based on a bug reported on the mailing list using a netcdf testcase. The problem occurs if processes are using a custom file view, but on some of them it appears as if the default file view is being used. Because of that, the simple-grouping option lead to different number of aggregators used on different processes, and ultimately to a deadlock. This patch fixes the problem by not using the file_view size anymore for the calculation in the simple-grouping option, but the contiguous chunk size (which is identical on all processes). Fixes issue #7109 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-10-29 12:30:41 -05:00
Edgar Gabriel	a130f569df	comomn_ompio_file_read/write: fix 2GB limiting issue individual read/write operations exceeding 2GB fail in ompio due to improper conversions from size_t to int in two different locations. This commit fixes an issue reported by Richard Warren from the HDF5 group. Fixes Issue #397 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-10-05 09:50:02 -05:00
George Bosilca	2930bd9d21	Whitespace cleanup No code or logic changes. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-08-14 11:06:47 -04:00
Harald Klimach	e222a04ae5	Suggestion to fix division by zero in file view. In common_ompi_aggregators calc_cost routine: do not cast the real division to an int intermediately. This patch removes the obsolete int variable c and assigns the result of the P_a/P_x division directly to n_as. With the intermediate int c variable, n_as gets 0 if P_a < P_x, resulting in a division by 0 when computing n_s. Signed-off-by: Harald Klimach <harald.klimach@uni-siegen.de>	2019-06-13 18:47:32 +02:00
Edgar Gabriel	8eda9f2ecd	common/ompio: fix coverty warnings this commmit fixes coverty warnings CID 1445198 and CID 1445197 For a reason that is a bit unclear to me, coverty only complained about the read files, but the write operations had the same issue, so I fixed that within the same commit as well. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-05-23 13:40:39 -05:00
Edgar Gabriel	27b2ec71a7	common/ompio: add support for read operations and collective I/O external32 data representation is now support by ompio for everything but non-blocking collective I/O operations. The support can further be improved in a second step to limit the temporary buffer size (at least for blocking operations), but it does work now for many scenarios. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-05-20 17:56:16 -05:00
Edgar Gabriel	ab56e6f0db	common/ompio: make individual read operations work. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-05-20 17:22:33 -05:00
Edgar Gabriel	f6b3a0af52	common/ompio: individual write of external32 works both blocking and non-blocking. collective write and read operations not yet. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-05-20 16:26:14 -05:00
Edgar Gabriel	d955753cb8	common/ompio: abstraction for different convertor types introduce separate convertors for memory vs. file representation. Adjust the interfaces for decode_datatype to provide the convertor to be used for that. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-05-20 13:35:38 -05:00
Edgar Gabriel	35be18b266	common/ompio: rename ompio_cuda* to ompio_buffer* the infrastructure put in place to manage cuda buffers is actually a lot more generic than just for cuda buffers. Specifically, we ca reuse much of the code to implement the external32 data representation. This commit converts the code from common_ompio_cuda* to common_ompio_buffer*. There are just very few places where we actually need to keep the OPAL_CUDA_SUPPORT ifdef in place. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-05-20 12:50:04 -05:00
Edgar Gabriel	a96efb7620	common/ompio: add comm_ompio_read_all/write_all functions in preparation for adding support for the external32 data representation. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-05-20 12:49:36 -05:00
Edgar Gabriel	d43427fc76	common/ompio: refactor the build_io_array function abstract out the io_array structure to be used in common_ompio_build_io_array function. This is preparation for a future component that would like to use the same function, but not modify the io_array stored on the file handle itself. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-04-17 14:42:33 -05:00
George Bosilca	e42b573cd3	Fix the PVAR allocation usage. According to the MPI standard the obj_handle is a pointer to an MPI object, and therefore cannot be MPI_COMM_WORLD. The MPI standard example 14.6 highlight this usage. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-02-02 19:03:43 -05:00
Edgar Gabriel	c0f8ce0fff	common/ompio: fix a floating point division problem This commit fixes a problem reported on the mailing list with individual writes larger than 512 MB. The culprit is a floating point division of two large, close values. Changing the datatypes from float to double (which is what is being used in the fcoll components) fixes the problem. See issue #6285 and https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118 Thanks for Axel Huebl and René Widera for reporting the issue. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-01-21 17:59:12 -06:00
René Widera	a91fab80a1	common/ompio: possible rounding issue Similar to #6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors. - remove floating point operations for `round up` - removes floating point conversion for round down (native behavior of integer division) Signed-off-by: René Widera <r.widera@hzdr.de>	2019-01-18 14:05:23 +01:00
Edgar Gabriel	bf058ca6b0	common/ompio: check datatypes when setting file view return MPI_ERR_ARG if the size of the fileview is not a multiple of the size of the etype provided. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-11 14:43:32 -05:00
Edgar Gabriel	05d25383c2	common/ompio: return correct error code for improper access return MPI_ERR_ACCESS if the user tries to read from a file that was opened using MPI_MODE_WRONLY return MPI_ERR_READ_ONLY if the user tries to write a file that was opened using MPI_MODE_RDONLY Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-11 14:41:58 -05:00
Brian Barrett	e9e4d2a4bc	Handle asprintf errors with opal_asprintf wrapper The Open MPI code base assumed that asprintf always behaved like the FreeBSD variant, where ptr is set to NULL on error. However, the C standard (and Linux) only guarantee that the return code will be -1 on error and leave ptr undefined. Rather than fix all the usage in the code, we use opal_asprintf() wrapper instead, which guarantees the BSD-like behavior of ptr always being set to NULL. In addition to being correct, this will fix many, many warnings in the Open MPI code base. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-10-08 16:43:53 -07:00
Jeff Squyres	5be0ba0247	common/monitoring: fix include files Move includes to top of file. Set some #defines so that monitoring_prof.c compiles without warning (as identified by gcc 8 on MacOS). Also ensure to include the internal Open MPI "mpi.h" file (not some random system <mpi.h> file). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-09-15 06:04:13 -07:00
Nathan Hjelm	000f9eed4d	opal: add types for atomic variables This commit updates the entire codebase to use specific opal types for all atomic variables. This is a change from the prior atomic support which required the use of the volatile keyword. This is the first step towards implementing support for C11 atomics as that interface requires the use of types declared with the _Atomic keyword. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-09-14 10:48:55 -06:00
Edgar Gabriel	e6a344ba63	Merge pull request #5561 from edgargabriel/pr/file_open_sharedfp_ordering common/ompio: fix an ordering problem during file_open	2018-08-20 10:18:14 -05:00
Edgar Gabriel	2742273ee3	common/ompio: fix an ordering problem during file_open the sharedfp component has to be selected and opened before we set the default file view during file_open. Otherwise there is a sperious error message from the sharefp_file_seek operation that is called during the file_set_view. Fixes Issue #5560 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-08-20 09:28:29 -05:00
Gaëtan Bossu	ccc96efc2e	DDN's Infinite Memory Engine support for OMPIO Changes made: - Create a new fs component for IME - Create a new fbtl component for IME - Modify the close function of OMPIO to finalize IME if necessary Signed-off-by: Gaëtan Bossu <gbossu@ddn.com> Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>	2018-08-16 11:45:47 +02:00
Gaëtan Bossu	8522ba112c	MCA/IO/OMPIO: fix MPI_File_delete implementation. OMPIO now uses the correct delete function depending on the fs mca_common_ompio_file_delete now works this way instead of calling POSIX unlink: - create a minimal file handle with the given file name - select the best fs component using this file handle - call the component-specific file delete function Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>	2018-07-17 18:17:13 +02:00
Edgar Gabriel	fd8c5fba4e	common/ompio: fix the fview based grouping options a bug sneaked into constructing the list of aggregators processes when using the fileview based grouping options Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-22 14:01:31 -05:00
Edgar Gabriel	743e0dff5a	common/ompio: fix zero size fview issue handle the situation where the user requests a non-zero amount of data but has a zero-size fileview. My instrinct would have been to return an error code, but according to the test that I used it should be MPI_SUCCESS and zero bytes. It is definitely better than segfaulting :-) THis makes another test from the IBM testsuite pass. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 17:02:13 -05:00
Edgar Gabriel	7808379a47	common/ompio: incorporate George's comments incorporate a couple of comments by George as part of the review on github. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:29:49 -05:00
Edgar Gabriel	3c10ed4ed1	common/ompio: use allocator to manage temporary buffers use an allocator to manage temporary buffers when copying unmanaged data from GPU buffer to host. This is necessary, since the buffers have to be pinned for better performance, which is an expensive operation. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Edgar Gabriel	6a532101aa	io/ompio and common/ompio: add initial support for cuda buffers in ompio this commit adds the initial support for cuda buffers in ompio, for blocking and non-blocking individual read and write operations. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Edgar Gabriel	df4431bd48	io/ompio: add support for some info objects add support for the info objects cb_buffer_size and collective_buffering. Also, introduce a new mca parameter that allows to give feedback on whether an info object is recognized (and honored). Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-19 19:34:36 -05:00
Edgar Gabriel	bc0f60dfd9	sharedfp/all components: revamp internal operations this commit revamps the internal operations of the sharedfp components. Specifically, it is focused around removing the second file_open operation for shared file pointers. This makes the code more efficient. Because of that, there is no necessity anymore for the sharedfp_lazy_open mca parameter. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-18 14:34:05 -05:00
Gilles Gouaillardet	cd45c7abb6	ompio: misc renames Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-14 09:41:10 +09:00
Gilles Gouaillardet	36b35ae0db	ompio: fix abstraction Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-14 09:41:10 +09:00
Edgar Gabriel	14bd114973	common/ompio: return error code from file_delete operation in file_close in case the user opened a file using the DELETE_ON_CLOSE flag, return the error code generated in the delete operation. Note, that this is however just a partial fix to the e_close_1 test from the ibm testsuite, since the object destructor that triggers the file_close function does not have a mechanism right now to recognize and return an error code. Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>	2018-06-07 19:30:14 -05:00
Edgar Gabriel	8feb497dbe	io/ompio: cleanup the aggregator selection logic and some internal structure elements/components. Along the way, add support for the cb_nodes Info object. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 16:47:10 -05:00
Edgar Gabriel	529d882ff0	io/ompio and common/ompio: relocate ompio_request code to common since the request code is now being accessed also from the vulcan fcoll component, the request code was relocated into the common/ompio directory to avoid ld load problems. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 16:13:12 -05:00
Edgar Gabriel	6b03cee7f1	io/ompio: erroneous condition in selecting aggregator selection logic fix the logic in the decision which aggregator selection algorithm to use. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-05-24 15:52:19 -05:00
Ninad Prabhukhanolkar	1518d7e003	Updated aggregate_profile.pl The files array was also storing $phase.prof. This was leading to $phase.prof's output getting dumped into itself again and again. Updated code to initialise files array with files other than $phase.prof. Signed-off-by: Ninad Prabhukhanolkar <ninadchess96@gmail.com>	2018-04-26 20:34:24 +05:30
Edgar Gabriel	c4879ec29f	io/ompio: don't reset amode if MODE_SEQUENTIAL is set the ompio module resets the amode from WRONLY to RDWR in order to accoomodate data sieving in the two-phase fcoll componet. This leads however to an error if MPI_MODE_SEQUENTIAL has been requested by the user, since MODE_SEQUENTIAL is incompatible with MODE_RDWR. SInce the change to the amode was done after opening the file for individual file pointers but before opening the file for shared filepointers, this lead to an error message in the sharedfp component. Note, that data sieving is never necessary if MODE_SEQUENTIAL is set, so this should not be a problem for any scenario. Fixes #4991 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-03-30 07:56:47 -05:00

1 2 3 4 5 ...

396 Коммитов