openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	465414953d	Merge pull request #7828 from edgargabriel/pr/v4.1.x-avg-fview-size common/ompio: use avg. file view size in the aggregator selection logic	2020-06-25 11:45:48 -04:00
Edgar Gabriel	eeee011ac0	common/ompio: use avg. file view size in the aggregator selection logic This is a fix based on a bugreport on github/mailing list from CGNS. The core of the problem was that different processes entered different branches of our aggregator selection logic, due to the fact that in some cases processes had a matching file_view size and contiguous chunk size (thus assuming 1-D distribution), and some processes did not (thus assuming 2-D distribution). The fix is to calculate the avg. file view size across all processes and use this value, thus ensuring that all processes enter the same branch. Fixes issue #7809 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu> (cherry picked from commit `4a8a330bba`)	2020-06-16 10:21:59 -05:00
raafatfeki	0864b62e12	fs/gpfs: Support of GPFS file system Creation of gpfs module under fs component. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2020-06-12 12:57:18 -04:00
Edgar Gabriel	39acc3a251	common/ompio: fix calculation in simple-grouping option This is based on a bug reported on the mailing list using a netcdf testcase. The problem occurs if processes are using a custom file view, but on some of them it appears as if the default file view is being used. Because of that, the simple-grouping option lead to different number of aggregators used on different processes, and ultimately to a deadlock. This patch fixes the problem by not using the file_view size anymore for the calculation in the simple-grouping option, but the contiguous chunk size (which is identical on all processes). Fixes issue #7109 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu> (cherry picked from commit `ad5d0df4e9`)	2019-11-25 09:04:13 -06:00
Edgar Gabriel	a3e1ecc14b	comomn_ompio_file_read/write: fix 2GB limiting issue individual read/write operations exceeding 2GB fail in ompio due to improper conversions from size_t to int in two different locations. This commit fixes an issue reported by Richard Warren from the HDF5 group. Fixes Issue #7045 Cherry-picked from commit `a130f569df` Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-10-22 12:12:55 -05:00
George Bosilca	c9f48e2e77	Whitespace cleanup No code or logic changes. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-08-16 10:27:43 -04:00
Harald Klimach	16e1d74c8f	Suggestion to fix division by zero in file view. In common_ompi_aggregators calc_cost routine: do not cast the real division to an int intermediately. This patch removes the obsolete int variable c and assigns the result of the P_a/P_x division directly to n_as. With the intermediate int c variable, n_as gets 0 if P_a < P_x, resulting in a division by 0 when computing n_s. Signed-off-by: Harald Klimach <harald.klimach@uni-siegen.de> (cherry picked from commit `e222a04ae5`)	2019-06-25 09:29:08 -06:00
Edgar Gabriel	c7250cd11d	common/ompio: fix division by zero problem with empty fview When using an empty fileview, a division by zero bug can occur in ompio. Not entirely sure why the problem did not show up previously, but some recent changes trigger that bug in one of our tests. This pr is part of a fix applied in commit `f6b3a0a` Fixes Issue #6703 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-05-23 13:48:57 -05:00
René Widera	e30e5b95c6	common/ompio: possible rounding issue Similar to #6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors. - remove floating point operations for `round up` - removes floating point conversion for round down (native behavior of integer division) Signed-off-by: René Widera <r.widera@hzdr.de> (cherry picked from commit `a91fab80a1`)	2019-01-30 12:31:39 -06:00
Edgar Gabriel	d1e8779fe3	common/ompio: fix a floating point division problem This commit fixes a problem reported on the mailing list with individual writes larger than 512 MB. The culprit is a floating point division of two large, close values. Changing the datatypes from float to double (which is what is being used in the fcoll components) fixes the problem. See issue #6285 and https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118 Thanks for Axel Huebl and René Widera for reporting the issue. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu> (cherry picked from commit `c0f8ce0fff`)	2019-01-30 12:31:16 -06:00
Edgar Gabriel	96c1a5b9dc	common/ompio: check datatypes when setting file view return MPI_ERR_ARG if the size of the fileview is not a multiple of the size of the etype provided. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-17 11:22:19 -05:00
Edgar Gabriel	425a71799e	common/ompio: return correct error code for improper access return MPI_ERR_ACCESS if the user tries to read from a file that was opened using MPI_MODE_WRONLY return MPI_ERR_READ_ONLY if the user tries to write a file that was opened using MPI_MODE_RDONLY Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-17 11:22:04 -05:00
Edgar Gabriel	2da601a350	common/ompio: fix an ordering problem during file_open the sharedfp component has to be selected and opened before we set the default file view during file_open. Otherwise there is a sperious error message from the sharefp_file_seek operation that is called during the file_set_view. Fixes Issue #5560 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-08-20 10:23:32 -05:00
Gaëtan Bossu	8522ba112c	MCA/IO/OMPIO: fix MPI_File_delete implementation. OMPIO now uses the correct delete function depending on the fs mca_common_ompio_file_delete now works this way instead of calling POSIX unlink: - create a minimal file handle with the given file name - select the best fs component using this file handle - call the component-specific file delete function Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>	2018-07-17 18:17:13 +02:00
Edgar Gabriel	fd8c5fba4e	common/ompio: fix the fview based grouping options a bug sneaked into constructing the list of aggregators processes when using the fileview based grouping options Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-22 14:01:31 -05:00
Edgar Gabriel	743e0dff5a	common/ompio: fix zero size fview issue handle the situation where the user requests a non-zero amount of data but has a zero-size fileview. My instrinct would have been to return an error code, but according to the test that I used it should be MPI_SUCCESS and zero bytes. It is definitely better than segfaulting :-) THis makes another test from the IBM testsuite pass. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 17:02:13 -05:00
Edgar Gabriel	7808379a47	common/ompio: incorporate George's comments incorporate a couple of comments by George as part of the review on github. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:29:49 -05:00
Edgar Gabriel	3c10ed4ed1	common/ompio: use allocator to manage temporary buffers use an allocator to manage temporary buffers when copying unmanaged data from GPU buffer to host. This is necessary, since the buffers have to be pinned for better performance, which is an expensive operation. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Edgar Gabriel	6a532101aa	io/ompio and common/ompio: add initial support for cuda buffers in ompio this commit adds the initial support for cuda buffers in ompio, for blocking and non-blocking individual read and write operations. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Edgar Gabriel	df4431bd48	io/ompio: add support for some info objects add support for the info objects cb_buffer_size and collective_buffering. Also, introduce a new mca parameter that allows to give feedback on whether an info object is recognized (and honored). Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-19 19:34:36 -05:00
Edgar Gabriel	bc0f60dfd9	sharedfp/all components: revamp internal operations this commit revamps the internal operations of the sharedfp components. Specifically, it is focused around removing the second file_open operation for shared file pointers. This makes the code more efficient. Because of that, there is no necessity anymore for the sharedfp_lazy_open mca parameter. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-18 14:34:05 -05:00
Gilles Gouaillardet	cd45c7abb6	ompio: misc renames Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-14 09:41:10 +09:00
Gilles Gouaillardet	36b35ae0db	ompio: fix abstraction Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-14 09:41:10 +09:00
Edgar Gabriel	14bd114973	common/ompio: return error code from file_delete operation in file_close in case the user opened a file using the DELETE_ON_CLOSE flag, return the error code generated in the delete operation. Note, that this is however just a partial fix to the e_close_1 test from the ibm testsuite, since the object destructor that triggers the file_close function does not have a mechanism right now to recognize and return an error code. Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>	2018-06-07 19:30:14 -05:00
Edgar Gabriel	8feb497dbe	io/ompio: cleanup the aggregator selection logic and some internal structure elements/components. Along the way, add support for the cb_nodes Info object. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 16:47:10 -05:00
Edgar Gabriel	529d882ff0	io/ompio and common/ompio: relocate ompio_request code to common since the request code is now being accessed also from the vulcan fcoll component, the request code was relocated into the common/ompio directory to avoid ld load problems. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 16:13:12 -05:00
Edgar Gabriel	6b03cee7f1	io/ompio: erroneous condition in selecting aggregator selection logic fix the logic in the decision which aggregator selection algorithm to use. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-05-24 15:52:19 -05:00
Ninad Prabhukhanolkar	1518d7e003	Updated aggregate_profile.pl The files array was also storing $phase.prof. This was leading to $phase.prof's output getting dumped into itself again and again. Updated code to initialise files array with files other than $phase.prof. Signed-off-by: Ninad Prabhukhanolkar <ninadchess96@gmail.com>	2018-04-26 20:34:24 +05:30
Edgar Gabriel	c4879ec29f	io/ompio: don't reset amode if MODE_SEQUENTIAL is set the ompio module resets the amode from WRONLY to RDWR in order to accoomodate data sieving in the two-phase fcoll componet. This leads however to an error if MPI_MODE_SEQUENTIAL has been requested by the user, since MODE_SEQUENTIAL is incompatible with MODE_RDWR. SInce the change to the amode was done after opening the file for individual file pointers but before opening the file for shared filepointers, this lead to an error message in the sharedfp component. Note, that data sieving is never necessary if MODE_SEQUENTIAL is set, so this should not be a problem for any scenario. Fixes #4991 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-03-30 07:56:47 -05:00
Jeff Squyres	9ef0f3d83a	ompi/monitoring: add .sh versionig to common monitoring lib Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-02-20 07:07:23 -08:00
Jeff Squyres	e7f91f8068	Merge pull request #4527 from clementFoyer/osc-no-includes Remove inter-dependencies between OSC modules.	2018-02-09 15:49:56 -05:00
Clement Foyer	f5b4fc05f8	Remove inter-dependencies between OSC modules. The osc monitoring component needed to include other OSC components header in order to be able tu access communicator through the component specific ompi_osc__module_t structures. This commit remove the dependency, and resolve the issue #4523. Extend the common monitoring API. Now it's possible to translate from local rank to world rank from both the communicator and the group. * Remove useless hashtable as we directly use the w_group contained in window structure. Add automatic generation at config time. The templates are expanded at configure time. It creates a new header file that generates all the variables/functions needed. Adding this during the autogen automagicaly generates for each of the available modules the proper functions. Only keep a generated argv-style array. Following Jeff's advice, the configure.m4 file generate a simple array of module variables to be iterated over to find the proper module. Signed-off-by: Clement Foyer <clement.foyer@inria.fr>	2018-02-07 11:52:00 +00:00
Edgar Gabriel	1f151be6d2	io/ompio: introduce a new function to retrieve mca parameter values ompio has the unique problem, that mca parameters set in the io/ompio component have to be accessible from other frameworks as well. This is mostly done to avoid a replication in the parameter names and to reduce the number of mca parameters that and end-user has to worry about. This commit introduces a generic function to retrieve ompio mca parameters, the function pointer is stored on the file handle. It replaces two functions that used the same concept already for one parameter each. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-12-01 10:00:23 -06:00
Nathan Hjelm	1282e98a01	opal/asm: rename existing arithmetic atomic functions This commit renames the arithmetic atomic operations in opal to indicate that they return the new value not the old value. This naming differentiates these routines from new functions that return the old value. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-11-30 10:41:22 -07:00
Ralph Castain	3906aaf41a	Silence warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-11-25 11:50:18 -08:00
Edgar Gabriel	75ab006ec0	io/ompio: add a new option to disable amode overwriting ompio has historically changed the WRONLY flag provided by the applicaiton to RDWR to allow for the data sieving optimization within the two-phase I/O fcoll component. This change did not have a performance impact on regular UNIX file systems, but seems to hurt performance on NFS (and maybe Lustre?) So provide an option that allows to keep the WRONLY option, and raise an error if tha fcoll/two-phase would actually like to use the data sieving. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-11-17 13:13:38 -06:00
Thomas Naughton	86bb6f8bac	Merge pull request #4444 from naughtont3/tjn-fix-plm-monitoring-configury configury: single quote to avoid trouble with BSD	2017-11-08 10:56:37 -05:00
Edgar Gabriel	c9bb049d00	io/ompio: fix a bug in handling large write/read operations This is a bug fix based on a problem reported on the mailing list. For very large read/write operations, ompio breaks the operation down into multiple cycles. The problem was that one of the variables required to maintain its values across the different cycles did not do that, and because of that the calculations of the memory offsets was wrong. Fixes issue #4453 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-11-06 11:48:13 -06:00
Edgar Gabriel	1885d99ac7	fs/ufs: set proper error codes on file_open set proper error codes in mca_fs_ufs_file_open by mapping the errno value to the MPI error code. Fixes an issue reported on the mailing by Wei-keng Liao Fixes Issue #4443 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-11-03 16:06:31 -05:00
Thomas Naughton	c5dc41ee1a	configury: single quote to avoid trouble with BSD Signed-off-by: Thomas Naughton <naughtont@ornl.gov>	2017-11-03 11:34:28 -04:00
Thomas Naughton	86d282d6dd	fix PML monitoring configury to compile DSOs Signed-off-by: Thomas Naughton <naughtont@ornl.gov>	2017-10-26 15:53:11 -04:00
Edgar Gabriel	be0de21e6f	fs/ufs and fbtl/posix: cleanup lock management This commit looks large, but its really mostly a cleanup step. 1. introduce proper error handling for the return values of fcntl and the fbtl_posix_lock function 2. rename a parameter to more accurately reflect what it does 3. introduce an mca parameter in the fs/ufs component that allows to control what the level of locking the user would like to enforce 4. move the initialization of the fs_block_size parameter from fs/ufs into the common/ompio component. An fs component might be allowed to overwrite this value, but none of the actual fs components do that. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-10-19 14:56:28 -05:00
Edgar Gabriel	4c0d347412	Merge pull request #4230 from edgargabriel/topic/no-smart-fview io/ompio: add a new grouping option avoiding communication	2017-09-26 10:56:06 -05:00
George Bosilca	64bff0e326	Disable monitoring if we compile statically. Protect all components against compilation on static builds. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-09-25 12:18:23 -04:00
George Bosilca	458ccc12e1	Move the profiling library in common/monitoring Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-09-25 12:18:23 -04:00
Clément FOYER	f334607c34	Simplify the communicator's name caching management (#6 ) Signed-off-by: Clement Foyer <clement.foyer@inria.fr>	2017-09-25 12:18:23 -04:00
bosilca	a680b3ac6d	Merge pull request #3853 from clementFoyer/master OMPI monitoring: Simplify the communicator's name caching management + misc test changes	2017-09-25 12:14:36 -04:00
Gilles Gouaillardet	b9315edb85	configury: remove the --disable-mpi-io option Fixes open-mpi/ompi#2185 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-09-20 14:39:09 +09:00
Edgar Gabriel	76a8c67575	io/ompio: add a new grouping option avoiding communication the new grouping option simple+ performs all calculations used for the aggregator selection as if the default file view would be used, thus avoiding communication in file_set_view all together. This mode is useful for applications that do not set a file view, but use explicit offset operations on the default file view. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-09-18 12:30:34 -05:00
Nathan Hjelm	4bba8774f4	monitoring: fix MPI_T regression The monitoring code causes MPI_T based tools to segfault when monitoring is disabled. This happens because the performance variables remain registered after the common/monitoring component is dlclosed due to a missing variable registration flag. This commit adds the necessary flag to all the registered performance variables. The issue on github is #4162. Close when applied to master. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-09-06 14:24:35 -06:00

1 2 3 4 5 ...

375 Коммитов