openmpi

Автор	SHA1	Сообщение	Дата
William Bailey	71fe9d78e0	fcoll/two_phase: Compiler warning for wrong variable type used Squash compiler warning. Changed output specifier to match variable type (long int -> long long int). Signed-off-by: William Bailey <wbailey2@nd.edu> (cherry picked from commit `e2718e0196`)	2019-12-08 14:15:29 -05:00
Edgar Gabriel	02da54c174	fcoll/two_phase: fix error in calculating aggregators in 32bit mode In fcoll_two_phase_supprot_fns.c: calculation of the aggregator index failed for large offsets on 32bit machine, due to improper handling of 64bit offsets. Fixes Issue #7110 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu> (cherry picked from commit `ea1355beae`)	2019-11-25 09:06:36 -06:00
Edgar Gabriel	cf5cdad40f	fcoll: make vulcan the default component make vulcan the default component except for Lustre file systems. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-22 14:12:02 -05:00
Edgar Gabriel	ac79e576ef	fcoll/base: do not use the two_phase compoment with CUDA support the two_phase compoment does not work with some collective I/O operations on CUDA buffers due to the data sieving (i.e. both read and write operations) executed on some buffers, which are not anticipated in the GPU buffer management of the code. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Edgar Gabriel	0757cb11a8	fcoll/all components: minor updates two minor updates: - in all components: use the fh->f_bytes_per_agg value (which might have been set by an info object) instead of re-reading the mca parameter - vulcan and dynamic_gen2: replace one allgather operation by an allreduce, since it is used to determine the sum of an array. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-20 07:47:29 -05:00
Edgar Gabriel	df4431bd48	io/ompio: add support for some info objects add support for the info objects cb_buffer_size and collective_buffering. Also, introduce a new mca parameter that allows to give feedback on whether an info object is recognized (and honored). Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-19 19:34:36 -05:00
Gilles Gouaillardet	cd45c7abb6	ompio: misc renames Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-14 09:41:10 +09:00
Gilles Gouaillardet	36b35ae0db	ompio: fix abstraction Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-14 09:41:10 +09:00
Edgar Gabriel	2d8a769bfd	fcoll/static: remove component now that we have a shiny new fcoll component, no need to keep the static component around. No use for it anymore. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-08 07:39:46 -05:00
Edgar Gabriel	deaeaa60de	fcoll/vulcan: minor bugfix when creating the groups_per_proc arrays Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 17:52:32 -05:00
Edgar Gabriel	8feb497dbe	io/ompio: cleanup the aggregator selection logic and some internal structure elements/components. Along the way, add support for the cb_nodes Info object. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 16:47:10 -05:00
Edgar Gabriel	529d882ff0	io/ompio and common/ompio: relocate ompio_request code to common since the request code is now being accessed also from the vulcan fcoll component, the request code was relocated into the common/ompio directory to avoid ld load problems. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 16:13:12 -05:00
raafatfeki	5ecb4a56e3	fcoll/vulcan: Support of asynchronous write in collective writeAll We introduced a new mca_vulcan parameter that specify the I/O synchronization type (Async/sync I/O) applied within the collective write operation. The user can explicitly choose to use async or sync write operation or make the choice automatically made. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-06-07 16:13:12 -05:00
raafatfeki	4f7172ddf6	fcoll/vulcan: Support of larger offsets For very large offsets, the data chunk size to be written by each aggregator exceeds the capacity of an integer variable. Besides, some variables were not large enough to hold intermediate values. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-06-07 16:13:12 -05:00
raafatfeki	4670fe50d7	fcoll/vulcan: Remove unnecessary calls to write Identify the index of each aggregator process in order to restrict the call to write_init function by the specific aggregator. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-06-07 16:13:12 -05:00
raafatfeki	bc6431bee9	fcoll/vulcan: use hindexed constructor on the sender side Instead of using a temporary buffer and copy data into the temp buffer before sending, use a derived datatype to describe the data that needs to be sent during a cycle in the collective I/O operation. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-06-07 16:13:12 -05:00
Edgar Gabriel	1c2c110824	fcoll/vulcan: add new fcoll component import of the new vulcan component. It is an enhanced version of the two_phase component, which uses however the ompio internal codes/loops to assemble the data arrays. It is therefore more inline with the dynamic and dynamic_gen2 component, and will be easier to maintain. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 16:13:12 -05:00
Edgar Gabriel	52bd606294	fcoll/dynamic_gen2: make sure that intermediate variables can hold the offset for very large offsets, ome ariables used in the fcoll/dynamic_gen2 code base were under certain circumstances not large enough to hold intermediate values. This issue was more detected in the vulcan component but could happen in the dynamic_gen2 component as well. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-01 06:53:38 -05:00
Jeff Squyres	25f2d02c61	fcoll/dynamic_gen2: minor compiler warning stomp Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-05-30 10:08:19 -07:00
raafatfeki	91e028f7fd	fcoll/dynamic_gen2: Reduce number of realloc calls keep track of the sizeof the blocklen_per_process and displs_per_process on the aggregator datastructure to minimze the number of realloc function calls required in the shuffle_init operation. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-04-20 10:13:57 -05:00
raafatfeki	5d99af29cd	fcoll/dynamic_gen2: Formatting fixes Adjust Coding Style to match the 4 space tab rule. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-04-02 17:25:00 -05:00
raafatfeki	92822613ea	fcoll/dynamic_gen2: fix coverty warnings fix warnings for coverty CID 1433655 and CID 1433654 Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-04-02 16:18:07 -05:00
raafatfeki	100677721d	fcoll/dynamic_gen2: use hindexed constructor on the sender side instead of using a temporary buffer and copy data into the temp buffer before sending, use a derived datatype to describe the data that needs to be sent during a cycle in the collective I/O operation. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-03-28 14:37:30 -05:00
Jeff Squyres	6319292170	fcoll/static: fix CID 1413066 local_iov_array is unconditionally allocated, so unconditionally de-allocate it, too. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-03-26 14:21:21 -07:00
Jeff Squyres	2968ffa296	fcoll/static: remove useless/dead code Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-03-26 14:21:21 -07:00
Nathan Hjelm	5f7ff5307e	fcoll/two_phase: do not use removed function (MPI_Address) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-03-23 08:43:24 -06:00
Edgar Gabriel	da640f98df	fcoll/two_phase: data sieving has to occur at offset 0 as well data sieving has to occur for any offset provided that is larger or equal zero for this implementation to work correctly. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-03-10 11:23:09 -06:00
Edgar Gabriel	1f151be6d2	io/ompio: introduce a new function to retrieve mca parameter values ompio has the unique problem, that mca parameters set in the io/ompio component have to be accessible from other frameworks as well. This is mostly done to avoid a replication in the parameter names and to reduce the number of mca parameters that and end-user has to worry about. This commit introduces a generic function to retrieve ompio mca parameters, the function pointer is stored on the file handle. It replaces two functions that used the same concept already for one parameter each. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-12-01 10:00:23 -06:00
Edgar Gabriel	75ab006ec0	io/ompio: add a new option to disable amode overwriting ompio has historically changed the WRONLY flag provided by the applicaiton to RDWR to allow for the data sieving optimization within the two-phase I/O fcoll component. This change did not have a performance impact on regular UNIX file systems, but seems to hurt performance on NFS (and maybe Lustre?) So provide an option that allows to keep the WRONLY option, and raise an error if tha fcoll/two-phase would actually like to use the data sieving. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-11-17 13:13:38 -06:00
Gilles Gouaillardet	b9315edb85	configury: remove the --disable-mpi-io option Fixes open-mpi/ompi#2185 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-09-20 14:39:09 +09:00
Joshua Hursey	e1d079544b	mca: Dynamic components link against project lib * Resolves #3705 * Components should link against the project level library to better support `dlopen` with `RTLD_LOCAL`. * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am` with the appropriate project level library: ``` MCA components in ompi/ $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la MCA components in orte/ $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la MCA components in opal/ $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la MCA components in oshmem/ $(top_builddir)/oshmem/liboshmem.la" ``` Note: The changes in this commit were automated by the script in the commit that proceeds it with the `libadd_mca_comp_update.py` script. Some components were not included in this change because they are statically built only. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-08-24 11:56:16 -04:00
Edgar Gabriel	f258036e06	fcoll/two_phase: adjust aggregator selection to new mapby flag on MPI_COMM_WORLD adjust how the aggregator nodes are selected depending on whether processes have been mapped by node or anything else. Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>	2017-08-15 09:50:41 -05:00
Edgar Gabriel	450ccd439b	fcoll/base: adjust selection table adjust the fcoll selection table to achieve the following: - two_phase should not advertise itself on lustre file systems - two_phase should advertise itself on sequential file systems (stripe_size == 0 ) - priority for dynamic, static and individual is reduced. This will lead to two_phase being selected in scenarios where two or more components indicate willingness to run. Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>	2017-07-25 10:37:22 -05:00
Mark Allen	552216f9ba	scripted symbol name change (ompi_ prefix) Passed the below set of symbols into a script that added ompi_ to them all. Note that if processing a symbol named "foo" the script turns foo into ompi_foo but doesn't turn foobar into ompi_foobar But beyond that the script is blind to C syntax, so it hits strings and comments etc as well as vars/functions. coll_base_comm_get_reqs comm_allgather_pml comm_allreduce_pml comm_bcast_pml fcoll_base_coll_allgather_array fcoll_base_coll_allgatherv_array fcoll_base_coll_bcast_array fcoll_base_coll_gather_array fcoll_base_coll_gatherv_array fcoll_base_coll_scatterv_array fcoll_base_sort_iovec mpit_big_lock mpit_init_count mpit_lock mpit_unlock netpatterns_base_err netpatterns_base_verbose netpatterns_cleanup_narray_knomial_tree netpatterns_cleanup_recursive_doubling_tree_node netpatterns_cleanup_recursive_knomial_allgather_tree_node netpatterns_cleanup_recursive_knomial_tree_node netpatterns_init netpatterns_register_mca_params netpatterns_setup_multinomial_tree netpatterns_setup_narray_knomial_tree netpatterns_setup_narray_tree netpatterns_setup_narray_tree_contigous_ranks netpatterns_setup_recursive_doubling_n_tree_node netpatterns_setup_recursive_doubling_tree_node netpatterns_setup_recursive_knomial_allgather_tree_node netpatterns_setup_recursive_knomial_tree_node pml_v_output_close pml_v_output_open intercept_extra_state_t odls_base_default_wait_local_proc _event_debug_mode_on _evthread_cond_fns _evthread_id_fn _evthread_lock_debugging_enabled _evthread_lock_fns cmd_line_option_t cmd_line_param_t crs_base_self_checkpoint_fn crs_base_self_continue_fn crs_base_self_restart_fn event_enable_debug_output event_global_current_base_ event_module_include eventops sync_wait_mt trigger_user_inc_callback var_type_names var_type_sizes Signed-off-by: Mark Allen <markalle@us.ibm.com>	2017-07-11 02:13:23 -04:00
Gilles Gouaillardet	fa5cd0dbe5	use ptrdiff_t instead of OPAL_PTRDIFF_TYPE since Open MPI now requires a C99, and ptrdiff_t type is part of C99, there is no more need for the abstract OPAL_PTRDIFF_TYPE type. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-19 13:41:56 +09:00
George Bosilca	366d64b7e5	Move the collective structure outside the communicator. As we changed the ABI (forcing a major release), we can limit the size of the predefined communicators by moving the collective structure outside the communicator. This might have a minimal, but unnoticeable, impact on performance. This approach has been discussed during the January 2017 devel meeting. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-27 11:54:17 -06:00
Edgar Gabriel	b10558c3da	fcoll/dynamic_gen2: fix bug exposed by uneven distribution of data This fixes a bug reported in-house occuring with this component. It is triggered if the data assigned to different aggregators is highly differing, leading to different number of internal iterations required to handle it. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2016-11-24 13:02:19 -06:00
Ralph Castain	1e2019ce2a	Revert "Update to sync with OMPI master and cleanup to build" This reverts commit `cb55c88a8b`.	2016-11-22 15:03:20 -08:00
Ralph Castain	cb55c88a8b	Update to sync with OMPI master and cleanup to build Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-22 14:24:54 -08:00
Gilles Gouaillardet	6b57b77ecb	configury: add the --disable-io-ompio option --disable-io-ompio is a shortcut that disable the following frameworks and components - fbtl - fcoll - sharedfp - common/ompio - io/ompio Fixes open-mpi/ompi#1934	2016-09-23 09:41:09 +09:00
Edgar Gabriel	19fe5cac50	io/ompio: next step in code-reorganization - move the sort_iovec operations to fcoll/base - move set_view_internal to common/ompio - move set_file_default to common/ompio - remove io_ompio_sort, not used anymore.	2016-08-02 09:18:29 -05:00
Edgar Gabriel	160d9a78c1	Merge pull request #1886 from edgargabriel/pr/ompio-reorg io/ompio: move io/ompio functionality to common/ompio	2016-07-29 12:24:21 -05:00
Edgar Gabriel	ccf76b7791	moving the internal read/write functions to common/ompio and update all fs/fcoll/sharedfp components to use these functions.	2016-07-21 13:08:32 -05:00
Edgar Gabriel	39ae93b87b	modify the fcoll components to use the common/ompio print queues	2016-07-21 13:08:32 -05:00
Edgar Gabriel	a899c0fb38	fcoll/static: fix coverty warnings fix coverty warnings CID 72144, CID 710677, CID 1364164	2016-07-21 13:08:15 -05:00
Edgar Gabriel	195ec89732	fcoll/base: mv coll_array functionis to fcoll base the coll_array functions are truly only used by the fcoll modules, so move them to fcoll/base. There is currently one exception to that rule (number of aggreagtors logic), but that function will be moved in a long term also to fcoll/base.	2016-07-14 08:41:14 -05:00
Nathan Hjelm	70533e6d50	fcoll/static: fix coverity issues Fix CID 72362: Explicit null dereferenced (FORWARD_NULL) From what I can tell the code @ fcoll_static_file_read_all.c:649 should be setting bytes_per_process[i] to 0 not bytes_per_process. Fix CID 72361: Explicit null dereferenced (FORWARD_NULL) Modified check to check for blocklen_per_process non-NULL before trying to free blocklen_per_process[l]. This is sufficient because free (NULL) is safe. Also cleaned up the initialization of this an a couple other arrays. They were allocated with malloc() then initialized to 0. Changed to used calloc(). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 14:48:13 -06:00
Nathan Hjelm	8871bdb2f8	fcoll/two_phase: fix coverity issues Fix CID 72296: Resource leak (RESOURCE_LEAK): Changed code to goto exit instead of returning to ensure memory is freed. Fix CID 712589: Out-of-bounds read (OVERRUN): In this loop i and j are identical and always less than iov_count. The CID was triggered because i was incremented if i was < iov_count. This meant that if the loop did go on the next iteration would access an invalid index. Fix CID 741363: Uninitialized scalar variable (UNINIT): Allocate tmp_len with calloc to insure every index is initialized. Fix CID 741364: Uninitialized pointer read (UNINIT): Allocate recv_types with calloc to ensure all indices are always initialized. Also added a check to not loop and destroy if recv_types is NULL. Also added a NULL check on the allocation of decoded iov. This is not the cause of CID 126784 but should be fixed. Fix CID 712588: Out-of-bounds read (OVERRUN): Similar to CID 712589. Should silence the issue. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 14:47:41 -06:00
Edgar Gabriel	45003ef78d	fix the data size counter for large ops for the static fcoll component	2016-02-23 08:33:50 -06:00
Edgar Gabriel	92d1b99468	optimize the shuffle step: 1. use communicator collectives if possible for performance reasons 2. combined multiple allgathers into a single one	2016-02-19 11:04:04 -06:00

1 2 3 4

173 Коммитов