openmpi

Автор	SHA1	Сообщение	Дата
Gilles Gouaillardet	b9315edb85	configury: remove the --disable-mpi-io option Fixes open-mpi/ompi#2185 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-09-20 14:39:09 +09:00
Joshua Hursey	e1d079544b	mca: Dynamic components link against project lib * Resolves #3705 * Components should link against the project level library to better support `dlopen` with `RTLD_LOCAL`. * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am` with the appropriate project level library: ``` MCA components in ompi/ $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la MCA components in orte/ $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la MCA components in opal/ $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la MCA components in oshmem/ $(top_builddir)/oshmem/liboshmem.la" ``` Note: The changes in this commit were automated by the script in the commit that proceeds it with the `libadd_mca_comp_update.py` script. Some components were not included in this change because they are statically built only. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-08-24 11:56:16 -04:00
Edgar Gabriel	f258036e06	fcoll/two_phase: adjust aggregator selection to new mapby flag on MPI_COMM_WORLD adjust how the aggregator nodes are selected depending on whether processes have been mapped by node or anything else. Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>	2017-08-15 09:50:41 -05:00
Edgar Gabriel	450ccd439b	fcoll/base: adjust selection table adjust the fcoll selection table to achieve the following: - two_phase should not advertise itself on lustre file systems - two_phase should advertise itself on sequential file systems (stripe_size == 0 ) - priority for dynamic, static and individual is reduced. This will lead to two_phase being selected in scenarios where two or more components indicate willingness to run. Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>	2017-07-25 10:37:22 -05:00
Mark Allen	552216f9ba	scripted symbol name change (ompi_ prefix) Passed the below set of symbols into a script that added ompi_ to them all. Note that if processing a symbol named "foo" the script turns foo into ompi_foo but doesn't turn foobar into ompi_foobar But beyond that the script is blind to C syntax, so it hits strings and comments etc as well as vars/functions. coll_base_comm_get_reqs comm_allgather_pml comm_allreduce_pml comm_bcast_pml fcoll_base_coll_allgather_array fcoll_base_coll_allgatherv_array fcoll_base_coll_bcast_array fcoll_base_coll_gather_array fcoll_base_coll_gatherv_array fcoll_base_coll_scatterv_array fcoll_base_sort_iovec mpit_big_lock mpit_init_count mpit_lock mpit_unlock netpatterns_base_err netpatterns_base_verbose netpatterns_cleanup_narray_knomial_tree netpatterns_cleanup_recursive_doubling_tree_node netpatterns_cleanup_recursive_knomial_allgather_tree_node netpatterns_cleanup_recursive_knomial_tree_node netpatterns_init netpatterns_register_mca_params netpatterns_setup_multinomial_tree netpatterns_setup_narray_knomial_tree netpatterns_setup_narray_tree netpatterns_setup_narray_tree_contigous_ranks netpatterns_setup_recursive_doubling_n_tree_node netpatterns_setup_recursive_doubling_tree_node netpatterns_setup_recursive_knomial_allgather_tree_node netpatterns_setup_recursive_knomial_tree_node pml_v_output_close pml_v_output_open intercept_extra_state_t odls_base_default_wait_local_proc _event_debug_mode_on _evthread_cond_fns _evthread_id_fn _evthread_lock_debugging_enabled _evthread_lock_fns cmd_line_option_t cmd_line_param_t crs_base_self_checkpoint_fn crs_base_self_continue_fn crs_base_self_restart_fn event_enable_debug_output event_global_current_base_ event_module_include eventops sync_wait_mt trigger_user_inc_callback var_type_names var_type_sizes Signed-off-by: Mark Allen <markalle@us.ibm.com>	2017-07-11 02:13:23 -04:00
Gilles Gouaillardet	fa5cd0dbe5	use ptrdiff_t instead of OPAL_PTRDIFF_TYPE since Open MPI now requires a C99, and ptrdiff_t type is part of C99, there is no more need for the abstract OPAL_PTRDIFF_TYPE type. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-04-19 13:41:56 +09:00
George Bosilca	366d64b7e5	Move the collective structure outside the communicator. As we changed the ABI (forcing a major release), we can limit the size of the predefined communicators by moving the collective structure outside the communicator. This might have a minimal, but unnoticeable, impact on performance. This approach has been discussed during the January 2017 devel meeting. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-27 11:54:17 -06:00
Edgar Gabriel	b10558c3da	fcoll/dynamic_gen2: fix bug exposed by uneven distribution of data This fixes a bug reported in-house occuring with this component. It is triggered if the data assigned to different aggregators is highly differing, leading to different number of internal iterations required to handle it. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2016-11-24 13:02:19 -06:00
Ralph Castain	1e2019ce2a	Revert "Update to sync with OMPI master and cleanup to build" This reverts commit `cb55c88a8b`.	2016-11-22 15:03:20 -08:00
Ralph Castain	cb55c88a8b	Update to sync with OMPI master and cleanup to build Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2016-11-22 14:24:54 -08:00
Gilles Gouaillardet	6b57b77ecb	configury: add the --disable-io-ompio option --disable-io-ompio is a shortcut that disable the following frameworks and components - fbtl - fcoll - sharedfp - common/ompio - io/ompio Fixes open-mpi/ompi#1934	2016-09-23 09:41:09 +09:00
Edgar Gabriel	19fe5cac50	io/ompio: next step in code-reorganization - move the sort_iovec operations to fcoll/base - move set_view_internal to common/ompio - move set_file_default to common/ompio - remove io_ompio_sort, not used anymore.	2016-08-02 09:18:29 -05:00
Edgar Gabriel	160d9a78c1	Merge pull request #1886 from edgargabriel/pr/ompio-reorg io/ompio: move io/ompio functionality to common/ompio	2016-07-29 12:24:21 -05:00
Edgar Gabriel	ccf76b7791	moving the internal read/write functions to common/ompio and update all fs/fcoll/sharedfp components to use these functions.	2016-07-21 13:08:32 -05:00
Edgar Gabriel	39ae93b87b	modify the fcoll components to use the common/ompio print queues	2016-07-21 13:08:32 -05:00
Edgar Gabriel	a899c0fb38	fcoll/static: fix coverty warnings fix coverty warnings CID 72144, CID 710677, CID 1364164	2016-07-21 13:08:15 -05:00
Edgar Gabriel	195ec89732	fcoll/base: mv coll_array functionis to fcoll base the coll_array functions are truly only used by the fcoll modules, so move them to fcoll/base. There is currently one exception to that rule (number of aggreagtors logic), but that function will be moved in a long term also to fcoll/base.	2016-07-14 08:41:14 -05:00
Nathan Hjelm	70533e6d50	fcoll/static: fix coverity issues Fix CID 72362: Explicit null dereferenced (FORWARD_NULL) From what I can tell the code @ fcoll_static_file_read_all.c:649 should be setting bytes_per_process[i] to 0 not bytes_per_process. Fix CID 72361: Explicit null dereferenced (FORWARD_NULL) Modified check to check for blocklen_per_process non-NULL before trying to free blocklen_per_process[l]. This is sufficient because free (NULL) is safe. Also cleaned up the initialization of this an a couple other arrays. They were allocated with malloc() then initialized to 0. Changed to used calloc(). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 14:48:13 -06:00
Nathan Hjelm	8871bdb2f8	fcoll/two_phase: fix coverity issues Fix CID 72296: Resource leak (RESOURCE_LEAK): Changed code to goto exit instead of returning to ensure memory is freed. Fix CID 712589: Out-of-bounds read (OVERRUN): In this loop i and j are identical and always less than iov_count. The CID was triggered because i was incremented if i was < iov_count. This meant that if the loop did go on the next iteration would access an invalid index. Fix CID 741363: Uninitialized scalar variable (UNINIT): Allocate tmp_len with calloc to insure every index is initialized. Fix CID 741364: Uninitialized pointer read (UNINIT): Allocate recv_types with calloc to ensure all indices are always initialized. Also added a check to not loop and destroy if recv_types is NULL. Also added a NULL check on the allocation of decoded iov. This is not the cause of CID 126784 but should be fixed. Fix CID 712588: Out-of-bounds read (OVERRUN): Similar to CID 712589. Should silence the issue. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-04-19 14:47:41 -06:00
Edgar Gabriel	45003ef78d	fix the data size counter for large ops for the static fcoll component	2016-02-23 08:33:50 -06:00
Edgar Gabriel	92d1b99468	optimize the shuffle step: 1. use communicator collectives if possible for performance reasons 2. combined multiple allgathers into a single one	2016-02-19 11:04:04 -06:00
Edgar Gabriel	e63836c653	clean up the mca parameter handling of the component. Add new parameters for number of sub groups and write chunk size. This will allow to perform a systematic parameter study.	2016-02-19 10:15:28 -06:00
Edgar Gabriel	4f400314e0	add the dynamic_gen2 component into the fcoll selection table.	2016-02-19 09:32:54 -06:00
Edgar Gabriel	268d525053	change the tag to be a positive value. handle 0-byte situations correctly.	2016-02-19 08:28:50 -06:00
Edgar Gabriel	ad79012059	first cut on the version which overlaps the communication/computation of 2 iterations.	2016-02-19 08:28:50 -06:00
Edgar Gabriel	b253d4e887	fix CID 1349739, CID 1349738, CID 1349736 and (probably) CID 1349740 (not entirely sure about the last one, since I don't understand why block[i] is a problem but max_len[i] allocated and treated exactly the same way 1 line later is not).	2016-01-21 08:32:23 -06:00
Edgar Gabriel	a9ca37059a	improve the communicaton abstraction. This commit also allows all aggregators to work simultaniously, instead of the slightly staggered way of the previous version.	2016-01-17 09:48:49 -06:00
Edgar Gabriel	26c57ef374	separate the size of the buffer used for the shuffle step and the size of the buffer used for a pwritev operation.	2016-01-17 09:48:49 -06:00
Edgar Gabriel	39d5c8c281	further bug fixes silencing a compiler warning and fixing a memory overrun	2016-01-17 09:48:49 -06:00
Edgar Gabriel	2bcae84e11	further debugging	2016-01-17 09:48:49 -06:00
Edgar Gabriel	2bdd6ba17a	correctly free some buffers, and ensure that lustre_stripe_size and stripe_count are always read from the file system.	2016-01-17 09:48:49 -06:00
Edgar Gabriel	d282e94b67	add the new dynamic_gen2 component, designed to coexist for now with the original dynamic component	2016-01-17 09:48:49 -06:00
Gilles Gouaillardet	bfe8e03d9d	fcoll/two_phase: use ompi_mpi_abort instead of PMPI_Abort Thanks Jeff for the review	2015-12-07 11:34:36 +09:00
Gilles Gouaillardet	002c7b8b3a	fcoll/two_phase: use PMPI_* insted of MPI_*	2015-11-20 13:46:19 +09:00
Nathan Hjelm	5122327727	fcoll/two_phase: fix new coverity errors Fix CID 1325467: use after free Remove extra free of aggregator_list. Fix CID 1325466: resource leak Fix typo in prior coverity fix. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2015-10-02 21:38:31 -06:00
Nathan Hjelm	95b95e19af	fcoll/dynamic: fix coverity errors Fixes CID 72320: Explicit NULL dereferenced On error it is possible that the blocklen_per_process array is NULL. Change the NULL check before the free to check for non-NULL on the array not the array element. Also clean up allocation of this array to use calloc instead of malloc + setting each element to NULL. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-10-01 14:38:09 -06:00
Nathan Hjelm	09df7aa205	fcoll/two_phase: fix coverity errors Fixes CIDs 72300, 72344, 1196764-1196768, 72300: Resource leaks Mulitple allocated arrays are going out of scope at the end of mca_fcoll_two_phase_file_write_all. Free these arrays. Also removed the extraneous NULL checks since free (NULL) is safe in C. Change returns to goto exit where the allocated resources are freed. Fixes CIDs 72285-72292, 72297, 72298: Resource leaks Change all appropriate return statements to goto exit to ensure that all resources are freed. Also removed the NULL checks since free (NULL) is safe in C. Fixes CIDs 72295, 72296: Resource leaks Moved free of requests and recv_types to after exit label. This will ensure these are freed on error. Also added a loop and statement to free send_buf which is going out of scope at the end of the function. Fixes CIDs 72336-72240, 735197, 735198: Resource leaks Moved the exit label before to before the resources are released and changed all appropriate return statements to goto exit. Also removed extraneous NULL checks because free (NULL) is safe in C. Fixes CIDs 72341, 72343, 1196805-1196809: Resource leaks Free all resources after exit label and change return statements to goto exit to ensure all resources are freed on error. Fixes CID 1269973: Unused value Check return code of ompi_request_wait_all. If it fails jump to the exit. Fixes CID 714119: Dereference before NULL check Wrong value checked in conditional. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2015-10-01 14:38:09 -06:00
Edgar Gabriel	01fcfb08fe	do not set the contigous flag in two_phase_file_read_all. This optimization needs some more debugging for the two_phase component, and is disabled for two_phase_file_write_all as well.	2015-09-18 09:30:50 -05:00
Gilles Gouaillardet	fe351f6801	io: do not cast way the const modifier when this is not necessary update the io framework and mpi c bindings	2015-09-09 09:18:58 +09:00
Jeff Squyres	bc9e5652ff	whitespace: purge whitespace at end of lines Generated by running "./contrib/whitespace-purge.sh".	2015-09-08 09:47:17 -07:00
Edgar Gabriel	c83e6ad0c8	fix coverty warnings 1322865 and 72136	2015-09-08 09:15:57 -05:00
Edgar Gabriel	c9710660af	Merge pull request #863 from edgargabriel/topic/fcoll-static-cleanup Topic/fcoll static cleanup	2015-09-03 11:21:02 -05:00
Edgar Gabriel	a96a15a83c	re-enable the contiguous buffer optimization similarly to the dynamic component. Passes all hdf5testsi and our own test suite. Please enter the commit message for your changes. Lines starting	2015-09-03 10:13:03 -05:00
Edgar Gabriel	8007effc93	code cleanup for static component, similarly to the dynamic one	2015-09-03 10:12:45 -05:00
Edgar Gabriel	ac3a01c39c	Silence coverty warnings 1321702, 1321701, 1321700, 72331, 72330, 72327, 72326, 72325,	2015-09-03 09:10:25 -05:00
Edgar Gabriel	82efc23e8d	iclean up indenting and tabs/space of fcoll_static_file_read/write_all	2015-09-01 09:39:33 -05:00
Edgar Gabriel	a1778406d6	Re-enable the contiguous buffer optimization to the read_all and the write_all routines. After long debugging, I found last week the reason this optimization originally broke some hdf5 tests. We now pass the hdf5 test suite with the optimization being actively used.	2015-09-01 09:29:07 -05:00
Edgar Gabriel	c2c44b11dc	Code cleanup for dynamic read_all and write_all Specifically: - reduce the number of realloc's and malloc's by moving some arrays out of the cycle loop, if we know that there size is not changing - store the rank of the aggregator in a separate variable to avoid continuous dereferencing - change the wait_all logic in write_all to use a fix number of requests (even if they are MPI_REQUEST_NULL) - fix the timing to considere the two initial allgather and the one allgatherv operation to be a part of it - add more comments.	2015-09-01 09:29:07 -05:00
Edgar Gabriel	cf1e4e0d35	step 0: clean up indenting and space vs. tabs	2015-09-01 09:29:07 -05:00
Edgar Gabriel	6f2e8d2073	last nights coverty fix introduced a new coverty complain. This commit tries to fix the new complain by coverty.	2015-08-25 08:46:38 -05:00

1 2 3

144 Коммитов