openmpi

Автор	SHA1	Сообщение	Дата
Edgar Gabriel	bc0f60dfd9	sharedfp/all components: revamp internal operations this commit revamps the internal operations of the sharedfp components. Specifically, it is focused around removing the second file_open operation for shared file pointers. This makes the code more efficient. Because of that, there is no necessity anymore for the sharedfp_lazy_open mca parameter. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-18 14:34:05 -05:00
Yossi Itigin	564f80d362	pml_ucx: add option to use opal memhooks instead of ucx internal hooks Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-06-17 15:30:44 +03:00
Gilles Gouaillardet	2caf1bf0e5	Merge pull request #5263 from ggouaillardet/topic/ompio_abstraction ompio: fix abstraction	2018-06-16 23:29:29 +09:00
Matias A Cabral	e6674556aa	MTL OFI: add support for FI_REMOTE_CQ_DATA. Extend number of supported ranks with providers that support FI_REMOTE_CQ_DATA. Add README file to OFI MTL Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-06-14 17:17:38 -07:00
Edgar Gabriel	d5bdcf8595	fs/pvfs2: fix compilation problem Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-14 09:30:45 -05:00
Howard Pritchard	7dcab6e4a4	Merge pull request #5269 from hppritcha/topic/squash_gcc7.3.0_warnings topo/treematch - quash compiler warning	2018-06-13 21:13:04 -05:00
Gilles Gouaillardet	cd45c7abb6	ompio: misc renames Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-14 09:41:10 +09:00
Gilles Gouaillardet	36b35ae0db	ompio: fix abstraction Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-14 09:41:10 +09:00
Howard Pritchard	64de269cc3	topo/treematch - quash compiler warning quash a compiler warning showing up with gcc 7.3 Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2018-06-13 16:34:17 -05:00
Thananon Patinyasakdikul	390d72addd	Merge pull request #4885 from davideberius/spc_pr Initial Software-based Performance Counters PR	2018-06-12 14:04:49 -07:00
David Eberius	d377a6b6f4	Added Software-based Performance Counters driver code along with several counters. This code is the implementation of Software-base Performance Counters as described in the paper 'Using Software-Base Performance Counters to Expose Low-Level Open MPI Performance Information' in EuroMPI/USA '17 (http://icl.cs.utk.edu/news_pub/submissions/software-performance-counters.pdf). More practical usage information can be found here: https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI. All software events functions are put in macros that become no-ops when SOFTWARE_EVENTS_ENABLE is not defined. The internal timer units have been changed to cycles to avoid division operations which was a large source of overhead as discussed in the paper. Added a --with-spc configure option to enable SPCs in the Open MPI build. This defines SOFTWARE_EVENTS_ENABLE. Added an MCA parameter, mpi_spc_enable, for turning on specific counters. Added an MCA parameter, mpi_spc_dump_enabled, for turning on and off dumping SPC counters in MPI_Finalize. Added an SPC test and example. Signed-off-by: David Eberius <deberius@vols.utk.edu>	2018-06-11 22:48:16 -04:00
KAWASHIMA Takahiro	a38e9e064f	coll: Update COLL module interface version to 2.3.0 Members for persistent operations are added to the module structure in a prior commit. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	e12a5056f1	coll/libnbc: Rename internal functions The `nbc_i` functions don't start communication, but create a request. `nbc__init` are appropriate names for them. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	5c21903477	coll/libnbc: Add assertion for `NBC_A2A_DISS` Persistent operation for `NBC_A2A_DISS` is not supported currently. Though the algorithm is not selected at all currently, I put an assertion not to select it by mistake. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	0b8b0f8393	coll/libnbc: Implement `MPI_STARTALL` Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	ed0144bad4	coll/libnbc: Adapt local copy for persistent request `NBC_Copy` shoud not be called in `MPI_*_INIT`. `NBC_Sched_copy` should be called instead. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	5c5de3a4fb	coll/libnbc: Fix handling of completed request Because a persistent reuqest does not free its `schedule` object when the communication completes, the `NBC_Progress` function cannot determine the completion using `schedule`. Without this change, a hang occurs when the `NBC_Progress` function is called recursively through the `NBC_Start_round` function. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	8e5690bf5c	coll/libnbc: Correct persistent request handling Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	e69e99575e	coll: Enable func check in `mca_coll_base_comm_select` Now libnbc COLL supports persistent collectives and all `*_init` functions of the COLL interface are available. So let's enable the check of availability of those functions on a communicator creation. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 09:53:37 +09:00
Gilles Gouaillardet	a9609b6bf8	coll/libnbc: add persistent collectives implementation Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-11 09:53:37 +09:00
KAWASHIMA Takahiro	a9fdea51aa	coll: Add persistent collective communication request feature Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 09:53:37 +09:00
Gilles Gouaillardet	c753e9baff	coll/libnbc: code refactoring prepare the upcoming persistent collectives by pre-factoring some code Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> fixup 808c3c62cd9475edd91ecde9d2d53b12e28b2c04	2018-06-11 09:53:37 +09:00
Gilles Gouaillardet	fe0bb6c310	coll/libnbc: misc revamp - merge NBC_Init_handle() into NBC_Schedule_request() - set schedule in NBC_Schedule_request instead of NBC_Start() - update NBC_Start() prototype Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-11 09:53:37 +09:00
Gilles Gouaillardet	360a76f440	coll/libnbc: revamp ibcast and use NBC_Schedule_request() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-11 09:53:37 +09:00
Edgar Gabriel	2d8a769bfd	fcoll/static: remove component now that we have a shiny new fcoll component, no need to keep the static component around. No use for it anymore. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-08 07:39:46 -05:00
Edgar Gabriel	b27a40cdf9	Merge pull request #5246 from edgargabriel/topic/ibm-testsuite-fixes Topic/ibm testsuite fixes	2018-06-08 06:06:49 -05:00
Yossi Itigin	fd12540751	Merge pull request #5227 from hoopoepg/topic/pml-ucx-hang-on-finalize PML/UCX: fixed hand on MPI_Finalize	2018-06-08 13:19:49 +03:00
Edgar Gabriel	a1484ec69a	io/ompio: check error conditions before executing file_sync check for pending I/O operations and invalid modes and return proper error codes before executing MPI_File_sync makes the e_sync_1 test from the ibm testsuite pass. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 19:30:27 -05:00
Edgar Gabriel	14bd114973	common/ompio: return error code from file_delete operation in file_close in case the user opened a file using the DELETE_ON_CLOSE flag, return the error code generated in the delete operation. Note, that this is however just a partial fix to the e_close_1 test from the ibm testsuite, since the object destructor that triggers the file_close function does not have a mechanism right now to recognize and return an error code. Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>	2018-06-07 19:30:14 -05:00
Edgar Gabriel	f7cae7731c	io/ompio: return error code for invalid offset in file_get_byte_offset, return an error code if the offset leads to an invalid position in file. Makes the e_get_byte_offset_1 test from the ibm testsuite pass. Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>	2018-06-07 18:46:17 -05:00
Edgar Gabriel	deaeaa60de	fcoll/vulcan: minor bugfix when creating the groups_per_proc arrays Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 17:52:32 -05:00
Edgar Gabriel	8feb497dbe	io/ompio: cleanup the aggregator selection logic and some internal structure elements/components. Along the way, add support for the cb_nodes Info object. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 16:47:10 -05:00
Edgar Gabriel	529d882ff0	io/ompio and common/ompio: relocate ompio_request code to common since the request code is now being accessed also from the vulcan fcoll component, the request code was relocated into the common/ompio directory to avoid ld load problems. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 16:13:12 -05:00
raafatfeki	5ecb4a56e3	fcoll/vulcan: Support of asynchronous write in collective writeAll We introduced a new mca_vulcan parameter that specify the I/O synchronization type (Async/sync I/O) applied within the collective write operation. The user can explicitly choose to use async or sync write operation or make the choice automatically made. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-06-07 16:13:12 -05:00
raafatfeki	4f7172ddf6	fcoll/vulcan: Support of larger offsets For very large offsets, the data chunk size to be written by each aggregator exceeds the capacity of an integer variable. Besides, some variables were not large enough to hold intermediate values. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-06-07 16:13:12 -05:00
raafatfeki	4670fe50d7	fcoll/vulcan: Remove unnecessary calls to write Identify the index of each aggregator process in order to restrict the call to write_init function by the specific aggregator. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-06-07 16:13:12 -05:00
raafatfeki	bc6431bee9	fcoll/vulcan: use hindexed constructor on the sender side Instead of using a temporary buffer and copy data into the temp buffer before sending, use a derived datatype to describe the data that needs to be sent during a cycle in the collective I/O operation. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-06-07 16:13:12 -05:00
Edgar Gabriel	1c2c110824	fcoll/vulcan: add new fcoll component import of the new vulcan component. It is an enhanced version of the two_phase component, which uses however the ompio internal codes/loops to assemble the data arrays. It is therefore more inline with the dynamic and dynamic_gen2 component, and will be easier to maintain. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 16:13:12 -05:00
Nathan Hjelm	63ded4d083	Merge pull request #5224 from benmenadue/master io/romio314: Replace deprecated MPI-1 functions	2018-06-06 15:41:53 -06:00
Ralph Castain	86d699d42e	Correct typo in name comparison flags Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-06 12:18:52 -07:00
Ralph Castain	840fb42f93	PMIx rte component does support dynamics Minor cleanups Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-05 21:55:19 -07:00
Nathan Hjelm	64a5baaa28	Merge pull request #5193 from hjelmn/osc_sm_location Use /dev/shm for shared memory files in osc components	2018-06-05 09:42:14 -06:00
Sergey Oblomov	0a8261f3b0	PML/UCX: fixed hand on MPI_Finalize fixes issue https://github.com/openucx/ucx/issues/2656 added flush for worker object to complete all pending operations Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-05 17:22:03 +03:00
Mikhail Kurnosov	3adf96fdb8	coll/base: add butterfly algorithm for MPI_Reduce_scatter Implements butterfly algorithm for MPI_Reduce_scatter. The algorithm can be used both by commutative and non-commutative operations, for power-of-two and non-power-of-two number of processes. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-06-05 15:53:13 +07:00
Ben Menadue	34ec0bd8ab	Replace MPI_Type_extent with MPI_Type_get_extent in ROMIO. Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>	2018-06-05 15:27:58 +10:00
Ben Menadue	756cc67221	Replace MPI_Address with MPI_Get_address in ROMIO. Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>	2018-06-05 15:27:25 +10:00
Ralph Castain	3020b699f3	Merge pull request #5213 from rhc54/topic/rte Enable the PMIx ompi/rte component	2018-06-03 10:23:40 -07:00
Ralph Castain	55ac526a67	Enable the PMIx ompi/rte component Get the OMPI rte/pmix component working. This was tested using PRRTE as the RM, configuring OMPI using: * autogen --no-orte * with external libevent, external hwloc, and external PMIx master * configuring PMIx master with the same libevent and hwloc * execute the application using PRRTE's "prun" launcher, which has the same cmd line as ORTE's mpirun Note that PMIx master appears to have a bug in the event notification system that caches job termination events. Thus, the first execution runs fine, but subsequent executions cause an "abort" when the OMPI default error handler is invoked upon notification of the prior job's termination. Will work that separately. Signed-off-by: Ralph Castain <rhc@open-mpi.org> (cherry picked from commit 134cca9ac0de092d767999357573a31703f72292)	2018-06-03 07:25:12 -07:00
Jeff Squyres	35438ae9b5	mpi/finalized: revamp INITIALIZED/FINALIZED Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be invoked during an attribute destruction callback (e.g., during the destruction of keyvals on MPI_COMM_SELF during the very beginning of MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false". Prior to this commit, we hung in FINALIZED if it were invoked during a COMM_SELF attribute destruction callback in FINALIZE. See https://github.com/open-mpi/ompi/issues/5084. This commit converts the MPI_INITIALIZED / MPI_FINALIZED infrastructure to use a single enum (ompi_mpi_state, set atomically) to represent the state of MPI: - not initialized - init started - init completed - finalize started - finalize past COMM_SELF destruction - finalize completed The "finalize past COMM_SELF destruction" state is what allows us to return "false" from MPI_FINALIZED before COMM_SELF has been fully destroyed / all attribute callbacks have been invoked. Since this state is checked at nearly every MPI API call (to see if we're outside of the INIT/FINALIZE epoch), care was taken to use atomics to set the ompi_mpi_state value in ompi_mpi_init() and ompi_mpi_finalize(), but performance-critical code paths can simply read the variable without needing to use a slow call to an opal_atomic_*() function. Thanks to @AndrewGaspar for reporting the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-01 13:36:29 -07:00
Edgar Gabriel	52bd606294	fcoll/dynamic_gen2: make sure that intermediate variables can hold the offset for very large offsets, ome ariables used in the fcoll/dynamic_gen2 code base were under certain circumstances not large enough to hold intermediate values. This issue was more detected in the vulcan component but could happen in the dynamic_gen2 component as well. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-01 06:53:38 -05:00

1 2 3 4 5 ...

6669 Коммитов