openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	3a93b535ec	Silence the flood of OSC/RDMA warnings Fixes #4950 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-03-25 16:12:41 -07:00
Jeff Squyres	871e5c76bc	Merge pull request #4960 from jsquyres/pr/warnings-fixes Coverity fix + compiler warning fixes	2018-03-23 14:47:56 -05:00
Jeff Squyres	c3adcb05eb	Miscellaneous compiler warnings fixes Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-03-23 11:45:30 -07:00
Nathan Hjelm	5f7ff5307e	fcoll/two_phase: do not use removed function (MPI_Address) Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-03-23 08:43:24 -06:00
Edgar Gabriel	36747cca67	io/ompio: disable the fcoll timing by default somehow the flag indicating to gather performance data on collective io operations has changed to 1 accidentally. Should be 0 ( false) by default. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-03-21 11:34:35 -05:00
Edgar Gabriel	aae8c6c6ad	remove addproc sharedfp component never got to move this sharedfp component into anything usable. Can easily be restored if necessary. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-03-21 11:27:01 -05:00
Edgar Gabriel	e703ac2da8	remove plfs components plfs components are at this point not utilized by anybody as far as I know. Easy to bring back if we want to. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-03-21 11:27:01 -05:00
Nathan Hjelm	7f4872d483	osc/rdma: performance improvments and bug fixes This commit is a large update to the osc/rdma component. Included in this commit: - Add support for using hardware atomics for fetch-and-op and single count accumulate when using the accumulate lock. This will improve the performance of these operations even when not setting the single intrinsic info key. - Rework how large accumulates are done. They now block on the get operation to fix some bugs discovered by an IBM one-sided test. I may roll back some of the changes if the underlying bug in the original design is discovered. There appear to be no real difference (on the hardware this was tested with) in performance so its probably a non-issue. References #2530. - Add support for an additional lock-all algorithm: on-demand. The on-demand algorithm will attempt to acquire the peer lock when starting an RMA operation. The lock algorithm default has not changed. The algorithm can be selected by setting the osc_rdma_locking_mode MCA variable. The valid values are two_level and on_demand. - Make use of the btl_flush function if available. This can improve performance with some btls. - When using btl_flush do not keep track of the number of put operations. This reduces the number of atomic operations in the critical path. - Make the window buffers more friendly to multi-threaded applications. This was done by dropping support for multiple buffers per MPI window. I intend to re-add that support once the underlying performance bug under the old buffering scheme is fixed. - Fix a bug in request completion in the accumulate, get, and put paths. This also helps with #2530. - General code cleanup and fixes. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-03-15 14:53:53 -06:00
Edgar Gabriel	da640f98df	fcoll/two_phase: data sieving has to occur at offset 0 as well data sieving has to occur for any offset provided that is larger or equal zero for this implementation to work correctly. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-03-10 11:23:09 -06:00
Edgar Gabriel	c83b47c266	io/romio314: mark datatypes of size 0 as contiguous this commit fixes an issue observed with romio314 and the hdf5 1.10.x testsuite. The ADIOI_Datatype_iscontig() routine in romio314/src/io_romio314_module.c will now return for a datatype of size 0 that it is contiguous, even if the extent of the datatype is non-zero. This avoids a segmentation fault observed in the ADIOI_Flatten routine, and fixes this particular with the hdf5 1.10.x testsuite in OpenMPI with romio314. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-03-08 09:10:09 -06:00
bosilca	9944d63de1	Merge pull request #4852 from thananon/pr/ob1_oos_fix pml/ob1: fixed out of sequence bug.	2018-02-28 13:02:03 -05:00
Thananon Patinyasakdikul	09cba8b30b	pml/ob1: fixed out of sequence bug. This commit fixes #4795 - Fixed typo that sometimes causes deadlock in change of protocol. - Redesigned out of sequence ordering and address the overflow case of sequence number from uint16_t. Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>	2018-02-27 13:49:40 -05:00
Valentin Petrov	bf4e694a96	coll/hcoll: Fix return codes Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2018-02-22 17:48:29 +02:00
Matias Cabral	0a822f8f99	Merge pull request #4821 from nrspruit/OFI_mtl_multi_event_progress MTL OFI: Added support for reading multiple CQ events in ofi progress	2018-02-20 14:59:47 -08:00
Jeff Squyres	9ef0f3d83a	ompi/monitoring: add .sh versionig to common monitoring lib Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-02-20 07:07:23 -08:00
Spruit, Neil R	e7bff501cd	MTL OFI: Added support for reading multiple CQ events in ofi progress -Updated ompi_mtl_ofi_progress to use an array to read CQ events up to a threshold that can be set by the Open MPI User. -Users can adjust the number of events that can be handled in the ompi_mtl_ofi_progress by setting "--mca mtl_ofi_progress_event_cnt #". -The default value for the the number of CQ events that can be read in a single call to ofi progress is 100 which is an average based off workload usecase anaylsis showing 70-128 as the range of multiple events returned during ofi progress. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-02-15 09:41:14 -05:00
Nathan Hjelm	0e83568466	coll/libnbc: do not take lock in progress if there are no requests This commit fixes a flaw in the progress function for libnbc. The function was unconditionally taking a lock even if there are no requests to process. This lock was showing up in vtune traces of multi-threaded benchmarks. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-02-13 09:51:01 -07:00
Edgar Gabriel	a3a734b6d2	io/ompio: correctly reset the request after performing the final OBJ_RELEASE on the request, reset the user level variable to MPI_REQUEST_NULL. Otherwise the c_2_f translation step in the fortran interface fails. Fixes issue #4807 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-02-13 09:18:25 -06:00
Jeff Squyres	e7f91f8068	Merge pull request #4527 from clementFoyer/osc-no-includes Remove inter-dependencies between OSC modules.	2018-02-09 15:49:56 -05:00
Nathan Hjelm	da9f833f4a	pml/ob1: ignore the eager limit of RDMA-only btls This commit fixes a flaw in the eager limit check in pml/ob1. The check was incorrectly checking if RDMA-only BTLs (BTLs without the send flag) has a valid eager limit. This commit fixes the check by adding an additional check for the send flag on the BTL module. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-02-07 12:42:44 -07:00
Clement Foyer	f5b4fc05f8	Remove inter-dependencies between OSC modules. The osc monitoring component needed to include other OSC components header in order to be able tu access communicator through the component specific ompi_osc__module_t structures. This commit remove the dependency, and resolve the issue #4523. Extend the common monitoring API. Now it's possible to translate from local rank to world rank from both the communicator and the group. * Remove useless hashtable as we directly use the w_group contained in window structure. Add automatic generation at config time. The templates are expanded at configure time. It creates a new header file that generates all the variables/functions needed. Adding this during the autogen automagicaly generates for each of the available modules the proper functions. Only keep a generated argv-style array. Following Jeff's advice, the configure.m4 file generate a simple array of module variables to be iterated over to find the proper module. Signed-off-by: Clement Foyer <clement.foyer@inria.fr>	2018-02-07 11:52:00 +00:00
Ralph Castain	7ddffc627d	Merge pull request #4776 from rhc54/topic/rte Correct abstraction break and update ignores	2018-01-31 04:34:05 -08:00
Ralph Castain	8e8a9aecc5	Correct abstraction break - direct reference to ORTE Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-01-30 21:19:14 -08:00
Gilles Gouaillardet	34b45cc879	osc/sm: fix the osc_free callback If component selection fails, then module->bases might be unallocated when ompi_osc_sm_free() in invoked, so test it before trying to free() module->bases[0]. Thanks Martin Binder for the report. Refs open-mpi/ompi#4770 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-01-31 11:23:21 +09:00
Sergey Oblomov	7a5811d0a8	request/state: update state for canceled request - fixed issue in set state for canceled request Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-01-29 18:26:20 +02:00
Edgar Gabriel	bcf26d419f	fs/ufs and fs/lustre: remove erroneous return statement an erroneous return statement has creeped in commit 1885d99 which leads to some processes not resetting stripe_size and stripe_count correctly. This can lead in 3.0.x to different fcoll modules being selected. The impact is not that dramatic on master and 3.1.x, but could lead to problems as well. Fixes #4745 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-01-24 14:07:21 -06:00
Xin Zhao	72ff2b1135	OMPI/OSC/UCX: adding atomic lock for fetch_and_op and compare_and_swap Signed-off-by: Xin Zhao <xinz@mellanox.com>	2018-01-18 00:36:22 +02:00
Yossi Itigin	f2851fd502	Merge pull request #4724 from alex-mikheev/topic/ucx_as_default ompi/oshmem: ucx is selected over yalla/ikrit by default	2018-01-17 17:41:49 +02:00
Alex Mikheev	640e945b9c	ompi: pml/ucx: blocking send using ucp_tag_send_nbr Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2018-01-17 15:54:18 +02:00
Alex Mikheev	ae326546f4	ompi/oshmem: ucx is selected over yalla/ikrit by default Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2018-01-17 15:08:04 +02:00
Matias Cabral	8049c06a96	Merge pull request #4580 from matcabral/fix_comments_pr_425_osc_rmda osc/rmda: fix missing opal_argv_free in mtls search.	2018-01-12 09:20:11 -08:00
Matias A Cabral	009ba475e1	osc/rmda: fix missing opal_argv_free in mtls search. Use asprintf in description message to avoid missing default values Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-01-11 14:29:16 -08:00
bosilca	ef38ca5663	Merge pull request #4644 from bosilca/topic/treematch Fix treematch topology assert	2018-01-02 21:21:54 -05:00
Alex Mikheev	e7bf0617cf	ompi: pml ucx: improve recv latency Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2017-12-26 16:24:16 +02:00
Aravind Gopalakrishnan	fb68726baf	MTL OFI: Allow retries in MTL progress for interrupted syscalls This fixes a regression in sockets provider which could return -EINTR value from fi_cq_read() due to a syscall being interrupted. The error value is currently interpreted as fatal condition. Relax the rule so that we can retry fi_cq_read() operation. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2017-12-20 14:58:49 -08:00
George Bosilca	38455845db	Fix asserts. In both cases we were comparing with the wrong size, it should be either the number of local processes or the number of nodes, and not the size of the communicator. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-12-20 11:51:35 -05:00
George Bosilca	808f865e9d	Force all output to use OMPI infrastructure. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-12-20 11:50:51 -05:00
Matias Cabral	2c86b8723d	Merge pull request #4510 from matcabral/mtl_psm2_shadow_vars New flag for MCA parameters that allows a behaving with a default value of "unset".	2017-12-04 12:25:37 -08:00
Howard Pritchard	b160cf6339	Merge pull request #4533 from hppritcha/topic/ofi_mtl_mprobe_fixes mtl/ofi: fix problem with mprobe/mrecv	2017-12-04 09:11:47 -07:00
Howard Pritchard	2233e44848	Merge pull request #4534 from hppritcha/topic/fix_a_segv_in_request pml/cm: check for request comp. before completing bsend	2017-12-04 09:09:41 -07:00
Gilles Gouaillardet	2f5b1e9fe0	Merge pull request #4551 from ggouaillardet/topic/communicator_mutex_c_lock Make usage of ompi_communicator_t, ompi_file_t and ompi_win_t mutex consistent	2017-12-04 09:20:52 +09:00
Edgar Gabriel	1f151be6d2	io/ompio: introduce a new function to retrieve mca parameter values ompio has the unique problem, that mca parameters set in the io/ompio component have to be accessible from other frameworks as well. This is mostly done to avoid a replication in the parameter names and to reduce the number of mca parameters that and end-user has to worry about. This commit introduces a generic function to retrieve ompio mca parameters, the function pointer is stored on the file handle. It replaces two functions that used the same concept already for one parameter each. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2017-12-01 10:00:23 -06:00
Gilles Gouaillardet	5f1a967351	ompi/file: rename ompi_file_t's f_mutex into f_lock in order to use a consistent name between ompi_file_t, ompi_win_t and ompi_communicator_t Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-12-01 16:06:22 +09:00
Nathan Hjelm	7893248c5a	opal/asm: add fetch-and-op atomics This commit adds support for fetch-and-op atomics. This is needed because and and or are irreversible operations so there needs to be a way to get the old value atomically. These are also the only semantics supported by C11 (there is not atomic_op_fetch, just atomic_fetch_op). The old op-and-fetch atomics have been defined in terms of fetch-and-op. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-11-30 10:41:23 -07:00
Nathan Hjelm	1282e98a01	opal/asm: rename existing arithmetic atomic functions This commit renames the arithmetic atomic operations in opal to indicate that they return the new value not the old value. This naming differentiates these routines from new functions that return the old value. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-11-30 10:41:22 -07:00
Nathan Hjelm	9d0b3fe9f4	opal/asm: remove opal_atomic_bool_cmpset functions This commit eliminates the old opal_atomic_bool_cmpset functions. They have been replaced by the opal_atomic_compare_exchange_strong functions. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-11-30 10:41:22 -07:00
Nathan Hjelm	45db3637af	osc/rdma: bug fixes This commit fixes the following bugs: - Allow a btl to be used for communication if it can communicate with all non-self peers and it supports global atomic visibility. In this case CPU atomics can be used for self and the btl for any other peer. - It was possible to get into a state where different threads of an MPI process could issue conflicting accumulate operations to a remote peer. To eliminate this race we now update the peer flags atomically. - Queue up and re-issue put operations that failed during a BTL callback. This can occur during an accumulate operation. This was an unhandled error case. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-11-29 12:43:58 -07:00
Nathan Hjelm	67e26b6e5a	Merge pull request #4482 from matcabral/osc_rdma_skip_mtls osc/rdma: mca parameter to list MTLs that lower osc rdma priority	2017-11-29 09:34:33 -07:00
Nathan Hjelm	647b40f3f2	Merge pull request #4442 from bosilca/topic/ob1_pvar Topic/ob1 pvar	2017-11-29 09:31:07 -07:00
Ralph Castain	7ad6886a30	Add a new OMPI rte component to support direct-launch using PMIx. Cleanup several places where abstraction violations crept into OMPI layer (direct reference of ORTE). Add some missing includes that were exposed by this change. Note that this compiles, but I haven't tested it for execution yet. Handing it over to Noah Evans for completion Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-11-28 12:05:01 -08:00

... 3 4 5 6 7 ...

6706 Коммитов