openmpi

Автор	SHA1	Сообщение	Дата
Boris Karasev	dca3dd2ea4	pmix: dstore returned for direct modex Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2018-03-20 04:56:48 +02:00
Ralph Castain	9eb426e288	Merge pull request #4924 from karasevb/pmix_fix_dmdx Sync to PMIx master PR pmix/pmix#697	2018-03-19 06:24:06 -07:00
Boris Karasev	36a0c6a794	pmix: fixed the direct modex request This commit fixes the case when local client asks for the key from the process on the remote node. The local server don't have commit count for remote ranks, it is maintained by another PMIx server, so commit count should be ignored for remote requests. Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2018-03-19 11:51:03 +02:00
Brian Barrett	a2d1419185	Merge pull request #4921 from bwbarrett/master-NEWS dist: Sync 2.1.3 NEWS items into master	2018-03-16 21:01:17 -07:00
Brian Barrett	ab19602752	dist: Sync 2.1.3 NEWS items into master Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-03-16 12:28:41 -07:00
bosilca	bf3dd8af19	Merge pull request #4884 from bosilca/topic/fix_wtime Improve the range and accuracy of MPI_Wtime.	2018-03-16 14:09:33 +09:00
Aurelien Bouteiller	e08e580e27	Merge pull request #4916 from abouteiller/topic/scaling.pl-m Scaling.pl: Fix Srun options and wait for DVM launch	2018-03-15 22:06:01 -04:00
Nathan Hjelm	7f4872d483	osc/rdma: performance improvments and bug fixes This commit is a large update to the osc/rdma component. Included in this commit: - Add support for using hardware atomics for fetch-and-op and single count accumulate when using the accumulate lock. This will improve the performance of these operations even when not setting the single intrinsic info key. - Rework how large accumulates are done. They now block on the get operation to fix some bugs discovered by an IBM one-sided test. I may roll back some of the changes if the underlying bug in the original design is discovered. There appear to be no real difference (on the hardware this was tested with) in performance so its probably a non-issue. References #2530. - Add support for an additional lock-all algorithm: on-demand. The on-demand algorithm will attempt to acquire the peer lock when starting an RMA operation. The lock algorithm default has not changed. The algorithm can be selected by setting the osc_rdma_locking_mode MCA variable. The valid values are two_level and on_demand. - Make use of the btl_flush function if available. This can improve performance with some btls. - When using btl_flush do not keep track of the number of put operations. This reduces the number of atomic operations in the critical path. - Make the window buffers more friendly to multi-threaded applications. This was done by dropping support for multiple buffers per MPI window. I intend to re-add that support once the underlying performance bug under the old buffering scheme is fixed. - Fix a bug in request completion in the accumulate, get, and put paths. This also helps with #2530. - General code cleanup and fixes. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-03-15 14:53:53 -06:00
Aurélien Bouteiller	9e23d24bb4	Scaling.pl: Fix Srun options and wait for DVM launch Flush out the DVM ready notice on stdout Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2018-03-15 00:00:49 -04:00
Jeff Squyres	5f58e7b961	Merge pull request #4910 from jsquyres/pr/reset-opal-cuda-verbose-value opal_datatype_module.c: reset opal_cuda_verbose	2018-03-13 14:01:34 -04:00
Jeff Squyres	2713a24009	opal_datatype_module.c: reset opal_cuda_verbose 999de137ce6 accidentally reset opal_cuda_verbose's default value. This commit puts it back. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-03-13 10:10:15 -07:00
Jeff Squyres	695b92ec7b	Merge pull request #4906 from blegat/doctypo Fix typo in MPI_Cart_shift doc	2018-03-13 11:29:44 -04:00
Benoît Legat	00600c7cbb	Fix typo in MPI_Cart_shift doc Signed-off-by: Benoît Legat <benoit.legat@gmail.com>	2018-03-13 15:25:42 +01:00
Josh Hursey	ae1d3183f9	Merge pull request #4891 from jjhursey/fix/mpir-symbol-vis Fix MPIR_proctable structure visibility	2018-03-13 08:09:15 -05:00
Edgar Gabriel	50d07e9622	Merge pull request #4900 from edgargabriel/topic/two_phase_data_sieving_fix fcoll/two_phase: data sieving has to occur at offset 0 as well	2018-03-10 12:18:15 -06:00
Edgar Gabriel	da640f98df	fcoll/two_phase: data sieving has to occur at offset 0 as well data sieving has to occur for any offset provided that is larger or equal zero for this implementation to work correctly. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-03-10 11:23:09 -06:00
Joshua Hursey	ccb4f43c9b	Fix MPIR_proctable structure visibility * The `MPIR_PROCDESC` structure needs to be visible even in optimized builds so that debuggers can attach to `mpirun` and properly read the `MPIR_proctable`. * In the v2.0.x and v2.x series this structure resided in the `orterun` directory and included the `CFLAGS` fix included here. This code moved in the v3.x series and the `CFLAGS` did not move causing this issue. - Instead of applying the debug `CFLAGS` globally to libopen-rte, only apply them to the `orted_submit.c` compile which contains the MPIR symbols. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2018-03-09 21:15:28 -05:00
Edgar Gabriel	0f345c068a	Merge pull request #4888 from edgargabriel/topic/romio_size0_contiguous_flag io/romio314: mark datatypes of size 0 as contiguous	2018-03-08 13:28:13 -06:00
Jeff Squyres	70c59f78b9	Merge pull request #4883 from bosilca/topic/get_element_fix Topic/get element fix	2018-03-08 10:31:47 -05:00
Edgar Gabriel	c83b47c266	io/romio314: mark datatypes of size 0 as contiguous this commit fixes an issue observed with romio314 and the hdf5 1.10.x testsuite. The ADIOI_Datatype_iscontig() routine in romio314/src/io_romio314_module.c will now return for a datatype of size 0 that it is contiguous, even if the extent of the datatype is non-zero. This avoids a segmentation fault observed in the ADIOI_Flatten routine, and fixes this particular with the hdf5 1.10.x testsuite in OpenMPI with romio314. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-03-08 09:10:09 -06:00
George Bosilca	9bced03213	Improve the range and accuracy of MPI_Wtime. As discussed on https://github.com/mpi-forum/mpi-issues/issues/77#issuecomment-369663119 the conversion to double in the MPI_Wtime decrease the range and accuracy of the resulting timer. By setting the timer to 0 at the first usage we basically maintain the accuracy for 194 days even for gettimeofday. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-03-08 14:26:02 +09:00
George Bosilca	999de137ce	Fix the datatype debug. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-03-08 03:40:08 +09:00
George Bosilca	7848035195	Update the loop stats. The loop should be updated on each internal iteration. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-03-08 03:18:39 +09:00
Jordan Cherry	2f0e8153a5	Merge pull request #4247 from jocherry/btlTcpLinksBugFix tcp btl: Fix multiple-link connection establishment.	2018-03-07 08:40:37 -08:00
Alex Mikheev	04ec013da9	Merge pull request #4847 from alex-mikheev/topic/oshmem_group_cache_refactor oshmem: refactor group cache	2018-03-04 14:36:32 +02:00
Jeff Squyres	3235243d71	Merge pull request #4878 from ThemosTsikas/patch-1 Make more robust in finding NAG Fortran Compiler	2018-03-02 16:16:11 -05:00
Themos Tsikas	a8fc30f95a	configury: Make more robust in finding NAG Fortran Compiler The NAG Fortran check only matched "nagfor" exactly, and failed if a path to nagfor was provided. Also change "-pthread" into "-Wl,-pthread". Signed-off-by: Themos Tsikas <themos.tsikas@nag.co.uk> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-03-02 10:50:37 -08:00
Ralph Castain	f81828456f	Merge pull request #4854 from rhc54/topic/update Update ORTE to support PMIx v3	2018-03-02 05:49:29 -08:00
Ralph Castain	2f85db9791	Always register the nspace for jobs Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-03-02 02:00:31 -08:00
Ralph Castain	7241043809	Modify the internal logic for resolve nodes/peers The current code path for PMIx_Resolve_peers and PMIx_Resolve_nodes executes a threadshift in the preg components themselves. This is done to ensure thread safety when called from the user level. However, it causes thread-stall when someone attempts to call the regex functions from _inside_ the PMIx code base should the call occur from within an event. Accordingly, move the threadshift to the client-level functions and make the preg components just execute their algorithms. Create a new pnet/test component to verify that the prge code can be safely accessed - set that component to be selected only when the user directly specifies it. The new component will be used to validate various logical extensions during development, and can then be discarded. Signed-off-by: Ralph Castain <rhc@open-mpi.org> (cherry picked from commit 456ac7f7af3d9ba09888e3c899eb001daaa24aef)	2018-03-02 02:00:31 -08:00
Ralph Castain	17c40f4cea	Implement support for proctable queries Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-03-02 02:00:31 -08:00
Ralph Castain	0434b615b5	Update ORTE to support PMIx v3 This is a point-in-time update that includes support for several new PMIx features, mostly focused on debuggers and "instant on": * initial prototype support for PMIx-based debuggers. For the moment, this is restricted to using the DVM. Supports direct launch of apps under debugger control, and indirect launch using prun as the intermediate launcher. Includes ability for debuggers to control the environment of both the launcher and the spawned app procs. Work continues on completing support for indirect launch * IO forwarding for tools. Output of apps launched under tool control is directed to the tool and output there - includes support for XML formatting and output to files. Stdin can be forwarded from the tool to apps, but this hasn't been implemented in ORTE yet. * Fabric integration for "instant on". Enable collection of network "blobs" to be delivered to network libraries on compute nodes prior to local proc spawn. Infrastructure is in place - implementation will come later. * Harvesting and forwarding of envars. Enable network plugins to harvest envars and include them in the launch msg for setting the environment prior to local proc spawn. Currently, only OmniPath is supported. PMIx MCA params control which envars are included, and also allows envars to be excluded. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-03-02 02:00:31 -08:00
Gilles Gouaillardet	f15d6200af	Merge pull request #4832 from ggouaillardet/topic/vader_process_vm btl/vader: handle unexpected short read/write in process_vm_{read,write}v	2018-03-02 13:59:51 +09:00
Gilles Gouaillardet	9fedf2836e	btl/vader: handle unexpected short read/write in process_vm_{read,write}v Important note : According to the man page "On success, process_vm_readv() returns the number of bytes read and process_vm_writev() returns the number of bytes written. This return value may be less than the total number of requested bytes, if a partial read/write occurred. (Partial transfers apply at the granularity of iovec elements. These system calls won't perform a partial transfer that splits a single iovec element.)" So since we use a single iovec element, the returned size should either be 0 or size, and the do loop should not be needed here. We tried on various Linux kernels with size > 2 GB, and surprisingly, the returned value is always 0x7ffff000 (fwiw, it happens to be the size of the larger number of pages that fits a signed 32 bits integer). We do not know whether this is a bug from the kernel, the libc or even the man page, but for the time being, we do as is process_vm_readv() could return any value. Thanks Heiko Bauke for the bug report. Refs. open-mpi/ompi#4829 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-03-02 13:20:46 +09:00
Nathan Hjelm	5ed2fc2d48	mca/base: add support for additional variable types This commit adds long, int32_t, uint32_t, int64_t, and uint64_t as possible MCA variable types. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-03-01 20:42:27 -07:00
Josh Hursey	6f589546d3	Merge pull request #4867 from sam6258/prefix_dir_fix Fix PATH and LD_LIBRARY_PATH prefixing to use first app context value…	2018-03-01 10:25:04 -06:00
Scott Miller	d7e594fcff	Fix PATH and LD_LIBRARY_PATH prefixing to use first app context value for ORTE_APP_PREFIX_DIR Signed-off-by: Scott Miller <scott.miller1@ibm.com>	2018-02-28 18:41:47 -05:00
bosilca	9944d63de1	Merge pull request #4852 from thananon/pr/ob1_oos_fix pml/ob1: fixed out of sequence bug.	2018-02-28 13:02:03 -05:00
Thananon Patinyasakdikul	09cba8b30b	pml/ob1: fixed out of sequence bug. This commit fixes #4795 - Fixed typo that sometimes causes deadlock in change of protocol. - Redesigned out of sequence ordering and address the overflow case of sequence number from uint16_t. Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>	2018-02-27 13:49:40 -05:00
Jordan Cherry	d7e7e3acb7	tcp btl: Fix multiple-link connection establishment. Fix case where the btl_tcp_links MCA parameter is used to create multiple TCP connections between peers. Three issues were resulting in hangs during large message transfer: * The 2nd..btl_tcp_link connections were dropped during establishment because the per-process address check was binary, rather than a count * The accept handler would not skip a btl module that was already in use, resulting in all connections for a given address being vectored to a single btl * Multiple addresses in the same subnet caused connections to be stalled, as the receiver would always use the same (first) address found. Binding the outgoing connection solves this issue * Lastly fix race condition created by connections being started at the exact same time by accpeting connections not in the closed state, allowing endpoint_accept to resolve dispute Signed-off-by: Jordan Cherry <cherryj@amazon.com>	2018-02-27 16:36:44 +00:00
Nathan Hjelm	5380d7cce5	mpool/hugepage: add missing header Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-02-26 13:35:56 -07:00
Nathan Hjelm	38d9b10db8	rcache/base: update VMA tree to use opal_interval_tree_t This commit replaces the current VMA tree implementation with one that uses the new opal_interval_tree_t class. Since the VMA tree lock is no longer used this commit also updates rcache/grdma and btl/vader to take better care when searching for existing registrations. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-02-26 13:35:56 -07:00
Nathan Hjelm	7163fc98a0	opal/class: add a new class: opal_interval_tree_t This commit adds a new class to opal: opal_interval_tree_t. This is a thread-safe impelementation of a 1-dimensional interval tree. The data structure is intended to provide a faster implementation of the registration cache VMA tree. The thread safety is provided by a relativistic red-black tree implementation. This structure provides support for multiple-reader, and single writer. There is one caveat, an item may appear in the tree twice while the tree is being updated. Care needs to be taken to avoid issues associated with this "feature". I don't anticipate a problem with the current VMA tree usage. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-02-26 13:35:56 -07:00
Ralph Castain	5c59876451	Merge pull request #4864 from ggouaillardet/topic/pmix_configury configury: look for PMI header in DIR provided by --with-pmi=DIR	2018-02-25 18:10:39 -08:00
Gilles Gouaillardet	83dd8cd3fc	configury: look for PMI header in DIR provided by --with-pmi=DIR Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-02-26 09:46:26 +09:00
Gilles Gouaillardet	75acb172a8	Merge pull request #4853 from ggouaillardet/topic/configury_pmi configury: fix PMI detection	2018-02-25 19:04:57 +09:00
Gilles Gouaillardet	b86e0f04bf	configury: fix PMI detection and do not end up with -L/usr/lib[64] when PMI libraries are installed in the default location. Thanks Davide Vanzo for the report. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-02-24 01:16:12 +09:00
valentin petrov	40e0ae7326	Merge pull request #4850 from vspetrov/master coll/hcoll: Fix return codes	2018-02-22 21:26:47 +03:00
Valentin Petrov	bf4e694a96	coll/hcoll: Fix return codes Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2018-02-22 17:48:29 +02:00
Alex Mikheev	292d185c30	oshmem: refactor group cache - Use opal hash table instead of list for group lookup. - Code cleanup/refactoring. Group cache is now a part of the proc_group. Signed-off-by: Alex Mikheev <alexm@mellanox.com>	2018-02-22 11:48:06 +02:00

1 2 3 4 5 ...

28268 Коммитов