openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	c1ce233eaf	Merge pull request #4143 from aravindksg/psm2_cuda Add support for GPU buffers for PSM2 MTL	2017-09-01 21:09:55 -07:00
Aravind Gopalakrishnan	2e83cf15ce	Add support for GPU buffers for PSM2 MTL PSM2 enables support for GPU buffers and CUDA managed memory and it can directly recognize GPU buffers, handle copies between HFIs and GPUs. Therefore, it is not required for OMPI to handle GPU buffers for pt2pt cases. In this patch, we allow the PSM2 MTL to specify when it does not require CUDA convertor support. This allows us to skip CUDA convertor init phases and lets PSM2 handle the memory transfers. This translates to improvements in latency. The patch enables blocking collectives and workloads with GPU contiguous, GPU non-contiguous memory. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2017-09-01 16:59:03 -07:00
Ralph Castain	2c723f4338	Roll to track PMIx master Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-09-01 12:30:34 -07:00
Nathan Hjelm	79fc9d54dc	Revert "* Some recent versions of GCC try very hard to make it impossible to" This reverts commit b5ea5e0994a827915107e03d6744e73156534a04 This commit reverts a change that is hopefully not necessary. If this is the case this will fix #4146. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-09-01 08:47:29 -06:00
Gilles Gouaillardet	c9cca771cc	pmix/ext2x: automatically generate ext2x component from pmix2x sources Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-08-30 09:41:31 +09:00
Gilles Gouaillardet	fd08b923d5	pmix: do not invoke PMIX_INFO_CREATE() with a zero size Thanks Lisandro Dalcin for the report Fixes open-mpi/ompi#3854 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-08-28 11:25:58 +09:00
Josh Hursey	ad87aa2674	Merge pull request #4121 from jjhursey/explore/dlopen-local mca: Dynamic components link against project lib	2017-08-25 13:15:51 -05:00
Joshua Hursey	e1d079544b	mca: Dynamic components link against project lib * Resolves #3705 * Components should link against the project level library to better support `dlopen` with `RTLD_LOCAL`. * Extend the `mca_FRAMEWORK_COMPONENT_la_LIBADD` in the `Makefile.am` with the appropriate project level library: ``` MCA components in ompi/ $(top_builddir)/ompi/lib@OMPI_LIBMPI_NAME@.la MCA components in orte/ $(top_builddir)/orte/lib@ORTE_LIB_PREFIX@open-rte.la MCA components in opal/ $(top_builddir)/opal/lib@OPAL_LIB_PREFIX@open-pal.la MCA components in oshmem/ $(top_builddir)/oshmem/liboshmem.la" ``` Note: The changes in this commit were automated by the script in the commit that proceeds it with the `libadd_mca_comp_update.py` script. Some components were not included in this change because they are statically built only. Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-08-24 11:56:16 -04:00
Ralph Castain	68029b27e4	Fix the orte-dvm operations so that orterun can connect and execute an application. There is a lingering problem, though. The first invocation of orterun succeeds every time. However, subsequent invocations have a high probability of hanging in the OOB connection handshake. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-23 17:31:08 -07:00
Ralph Castain	0561d64748	Continue tracking PMIx v2.1.0 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-23 09:38:27 -07:00
Ralph Castain	e02c39385a	Merge branch 'master' into topic/modex	2017-08-22 20:06:35 -07:00
George Bosilca	50f471e31e	Cleanup a set of warnings reported by Ralph. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-08-22 23:00:18 -04:00
Gilles Gouaillardet	565b516dae	hwloc/base: fix opal_output() usage Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-08-23 10:24:47 +09:00
Ralph Castain	d80b0c7990	If the HWLOC shared memory system is unable to connect, then fallback to providing the topology via XML. Do not automatically provide the XML to every process as that defeats the purpose of the shared memory system. Instead, use PMIx_Query_info_nb to get the info from the server when required. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-22 18:12:26 -07:00
Ralph Castain	38e363c515	Fix the #if check for hwloc version Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-22 14:07:36 -07:00
Ralph Castain	e3213386ec	Fix the internal PMIx installation - matching changes have been upstreamed Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-22 13:49:07 -07:00
Ralph Castain	a1b15c5666	Roll in update to PMIx master. Transfer updates from pmix2x component to ext2x Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-22 13:06:47 -07:00
Brice Goglin	2d242ab9f0	hwloc/shmem: don't abort on failure to load from shmem Adopting can fail if the server-side hole isn't available on the client. We can fallback to other ways to load the topology. Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>	2017-08-21 19:57:38 +02:00
Brice Goglin	ffd209fc2e	hwloc/shmem: dump /proc/self/maps if failed to find a hole and verbosity > 4 Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>	2017-08-21 19:57:38 +02:00
Ralph Castain	d515f48885	The local PMIx server is notifying its clients of all events, but for some reason I don't recall, the broadcast notification was marked for delivery only to non-default event handlers. This creates a discrepancy between the two behaviors, so don't restrict the broadcast notifications. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-18 17:26:11 -07:00
Brian Barrett	c667719a3f	Merge pull request #3955 from mohanasudhan/master Btl tcp: Improved diagnostic output and failure mode	2017-08-18 11:42:27 -07:00
Mohan	fc32ae401e	Btl Tcp: Updated tcp handshake methods This commit has two changes 1. Adding magic string during handshake can cause issue when used with older version of MPI. Hence set RCVTIMEO paramter to 2 second 2. Using single call during handshake instead of two calls Signed-off-by: Mohan Gandhi <mohgan@amazon.com>	2017-08-18 10:06:52 -07:00
Mohan	e3dfe11da9	Btl tcp: Improving verbose around tcp As part of improvement towards tcp btl we are improving verbose in general Signed-off-by: Mohan Gandhi <mohgan@amazon.com>	2017-08-17 17:22:16 -07:00
Mohan	4bc7b214dc	Btl tcp: Improving verbose around IPV6 As part of improvement around tcp btl debugging & verbose. we are improving verbose around IPV6 Signed-off-by: Mohan Gandhi <mohgan@amazon.com>	2017-08-17 16:45:14 -07:00
Mohan	0741fad479	Btl tcp: BTL_ERROR to show_help & update func behaviour As part of improvement towards tcp debugging we are moving few BTL_ERROR to show_help and also update the function behaviour of mca_btl_tcp_endpoint_complete_connect to return SUCCESS and ERROR cases. Signed-off-by: Mohan Gandhi <mohgan@amazon.com>	2017-08-17 16:45:14 -07:00
Mohan	368f9f0dfc	Btl tcp: Using magic string to verify mpi connection As part of improvement towards handling failure case in btl tcp we are using magic string to verify mpi connection. In case if there is mismatch or missing magic string we can identify that we are trying to connect with someother process. Signed-off-by: Mohan Gandhi <mohgan@amazon.com>	2017-08-17 16:45:13 -07:00
Mohan	c30a42917c	Btl tcp: Refactoring non-blocking send/receive function Moving non-blocking send/receive function to btl_tcp will help reusing these function where ever needed. In this case we plan to reuse receive function to retrive magic string to validate established connection is from mpi process. Signed-off-by: Mohan Gandhi <mohgan@amazon.com>	2017-08-17 16:45:13 -07:00
Ralph Castain	088b6cdeee	Silence coverity warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-17 09:49:35 -07:00
Ralph Castain	41df973359	Add diagnostics for hwloc get_topology Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-16 14:21:27 -07:00
Jeff Squyres	cd8db5313e	Merge pull request #4101 from jsquyres/pr/usnic-restore-configure-summary-line btl/usnic: restore configure usNIC summary line	2017-08-16 16:36:19 -04:00
Jeff Squyres	a591159fb4	btl/usnic: restore configure usNIC summary line Not sure how/when this got deleted, but put back the "Cisco usNIC" line in the transport summary at the end of configure. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-08-16 12:37:59 -07:00
Ralph Castain	c4d5dbfcdc	Change test per recommendation of @jsquyres Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-16 11:19:15 -07:00
Jeff Squyres	ce3a032b5e	rcash_base_frame: fix compiler warning Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-08-16 09:48:31 -07:00
Ralph Castain	eb69df02ae	Update to PMIx v2.1.0rc1 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-15 19:59:15 -07:00
Ralph Castain	65fb6070d9	Update tool support by adding MCA params to direct orted's to drop session and/or system-level tool rendezous files. Ensure PMIx is enabled for tools Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-15 17:49:47 -07:00
Ralph Castain	98f36711e3	Update hwloc to latest shmem branch. Correct typos in update-my-copyright.pl. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-15 13:32:12 -07:00
Ralph Castain	033a0eb373	Fix the --disable-dlopen --with-devel-headers case by not having libpmix link back to libopen-pal as the latter won't exist in time during this build case Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-15 10:51:35 -07:00
Ralph Castain	daf548b328	Apply patch from @bgoglin Fixes #4027 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-11 07:16:14 -07:00
Ralph Castain	4290247d64	Update to latest PMIx v2.1.0a Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-10 18:48:07 -07:00
Howard Pritchard	6dfb48d866	Merge pull request #4056 from hppritcha/topic/swat_issue_4020 mca/registry: fix problem group_component_register	2017-08-09 10:25:00 -06:00
Jeff Squyres	6889948475	Merge pull request #4058 from thananon/pr/usnic_fix_credit btl/usnic: assign the number of send credit correctly.	2017-08-09 11:46:42 -04:00
Howard Pritchard	55774d1390	mca/registry: fix problem group_component_register Turns out that supplying NULL to group_register in the mca_base_var_group_component_register is not a good idea if one wants for ompi_info to work as intended. The ugni and vader btl's both call this before registering component variables. This borks up the ompi_info works since NULL is supplied as the project name. So, now supply the project name rather than just NULL to group register. Fixes #4020. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-08-08 19:50:27 -06:00
Thananon Patinyasakdikul	68658e4bab	btl/usnic: assign the number of send credit correctly. usnic endpoints was always created with default send credit value of 8. This commit assign the correct number from the hardware instead. Signed-off-by: Thananon Patinyasakdikul <apatinya@cisco.com>	2017-08-08 17:01:16 -07:00
Ralph Castain	53c9270af7	Silence coverity warnings Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-08 06:10:14 -07:00
Nathan Hjelm	b870d150dd	rcache/base: remove erroneous comment Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-08-07 15:17:12 -06:00
Nathan Hjelm	76320a8ba5	opal: rename opal_atomic_init to opal_atomic_lock_init This function is used to initalize and opal atomic lock. The old name was confusing. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-08-07 14:15:11 -06:00
Ralph Castain	9921237f99	Merge pull request #4012 from rhc54/topic/p3 Cover the use-cases for OPAL_PREFIX and PMIX_INSTALL_PREFIX options	2017-08-07 11:42:53 -07:00
Ralph Castain	9499acc56a	Merge pull request #4043 from rhc54/topic/libpmix Fix libpmix linking	2017-08-07 11:28:15 -07:00
Ralph Castain	d593e5a4ce	When we specify --with-devel-headers, we also emit a copy of libpmix. However, that library was built against the OPAL libevent component, which means all the libevent functions are prefixed with OPAL names. So ensure that the emitted libpmix is linked back against libopen-pal so those symbols will be resolved. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-08-07 09:36:16 -07:00
Nathan Hjelm	813762334e	memory/patcher: hook madvise It is not possible to use the patcher based memory hooks without hooking madvise (MADV_DONTNEED). This commit updates the patcher memory hooks to always hook madvise. This should be safe with recent rcache updates. References #3685. Close when merged into v2.0.x, v2.x, and v3.0.x. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-08-07 10:29:45 -06:00

1 2 3 4 5 ...

4900 Коммитов