openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	4b6d220a83	You cannot include both pmi.h and pmi2.h as they have conflicting defines in them. Thanks to Kilian Cavalotti for pointing it out Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-19 11:53:54 -07:00
Jeff Squyres	ce0e1cd32c	Merge pull request #3201 from hppritcha/jjhursey-topic/timer-gettimeofday Jjhursey topic/timer gettimeofday	2017-03-18 20:12:36 -04:00
Jeff Squyres	b8dfd49e97	hwloc: re-enable use of autogen.pl in a tarball Commit `fec519a793` broke the ability to run autogen.pl in a distribution tarball. This commit restores that ability by also distributing opal/mca/hwloc/autogen.options in the tarball. Skipping CI because CI does not test this functionality: [skip ci] bot:notest Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-03-17 11:41:17 -07:00
Jeff Squyres	5219054d29	Merge pull request #3185 from jsquyres/pr/master/compiler-warning-squashes Compiler warning squashes	2017-03-16 10:12:08 -04:00
Jeff Squyres	b51c4e2797	memory/patcher: fix a compiler warning Don't define the madvise intercept functions since we're not currently intercepting madvise. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-03-16 05:43:51 -07:00
Jeff Squyres	616f20c52c	timer/linux: rename component-specific functions Several component-specific functions were named with a prefix of "opal_timer_base", which was quite confusing. Rename them to have a prefix "opal_timer_linux" to make it clear that they are here in this component (and different than actual opal_timer_base symbols). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-03-15 21:03:13 -05:00
Jeff Squyres	290d4598df	timer/linux: remove global variable This variable is only used in one file, so make it static. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-03-15 21:03:06 -05:00
Howard Pritchard	db2e1298fb	OSx: remove built-in atomics support It was decided to remove support for os-x builtin atomics Fixes #2668 Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2017-03-15 12:45:33 -06:00
Nathan Hjelm	6b210fa2c4	btl/ugni: do not return a frag from sendi if an endpoint is waitlisted This fixes a hang that can occur when running bandwidth tests. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-14 10:14:13 -06:00
Nathan Hjelm	2e42b0afbd	btl/ugni: move connection check into sync event This commit makes datagram checks time based and reduces their frequency when only the wildcard datagram is posted. This change improves latency on knl systems. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-14 10:10:05 -06:00
Nathan Hjelm	d5aaeb74b6	btl/ugni: return a descriptor from sendi Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-13 14:56:54 -06:00
Nathan Hjelm	a19e7023d1	btl/ugni: always check local SMSG CQ This commit removes the local operation count check from the local SMSG completion queue. This check was leading to hangs due to an undocumented feature of the ugni library. The local SMSG CQ is used to send credit return messages back to the sender. The ugni library never checks for the completion itself but relying on the SMSG user to periodically check the CQ. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-13 14:56:54 -06:00
Nathan Hjelm	d5cdeb81d0	btl/ugni: improve multi-threaded performance This commit updates the ugni btl to make use of multiple device contexts to improve the multi-threaded RMA performance. This commit contains the following: - Cleanup the endpoint structure by removing unnecessary field. The structure now also contains all the fields originally handled by the common/ugni endpoint. - Clean up the fragment allocation code to remove the need to initialize the my_list member of the fragment structure. This member is not initialized by the free list initializer function. - Remove the (now unused) common/ugni component. btl/ugni no longer need the component. common/ugni was originally split out of btl/ugni to support bcol/ugni. As that component exists there is no reason to keep this component. - Create wrappers for the ugni functionality required by btl/ugni. This was done to ease supporting multiple device contexts. The wrappers are thread safe and currently use a spin lock instead of a mutex. This produces better performance when using multiple threads spread over multiple cores. In the future this lock may be replaced by another serialization mechanism. The wrappers are located in a new file: btl_ugni_device.h. - Remove unnecessary device locking from serial parts of the ugni btl. This includes the first add-procs and module finalize. - Clean up fragment wait list code by moving enqueue into common function. - Expose the communication domain flags as an MCA variable. The defaults have been updated to reflect the recommended setting for knl and haswell. - Avoid allocating fragments for communication with already overloaded peers. - Allocate RDMA endpoints dyncamically. This is needed to support spreading RMA operations accross multiple contexts. - Add support for spreading RMA communication over multiple ugni device contexts. This should greatly improve the threading performance when communicating with multiple peers. By default the number of virtual devices depends on 1) whether opal_using_threads() is set, 2) how many local processes are in the job, and 3) how many bits are available in the pid. The last is used to ensure that each CDM is created with a unique id. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-13 14:46:06 -06:00
Nathan Hjelm	12bf38a25c	btl/ugni: add MPI_T performance variables for ugni counters This commit exposes ugni statistics for use with MPI_T. There is no overhead to providing these counters. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-13 14:42:58 -06:00
Ralph Castain	c6bc3ccb76	Sync to latest PMIx master and PMIx reference server Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-11 12:50:38 -08:00
Nathan Hjelm	3caeda21dc	memory/patcher: do not hook madvise It is not possible to hook madvise at this time due to a deadlock when using glibc. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-03-07 16:26:53 -07:00
Joshua Ladd	e2ba60b778	Merge pull request #3111 from jladd-mlnx/topic/cx5-device-param Adding latest ConnectX-5 adapter vendor part id to OpenIB device params.	2017-03-07 13:55:46 -05:00
Nathan Hjelm	15ea9c5524	Merge pull request #3013 from hjelmn/rcache_lifo rcache/base: do not free memory with the vma lock held	2017-03-07 09:11:04 -07:00
Jeff Squyres	c2adf359cf	Merge pull request #3083 from ggouaillardet/topic/hwloc_v15 hwloc: add support for hwloc v1.5	2017-03-07 10:01:24 -05:00
Joshua Ladd	b28647857f	Adding latest ConnectX-5 adapter vendor part id to OpenIB device params. Signed-off-by: Joshua Ladd <jladd.mlnx@gmail.com>	2017-03-07 00:19:54 +02:00
Ralph Castain	aca7091114	Fix some minor compatibility issues by ensuring job-level data gets stored against wildcard rank in the cray, s1, and s2 components, and that the ext1 component translates all wildcard rank requests into the peer's rank since v1.x of PMIx doesn't understand wildcard ranks Closes #3101 Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-05 10:30:59 -08:00
Ralph Castain	1de72ff023	Silence an unnecessary error log Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-02 17:18:34 -08:00
Gilles Gouaillardet	7e01be60d9	hwloc: add support for hwloc v1.5 hwloc v1.5 does not support HWLOC_OBJ_OSDEV_COPROC nor hwloc_topology_dup(), so for this version : - do not search for coprocessors - do not try hwloc_topology_dup(), note this is not used anywhere in the code base Thanks Jeff for helping with the wording Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-03-03 09:39:24 +09:00
Ralph Castain	83199979ba	Remove the stale opal/sec framework Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-03-02 15:41:56 -08:00
Jeff Squyres	5b484c91f4	btl/tcp: use show_help to print the dropped-TCP warning Make the message more friendly / more detailed, and de-duplicate it (just in case it happens a lot). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-03-01 16:31:29 -08:00
George Bosilca	b0f8d2c460	Never free the statically allocated buffer. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-03-01 13:21:03 -05:00
George Bosilca	ec4a235e6a	Allow a TCP proc release during the create. This is mostly for error cases, where we need to release the newly created proc. Currently the code deadlocks because the endpoint lock is help at the release and the lock is not recursive. Aslo added some code to print the IP addresses that don't match during the TCP connection step. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2017-03-01 13:17:54 -05:00
Jeff Squyres	d5266aba90	Merge pull request #2955 from jsquyres/pr/hwloc-external-fixes Fix --with-hwloc=external	2017-02-28 14:57:07 -05:00
Josh Hursey	0006f0d7c5	Merge pull request #2773 from jjhursey/topic/hook-fwk Add a 'hook' framework	2017-02-28 12:29:50 -06:00
Jeff Squyres	fec519a793	hwloc: rename opal/mca/hwloc/hwloc.h -> hwloc-internal.h Per a prior commit, the presence of "hwloc.h" can cause ambiguity when using --with-hwloc=external (i.e., whether to include opal/mca/hwloc/hwloc.h or whether to include the system-installed hwloc.h). This commit: 1. Renames opal/mca/hwloc/hwloc.h to hwloc-internal.h. 2. Adds opal/mca/hwloc/autogen.options to tell autogen.pl to expect to find hwloc-internal.h (instead of hwloc.h) in opal/mca/hwloc. 3. s@opal/mca/hwloc/hwloc.h@opal/mca/hwloc/hwloc-internal.h@g in the rest of the code base. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-02-28 07:48:42 -08:00
Joshua Hursey	c10bbfded6	ompi/hook: Add the hook/license framework * Include a 'demo' component that shows some of the features. * Currently has hooks for: - MPI_Initialized - top, bottom - MPI_Init_thread - top, bottom - MPI_Finalized - top, bottom - MPI_Init - top (pre-opal_init), top (post-opal_init), error, bottom - MPI_Finalize - top, bottom * Other places in ompi can 'register' to hook into any one of these places by passing back a component structure filled with function pointers. * Add a `MCA_BASE_COMPONENT_FLAG_REQUIRED` flag to the MCA structure that is checked by the `hook` framework. If a required, static component has been excluded then the `hook` framework will fail to initialize. - See note in `opal/mca/mca.h` as to why this is checked in the `hook` framework and not in `opal/mca/base/mca_base_component_find.c` Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2017-02-27 12:05:53 -05:00
Gilles Gouaillardet	af0b5cffb4	asm: rename the AMD64 into X86_64 in this context, AMD64 really means amd64 or em64t, so let's rename this into X86_64 in order to avoid any confusion Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-27 15:10:50 +09:00
Jeff Squyres	d7dd4d769e	openmpi-mca-params.conf: Fix comment Make sure to specify "--level 9" to ompi_info to see all MCA params. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2017-02-24 07:09:06 -08:00
Clement Foyer	f371cc0a43	Fix minor typo Return value in comment about opal_list_item_compare_fn_t typedef when a < b is indicated to be 11 instead of -1. Signed-off-by: Clement Foyer <clement.foyer@inria.fr>	2017-02-23 16:10:32 +01:00
Ralph Castain	e86a0dbf39	Update to PMIx master to include dlopen fixes and addition of libltdl support Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-22 11:54:33 -08:00
Nathan Hjelm	60ad9d1817	rcache/base: do not free memory with the vma lock held This commit makes the vma tree garbage collection list a lifo. This way we can avoid having to hold any lock when releasing vmas. In theory this should finally fix the hold-and-wait deadlock detailed in #1654. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2017-02-21 21:04:46 -07:00
Ralph Castain	8cffdcf127	Ensure that the pmix headers and lib get installed when --with-devel-headers is given so that PMIx applications can be built and executed against the "embedded" PMIx version Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-21 13:46:46 -08:00
Gilles Gouaillardet	4184c01be5	Merge pull request #2393 from bosilca/topic/no_predefined_ddt_refcount Don't refcount the predefined datatypes.	2017-02-21 09:38:11 +09:00
Gilles Gouaillardet	bb2481a84b	pmix2x: synchronize to the latest PMIx master pmix/master@f57d9b2953 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2017-02-20 10:45:17 +09:00
Ralph Castain	f49118eaab	Fix some pmix configuration code Remove stale file reference that caused a check to always fail. Update psm2 function check to new libs Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-16 10:54:47 -08:00
Howard Pritchard	b272f87926	Merge pull request #2968 from hjelmn/pmix_cray pmix/cray: performance improvements and cleanup	2017-02-16 11:41:59 -07:00
Ralph Castain	201f8571ca	Ensure we retain the peer object until we are done with it, then detect that the socket has closed due to a lost connection and cleanly release the message event Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-15 18:30:55 -08:00
Ralph Castain	223495325d	Fix binding policy bug and support pe=1 modifier Allow someone to specify the "pe=N" modifier to a mapping policy when N=1. This equates to just "bind-to core", but helps people who use a script to set the PE policy. Fix a bug where setting the binding policy left a lingering "if-supported" flag that shouldn't be there. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-15 14:55:17 -08:00
Ralph Castain	9cd7349d7c	Instead of completely free'ing the event base, pause the PMIx progress thread before tearing down the infrastructure, and then release the event base at the end of the procedure. This allows any infrastructure objects holding events to delete them prior to free'ing the event base. Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-15 05:02:43 -08:00
Ralph Castain	f7fe2f7189	Merge pull request #2977 from rhc54/topic/spawn Fix comm_spawn by registering nspace info only when needed	2017-02-15 04:31:54 -08:00
Ralph Castain	68b53e2179	Fix comm_spawn by registering nspace info only when needed - either when we have local procs, or when job-level info is required by connecting jobs Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-14 19:47:56 -08:00
Ralph Castain	404fe327be	Merge pull request #2973 from rhc54/topic/cleanups Update to newest PMIx master (includes configuration cleanups). Silence trivial Coverity warning in hwloc base.	2017-02-14 17:38:18 -08:00
Ralph Castain	0c8609ca16	Update to newest PMIx master (includes configuration cleanups). Silence trivial Coverity warning in hwloc base. Cleanup a race condition segfault during finalize by ensuring the PMIx progress thread is stopped prior to starting to tear down the messaging components Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2017-02-14 15:14:00 -08:00
Nathan Hjelm	8562b87ad3	Merge pull request #2967 from hjelmn/auto_bool mca/base: add new base enumerator (auto_bool)	2017-02-14 12:25:56 -07:00
Nathan Hjelm	5683e7836f	Merge pull request #2965 from hjelmn/deprecated_fix mca/base: fix deprecated variable help message	2017-02-14 12:22:11 -07:00

1 2 3 4 5 ...

4653 Коммитов