openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	b7e937fea5	Merge pull request #1778 from thananon/usnic_thread_safe Added MPI_THREAD_MULTIPLE support for btl/usnic.	2016-06-14 18:43:04 -04:00
Jeff Squyres	5071602c59	PSM/PSM2: Disable signal handler hijacking by default Per discussion on https://github.com/open-mpi/ompi/pull/1767 (and some subsequent phone calls and off-issue email discussions), the PSM library is hijacking signal handlers by default. Specifically: unless the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel TrueScale) is set, the library constructor for this library will hijack various signal handlers for the purpose of invoking its own error reporting mechanisms. This may be a bit surprising, but is not a problem, per se. The real problem is that older versions of at least the PSM library do not unregister these signal handlers upon being unloaded from memory. Hence, a segv can actually result in a double segv (i.e., the original segv and then another segv when the now-non-existent signal handler is invoked). This PSM signal hijacking subverts Open MPI's own signal reporting mechanism, which may be a bit surprising for some users (particularly those who do not have Intel TrueScale). As such, we disable it by default so that Open MPI's own error-reporting mechanisms are used. Additionally, there is a typo in the library destructor for the PSM2 library that may cause problems in the unloading of its signal handlers. This problem can be avoided by setting `HFI_NO_BACKTRACE=1` (for PSM2 / Intel OmniPath). This is further compounded by the fact that the PSM / PSM2 libraries can be loaded by the OFI MTL and the usNIC BTL (because they are loaded by libfabric), even when there is no Intel networking hardware present. Having the PSM/PSM2 libraries behave this way when no Intel hardware is present is clearly undesirable (and is likely to be fixed in future releases of the PSM/PSM2 libraries). This commit sets the following two environment variables to disable this behavior from the PSM/PSM2 libraries (if they are not already set): * IPATH_NO_BACKTRACE=1 * HFI_NO_BACKTRACE=1 If the user has set these variables before invoking Open MPI, we will not override their values (i.e., their preferences will be honored). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-06-14 11:45:23 -07:00
Thananon Patinyasakdikul	ee85204c12	Added MPI_THREAD_MULTIPLE support for btl/usnic.	2016-06-13 13:47:06 -07:00
Nathan Hjelm	253c91972e	arm64: add atomic swap function This commit adds the opal_atomic_swap_32 and opal_atomic_swap_64 functions. This should improve the performance of btl/vader. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-11 09:46:29 -06:00
Nathan Hjelm	109389dce2	Merge pull request #1634 from hjelmn/cma cma: add support for MIPS and ARM	2016-06-11 09:20:28 -06:00
Ralph Castain	d58da99dbc	Shift to memcpy to avoid Solaris issues	2016-06-09 12:07:17 -07:00
Gilles Gouaillardet	1f651d17c1	opal/util/ethtool: fix (infamous) strncpy usage the infamous strncpy does not NULL terminate the destination when the buffer is truncated do it ourself ! fix CID 1362576	2016-06-09 09:54:50 +09:00
Ralph Castain	8fa935534b	Abstract the strnlen function for environments that do not have it (e.g., Solaris 10)	2016-06-08 10:12:43 -07:00
Nathan Hjelm	f8957f24af	Merge pull request #1768 from hjelmn/cq_fix btl/openib: fix cq resize calculation	2016-06-07 21:34:36 -06:00
Nathan Hjelm	dd519c55b1	btl/openib: fix cq resize calculation Before dynamic add_procs the openib_btl_size_queues was called exactly once for non-dynamic jobs. Now the function is called on each new connection so the calculation was wrong. Re-wrote the function to correctly calculate the CQ size and only attempt to adjust the CQ if the requested size has changed. This fixes a bug when using the openib btl on psm2 hardware that is caused by the time needed to resize a CQ. The overhead was causing udcm to timeout and fail. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-07 16:05:56 -06:00
Nathan Hjelm	e082ed752a	opal/progress: fix warnings This commit fixes several warning introduced by open-mpi/ompi@fc26d9c69f . Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-06 22:18:24 -06:00
Nathan Hjelm	4a2bd83302	opal/cma: improve Linux CMA detection This commit improves the CMA detection when the installed glibc doesn't have support for CMA. In this case we need to verify that the syscall numbers in opal/include/opal/sys/cma.h are valid for the architecture. This verification is done by attempting to use CMA while including the internal header. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-05 22:29:07 -06:00
Gilles Gouaillardet	b707d138fe	pmix114/pmix1_client: fix misc memory leaks Fixes CID 1325146-1325149	2016-06-06 09:33:35 +09:00
Nathan Hjelm	0084ad0d1b	opal: add armv8 support This commit adds assembly support for aarch64. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-03 10:32:21 -06:00
Nathan Hjelm	6169d03ea3	btl: adjust values of new atomic flags Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-02 19:21:34 -06:00
Nathan Hjelm	9f43b23725	Merge pull request #1710 from hjelmn/ugni_atomics Additional ugni atomics	2016-06-02 18:25:49 -06:00
George Bosilca	e1c6b0e4a7	Some compilers are more than picky.	2016-06-03 09:04:34 +09:00
Nathan Hjelm	d9fc855955	Merge pull request #1743 from hjelmn/gcc_atomics_fix atomic/gcc: add check for 128-bit CAS being lock-free	2016-06-02 16:55:31 -06:00
Nathan Hjelm	d86e41ea13	atomic/gcc: add check for 128-bit CAS being lock-free Compiler implementations are free to include support for atomics that use locks. Unfortunately lock-free and lock atomics do not mix. Older versions of llvm on OS X use locks to provide __atomic_compare_exchange on 128-bit values but are lock-free on 64-bit values. This screws up our lifo implementation which mixes 64-bit and 128-bit atomics on the same values to improve performance. This commit adds a configure-time check if 128-bit atomics are lock free. If they are not then the 128-bit __atomic CAS is disabled and we check for the __sync version as a fallback. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-02 15:59:05 -06:00
Nathan Hjelm	5aab4b2d51	Merge pull request #1662 from ggouaillardet/topic/amd64_atomic amd64/atomic: silence warnings	2016-06-02 14:10:20 -06:00
George Bosilca	87b1d17e7e	Remove warnings. clang 7.0 with the picky option on is extremely verbose, and complains about almost everything. Trying to make him happy, at least regarding the datatype engine.	2016-06-03 00:56:24 +09:00
rhc54	483b9c370a	Merge pull request #1741 from rhc54/topic/pmix114 Update to 1.1.4rc3	2016-06-02 06:57:37 -07:00
Nathan Hjelm	fc26d9c69f	Merge pull request #1734 from hjelmn/progress_threading opal/progress: make progress function registration mt safe	2016-06-02 06:35:59 -06:00
Ralph Castain	ecea1e3bb5	Update to 1.1.4rc3	2016-06-01 20:56:07 -07:00
Nathan Hjelm	2fad3b9bc6	opal/progress: make progress function registration mt safe This commit fixes a bug in opal progress registration that can cause crashes when a progress function is registered while another thread is in opal_progress(). Before this commit realloc is used to allocate more space for progress functions but it is possible for a thread in opal_progress() to try to read from the array that is freed by realloc before the array is re-assigned when realloc returns. To prevent this race use malloc + memcpy to fill the new array and atomically swap out the old and new array pointers. Per suggestion we now allocate a default of 8 slots for callbacks and double the current number when we run out of space. This commit also fixes leaking the callbacks_lp array. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-01 20:57:19 -06:00
George Bosilca	d9fb59bea5	Update the synchronization primitive Add comments and make sure we correctly return the status of the synchronization primitive, especially if it was completed with error.	2016-06-02 11:53:56 +09:00
Nathan Hjelm	f33bbfd381	atomic: add support for __atomic builtins (#1735 ) * atomic: add support for __atomic builtins This commit adds support for the gcc __atomic builtins. The __sync builtins are deprecated and have been replaced by these atomics. In addition, the new atomics support atomic exchange which was not supported by __sync. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> * atomic: add support for transactional memory This commit adds support for using transactional memory when using opal atomic locks. This feature is enabled if the __HLE__ feature is available and the gcc builtin atomics are in use. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-01 21:23:47 -04:00
rhc54	b85a5e62ab	Merge pull request #1739 from rhc54/topic/pmix Split the pmix external component into one for the 1.1.4 release, and…	2016-06-01 16:24:44 -07:00
Ralph Castain	12ecf972af	Split the pmix external component into one for the 1.1.4 release, and another for the upcoming 2.0 release. Clean up the configury so the components look for a series-specific function instead of running a program. NOTE: the changes for the 2.0 series are not yet in the PMIx master.	2016-06-01 14:15:24 -07:00
Nathan Hjelm	ceb2912838	Merge pull request #1736 from hjelmn/ugni_fixes ugni BTL fixes	2016-06-01 14:59:55 -06:00
Jeff Squyres	d175fd692d	README.ompi: track patches added to hwloc Track post-v1.11.3-release patches applied to the hwloc copy embedded in Open MPI. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-06-01 07:17:05 -07:00
Jeff Squyres	3867bd3640	hwloc.m4: only check for valgrind in non-embedded mode This fixes https://github.com/open-mpi/ompi/issues/1732: i.e., the case where the outer project has its own check for <valgrind/valgrind.h>, but also supplements CPPFLAGS (to find Valgrind's header files) before doing that check. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Ideally, we would tell OMPI to disable autoconf's caching of our valgrind check result so that its check gets the right result after adding CPPFLAGS. Not sure if we can do that. For now, just disable our Valgrind code in embedded mode. This will keep the x86 backend enabled under Valgrind but it will auto-disable itself when finding identical APIC ids anyway (because CPUID returns same outputs for all PUs). Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Fixes open-mpi/ompi#1732 (cherry picked from commit open-mpi/hwloc@8b44fb1c81)	2016-06-01 06:58:53 -07:00
Gilles Gouaillardet	57978a75d0	Merge pull request #1717 from ggouaillardet/topic/lex_cleanup configury: clean the flex generated .c files	2016-06-01 13:06:21 +09:00
Nathan Hjelm	5d4bcce042	Merge pull request #1700 from shamisp/topic/cma_config CMA: Fixing logic for CMA system call detection	2016-05-31 20:33:48 -06:00
Nathan Hjelm	340152a635	Merge pull request #1720 from shamisp/topic/vader/max_addr VADER: Adjusting VADER_MAX_ADDRESS for non x86 platforms.	2016-05-31 20:33:28 -06:00
Gilles Gouaillardet	5f565dfec3	configury: clean the flex generated .c files	2016-06-01 11:13:31 +09:00
Nathan Hjelm	bf10d79914	btl/ugni: remove erroneous unlock The endpoint lock was being released twice in mca_btl_ugni_get_ep. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-31 16:52:53 -06:00
Nathan Hjelm	cc96097873	btl/ugni: fix bug when attempting unaligned get on aries This commit fixes a programming error when using an aries nic. The documentation of ugni shows that only the local alignment restriction for get was lifted on aries. There is still a remote address alignment restriction. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-31 16:52:09 -06:00
Jeff Squyres	5cfee95ea4	hwloc1113: add missing file to Makefile.am Lack of this file causes a failure when you run autogen.pl on a distribution tarball. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-31 09:57:50 -07:00
Nathan Hjelm	60519c2b4e	cma: add support for MIPS and ARM Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-05-30 12:13:20 -06:00
George Bosilca	d2abff583e	Fix race condition during BTL TCP tear-down. bot🏷️bug bot:assign:@hjelmn	2016-05-30 10:47:14 -05:00
Jeff Squyres	e126d2cd18	Merge pull request #1584 from bgoglin/master Update hwloc to v1.11.3	2016-05-28 11:01:54 -04:00
Ralph Castain	55923eacd3	Stealing some pieces of Josh Hursey's PR #1583 and modifying a bit, allow the opal/pmix external component to handle both PMIx 1.1.4 and PMIx 2.0 versions. Automatically detect the version of the target external library and adjust the only two APIs that changed (PMIx_Init and PMIx_Finalize) Rename temp vars in .m4 to avoid conflict with Travis	2016-05-27 08:06:31 -07:00
Nathan Hjelm	28dfa36a3f	btl/ugni: fix bug when attempting unaligned get on aries This commit fixes a programming error when using an aries nic. The documentation of ugni shows that only the local alignment restriction for get was lifted on aries. There is still a remote address alignment restriction. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-27 08:22:13 -06:00
Nathan Hjelm	c19426ac1b	btl/ugni: add support for additional atomic operations This commit adds support for Cray Aries atomic operations. This includes 32-bit and floating point support. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-27 08:22:13 -06:00
Nathan Hjelm	23fe19a956	btl: add support for more atomics This commit add support for more atomic operations and type. The operations added are logical and, logical or, logical xor, swap, min, and max. New types are 32-bit int by using the MCA_BTL_ATOMIC_FLAG_32BIT flag, 64-bit float by using the MCA_BTL_ATOMIC_FLAG_FLOAT flag, and 32-bit float by using both flags. Floating point numbers are supported by packing the number in as an int64_t or int32_t. We will update the btl interface in the future to make this less confusing. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-27 08:22:13 -06:00
Nathan Hjelm	d25b846c01	Merge pull request #1704 from hpcraink/pr/configure_framework Fix configure for FreePGI on OSX	2016-05-26 17:01:08 -06:00
Nathan Hjelm	8c9292d5d1	Merge pull request #1721 from hjelmn/xrc_fix btl/openib: fix XRC WQE calculation	2016-05-26 17:00:31 -06:00
Nathan Hjelm	56bdcd0888	btl/openib: fix XRC WQE calculation Before dynamic add_procs support was committed to master we called add_procs with every proc in the job. The XRC code in the openib btl was taking advantage of this and setting the number of work queue entries (WQE) based on all the procs on a remote node. Since that is no longer the case we can not simply increment the sd_wqe field on the queue pair. To fix the issue a new field has been added to the xrc queue pair structure to keep track of how many wqes there are total on the queue pair. If a new endpoint is added that increases the number of wqes and the xrc queue pair is already connected the code will attempt to modify the number of wqes on the queue pair. A failure is ignored because all that will happen is the number of active send work requests on an XRC queue pair will be more limited. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-26 15:58:31 -06:00
Aurelien Bouteiller	49bd28d0ac	Merge pull request #1714 from hjelmn/scif_exclusivity btl/scif: reduce default exclusivity	2016-05-26 17:53:11 -04:00

1 2 3 4 5 ...

4193 Коммитов