openmpi

Автор	SHA1	Сообщение	Дата
Artem Polyakov	8d011ea403	Fix Mellanox copyright.	2016-06-26 21:01:19 -06:00
Nathan Hjelm	fb455f0802	opal/sync: fix race condition This commit fixes a race condition discovered by @artpol84. The race happens when a signalling thread decrements the sync count to 0 then goes to sleep. If the waiting thread runs and detects the count == 0 before going to sleep on the condition variable it will destroy the condition variable while the signalling thread is potentially still processing the completion. The fix is to add a non-atomic member to the sync structure that indicates another process is handling completion. Since the member will only be set to false by the initiating thread and the completing thread the variable does not need to be protected. When destoying a condition variable the waiting thread needs to wait until the singalling thread is finished. Thanks to @artpol84 for tracking this down. Fixes #1813 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-26 20:14:01 -06:00
Nathan Hjelm	2992d6d238	Merge pull request #1808 from abjoshi-brcm/timer_arm64 arm64: add timer support	2016-06-23 07:10:56 -06:00
Abhishek Joshi	f06f7eb3e6	arm64: add timer support Signed-off-by: Sreenidhi Bharathkar Ramesh <sreenidhi-bharathkar.ramesh@broadcom.com>	2016-06-23 11:01:00 +00:00
Ralph Castain	08b1438f15	Add missing PMIx range value so OPAL and PMIx align again	2016-06-22 22:03:25 -07:00
Nathan Hjelm	55d1933a89	opal/sync: fix warnings Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-22 15:03:21 -06:00
Nathan Hjelm	e4f920f6f9	opal/progress: improve performance when there are no LP callbacks This commit adds another check to the low-priority callback conditional that short-circuits the atomic-add if there are no low-priority callbacks. This should improve performance in the common case. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-22 09:52:37 -06:00
Nathan Hjelm	143a93f379	opal/sync: remove usage of OPAL_ENABLE_MULTI_THREADS The OPAL_ENABLE_MULTI_THREADS macro is always defined as 1. This was causing us to always use the multi-thread path for synchronization objects. The code has been updated to use the opal_using_threads() function. When MPI_THREAD_MULTIPLE support is disabled at build time (2.x only) this function is a macro evaluating to false so the compiler will optimize out the MT-path in this case. The OPAL_ATOMIC_ADD_32 macro has been removed and replaced by the existing OPAL_THREAD_ADD32 macro. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-22 09:52:37 -06:00
Gilles Gouaillardet	bf133c401e	pmix2x: fix a typo in dereg_event_hdlr() This bug has been fixed when open-mpi/ompi@dde69e1be2 was backported into upstream pmix in pmix/master@5e5577778c but it was not fixed in open-mpi/ompi	2016-06-22 13:45:29 +09:00
Jeff Squyres	af614afedf	Merge pull request #1800 from thananon/common_sym_fix Fixed common symbol error in btl/usnic.	2016-06-21 20:11:52 -04:00
Ralph Castain	441739b5a4	Cleanup a lagging message that generates an annoying (but seemingly harmless) warning	2016-06-20 12:23:27 -07:00
Thananon Patinyasakdikul	afe07cd5d5	Fixed common symbol in btl/usnic - This commit fixes the accidental common symbol btl_usnic_lock - It also moves the btl_usnic_lock declaration to btl_usnic.h	2016-06-20 10:05:44 -07:00
Howard Pritchard	1bed9fdb59	Merge pull request #1799 from hppritcha/topic/help_aries_with_knl common/ugni: help out knl with aries	2016-06-20 08:09:24 -06:00
Ralph Castain	0ba02821e6	Add requested key and job-level info	2016-06-19 18:22:31 -07:00
Ralph Castain	0a29f5cb77	Sigh - missed two typos	2016-06-18 20:57:53 -07:00
Ralph Castain	dd38cf1fed	Fix typo	2016-06-18 20:56:43 -07:00
Howard Pritchard	8b53487977	common/ugni: help out knl with aries The way the gni btl is currently coded, it will run completely out of gas on KNL at 123 processes/node. Since there are bound to be those who try to run a MPI process/hyperthread on KNL nodes, the fma sharing mode needs to be requested. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-06-18 15:09:05 -05:00
Ralph Castain	dde69e1be2	Cleanup CIDs 1362763, 1362762, 1362760, 1362759, 1362758, 1362757, 1362756, 1362755, 1362754. Unsure how to resolve 1362761. Fixes #1792	2016-06-18 12:28:46 -07:00
Jeff Squyres	7a8d7fb948	openib: fix compiler warnings Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-06-18 07:15:11 -07:00
Jeff Squyres	c332ee5884	Merge pull request #1784 from thananon/fix_usnic_thread Fix btl/usnic deadlock when the connectivity check is turned off.	2016-06-17 11:15:14 -04:00
Nathan Hjelm	f59c2fce6b	Merge pull request #1786 from hjelmn/32_fix opal/progress: use 32-bit atomics for call counter	2016-06-17 08:54:41 -06:00
Nathan Hjelm	2e4141f20a	Merge pull request #1787 from hjelmn/asm_fix opal/asm: fix syntax of timer code for ia32	2016-06-17 08:50:57 -06:00
Ralph Castain	044c561cba	Roll to latest PMIx master	2016-06-16 17:30:30 -07:00
Nathan Hjelm	9c709966f7	opal/asm: fix syntax of timer code for ia32 Thanks to Paul Hargrove for pointing this out. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-16 16:55:01 -06:00
rhc54	702a982271	Merge pull request #1767 from rhc54/topic/pmix2 Enable the PMIx event notification capability	2016-06-16 15:27:43 -07:00
Nathan Hjelm	7349ddc937	patcher/overwrite: use OPAL_ASSEMBLY_ARCH to determine architecture Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-16 10:00:00 -06:00
Nathan Hjelm	dbd8369485	opal/progress: use 32-bit atomics for call counter This commit fixes a compile error on 32-bit platforms. The low-priority call counter was always using 64-bit atomics which will not work if 64-bit atomic math is not available. Updated to use 32-bit instead. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-16 09:01:19 -06:00
Thananon Patinyasakdikul	7bd18214a7	Fix btl/usnic deadlock when the connectivity check is turned off.	2016-06-15 07:42:55 -07:00
Jeff Squyres	b7e937fea5	Merge pull request #1778 from thananon/usnic_thread_safe Added MPI_THREAD_MULTIPLE support for btl/usnic.	2016-06-14 18:43:04 -04:00
Ralph Castain	5d330d5220	Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler. Add PMIx 2.0 Remove PMIx 1.1.4 Cleanup copying of component Add missing file Touchup a typo in the Makefile.am Update the pmix ext114 component Minor cleanups and resync to master Update to latest PMIx 2.x Update to the PMIx event notification branch latest changes	2016-06-14 13:08:41 -07:00
Jeff Squyres	5071602c59	PSM/PSM2: Disable signal handler hijacking by default Per discussion on https://github.com/open-mpi/ompi/pull/1767 (and some subsequent phone calls and off-issue email discussions), the PSM library is hijacking signal handlers by default. Specifically: unless the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel TrueScale) is set, the library constructor for this library will hijack various signal handlers for the purpose of invoking its own error reporting mechanisms. This may be a bit surprising, but is not a problem, per se. The real problem is that older versions of at least the PSM library do not unregister these signal handlers upon being unloaded from memory. Hence, a segv can actually result in a double segv (i.e., the original segv and then another segv when the now-non-existent signal handler is invoked). This PSM signal hijacking subverts Open MPI's own signal reporting mechanism, which may be a bit surprising for some users (particularly those who do not have Intel TrueScale). As such, we disable it by default so that Open MPI's own error-reporting mechanisms are used. Additionally, there is a typo in the library destructor for the PSM2 library that may cause problems in the unloading of its signal handlers. This problem can be avoided by setting `HFI_NO_BACKTRACE=1` (for PSM2 / Intel OmniPath). This is further compounded by the fact that the PSM / PSM2 libraries can be loaded by the OFI MTL and the usNIC BTL (because they are loaded by libfabric), even when there is no Intel networking hardware present. Having the PSM/PSM2 libraries behave this way when no Intel hardware is present is clearly undesirable (and is likely to be fixed in future releases of the PSM/PSM2 libraries). This commit sets the following two environment variables to disable this behavior from the PSM/PSM2 libraries (if they are not already set): * IPATH_NO_BACKTRACE=1 * HFI_NO_BACKTRACE=1 If the user has set these variables before invoking Open MPI, we will not override their values (i.e., their preferences will be honored). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-06-14 11:45:23 -07:00
Thananon Patinyasakdikul	ee85204c12	Added MPI_THREAD_MULTIPLE support for btl/usnic.	2016-06-13 13:47:06 -07:00
Nathan Hjelm	253c91972e	arm64: add atomic swap function This commit adds the opal_atomic_swap_32 and opal_atomic_swap_64 functions. This should improve the performance of btl/vader. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-11 09:46:29 -06:00
Nathan Hjelm	109389dce2	Merge pull request #1634 from hjelmn/cma cma: add support for MIPS and ARM	2016-06-11 09:20:28 -06:00
Ralph Castain	d58da99dbc	Shift to memcpy to avoid Solaris issues	2016-06-09 12:07:17 -07:00
Gilles Gouaillardet	1f651d17c1	opal/util/ethtool: fix (infamous) strncpy usage the infamous strncpy does not NULL terminate the destination when the buffer is truncated do it ourself ! fix CID 1362576	2016-06-09 09:54:50 +09:00
Ralph Castain	8fa935534b	Abstract the strnlen function for environments that do not have it (e.g., Solaris 10)	2016-06-08 10:12:43 -07:00
Nathan Hjelm	f8957f24af	Merge pull request #1768 from hjelmn/cq_fix btl/openib: fix cq resize calculation	2016-06-07 21:34:36 -06:00
Nathan Hjelm	dd519c55b1	btl/openib: fix cq resize calculation Before dynamic add_procs the openib_btl_size_queues was called exactly once for non-dynamic jobs. Now the function is called on each new connection so the calculation was wrong. Re-wrote the function to correctly calculate the CQ size and only attempt to adjust the CQ if the requested size has changed. This fixes a bug when using the openib btl on psm2 hardware that is caused by the time needed to resize a CQ. The overhead was causing udcm to timeout and fail. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-07 16:05:56 -06:00
Nathan Hjelm	e082ed752a	opal/progress: fix warnings This commit fixes several warning introduced by open-mpi/ompi@fc26d9c69f . Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-06 22:18:24 -06:00
Nathan Hjelm	4a2bd83302	opal/cma: improve Linux CMA detection This commit improves the CMA detection when the installed glibc doesn't have support for CMA. In this case we need to verify that the syscall numbers in opal/include/opal/sys/cma.h are valid for the architecture. This verification is done by attempting to use CMA while including the internal header. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-05 22:29:07 -06:00
Gilles Gouaillardet	b707d138fe	pmix114/pmix1_client: fix misc memory leaks Fixes CID 1325146-1325149	2016-06-06 09:33:35 +09:00
Nathan Hjelm	0084ad0d1b	opal: add armv8 support This commit adds assembly support for aarch64. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-03 10:32:21 -06:00
Nathan Hjelm	6169d03ea3	btl: adjust values of new atomic flags Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-02 19:21:34 -06:00
Nathan Hjelm	9f43b23725	Merge pull request #1710 from hjelmn/ugni_atomics Additional ugni atomics	2016-06-02 18:25:49 -06:00
George Bosilca	e1c6b0e4a7	Some compilers are more than picky.	2016-06-03 09:04:34 +09:00
Nathan Hjelm	d9fc855955	Merge pull request #1743 from hjelmn/gcc_atomics_fix atomic/gcc: add check for 128-bit CAS being lock-free	2016-06-02 16:55:31 -06:00
Nathan Hjelm	d86e41ea13	atomic/gcc: add check for 128-bit CAS being lock-free Compiler implementations are free to include support for atomics that use locks. Unfortunately lock-free and lock atomics do not mix. Older versions of llvm on OS X use locks to provide __atomic_compare_exchange on 128-bit values but are lock-free on 64-bit values. This screws up our lifo implementation which mixes 64-bit and 128-bit atomics on the same values to improve performance. This commit adds a configure-time check if 128-bit atomics are lock free. If they are not then the 128-bit __atomic CAS is disabled and we check for the __sync version as a fallback. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-02 15:59:05 -06:00
Nathan Hjelm	5aab4b2d51	Merge pull request #1662 from ggouaillardet/topic/amd64_atomic amd64/atomic: silence warnings	2016-06-02 14:10:20 -06:00
George Bosilca	87b1d17e7e	Remove warnings. clang 7.0 with the picky option on is extremely verbose, and complains about almost everything. Trying to make him happy, at least regarding the datatype engine.	2016-06-03 00:56:24 +09:00

1 2 3 4 5 ...

4222 Коммитов