openmpi

Автор	SHA1	Сообщение	Дата
rhc54	60f789dca1	Merge pull request #1948 from rhc54/topic/pmixtool Update to include extended tool support, new datatypes	2016-08-09 16:17:28 -07:00
Nathan Hjelm	19be439998	Merge pull request #1949 from hjelmn/ugni_fix btl/ugni: fix another connection race	2016-08-09 08:32:40 -06:00
Nathan Hjelm	38f18eed22	Merge pull request #1941 from ggouaillardet/topic/memory_patcher_configury configury: make memory/patcher symbol detection more robust	2016-08-09 07:06:38 -06:00
Gilles Gouaillardet	13009aa290	opal/alfg: have opal_random() wrapper always return a positive int	2016-08-09 17:12:30 +09:00
Gilles Gouaillardet	6f6b3ac68a	configury: standardize memory/patcher symbol detection and make it more robust by default, Sun compilers optimize out the original test, and hence fail detecting a symbol is missing.	2016-08-09 09:35:52 +09:00
Nathan Hjelm	adb668209b	btl/ugni: fix another connection race This commit fixes a race that can occur when two threads are in the ugni progress function at the same time. This race occurs when one thread calls GNI_PostDataProbeById then goes to sleep then another thread calls GNI_PostDataProbeById then GNI_EpPostDataWaitById before the other thread wakes up. If this happens the first thread will print a warning on GNI_EpPostDataWaitById about no matching post. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-08 15:38:11 -06:00
Ralph Castain	527b5c692a	Update to include extended tool support, new datatypes	2016-08-08 13:39:46 -07:00
Todd Kordenbrock	b90da992c8	Merge pull request #1895 from PDeveze/Patchs-on-btl-portals4 btl/portals4: Take into account the limitation of portals4 (max_msg_s…	2016-08-08 15:12:50 -05:00
Nathan Hjelm	5ced037488	Merge pull request #1939 from hjelmn/ugni_fix btl/ugni: protect against re-entry and races in connections	2016-08-08 08:55:30 -06:00
Artem Polyakov	b24ec3e3b9	pmix/s2: fix indentation (only)	2016-08-06 16:31:19 +06:00
Artem Polyakov	2cb923a413	pmix/s1: fix indentation (only)	2016-08-06 16:30:45 +06:00
Artem Polyakov	8aa3ef7799	pmix/s2: fix s2 component data placement Use wildcard for the information related to the job-level data. Fixes s2 component with regard to PR https://github.com/open-mpi/ompi/pull/1897.	2016-08-06 15:49:16 +06:00
Artem Polyakov	81063f1717	pmix/s1: fix s1 component data placement Use wildcard for the information related to the job-level data. Fixes s1 component with regard to PR https://github.com/open-mpi/ompi/pull/1897.	2016-08-06 15:45:46 +06:00
Nathan Hjelm	14b36d4503	btl/ugni: protect against re-entry and races in connections This commit fixes two issues that can occur during a connection: - Re-entry to connection progress from modex lookup. Added an additional endpoint state that will keep the code from re-entering the common endpoint create. - Fixed a race between a process posting a directed datagram through a send and a connection being progressed through opal_progress(). The progress code was not obtaining the endpoint lock before attempting to update the endpoint. To limit the amount of code changed for 2.0.1 this commit makes the endpoint lock recursive. In a future update this may be changed. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-08-04 16:08:01 -06:00
Jeff Squyres	c42d8867e6	Merge pull request #1925 from jsquyres/pr/warnings-fixes hwloc: fix Valgrind warning	2016-08-04 08:48:50 -07:00
Jeff Squyres	36555b7a1d	Merge pull request #1933 from thananon/fix_random Make libevent use internal random	2016-08-04 08:27:56 -07:00
Boris Karasev	9d6a4b3b2d	configury/libevent: fix incorrect drop of OPAL_HAVE_WORKING_EVENTOPS Fixes PR https://github.com/open-mpi/ompi/pull/1687 The code that sets OPAL_HAVE_WORKING_EVENTOPS for internal libevent was executed even if the external libevent component was configured. As the result libevent progress wasn't called in opal_progress which for example caused ring_c to hang when pml/ob1 was used.	2016-08-04 16:37:37 +06:00
Gilles Gouaillardet	30f98cd9d0	pmix: redefine OPAL_PMIX_ARCH macro Architecture is set by the ompi layer after job startup, so the key cannot have the "pmix" prefix since optimizations in open-mpi/ompi@01a653d50a otherwise architecture cannot be retrieved	2016-08-04 13:31:28 +09:00
Thananon Patinyasakdikul	b3e9dadff2	libevent: use opal_random() instead of rand(3) This commits changed rand(3) and family in libevent to use internal random function provided in opal to prevent pertubing user's random seed. Fixes open-mpi/ompi#1877	2016-08-03 09:18:12 -07:00
Howard Pritchard	08266a1a56	mpool/hugepage mntent intro fallout On Cray, PR #1846 introduced a double free situation which led to all kinds of random memory corruption problems. This commit fixes this problem. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-08-02 05:52:31 -05:00
Jeff Squyres	7bea563e02	hwloc: fix Valgrind warning Cherry picked from open-mpi/hwloc@d4565c351e Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-08-01 18:50:40 -07:00
Gilles Gouaillardet	21e7f31dbe	pmix2x: fix unpack sequence in PMIx_Get callback first unpack the nspace (PMIX_STRING) before unpacking the various keys (PMIX_KVAL)	2016-08-01 14:21:22 +09:00
Howard Pritchard	477f6cb6a8	Merge pull request #1846 from ggouaillardet/topic/mntent mpool/hugepage: set mntent API instead of manually parsing /proc/mounts	2016-07-31 20:17:37 -06:00
Gilles Gouaillardet	1778e5b586	atomic/sparcv9: fix a typo in the comment, no code change	2016-08-01 10:34:02 +09:00
Ralph Castain	16fccd4964	Establish a way for ORTE to tell PMIx the base tmpdir to use, and update PMIx to understand such directives	2016-07-29 09:52:36 -07:00
Nathan Hjelm	325c9ba4cc	opal/thread: fix warnings Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-07-29 07:04:19 -06:00
Nathan Hjelm	1da558407c	Merge pull request #1911 from hjelmn/threads opal/thread: clean up and add additional OPAL_THREAD macros	2016-07-29 06:44:11 -06:00
Gilles Gouaillardet	273e56096b	configury: capture configury command line configury command line is quoted and made available via the OPAL_CONFIGURE_CLI macro. it can be retrieved via {orte-info,ompi_info,oshmem_info} -c, or {orte-info,ompi_info,oshmem_info} --all --parseable \| grep ^config:cli:	2016-07-29 09:14:09 +09:00
Ralph Castain	cacb582ecd	Support timeout values when performing connect/accept operations. Bump default timeout to 10 minutes so folks have time to start the partnering application	2016-07-28 14:09:06 -07:00
Nathan Hjelm	c281bd3c7f	Merge pull request #1908 from hjelmn/udreg_fix rcache/udreg: make reference count thread safe	2016-07-28 09:27:16 -06:00
Nathan Hjelm	aac611237b	opal/thread: clean up and add additional OPAL_THREAD macros This commit expands the OPAL_THREAD macros to include 32- and 64-bit atomic swap. Additionally, macro declararations have been updated to include both OPAL_THREAD_* and OPAL_ATOMIC_*. Before this commit the former was used with add and the later with cmpset. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-07-28 09:23:14 -06:00
Nathan Hjelm	a8c3699484	Fix performance regression caused by enabling opal thread support This commit adds opal_using_threads() protection around the atomic operation in OBJ_RETAIN/OBJ_RELEASE. This resolves the performance issues seen when running psm with MPI_THREAD_SINGLE. To avoid issues with header dependencies opal_using_threads() has been moved to a new header (thread_usage.h). The OPAL_THREAD_ADD* and OPAL_THREAD_CMPSET* macros have also been relocated to this header. This commit is cherry-picked off a fix that was submitted for the v1.8 release series but never applied to master. This fixes part of the problem reported by @nysal in #1902. (cherry picked from commit open-mpi/ompi-release@ce91307918) Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-07-28 07:01:27 -06:00
Nathan Hjelm	4658b761e4	rcache/udreg: make reference count thread safe Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-07-27 13:40:35 -06:00
Nathan Hjelm	1eb4ef438e	Merge pull request #1903 from hjelmn/openib_fixes btl/openib: set send flags only after endpoint is connected	2016-07-27 09:01:49 -06:00
Howard Pritchard	1dc7e9ed8f	Merge pull request #1904 from hppritcha/topic/fix_cray_srun_native_launch pmix/cray: switch to using wildcards for some	2016-07-27 07:12:02 -06:00
Howard Pritchard	b65bbe017f	pmix/cray: switch to using wildcards for some items so that at least srun native launch on cray works again. More issues to fix when using alps. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-07-26 17:07:58 -05:00
Nathan Hjelm	5e13e1ab7d	btl/openib: set send flags only after endpoint is connected The max inline send size on a queue pair is not available until after the endpoint is connected. Before this commit the send flags (including the inline flag) were set before this value was initialized. This commit moves setting the send_flags down to mca_btl_openib_put_internal which is only called after the endpoint is connected. This fixes a bug when using osc/rdma. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-07-26 16:01:11 -06:00
Gilles Gouaillardet	91ccec342c	btl/openib: remove some dead code remove useless call to opal_mem_hooks_support_level() and the value local variable.	2016-07-22 09:26:33 +09:00
Gilles Gouaillardet	1b3be0ac8c	configury + btl/openib: fix a typo test for existence of struct ibv_exp_device_attr.exp_atomic_cap. That was previously mistyped struct ibv_exp_device_attr.ext_atomic_cap	2016-07-22 09:26:33 +09:00
Ralph Castain	71de03fc67	Cleanup the new naming requirements to ensure that info is correctly retrieved Cleanup permissions Restore singleton operations	2016-07-21 09:46:03 -07:00
Ralph Castain	2b55ee8118	Cleanup Coverity warnings	2016-07-20 20:31:58 -07:00
Ralph Castain	01a653d50a	Remove a debug print in comm_cid.c. Update PMIx2 to include the revised PMIx_Get logic for higher performance by reducing the number of hash table lookups. Fix a bug where requests for data from a proc in another nspace could hang, or result in "not found". Remove stale file reference Restore autogen pass thru pmix Remove generated file	2016-07-20 00:58:19 -07:00
Pascal Deveze	6d6ec66705	btl/portals4: Take into account the limitation of portals4 (max_msg_size)	2016-07-19 15:19:29 +02:00
Nathan Hjelm	03bce91de8	pmix/pmix2x: add missing increment in loop This commit fixes a bug in the pmix2x client code where a loop variable is not correctly incremented. This was leading to hangs and crashes when creating intercommunicators. Also fixed two double increments in other loops. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-07-18 10:35:05 -06:00
Jeff Squyres	72f41d4490	pmix: replace all tabs with spaces No code or logic changes Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-07-17 15:08:33 -04:00
Jeff Squyres	1c32742c66	pmix_ext20: fix syntax error Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-07-17 15:04:12 -04:00
Ralph Castain	99f7096031	Fix permissions	2016-07-16 21:03:55 -07:00
Ralph Castain	d4071fbd1c	Fix dynamic operations by ensuring that we only fire the debugger release if the debugger is attached, and that the OPAL pmix key for directing events to non-default handlers matches the PMIx spelling	2016-07-16 13:20:41 -07:00
Ralph Castain	1ceb35ba5c	Fix singletons - do not include the PMIx tool URI in the environment provided to child processes	2016-07-13 17:33:34 -07:00
Ralph Castain	20a91c2baf	Add a new --continuous flag to mpirun that directs ORTE to let a job continue running as app procs terminate. Don't attempt to restart them. Add event notification of abnormally terminating procs, and demonstrate that in the mpi_spin test program. Cleanup debug message	2016-07-13 15:28:33 -07:00
Artem Polyakov	72585a905f	opal/pmix: add blocking Fence to SLURM components. Blocking fence is used in yalla del proc. Native pmix exposes this functionality. We need to expose it for SLURM's s1/s2 components as well. Also this commit fixes uninitialized `rc` in fencenb's of both components.	2016-07-11 09:43:15 +03:00
Artem Polyakov	8e16f47492	Merge pull request #1688 from artpol84/fix_base64 Fix base64 implementation in pmix framework.	2016-07-07 10:47:50 +06:00
Gilles Gouaillardet	1ba7e2b20b	mpool/hugepage: set mntent API instead of manually parsing /proc/mounts Refs open-mpi/ompi#1822	2016-07-06 15:00:19 +09:00
Gilles Gouaillardet	acda07472a	configury: revamp and re-ident sub configure.m4 after open-mpi/ompi@846360fd4c	2016-07-06 11:59:51 +09:00
Gilles Gouaillardet	846360fd4c	configury: correctly perform make distclean when {libevent,hwloc,pmix} are external components Thanks Jeff for the guidance Fixes open-mpi/ompi#1683 note: in order to keep this commit easy to review, some AS_IF([...]) were replaced with AS_IF([false], ...) or AS_IF_([true], ...) these will be removed and re-idented in a subsequent commit	2016-07-06 11:57:24 +09:00
Ralph Castain	ee56d9dc1a	Shorten the session directory name as some OS's are now providing unusually long temp directory names, causing us to overflow the sockaddr field	2016-07-05 14:59:50 -07:00
Ralph Castain	7e0af3f4f0	Update pmix2x to track upstream changes	2016-07-05 11:54:22 -07:00
Gilles Gouaillardet	267821f0dd	pmix2x/pmix: fix a typo in PMIx_tool_init() and remove now useless local variable i	2016-07-05 13:47:50 +09:00
Gilles Gouaillardet	efce8cc734	pmix2x/pmix: add missing include files pmix cannot be built on alpine linux because of some missing includes. uid_t and gid_t are defined in unistd.h or sys/types.h, and unistd.h is not indirectly pulled under alpine linux, so do it manually. Thanks N.L.K Nguyen for the report (back-ported from upstream pmix/master@c8d55350a9)	2016-07-05 09:03:14 +09:00
Ralph Castain	c9ada8e095	Silence Coverity warnings	2016-07-03 20:45:08 -07:00
Ralph Castain	673f82e2b6	Update the PMIx listener to avoid leaking sockets into children, and better handle race condition errors	2016-07-03 08:23:33 -07:00
Nathan Hjelm	01d6da31af	btl/openib: fix rdmacm locking bug This commit fixes a long standing bug in rdmacm. It is required that the thread that calls mca_btl_openib_endpoint_cpc_complete holds the endpoint lock. This was not the case for rdmacm. This causes debug builds to abort. This change also required changing mca_btl_openib_endpoint_send_cts to require the endpoint lock to be held when calling. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-30 15:50:07 -06:00
Nathan Hjelm	cc2b3e0c3f	Merge pull request #1830 from hjelmn/rdmacm_test Test for rdmacm hang fix	2016-06-30 10:41:46 -06:00
Nathan Hjelm	960fcd292c	btl/openib: fix rdma hang This commit is an attempt to fix a hang in finalize of rdmacm. This fixes a path where no rdmacm client is found for an endpoint. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-29 20:31:26 -06:00
Ralph Castain	6e434d6785	Add support for PMIx tool connections and queries. Initially only support a request to list all known namespaces (jobids) from ORTE, but other folks will extend that support to include additional information Update to match PMIx RFC Fix configury to point to correct libevent and hwloc locations	2016-06-29 19:19:19 -07:00
Jeff Squyres	f18d6606da	Merge pull request #1824 from hjelmn/rdmacm_fix btl/openib: fix segmentation fault	2016-06-28 18:10:35 -04:00
Nathan Hjelm	8128c8eb29	btl/openib: fix segmentation fault This commit fixes a segmentation fault that occurs if a device can be initialized but not used. In this case the devices_count is not equal to the number of usable devices in the devices pointer array. Thanks to @artpol84 for tracking this down. Fixes open-mpi/ompi#1823 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-28 10:31:32 -06:00
Nathan Hjelm	955269b4f1	Merge pull request #1816 from hjelmn/request_perfm_regression opal/sync: fix race condition	2016-06-28 09:12:00 -06:00
Artem Polyakov	541715572f	Fix MPI_Waitany and MPI_Waitsome (request handling related)	2016-06-28 16:40:00 +03:00
Artem Polyakov	8d011ea403	Fix Mellanox copyright.	2016-06-26 21:01:19 -06:00
Nathan Hjelm	fb455f0802	opal/sync: fix race condition This commit fixes a race condition discovered by @artpol84. The race happens when a signalling thread decrements the sync count to 0 then goes to sleep. If the waiting thread runs and detects the count == 0 before going to sleep on the condition variable it will destroy the condition variable while the signalling thread is potentially still processing the completion. The fix is to add a non-atomic member to the sync structure that indicates another process is handling completion. Since the member will only be set to false by the initiating thread and the completing thread the variable does not need to be protected. When destoying a condition variable the waiting thread needs to wait until the singalling thread is finished. Thanks to @artpol84 for tracking this down. Fixes #1813 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-26 20:14:01 -06:00
Nathan Hjelm	dac9201f3b	Merge pull request #1770 from hjelmn/rdma_wth btl/openib: fix rdmacm	2016-06-24 22:46:53 -06:00
Nathan Hjelm	2992d6d238	Merge pull request #1808 from abjoshi-brcm/timer_arm64 arm64: add timer support	2016-06-23 07:10:56 -06:00
Abhishek Joshi	f06f7eb3e6	arm64: add timer support Signed-off-by: Sreenidhi Bharathkar Ramesh <sreenidhi-bharathkar.ramesh@broadcom.com>	2016-06-23 11:01:00 +00:00
Ralph Castain	08b1438f15	Add missing PMIx range value so OPAL and PMIx align again	2016-06-22 22:03:25 -07:00
Nathan Hjelm	55d1933a89	opal/sync: fix warnings Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-22 15:03:21 -06:00
Nathan Hjelm	e4f920f6f9	opal/progress: improve performance when there are no LP callbacks This commit adds another check to the low-priority callback conditional that short-circuits the atomic-add if there are no low-priority callbacks. This should improve performance in the common case. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-22 09:52:37 -06:00
Nathan Hjelm	143a93f379	opal/sync: remove usage of OPAL_ENABLE_MULTI_THREADS The OPAL_ENABLE_MULTI_THREADS macro is always defined as 1. This was causing us to always use the multi-thread path for synchronization objects. The code has been updated to use the opal_using_threads() function. When MPI_THREAD_MULTIPLE support is disabled at build time (2.x only) this function is a macro evaluating to false so the compiler will optimize out the MT-path in this case. The OPAL_ATOMIC_ADD_32 macro has been removed and replaced by the existing OPAL_THREAD_ADD32 macro. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-22 09:52:37 -06:00
Gilles Gouaillardet	bf133c401e	pmix2x: fix a typo in dereg_event_hdlr() This bug has been fixed when open-mpi/ompi@dde69e1be2 was backported into upstream pmix in pmix/master@5e5577778c but it was not fixed in open-mpi/ompi	2016-06-22 13:45:29 +09:00
Jeff Squyres	af614afedf	Merge pull request #1800 from thananon/common_sym_fix Fixed common symbol error in btl/usnic.	2016-06-21 20:11:52 -04:00
Ralph Castain	441739b5a4	Cleanup a lagging message that generates an annoying (but seemingly harmless) warning	2016-06-20 12:23:27 -07:00
Thananon Patinyasakdikul	afe07cd5d5	Fixed common symbol in btl/usnic - This commit fixes the accidental common symbol btl_usnic_lock - It also moves the btl_usnic_lock declaration to btl_usnic.h	2016-06-20 10:05:44 -07:00
Howard Pritchard	1bed9fdb59	Merge pull request #1799 from hppritcha/topic/help_aries_with_knl common/ugni: help out knl with aries	2016-06-20 08:09:24 -06:00
Ralph Castain	0ba02821e6	Add requested key and job-level info	2016-06-19 18:22:31 -07:00
Ralph Castain	0a29f5cb77	Sigh - missed two typos	2016-06-18 20:57:53 -07:00
Ralph Castain	dd38cf1fed	Fix typo	2016-06-18 20:56:43 -07:00
Howard Pritchard	8b53487977	common/ugni: help out knl with aries The way the gni btl is currently coded, it will run completely out of gas on KNL at 123 processes/node. Since there are bound to be those who try to run a MPI process/hyperthread on KNL nodes, the fma sharing mode needs to be requested. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-06-18 15:09:05 -05:00
Ralph Castain	dde69e1be2	Cleanup CIDs 1362763, 1362762, 1362760, 1362759, 1362758, 1362757, 1362756, 1362755, 1362754. Unsure how to resolve 1362761. Fixes #1792	2016-06-18 12:28:46 -07:00
Jeff Squyres	7a8d7fb948	openib: fix compiler warnings Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-06-18 07:15:11 -07:00
Jeff Squyres	c332ee5884	Merge pull request #1784 from thananon/fix_usnic_thread Fix btl/usnic deadlock when the connectivity check is turned off.	2016-06-17 11:15:14 -04:00
Nathan Hjelm	f59c2fce6b	Merge pull request #1786 from hjelmn/32_fix opal/progress: use 32-bit atomics for call counter	2016-06-17 08:54:41 -06:00
Nathan Hjelm	2e4141f20a	Merge pull request #1787 from hjelmn/asm_fix opal/asm: fix syntax of timer code for ia32	2016-06-17 08:50:57 -06:00
Ralph Castain	044c561cba	Roll to latest PMIx master	2016-06-16 17:30:30 -07:00
Nathan Hjelm	9c709966f7	opal/asm: fix syntax of timer code for ia32 Thanks to Paul Hargrove for pointing this out. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-16 16:55:01 -06:00
rhc54	702a982271	Merge pull request #1767 from rhc54/topic/pmix2 Enable the PMIx event notification capability	2016-06-16 15:27:43 -07:00
Nathan Hjelm	7349ddc937	patcher/overwrite: use OPAL_ASSEMBLY_ARCH to determine architecture Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-16 10:00:00 -06:00
Nathan Hjelm	dbd8369485	opal/progress: use 32-bit atomics for call counter This commit fixes a compile error on 32-bit platforms. The low-priority call counter was always using 64-bit atomics which will not work if 64-bit atomic math is not available. Updated to use 32-bit instead. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-16 09:01:19 -06:00
Thananon Patinyasakdikul	7bd18214a7	Fix btl/usnic deadlock when the connectivity check is turned off.	2016-06-15 07:42:55 -07:00
Jeff Squyres	b7e937fea5	Merge pull request #1778 from thananon/usnic_thread_safe Added MPI_THREAD_MULTIPLE support for btl/usnic.	2016-06-14 18:43:04 -04:00
Ralph Castain	5d330d5220	Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler. Add PMIx 2.0 Remove PMIx 1.1.4 Cleanup copying of component Add missing file Touchup a typo in the Makefile.am Update the pmix ext114 component Minor cleanups and resync to master Update to latest PMIx 2.x Update to the PMIx event notification branch latest changes	2016-06-14 13:08:41 -07:00
Jeff Squyres	5071602c59	PSM/PSM2: Disable signal handler hijacking by default Per discussion on https://github.com/open-mpi/ompi/pull/1767 (and some subsequent phone calls and off-issue email discussions), the PSM library is hijacking signal handlers by default. Specifically: unless the environment variables `IPATH_NO_BACKTRACE=1` (for PSM / Intel TrueScale) is set, the library constructor for this library will hijack various signal handlers for the purpose of invoking its own error reporting mechanisms. This may be a bit surprising, but is not a problem, per se. The real problem is that older versions of at least the PSM library do not unregister these signal handlers upon being unloaded from memory. Hence, a segv can actually result in a double segv (i.e., the original segv and then another segv when the now-non-existent signal handler is invoked). This PSM signal hijacking subverts Open MPI's own signal reporting mechanism, which may be a bit surprising for some users (particularly those who do not have Intel TrueScale). As such, we disable it by default so that Open MPI's own error-reporting mechanisms are used. Additionally, there is a typo in the library destructor for the PSM2 library that may cause problems in the unloading of its signal handlers. This problem can be avoided by setting `HFI_NO_BACKTRACE=1` (for PSM2 / Intel OmniPath). This is further compounded by the fact that the PSM / PSM2 libraries can be loaded by the OFI MTL and the usNIC BTL (because they are loaded by libfabric), even when there is no Intel networking hardware present. Having the PSM/PSM2 libraries behave this way when no Intel hardware is present is clearly undesirable (and is likely to be fixed in future releases of the PSM/PSM2 libraries). This commit sets the following two environment variables to disable this behavior from the PSM/PSM2 libraries (if they are not already set): * IPATH_NO_BACKTRACE=1 * HFI_NO_BACKTRACE=1 If the user has set these variables before invoking Open MPI, we will not override their values (i.e., their preferences will be honored). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-06-14 11:45:23 -07:00
Thananon Patinyasakdikul	ee85204c12	Added MPI_THREAD_MULTIPLE support for btl/usnic.	2016-06-13 13:47:06 -07:00
Nathan Hjelm	253c91972e	arm64: add atomic swap function This commit adds the opal_atomic_swap_32 and opal_atomic_swap_64 functions. This should improve the performance of btl/vader. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-11 09:46:29 -06:00
Nathan Hjelm	109389dce2	Merge pull request #1634 from hjelmn/cma cma: add support for MIPS and ARM	2016-06-11 09:20:28 -06:00
Ralph Castain	d58da99dbc	Shift to memcpy to avoid Solaris issues	2016-06-09 12:07:17 -07:00
Gilles Gouaillardet	1f651d17c1	opal/util/ethtool: fix (infamous) strncpy usage the infamous strncpy does not NULL terminate the destination when the buffer is truncated do it ourself ! fix CID 1362576	2016-06-09 09:54:50 +09:00
Ralph Castain	8fa935534b	Abstract the strnlen function for environments that do not have it (e.g., Solaris 10)	2016-06-08 10:12:43 -07:00
Nathan Hjelm	f8957f24af	Merge pull request #1768 from hjelmn/cq_fix btl/openib: fix cq resize calculation	2016-06-07 21:34:36 -06:00
Nathan Hjelm	17ae1aceeb	btl/openib: fix rdmacm The rdma_disconnect function specifies that both the server and client should call rdma_disconnect. The code was not calling rdma_disconnect on an endpoint if the event came before the endpoint finalization. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-07 17:53:58 -06:00
Nathan Hjelm	dd519c55b1	btl/openib: fix cq resize calculation Before dynamic add_procs the openib_btl_size_queues was called exactly once for non-dynamic jobs. Now the function is called on each new connection so the calculation was wrong. Re-wrote the function to correctly calculate the CQ size and only attempt to adjust the CQ if the requested size has changed. This fixes a bug when using the openib btl on psm2 hardware that is caused by the time needed to resize a CQ. The overhead was causing udcm to timeout and fail. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-07 16:05:56 -06:00
Nathan Hjelm	e082ed752a	opal/progress: fix warnings This commit fixes several warning introduced by open-mpi/ompi@fc26d9c69f . Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-06 22:18:24 -06:00
Nathan Hjelm	4a2bd83302	opal/cma: improve Linux CMA detection This commit improves the CMA detection when the installed glibc doesn't have support for CMA. In this case we need to verify that the syscall numbers in opal/include/opal/sys/cma.h are valid for the architecture. This verification is done by attempting to use CMA while including the internal header. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-05 22:29:07 -06:00
Gilles Gouaillardet	b707d138fe	pmix114/pmix1_client: fix misc memory leaks Fixes CID 1325146-1325149	2016-06-06 09:33:35 +09:00
Nathan Hjelm	0084ad0d1b	opal: add armv8 support This commit adds assembly support for aarch64. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-03 10:32:21 -06:00
Nathan Hjelm	6169d03ea3	btl: adjust values of new atomic flags Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-06-02 19:21:34 -06:00
Nathan Hjelm	9f43b23725	Merge pull request #1710 from hjelmn/ugni_atomics Additional ugni atomics	2016-06-02 18:25:49 -06:00
George Bosilca	e1c6b0e4a7	Some compilers are more than picky.	2016-06-03 09:04:34 +09:00
Nathan Hjelm	d9fc855955	Merge pull request #1743 from hjelmn/gcc_atomics_fix atomic/gcc: add check for 128-bit CAS being lock-free	2016-06-02 16:55:31 -06:00
Nathan Hjelm	d86e41ea13	atomic/gcc: add check for 128-bit CAS being lock-free Compiler implementations are free to include support for atomics that use locks. Unfortunately lock-free and lock atomics do not mix. Older versions of llvm on OS X use locks to provide __atomic_compare_exchange on 128-bit values but are lock-free on 64-bit values. This screws up our lifo implementation which mixes 64-bit and 128-bit atomics on the same values to improve performance. This commit adds a configure-time check if 128-bit atomics are lock free. If they are not then the 128-bit __atomic CAS is disabled and we check for the __sync version as a fallback. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-02 15:59:05 -06:00
Nathan Hjelm	5aab4b2d51	Merge pull request #1662 from ggouaillardet/topic/amd64_atomic amd64/atomic: silence warnings	2016-06-02 14:10:20 -06:00
George Bosilca	87b1d17e7e	Remove warnings. clang 7.0 with the picky option on is extremely verbose, and complains about almost everything. Trying to make him happy, at least regarding the datatype engine.	2016-06-03 00:56:24 +09:00
rhc54	483b9c370a	Merge pull request #1741 from rhc54/topic/pmix114 Update to 1.1.4rc3	2016-06-02 06:57:37 -07:00
Nathan Hjelm	fc26d9c69f	Merge pull request #1734 from hjelmn/progress_threading opal/progress: make progress function registration mt safe	2016-06-02 06:35:59 -06:00
Ralph Castain	ecea1e3bb5	Update to 1.1.4rc3	2016-06-01 20:56:07 -07:00
Nathan Hjelm	2fad3b9bc6	opal/progress: make progress function registration mt safe This commit fixes a bug in opal progress registration that can cause crashes when a progress function is registered while another thread is in opal_progress(). Before this commit realloc is used to allocate more space for progress functions but it is possible for a thread in opal_progress() to try to read from the array that is freed by realloc before the array is re-assigned when realloc returns. To prevent this race use malloc + memcpy to fill the new array and atomically swap out the old and new array pointers. Per suggestion we now allocate a default of 8 slots for callbacks and double the current number when we run out of space. This commit also fixes leaking the callbacks_lp array. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-01 20:57:19 -06:00
George Bosilca	d9fb59bea5	Update the synchronization primitive Add comments and make sure we correctly return the status of the synchronization primitive, especially if it was completed with error.	2016-06-02 11:53:56 +09:00
Nathan Hjelm	f33bbfd381	atomic: add support for __atomic builtins (#1735 ) * atomic: add support for __atomic builtins This commit adds support for the gcc __atomic builtins. The __sync builtins are deprecated and have been replaced by these atomics. In addition, the new atomics support atomic exchange which was not supported by __sync. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> * atomic: add support for transactional memory This commit adds support for using transactional memory when using opal atomic locks. This feature is enabled if the __HLE__ feature is available and the gcc builtin atomics are in use. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-01 21:23:47 -04:00
rhc54	b85a5e62ab	Merge pull request #1739 from rhc54/topic/pmix Split the pmix external component into one for the 1.1.4 release, and…	2016-06-01 16:24:44 -07:00
Ralph Castain	12ecf972af	Split the pmix external component into one for the 1.1.4 release, and another for the upcoming 2.0 release. Clean up the configury so the components look for a series-specific function instead of running a program. NOTE: the changes for the 2.0 series are not yet in the PMIx master.	2016-06-01 14:15:24 -07:00
Nathan Hjelm	ceb2912838	Merge pull request #1736 from hjelmn/ugni_fixes ugni BTL fixes	2016-06-01 14:59:55 -06:00
Jeff Squyres	d175fd692d	README.ompi: track patches added to hwloc Track post-v1.11.3-release patches applied to the hwloc copy embedded in Open MPI. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-06-01 07:17:05 -07:00
Jeff Squyres	3867bd3640	hwloc.m4: only check for valgrind in non-embedded mode This fixes https://github.com/open-mpi/ompi/issues/1732: i.e., the case where the outer project has its own check for <valgrind/valgrind.h>, but also supplements CPPFLAGS (to find Valgrind's header files) before doing that check. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Ideally, we would tell OMPI to disable autoconf's caching of our valgrind check result so that its check gets the right result after adding CPPFLAGS. Not sure if we can do that. For now, just disable our Valgrind code in embedded mode. This will keep the x86 backend enabled under Valgrind but it will auto-disable itself when finding identical APIC ids anyway (because CPUID returns same outputs for all PUs). Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr> Fixes open-mpi/ompi#1732 (cherry picked from commit open-mpi/hwloc@8b44fb1c81)	2016-06-01 06:58:53 -07:00
Gilles Gouaillardet	57978a75d0	Merge pull request #1717 from ggouaillardet/topic/lex_cleanup configury: clean the flex generated .c files	2016-06-01 13:06:21 +09:00
Nathan Hjelm	5d4bcce042	Merge pull request #1700 from shamisp/topic/cma_config CMA: Fixing logic for CMA system call detection	2016-05-31 20:33:48 -06:00
Nathan Hjelm	340152a635	Merge pull request #1720 from shamisp/topic/vader/max_addr VADER: Adjusting VADER_MAX_ADDRESS for non x86 platforms.	2016-05-31 20:33:28 -06:00
Gilles Gouaillardet	5f565dfec3	configury: clean the flex generated .c files	2016-06-01 11:13:31 +09:00
Nathan Hjelm	bf10d79914	btl/ugni: remove erroneous unlock The endpoint lock was being released twice in mca_btl_ugni_get_ep. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-31 16:52:53 -06:00
Nathan Hjelm	cc96097873	btl/ugni: fix bug when attempting unaligned get on aries This commit fixes a programming error when using an aries nic. The documentation of ugni shows that only the local alignment restriction for get was lifted on aries. There is still a remote address alignment restriction. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-31 16:52:09 -06:00
Jeff Squyres	5cfee95ea4	hwloc1113: add missing file to Makefile.am Lack of this file causes a failure when you run autogen.pl on a distribution tarball. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-31 09:57:50 -07:00
Nathan Hjelm	60519c2b4e	cma: add support for MIPS and ARM Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-05-30 12:13:20 -06:00
George Bosilca	d2abff583e	Fix race condition during BTL TCP tear-down. bot🏷️bug bot:assign:@hjelmn	2016-05-30 10:47:14 -05:00
Jeff Squyres	e126d2cd18	Merge pull request #1584 from bgoglin/master Update hwloc to v1.11.3	2016-05-28 11:01:54 -04:00
Ralph Castain	55923eacd3	Stealing some pieces of Josh Hursey's PR #1583 and modifying a bit, allow the opal/pmix external component to handle both PMIx 1.1.4 and PMIx 2.0 versions. Automatically detect the version of the target external library and adjust the only two APIs that changed (PMIx_Init and PMIx_Finalize) Rename temp vars in .m4 to avoid conflict with Travis	2016-05-27 08:06:31 -07:00
Nathan Hjelm	28dfa36a3f	btl/ugni: fix bug when attempting unaligned get on aries This commit fixes a programming error when using an aries nic. The documentation of ugni shows that only the local alignment restriction for get was lifted on aries. There is still a remote address alignment restriction. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-27 08:22:13 -06:00
Nathan Hjelm	c19426ac1b	btl/ugni: add support for additional atomic operations This commit adds support for Cray Aries atomic operations. This includes 32-bit and floating point support. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-27 08:22:13 -06:00
Nathan Hjelm	23fe19a956	btl: add support for more atomics This commit add support for more atomic operations and type. The operations added are logical and, logical or, logical xor, swap, min, and max. New types are 32-bit int by using the MCA_BTL_ATOMIC_FLAG_32BIT flag, 64-bit float by using the MCA_BTL_ATOMIC_FLAG_FLOAT flag, and 32-bit float by using both flags. Floating point numbers are supported by packing the number in as an int64_t or int32_t. We will update the btl interface in the future to make this less confusing. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-27 08:22:13 -06:00
Nathan Hjelm	d25b846c01	Merge pull request #1704 from hpcraink/pr/configure_framework Fix configure for FreePGI on OSX	2016-05-26 17:01:08 -06:00
Nathan Hjelm	8c9292d5d1	Merge pull request #1721 from hjelmn/xrc_fix btl/openib: fix XRC WQE calculation	2016-05-26 17:00:31 -06:00
Nathan Hjelm	56bdcd0888	btl/openib: fix XRC WQE calculation Before dynamic add_procs support was committed to master we called add_procs with every proc in the job. The XRC code in the openib btl was taking advantage of this and setting the number of work queue entries (WQE) based on all the procs on a remote node. Since that is no longer the case we can not simply increment the sd_wqe field on the queue pair. To fix the issue a new field has been added to the xrc queue pair structure to keep track of how many wqes there are total on the queue pair. If a new endpoint is added that increases the number of wqes and the xrc queue pair is already connected the code will attempt to modify the number of wqes on the queue pair. A failure is ignored because all that will happen is the number of active send work requests on an XRC queue pair will be more limited. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-26 15:58:31 -06:00
Aurelien Bouteiller	49bd28d0ac	Merge pull request #1714 from hjelmn/scif_exclusivity btl/scif: reduce default exclusivity	2016-05-26 17:53:11 -04:00
Pavel Shamis (Pasha)	60fd25f3fb	VADER: Adjusting VADER_MAX_ADDRESS for non x86 platforms. The original VADER_MAX_ADDRESS was tunned for x86_64 platforms only. For non x86_64 platforms we can use XPMEM_MAXADDR_SIZE. Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>	2016-05-26 16:38:04 -05:00
Nathan Hjelm	99627319f0	btl/ugni: reduce overhead of progress function This commit reduces the overhead of calling the ugni progress function. It does the following: - Check for new connections once every eight calls. - Do not call remote smsg progress unless we are connected to at least one remote peer. - Do not call rdma progress unless at least one rdma fragment is outstanding. - Check endpoint wait list size before obtaining a lock. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 14:27:34 -06:00
Nathan Hjelm	5caf12cd9b	btl/scif: reduce default exclusivity This commit reduces the default exclusivity so that btl/scif is not used for send/recv over other shared memory transports. Fixes open-mpi/ompi#1712 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-25 14:25:07 -06:00
Rainer Keller	3727cba9bb	Fix compilation for FreePGI on OSX Our checks and the ones of libevent are somewhat flawed. If adding multiple "-framework" to CXXFLAGS or CFLAGS, we strip the keyword from the command-line, not good. libevent however assumes plain gcc without testing properly that the compiler supports -Wno-deprecated-declarations.	2016-05-25 09:12:39 +02:00
Nathan Hjelm	461ca1203b	Merge pull request #1703 from hjelmn/grdma_cuda_fix rcache/grdma: fix typo in cuda code	2016-05-24 18:51:22 -06:00
bosilca	b90c83840f	Refactor the request completion (#1422 ) * Remodel the request. Added the wait sync primitive and integrate it into the PML and MTL infrastructure. The multi-threaded requests are now significantly less heavy and less noisy (only the threads associated with completed requests are signaled). * Fix the condition to release the request.	2016-05-24 18:20:51 -05:00
Nathan Hjelm	af52dad8f8	rcache/grdma: fix typo in cuda code Fixes open-mpi/ompi#1702 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-24 15:56:39 -06:00
Pavel Shamis (Pasha)	d984b4b3f9	CMA: Fixing logic for CMA system call detection The OPAL_CMA_NEED_SYSCALL_DEFS is always defined/set to 0 or 1. Therefore instead of checking if the macro is defined, we have to look at the value itself. Signed-off-by: Pavel Shamis (Pasha) <pasharesearch@gmail.com>	2016-05-24 14:53:25 -05:00
Ralph Castain	80f4e3b872	Fix the --tune problem by searching the argv for MCA params in advance of opal_init_util. Only search the first app_context as we historically have done - we can debate whether or not to search all app_contexts	2016-05-23 21:09:44 -07:00
Nathan Hjelm	37e9e2c660	mca/base: fix typo in flag enumeration This commit fixes a typo in flag enumeration that can cause the parser to miss valid flags or crash. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-23 12:21:34 -06:00
Artem Polyakov	725eea2819	Fix base64 implementation in pmix framework. In the commit `80f07b65f1` setting of '-' marker used as the string termination sign was moved from base64 code: from: `80f07b65f1 (diff-1b10896c267d2591dc2c08fd0542ab67L491)` to: `80f07b65f1 (diff-1b10896c267d2591dc2c08fd0542ab67R189)` However the decoding function wasn't fixed and still expects on extra byte at the end of the encoded string which leads to data truncation during extraction (was noticed on standalone code that was using base64 from OMPI).	2016-05-23 23:30:31 +06:00
Gilles Gouaillardet	d5a2ac6f2f	btl/openib: fix #if vs #ifdef	2016-05-23 14:27:33 +09:00
Gilles Gouaillardet	5a8cbe5a8f	btl/openib: remove obsolete reference to MEMORY_LINUX_MALLOC_ALIGN_ENABLED macro	2016-05-23 14:12:21 +09:00
Gilles Gouaillardet	8466a3daf3	pmix: update .gitignore git ignore opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in git rm opal/mca/pmix/pmix114/pmix/include/pmix/autogen/config.h.in git ignore opal/mca/pmix/pmix*/...	2016-05-23 11:58:07 +09:00
Nathan Hjelm	31bfeede82	bml/r2: always add btl progress function This commit changes the behavior of bml/r2 from conditionally registering btl progress functions to always registering progress functions. Any progress function beloning to a btl that is not yet in use is registered as low-priority. As soon as a proc is added that will make use of the btl is is re-registered normally. This works around an issue with some btls. In order to progress a first message from an unknown peer both ugni and openib need to have their progress functions called. If either btl is not in use after the first call to add_procs the callback was never happening. This commit ensures the btl progress function is called at some point but the number of progress callbacks is reduced from normal to ensure lower overhead when a btl is not used. The current ratio is 1 low priority progress callback for every 8 calls to opal_progress(). Fixes open-mpi/ompi#1676 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-21 15:54:04 -04:00
Ralph Castain	4e0749f03d	Remove verbose error messages	2016-05-20 10:04:26 -07:00
Ralph Castain	42ecffb6d0	Move the registration of MCA params out of the init of the var system - put them in with the rest of the OPAL MCA param registrations Take another shot at untangling the spaghetti orterun: fix for command line parsing orte-submit calls opal_init_util () before parsing out MCA command line options (-mca, -am, etc). This prevents mpirun from setting opal MCA variables for some frameworks as well as the MCA base. This is because when a framework is opened all of its variables are set to read-only. Eventually we want to lift this restriction on some MCA variables but since -mca is affected we must parse out the MCA command line options before opal_init_util(). This commit fixes the bug by adding a new option to opal_cmd_line_parse (ignore unknown option) so orte-submit can pre-parse the command line for MCA options. Signed-off-by: Nathan Hjelm <hjelmn@me.com> Minor cleanups to avoid releasing/recreating the cmd line	2016-05-20 09:59:50 -07:00
Brice Goglin	ca621330a6	Update hwloc to v1.11.3 Remove contrib/windows/ Merge hwlocXYZ/hwloc/README-ompi.txt back into hwlocXYZ/README-ompi.txt instead of having both. Add README.txt in new automake-required directory contrib/systemd/ Keep the following patches applied since they are not in 1.11.3 linux: actually enable libudev based on the result of AC_CHECK_LIB (cherry picked from open-mpi/hwloc@9549fd59af) configure: check the actual may_alias syntax that we use (cherry picked from open-mpi/hwloc@0ab7af5e90)	2016-05-20 07:20:16 +02:00
Gilles Gouaillardet	5ec1eedbae	Merge pull request #1682 from ggouaillardet/topic/fix-ethtool-again opal/util/ethtool: use system ethtool_cmd_speed when available	2016-05-20 10:30:43 +09:00
Gilles Gouaillardet	cbbdce05b1	pmix/pmix114: silence a warning	2016-05-20 09:35:26 +09:00
Gilles Gouaillardet	ed3fd1775f	rcache/grdma: silence a warning	2016-05-20 09:30:29 +09:00
Gilles Gouaillardet	a01a5487a8	opal/util/ethtool: use system ethtool_cmd_speed when available Refs: open-mpi/ompi#1679	2016-05-20 09:05:09 +09:00
rhc54	99d3c283f5	Merge pull request #1681 from rhc54/topic/pmixupdate Update PMIx 114 to current release candidate	2016-05-19 13:50:16 -07:00
Ralph Castain	6f743f81b6	Update PMIx 114 to current release candidate	2016-05-19 12:55:05 -07:00
Jeff Squyres	87233aae49	ethtool: better handle portability Be sure to handle the case where we don't have ethtool support at all. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-19 10:57:14 -07:00
Gilles Gouaillardet	fd93d236b1	opal/util/ethtool: fix compilation on older Linux when struct ethtool_cmd has no speed_hi field Refs: open-mpi/ompi#1628	2016-05-19 11:58:04 +09:00
Jeff Squyres	66f53ec29a	Merge pull request #1628 from kmroz/wip-btl-tcp-ethtool-speed btl/tcp: autodetect bandwidth and latency if unset by the user	2016-05-18 12:12:55 -04:00
Nathan Hjelm	9371a6a52d	Merge pull request #1673 from hjelmn/fix_rcache_deadlock rcache: fix deadlock in multi-threaded environments	2016-05-18 08:32:21 -07:00
Karol Mroz	ca6ddf3270	btl/tcp: autodetect bandwidth and latency if unset Fixes open-mpi/ompi#120 Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-05-18 16:25:52 +02:00
Karol Mroz	b9c6c43c6b	btl/tcp: add default defines for bandwidth and latency Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-05-18 16:25:52 +02:00
Karol Mroz	31e33a64f9	opal/util: add function to obtain interface speed If kernel ethtool_cmd_speed() is not available, use copies if possible. Signed-off-by: Karol Mroz <mroz.karol@gmail.com>	2016-05-18 16:25:51 +02:00
Nathan Hjelm	ab8ed177f5	rcache: fix deadlock in multi-threaded environments This commit fixes several bugs in the registration cache code: - Fix a programming error in the grdma invalidation function that can cause an infinite loop if more than 100 registrations are associated with a munmapped region. This happens because the mca_rcache_base_vma_find_all function returns the same 100 registrations on each call. This has been fixed by adding an iterate function to the vma tree interface. - Always obtain the vma lock when needed. This is required because there may be other threads in the system even if opal_using_threads() is false. Additionally, since it is safe to do so (the vma lock is recursive) the vma interface has been made thread safe. - Avoid calling free() while holding a lock. This avoids race conditions with locks held outside the Open MPI code. Fixes open-mpi/ompi#1654. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-05-17 09:02:40 -06:00
Nathan Hjelm	f6938868bd	Merge pull request #1659 from hjelmn/sync_64 sync_builtin: check for 64-bit atomic support	2016-05-17 05:40:04 -07:00
rhc54	8b534e9897	Merge pull request #1668 from rhc54/topic/slurm When direct launching applications, we must allow the MPI layer to pr…	2016-05-16 12:23:19 -07:00
Howard Pritchard	1a676e5b35	pmix/cray: fix some breakage Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-05-16 12:45:05 -05:00
Gilles Gouaillardet	4e21933a74	memory/patcher: declare __curbrk as extern in order not to generate an (unitialized) common symbol	2016-05-16 09:30:11 +09:00
Ralph Castain	01ba861f2a	When direct launching applications, we must allow the MPI layer to progress during RTE-level barriers. Neither SLURM nor Cray provide non-blocking fence functions, so push those calls into a separate event thread (use the OPAL async thread for this purpose so we don't create another one) and let the MPI thread sping in wait_for_completion. This also restores the "lazy" completion during MPI_Finalize to minimize cpu utilization. Update external as well Revise the change: we still need the MPI_Barrier in MPI_Finalize when we use a blocking fence, but do use the "lazy" wait for completion. Replace the direct logic in MPI_Init with a cleaner macro	2016-05-14 16:37:00 -07:00
Gilles Gouaillardet	456b73da69	btl/openib: fix error path in init_one_device() do not explicitly release ib verbs components since they will be released in the object destructor Thanks Durga for the report	2016-05-13 09:03:48 +09:00
Gilles Gouaillardet	5dae7a47ff	amd64/atomic: silence warnings Solaris Studio compilers issue (tons of) warnings because one arguments of several __asm__ __volatile__ section is not needed	2016-05-11 11:26:50 +09:00
Jeff Squyres	30f913f217	Merge pull request #1652 from jsquyres/pr/remove-aix-timer timer/aix: remove stale code	2016-05-10 15:47:02 -04:00
Jeff Squyres	eccf0ff4cd	hwloc/external: set WRAPPER_EXTRA_* vars in proper location WRAPPER_EXTRA flags are checked before the POST_CONFIG macro is invoked. So set them in the main CONFIG macro. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-10 07:34:56 -07:00
Josh Hursey	44d95cb610	Merge pull request #1657 from bgoglin/hwloc-for-2.0 configure: check the actual may_alias syntax that we use	2016-05-09 13:37:08 -05:00
Ralph Castain	7767882346	Per user request, add some missing data and definitions: OPAL_PMIX_UNIV_RANK - synonym for OPAL_PMIX_GLOBAL_RANK OPAL_PMIX_APP_SIZE - #ranks in the application of this proc	2016-05-09 08:39:01 -07:00
Nathan Hjelm	d99a9786b6	sync_builtin: check for 64-bit atomic support This commit adds an additional check for 64-bit atomic support for __sync builtins. If 64-bit support is not available the opal_atomic_*_64 atomics are disabled. Signed-off-by: Nathan Hjelm <hjelmn@me.com>	2016-05-09 03:17:51 -06:00
Brice Goglin	6839d928c2	configure: check the actual may_alias syntax that we use xlc 13.1.0 crashes because of our may_alias attributes in nolibxml.c on Power7. libxml.c and nolibxml.c are the only may_alias users for now, so change our configure check to match the actual code using it. Thanks to Paul Hargrove for reporting and debugging the issue, and providing the patch. https://www.open-mpi.org/community/lists/devel/2016/05/18918.php (cherry picked from open-mpi/hwloc@0ab7af5e90)	2016-05-08 22:22:30 +02:00
Ralph Castain	7594b95e4b	Ensure the hwloc external header is include when --with-devel-headers is given	2016-05-08 10:18:14 -07:00
Jeff Squyres	acbd2c608d	memory/patcher: check for <sys/syscall.h> Thanks to Paul Hargrove for reporting. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-07 09:48:14 -07:00
Jeff Squyres	b4982d7725	timer/aix: remove stale code Per discussion on the mailing list and with IBM, remove the AIX timer code (since AIX is no longer supported). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-05-07 09:31:34 -07:00
Ralph Castain	7e5ef6a240	Fix the env_list support - the MCA param was being set way too early, so provide a "backdoor" way of providing the value	2016-05-06 15:38:39 -07:00
Ralph Castain	58dd41facf	Repair the processing of cmd line options that mapped to MCA params. This was responsible for breaking things like map-by <foo>. Remove debug, let orterun send terminate cmd to DVM Recover the DVM support	2016-05-06 13:14:03 -07:00

... 2 3 4 5 6 ...

4444 Коммитов