openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	01a653d50a	Remove a debug print in comm_cid.c. Update PMIx2 to include the revised PMIx_Get logic for higher performance by reducing the number of hash table lookups. Fix a bug where requests for data from a proc in another nspace could hang, or result in "not found". Remove stale file reference Restore autogen pass thru pmix Remove generated file	2016-07-20 00:58:19 -07:00
Nathan Hjelm	03bce91de8	pmix/pmix2x: add missing increment in loop This commit fixes a bug in the pmix2x client code where a loop variable is not correctly incremented. This was leading to hangs and crashes when creating intercommunicators. Also fixed two double increments in other loops. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-07-18 10:35:05 -06:00
Jeff Squyres	72f41d4490	pmix: replace all tabs with spaces No code or logic changes Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-07-17 15:08:33 -04:00
Jeff Squyres	1c32742c66	pmix_ext20: fix syntax error Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-07-17 15:04:12 -04:00
Ralph Castain	99f7096031	Fix permissions	2016-07-16 21:03:55 -07:00
Ralph Castain	d4071fbd1c	Fix dynamic operations by ensuring that we only fire the debugger release if the debugger is attached, and that the OPAL pmix key for directing events to non-default handlers matches the PMIx spelling	2016-07-16 13:20:41 -07:00
Ralph Castain	1ceb35ba5c	Fix singletons - do not include the PMIx tool URI in the environment provided to child processes	2016-07-13 17:33:34 -07:00
Ralph Castain	20a91c2baf	Add a new --continuous flag to mpirun that directs ORTE to let a job continue running as app procs terminate. Don't attempt to restart them. Add event notification of abnormally terminating procs, and demonstrate that in the mpi_spin test program. Cleanup debug message	2016-07-13 15:28:33 -07:00
Artem Polyakov	72585a905f	opal/pmix: add blocking Fence to SLURM components. Blocking fence is used in yalla del proc. Native pmix exposes this functionality. We need to expose it for SLURM's s1/s2 components as well. Also this commit fixes uninitialized `rc` in fencenb's of both components.	2016-07-11 09:43:15 +03:00
Artem Polyakov	8e16f47492	Merge pull request #1688 from artpol84/fix_base64 Fix base64 implementation in pmix framework.	2016-07-07 10:47:50 +06:00
Gilles Gouaillardet	acda07472a	configury: revamp and re-ident sub configure.m4 after open-mpi/ompi@846360fd4c	2016-07-06 11:59:51 +09:00
Gilles Gouaillardet	846360fd4c	configury: correctly perform make distclean when {libevent,hwloc,pmix} are external components Thanks Jeff for the guidance Fixes open-mpi/ompi#1683 note: in order to keep this commit easy to review, some AS_IF([...]) were replaced with AS_IF([false], ...) or AS_IF_([true], ...) these will be removed and re-idented in a subsequent commit	2016-07-06 11:57:24 +09:00
Ralph Castain	ee56d9dc1a	Shorten the session directory name as some OS's are now providing unusually long temp directory names, causing us to overflow the sockaddr field	2016-07-05 14:59:50 -07:00
Ralph Castain	7e0af3f4f0	Update pmix2x to track upstream changes	2016-07-05 11:54:22 -07:00
Gilles Gouaillardet	267821f0dd	pmix2x/pmix: fix a typo in PMIx_tool_init() and remove now useless local variable i	2016-07-05 13:47:50 +09:00
Gilles Gouaillardet	efce8cc734	pmix2x/pmix: add missing include files pmix cannot be built on alpine linux because of some missing includes. uid_t and gid_t are defined in unistd.h or sys/types.h, and unistd.h is not indirectly pulled under alpine linux, so do it manually. Thanks N.L.K Nguyen for the report (back-ported from upstream pmix/master@c8d55350a9)	2016-07-05 09:03:14 +09:00
Ralph Castain	c9ada8e095	Silence Coverity warnings	2016-07-03 20:45:08 -07:00
Ralph Castain	673f82e2b6	Update the PMIx listener to avoid leaking sockets into children, and better handle race condition errors	2016-07-03 08:23:33 -07:00
Nathan Hjelm	01d6da31af	btl/openib: fix rdmacm locking bug This commit fixes a long standing bug in rdmacm. It is required that the thread that calls mca_btl_openib_endpoint_cpc_complete holds the endpoint lock. This was not the case for rdmacm. This causes debug builds to abort. This change also required changing mca_btl_openib_endpoint_send_cts to require the endpoint lock to be held when calling. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-30 15:50:07 -06:00
Nathan Hjelm	cc2b3e0c3f	Merge pull request #1830 from hjelmn/rdmacm_test Test for rdmacm hang fix	2016-06-30 10:41:46 -06:00
Nathan Hjelm	960fcd292c	btl/openib: fix rdma hang This commit is an attempt to fix a hang in finalize of rdmacm. This fixes a path where no rdmacm client is found for an endpoint. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-29 20:31:26 -06:00
Ralph Castain	6e434d6785	Add support for PMIx tool connections and queries. Initially only support a request to list all known namespaces (jobids) from ORTE, but other folks will extend that support to include additional information Update to match PMIx RFC Fix configury to point to correct libevent and hwloc locations	2016-06-29 19:19:19 -07:00
Jeff Squyres	f18d6606da	Merge pull request #1824 from hjelmn/rdmacm_fix btl/openib: fix segmentation fault	2016-06-28 18:10:35 -04:00
Nathan Hjelm	8128c8eb29	btl/openib: fix segmentation fault This commit fixes a segmentation fault that occurs if a device can be initialized but not used. In this case the devices_count is not equal to the number of usable devices in the devices pointer array. Thanks to @artpol84 for tracking this down. Fixes open-mpi/ompi#1823 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-28 10:31:32 -06:00
Nathan Hjelm	955269b4f1	Merge pull request #1816 from hjelmn/request_perfm_regression opal/sync: fix race condition	2016-06-28 09:12:00 -06:00
Artem Polyakov	541715572f	Fix MPI_Waitany and MPI_Waitsome (request handling related)	2016-06-28 16:40:00 +03:00
Artem Polyakov	8d011ea403	Fix Mellanox copyright.	2016-06-26 21:01:19 -06:00
Nathan Hjelm	fb455f0802	opal/sync: fix race condition This commit fixes a race condition discovered by @artpol84. The race happens when a signalling thread decrements the sync count to 0 then goes to sleep. If the waiting thread runs and detects the count == 0 before going to sleep on the condition variable it will destroy the condition variable while the signalling thread is potentially still processing the completion. The fix is to add a non-atomic member to the sync structure that indicates another process is handling completion. Since the member will only be set to false by the initiating thread and the completing thread the variable does not need to be protected. When destoying a condition variable the waiting thread needs to wait until the singalling thread is finished. Thanks to @artpol84 for tracking this down. Fixes #1813 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-26 20:14:01 -06:00
Nathan Hjelm	dac9201f3b	Merge pull request #1770 from hjelmn/rdma_wth btl/openib: fix rdmacm	2016-06-24 22:46:53 -06:00
Nathan Hjelm	2992d6d238	Merge pull request #1808 from abjoshi-brcm/timer_arm64 arm64: add timer support	2016-06-23 07:10:56 -06:00
Abhishek Joshi	f06f7eb3e6	arm64: add timer support Signed-off-by: Sreenidhi Bharathkar Ramesh <sreenidhi-bharathkar.ramesh@broadcom.com>	2016-06-23 11:01:00 +00:00
Ralph Castain	08b1438f15	Add missing PMIx range value so OPAL and PMIx align again	2016-06-22 22:03:25 -07:00
Nathan Hjelm	55d1933a89	opal/sync: fix warnings Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-22 15:03:21 -06:00
Nathan Hjelm	e4f920f6f9	opal/progress: improve performance when there are no LP callbacks This commit adds another check to the low-priority callback conditional that short-circuits the atomic-add if there are no low-priority callbacks. This should improve performance in the common case. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-22 09:52:37 -06:00
Nathan Hjelm	143a93f379	opal/sync: remove usage of OPAL_ENABLE_MULTI_THREADS The OPAL_ENABLE_MULTI_THREADS macro is always defined as 1. This was causing us to always use the multi-thread path for synchronization objects. The code has been updated to use the opal_using_threads() function. When MPI_THREAD_MULTIPLE support is disabled at build time (2.x only) this function is a macro evaluating to false so the compiler will optimize out the MT-path in this case. The OPAL_ATOMIC_ADD_32 macro has been removed and replaced by the existing OPAL_THREAD_ADD32 macro. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2016-06-22 09:52:37 -06:00
Gilles Gouaillardet	bf133c401e	pmix2x: fix a typo in dereg_event_hdlr() This bug has been fixed when open-mpi/ompi@dde69e1be2 was backported into upstream pmix in pmix/master@5e5577778c but it was not fixed in open-mpi/ompi	2016-06-22 13:45:29 +09:00
Jeff Squyres	af614afedf	Merge pull request #1800 from thananon/common_sym_fix Fixed common symbol error in btl/usnic.	2016-06-21 20:11:52 -04:00
Ralph Castain	441739b5a4	Cleanup a lagging message that generates an annoying (but seemingly harmless) warning	2016-06-20 12:23:27 -07:00
Thananon Patinyasakdikul	afe07cd5d5	Fixed common symbol in btl/usnic - This commit fixes the accidental common symbol btl_usnic_lock - It also moves the btl_usnic_lock declaration to btl_usnic.h	2016-06-20 10:05:44 -07:00
Howard Pritchard	1bed9fdb59	Merge pull request #1799 from hppritcha/topic/help_aries_with_knl common/ugni: help out knl with aries	2016-06-20 08:09:24 -06:00
Ralph Castain	0ba02821e6	Add requested key and job-level info	2016-06-19 18:22:31 -07:00
Ralph Castain	0a29f5cb77	Sigh - missed two typos	2016-06-18 20:57:53 -07:00
Ralph Castain	dd38cf1fed	Fix typo	2016-06-18 20:56:43 -07:00
Howard Pritchard	8b53487977	common/ugni: help out knl with aries The way the gni btl is currently coded, it will run completely out of gas on KNL at 123 processes/node. Since there are bound to be those who try to run a MPI process/hyperthread on KNL nodes, the fma sharing mode needs to be requested. Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2016-06-18 15:09:05 -05:00
Ralph Castain	dde69e1be2	Cleanup CIDs 1362763, 1362762, 1362760, 1362759, 1362758, 1362757, 1362756, 1362755, 1362754. Unsure how to resolve 1362761. Fixes #1792	2016-06-18 12:28:46 -07:00
Jeff Squyres	7a8d7fb948	openib: fix compiler warnings Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2016-06-18 07:15:11 -07:00
Jeff Squyres	c332ee5884	Merge pull request #1784 from thananon/fix_usnic_thread Fix btl/usnic deadlock when the connectivity check is turned off.	2016-06-17 11:15:14 -04:00
Nathan Hjelm	f59c2fce6b	Merge pull request #1786 from hjelmn/32_fix opal/progress: use 32-bit atomics for call counter	2016-06-17 08:54:41 -06:00
Nathan Hjelm	2e4141f20a	Merge pull request #1787 from hjelmn/asm_fix opal/asm: fix syntax of timer code for ia32	2016-06-17 08:50:57 -06:00
Ralph Castain	044c561cba	Roll to latest PMIx master	2016-06-16 17:30:30 -07:00

1 2 3 4 5 ...

4251 Коммитов