Ralph Castain
71de03fc67
Cleanup the new naming requirements to ensure that info is correctly retrieved
...
Cleanup permissions
Restore singleton operations
2016-07-21 09:46:03 -07:00
Ralph Castain
2b55ee8118
Cleanup Coverity warnings
2016-07-20 20:31:58 -07:00
Ralph Castain
01a653d50a
Remove a debug print in comm_cid.c. Update PMIx2 to include the revised PMIx_Get logic for higher performance by reducing the number of hash table lookups. Fix a bug where requests for data from a proc in another nspace could hang, or result in "not found".
...
Remove stale file reference
Restore autogen pass thru pmix
Remove generated file
2016-07-20 00:58:19 -07:00
Nathan Hjelm
03bce91de8
pmix/pmix2x: add missing increment in loop
...
This commit fixes a bug in the pmix2x client code where a loop
variable is not correctly incremented. This was leading to hangs and
crashes when creating intercommunicators. Also fixed two double
increments in other loops.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-18 10:35:05 -06:00
Jeff Squyres
72f41d4490
pmix: replace all tabs with spaces
...
No code or logic changes
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-17 15:08:33 -04:00
Jeff Squyres
1c32742c66
pmix_ext20: fix syntax error
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-17 15:04:12 -04:00
Ralph Castain
99f7096031
Fix permissions
2016-07-16 21:03:55 -07:00
Ralph Castain
d4071fbd1c
Fix dynamic operations by ensuring that we only fire the debugger release if the debugger is attached, and that the OPAL pmix key for directing events to non-default handlers matches the PMIx spelling
2016-07-16 13:20:41 -07:00
Ralph Castain
1ceb35ba5c
Fix singletons - do not include the PMIx tool URI in the environment provided to child processes
2016-07-13 17:33:34 -07:00
Ralph Castain
20a91c2baf
Add a new --continuous flag to mpirun that directs ORTE to let a job continue running as app procs terminate. Don't attempt to restart them. Add event notification of abnormally terminating procs, and demonstrate that in the mpi_spin test program.
...
Cleanup debug message
2016-07-13 15:28:33 -07:00
Artem Polyakov
72585a905f
opal/pmix: add blocking Fence to SLURM components.
...
Blocking fence is used in yalla del proc. Native pmix exposes this functionality.
We need to expose it for SLURM's s1/s2 components as well.
Also this commit fixes uninitialized `rc` in fencenb's of both
components.
2016-07-11 09:43:15 +03:00
Artem Polyakov
8e16f47492
Merge pull request #1688 from artpol84/fix_base64
...
Fix base64 implementation in pmix framework.
2016-07-07 10:47:50 +06:00
Gilles Gouaillardet
acda07472a
configury: revamp and re-ident sub configure.m4 after open-mpi/ompi@846360fd4c
2016-07-06 11:59:51 +09:00
Gilles Gouaillardet
846360fd4c
configury: correctly perform make distclean when {libevent,hwloc,pmix} are external components
...
Thanks Jeff for the guidance
Fixes open-mpi/ompi#1683
note:
in order to keep this commit easy to review, some AS_IF([...]) were replaced with
AS_IF([false], ...) or AS_IF_([true], ...)
these will be removed and re-idented in a subsequent commit
2016-07-06 11:57:24 +09:00
Ralph Castain
ee56d9dc1a
Shorten the session directory name as some OS's are now providing unusually long temp directory names, causing us to overflow the sockaddr field
2016-07-05 14:59:50 -07:00
Ralph Castain
7e0af3f4f0
Update pmix2x to track upstream changes
2016-07-05 11:54:22 -07:00
Gilles Gouaillardet
267821f0dd
pmix2x/pmix: fix a typo in PMIx_tool_init()
...
and remove now useless local variable i
2016-07-05 13:47:50 +09:00
Gilles Gouaillardet
efce8cc734
pmix2x/pmix: add missing include files
...
pmix cannot be built on alpine linux because of some missing includes.
uid_t and gid_t are defined in unistd.h or sys/types.h, and unistd.h
is not indirectly pulled under alpine linux, so do it manually.
Thanks N.L.K Nguyen for the report
(back-ported from upstream pmix/master@c8d55350a9 )
2016-07-05 09:03:14 +09:00
Ralph Castain
c9ada8e095
Silence Coverity warnings
2016-07-03 20:45:08 -07:00
Ralph Castain
673f82e2b6
Update the PMIx listener to avoid leaking sockets into children, and better handle race condition errors
2016-07-03 08:23:33 -07:00
Nathan Hjelm
01d6da31af
btl/openib: fix rdmacm locking bug
...
This commit fixes a long standing bug in rdmacm. It is required that
the thread that calls mca_btl_openib_endpoint_cpc_complete holds the
endpoint lock. This was not the case for rdmacm. This causes debug
builds to abort. This change also required changing
mca_btl_openib_endpoint_send_cts to require the endpoint lock to be
held when calling.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-30 15:50:07 -06:00
Nathan Hjelm
cc2b3e0c3f
Merge pull request #1830 from hjelmn/rdmacm_test
...
Test for rdmacm hang fix
2016-06-30 10:41:46 -06:00
Nathan Hjelm
960fcd292c
btl/openib: fix rdma hang
...
This commit is an attempt to fix a hang in finalize of rdmacm. This fixes
a path where no rdmacm client is found for an endpoint.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-29 20:31:26 -06:00
Ralph Castain
6e434d6785
Add support for PMIx tool connections and queries. Initially only support a request to list all known namespaces (jobids) from ORTE, but other folks will extend that support to include additional information
...
Update to match PMIx RFC
Fix configury to point to correct libevent and hwloc locations
2016-06-29 19:19:19 -07:00
Jeff Squyres
f18d6606da
Merge pull request #1824 from hjelmn/rdmacm_fix
...
btl/openib: fix segmentation fault
2016-06-28 18:10:35 -04:00
Nathan Hjelm
8128c8eb29
btl/openib: fix segmentation fault
...
This commit fixes a segmentation fault that occurs if a device can be
initialized but not used. In this case the devices_count is not equal
to the number of usable devices in the devices pointer array.
Thanks to @artpol84 for tracking this down.
Fixes open-mpi/ompi#1823
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-28 10:31:32 -06:00
Nathan Hjelm
dac9201f3b
Merge pull request #1770 from hjelmn/rdma_wth
...
btl/openib: fix rdmacm
2016-06-24 22:46:53 -06:00
Nathan Hjelm
2992d6d238
Merge pull request #1808 from abjoshi-brcm/timer_arm64
...
arm64: add timer support
2016-06-23 07:10:56 -06:00
Abhishek Joshi
f06f7eb3e6
arm64: add timer support
...
Signed-off-by: Sreenidhi Bharathkar Ramesh <sreenidhi-bharathkar.ramesh@broadcom.com>
2016-06-23 11:01:00 +00:00
Ralph Castain
08b1438f15
Add missing PMIx range value so OPAL and PMIx align again
2016-06-22 22:03:25 -07:00
Gilles Gouaillardet
bf133c401e
pmix2x: fix a typo in dereg_event_hdlr()
...
This bug has been fixed when open-mpi/ompi@dde69e1be2 was backported into upstream pmix in pmix/master@5e5577778c
but it was not fixed in open-mpi/ompi
2016-06-22 13:45:29 +09:00
Jeff Squyres
af614afedf
Merge pull request #1800 from thananon/common_sym_fix
...
Fixed common symbol error in btl/usnic.
2016-06-21 20:11:52 -04:00
Ralph Castain
441739b5a4
Cleanup a lagging message that generates an annoying (but seemingly harmless) warning
2016-06-20 12:23:27 -07:00
Thananon Patinyasakdikul
afe07cd5d5
Fixed common symbol in btl/usnic
...
- This commit fixes the accidental common symbol btl_usnic_lock
- It also moves the btl_usnic_lock declaration to btl_usnic.h
2016-06-20 10:05:44 -07:00
Howard Pritchard
1bed9fdb59
Merge pull request #1799 from hppritcha/topic/help_aries_with_knl
...
common/ugni: help out knl with aries
2016-06-20 08:09:24 -06:00
Ralph Castain
0ba02821e6
Add requested key and job-level info
2016-06-19 18:22:31 -07:00
Ralph Castain
0a29f5cb77
Sigh - missed two typos
2016-06-18 20:57:53 -07:00
Ralph Castain
dd38cf1fed
Fix typo
2016-06-18 20:56:43 -07:00
Howard Pritchard
8b53487977
common/ugni: help out knl with aries
...
The way the gni btl is currently coded,
it will run completely out of gas on KNL at
123 processes/node. Since there are bound to be
those who try to run a MPI process/hyperthread
on KNL nodes, the fma sharing mode needs to be requested.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-06-18 15:09:05 -05:00
Ralph Castain
dde69e1be2
Cleanup CIDs 1362763, 1362762, 1362760, 1362759, 1362758, 1362757, 1362756, 1362755, 1362754. Unsure how to resolve 1362761.
...
Fixes #1792
2016-06-18 12:28:46 -07:00
Jeff Squyres
7a8d7fb948
openib: fix compiler warnings
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-06-18 07:15:11 -07:00
Jeff Squyres
c332ee5884
Merge pull request #1784 from thananon/fix_usnic_thread
...
Fix btl/usnic deadlock when the connectivity check is turned off.
2016-06-17 11:15:14 -04:00
Nathan Hjelm
f59c2fce6b
Merge pull request #1786 from hjelmn/32_fix
...
opal/progress: use 32-bit atomics for call counter
2016-06-17 08:54:41 -06:00
Ralph Castain
044c561cba
Roll to latest PMIx master
2016-06-16 17:30:30 -07:00
rhc54
702a982271
Merge pull request #1767 from rhc54/topic/pmix2
...
Enable the PMIx event notification capability
2016-06-16 15:27:43 -07:00
Nathan Hjelm
7349ddc937
patcher/overwrite: use OPAL_ASSEMBLY_ARCH to determine architecture
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-16 10:00:00 -06:00
Thananon Patinyasakdikul
7bd18214a7
Fix btl/usnic deadlock when the connectivity check is turned off.
2016-06-15 07:42:55 -07:00
Jeff Squyres
b7e937fea5
Merge pull request #1778 from thananon/usnic_thread_safe
...
Added MPI_THREAD_MULTIPLE support for btl/usnic.
2016-06-14 18:43:04 -04:00
Ralph Castain
5d330d5220
Enable the PMIx event notification capability and use that for all error notifications, including debugger release. This capability requires use of PMIx 2.0 or above as the features are not available with earlier PMIx releases. When OMPI master is built against an earlier external version, it will fallback to the prior behavior - i.e., debugger will be released via RML and all notifications will go strictly to the default error handler.
...
Add PMIx 2.0
Remove PMIx 1.1.4
Cleanup copying of component
Add missing file
Touchup a typo in the Makefile.am
Update the pmix ext114 component
Minor cleanups and resync to master
Update to latest PMIx 2.x
Update to the PMIx event notification branch latest changes
2016-06-14 13:08:41 -07:00
Thananon Patinyasakdikul
ee85204c12
Added MPI_THREAD_MULTIPLE support for btl/usnic.
2016-06-13 13:47:06 -07:00