Howard Pritchard
477f6cb6a8
Merge pull request #1846 from ggouaillardet/topic/mntent
...
mpool/hugepage: set mntent API instead of manually parsing /proc/mounts
2016-07-31 20:17:37 -06:00
Ralph Castain
16fccd4964
Establish a way for ORTE to tell PMIx the base tmpdir to use, and update PMIx to understand such directives
2016-07-29 09:52:36 -07:00
Ralph Castain
cacb582ecd
Support timeout values when performing connect/accept operations. Bump default timeout to 10 minutes so folks have time to start the partnering application
2016-07-28 14:09:06 -07:00
Nathan Hjelm
4658b761e4
rcache/udreg: make reference count thread safe
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-27 13:40:35 -06:00
Nathan Hjelm
1eb4ef438e
Merge pull request #1903 from hjelmn/openib_fixes
...
btl/openib: set send flags only after endpoint is connected
2016-07-27 09:01:49 -06:00
Howard Pritchard
1dc7e9ed8f
Merge pull request #1904 from hppritcha/topic/fix_cray_srun_native_launch
...
pmix/cray: switch to using wildcards for some
2016-07-27 07:12:02 -06:00
Howard Pritchard
b65bbe017f
pmix/cray: switch to using wildcards for some
...
items so that at least srun native launch on
cray works again.
More issues to fix when using alps.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-07-26 17:07:58 -05:00
Nathan Hjelm
5e13e1ab7d
btl/openib: set send flags only after endpoint is connected
...
The max inline send size on a queue pair is not available until after
the endpoint is connected. Before this commit the send flags
(including the inline flag) were set before this value was
initialized. This commit moves setting the send_flags down to
mca_btl_openib_put_internal which is only called after the endpoint is
connected. This fixes a bug when using osc/rdma.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-26 16:01:11 -06:00
Gilles Gouaillardet
91ccec342c
btl/openib: remove some dead code
...
remove useless call to opal_mem_hooks_support_level() and the value local variable.
2016-07-22 09:26:33 +09:00
Gilles Gouaillardet
1b3be0ac8c
configury + btl/openib: fix a typo
...
test for existence of struct ibv_exp_device_attr.exp_atomic_cap.
That was previously mistyped struct ibv_exp_device_attr.ext_atomic_cap
2016-07-22 09:26:33 +09:00
Ralph Castain
71de03fc67
Cleanup the new naming requirements to ensure that info is correctly retrieved
...
Cleanup permissions
Restore singleton operations
2016-07-21 09:46:03 -07:00
Ralph Castain
2b55ee8118
Cleanup Coverity warnings
2016-07-20 20:31:58 -07:00
Ralph Castain
01a653d50a
Remove a debug print in comm_cid.c. Update PMIx2 to include the revised PMIx_Get logic for higher performance by reducing the number of hash table lookups. Fix a bug where requests for data from a proc in another nspace could hang, or result in "not found".
...
Remove stale file reference
Restore autogen pass thru pmix
Remove generated file
2016-07-20 00:58:19 -07:00
Pascal Deveze
6d6ec66705
btl/portals4: Take into account the limitation of portals4 (max_msg_size)
2016-07-19 15:19:29 +02:00
Nathan Hjelm
03bce91de8
pmix/pmix2x: add missing increment in loop
...
This commit fixes a bug in the pmix2x client code where a loop
variable is not correctly incremented. This was leading to hangs and
crashes when creating intercommunicators. Also fixed two double
increments in other loops.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-18 10:35:05 -06:00
Jeff Squyres
72f41d4490
pmix: replace all tabs with spaces
...
No code or logic changes
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-17 15:08:33 -04:00
Jeff Squyres
1c32742c66
pmix_ext20: fix syntax error
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-17 15:04:12 -04:00
Ralph Castain
99f7096031
Fix permissions
2016-07-16 21:03:55 -07:00
Ralph Castain
d4071fbd1c
Fix dynamic operations by ensuring that we only fire the debugger release if the debugger is attached, and that the OPAL pmix key for directing events to non-default handlers matches the PMIx spelling
2016-07-16 13:20:41 -07:00
Ralph Castain
1ceb35ba5c
Fix singletons - do not include the PMIx tool URI in the environment provided to child processes
2016-07-13 17:33:34 -07:00
Ralph Castain
20a91c2baf
Add a new --continuous flag to mpirun that directs ORTE to let a job continue running as app procs terminate. Don't attempt to restart them. Add event notification of abnormally terminating procs, and demonstrate that in the mpi_spin test program.
...
Cleanup debug message
2016-07-13 15:28:33 -07:00
Artem Polyakov
72585a905f
opal/pmix: add blocking Fence to SLURM components.
...
Blocking fence is used in yalla del proc. Native pmix exposes this functionality.
We need to expose it for SLURM's s1/s2 components as well.
Also this commit fixes uninitialized `rc` in fencenb's of both
components.
2016-07-11 09:43:15 +03:00
Artem Polyakov
8e16f47492
Merge pull request #1688 from artpol84/fix_base64
...
Fix base64 implementation in pmix framework.
2016-07-07 10:47:50 +06:00
Gilles Gouaillardet
1ba7e2b20b
mpool/hugepage: set mntent API instead of manually parsing /proc/mounts
...
Refs open-mpi/ompi#1822
2016-07-06 15:00:19 +09:00
Gilles Gouaillardet
acda07472a
configury: revamp and re-ident sub configure.m4 after open-mpi/ompi@846360fd4c
2016-07-06 11:59:51 +09:00
Gilles Gouaillardet
846360fd4c
configury: correctly perform make distclean when {libevent,hwloc,pmix} are external components
...
Thanks Jeff for the guidance
Fixes open-mpi/ompi#1683
note:
in order to keep this commit easy to review, some AS_IF([...]) were replaced with
AS_IF([false], ...) or AS_IF_([true], ...)
these will be removed and re-idented in a subsequent commit
2016-07-06 11:57:24 +09:00
Ralph Castain
ee56d9dc1a
Shorten the session directory name as some OS's are now providing unusually long temp directory names, causing us to overflow the sockaddr field
2016-07-05 14:59:50 -07:00
Ralph Castain
7e0af3f4f0
Update pmix2x to track upstream changes
2016-07-05 11:54:22 -07:00
Gilles Gouaillardet
267821f0dd
pmix2x/pmix: fix a typo in PMIx_tool_init()
...
and remove now useless local variable i
2016-07-05 13:47:50 +09:00
Gilles Gouaillardet
efce8cc734
pmix2x/pmix: add missing include files
...
pmix cannot be built on alpine linux because of some missing includes.
uid_t and gid_t are defined in unistd.h or sys/types.h, and unistd.h
is not indirectly pulled under alpine linux, so do it manually.
Thanks N.L.K Nguyen for the report
(back-ported from upstream pmix/master@c8d55350a9 )
2016-07-05 09:03:14 +09:00
Ralph Castain
c9ada8e095
Silence Coverity warnings
2016-07-03 20:45:08 -07:00
Ralph Castain
673f82e2b6
Update the PMIx listener to avoid leaking sockets into children, and better handle race condition errors
2016-07-03 08:23:33 -07:00
Nathan Hjelm
01d6da31af
btl/openib: fix rdmacm locking bug
...
This commit fixes a long standing bug in rdmacm. It is required that
the thread that calls mca_btl_openib_endpoint_cpc_complete holds the
endpoint lock. This was not the case for rdmacm. This causes debug
builds to abort. This change also required changing
mca_btl_openib_endpoint_send_cts to require the endpoint lock to be
held when calling.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-30 15:50:07 -06:00
Nathan Hjelm
cc2b3e0c3f
Merge pull request #1830 from hjelmn/rdmacm_test
...
Test for rdmacm hang fix
2016-06-30 10:41:46 -06:00
Nathan Hjelm
960fcd292c
btl/openib: fix rdma hang
...
This commit is an attempt to fix a hang in finalize of rdmacm. This fixes
a path where no rdmacm client is found for an endpoint.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-29 20:31:26 -06:00
Ralph Castain
6e434d6785
Add support for PMIx tool connections and queries. Initially only support a request to list all known namespaces (jobids) from ORTE, but other folks will extend that support to include additional information
...
Update to match PMIx RFC
Fix configury to point to correct libevent and hwloc locations
2016-06-29 19:19:19 -07:00
Jeff Squyres
f18d6606da
Merge pull request #1824 from hjelmn/rdmacm_fix
...
btl/openib: fix segmentation fault
2016-06-28 18:10:35 -04:00
Nathan Hjelm
8128c8eb29
btl/openib: fix segmentation fault
...
This commit fixes a segmentation fault that occurs if a device can be
initialized but not used. In this case the devices_count is not equal
to the number of usable devices in the devices pointer array.
Thanks to @artpol84 for tracking this down.
Fixes open-mpi/ompi#1823
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-06-28 10:31:32 -06:00
Nathan Hjelm
dac9201f3b
Merge pull request #1770 from hjelmn/rdma_wth
...
btl/openib: fix rdmacm
2016-06-24 22:46:53 -06:00
Nathan Hjelm
2992d6d238
Merge pull request #1808 from abjoshi-brcm/timer_arm64
...
arm64: add timer support
2016-06-23 07:10:56 -06:00
Abhishek Joshi
f06f7eb3e6
arm64: add timer support
...
Signed-off-by: Sreenidhi Bharathkar Ramesh <sreenidhi-bharathkar.ramesh@broadcom.com>
2016-06-23 11:01:00 +00:00
Ralph Castain
08b1438f15
Add missing PMIx range value so OPAL and PMIx align again
2016-06-22 22:03:25 -07:00
Gilles Gouaillardet
bf133c401e
pmix2x: fix a typo in dereg_event_hdlr()
...
This bug has been fixed when open-mpi/ompi@dde69e1be2 was backported into upstream pmix in pmix/master@5e5577778c
but it was not fixed in open-mpi/ompi
2016-06-22 13:45:29 +09:00
Jeff Squyres
af614afedf
Merge pull request #1800 from thananon/common_sym_fix
...
Fixed common symbol error in btl/usnic.
2016-06-21 20:11:52 -04:00
Ralph Castain
441739b5a4
Cleanup a lagging message that generates an annoying (but seemingly harmless) warning
2016-06-20 12:23:27 -07:00
Thananon Patinyasakdikul
afe07cd5d5
Fixed common symbol in btl/usnic
...
- This commit fixes the accidental common symbol btl_usnic_lock
- It also moves the btl_usnic_lock declaration to btl_usnic.h
2016-06-20 10:05:44 -07:00
Howard Pritchard
1bed9fdb59
Merge pull request #1799 from hppritcha/topic/help_aries_with_knl
...
common/ugni: help out knl with aries
2016-06-20 08:09:24 -06:00
Ralph Castain
0ba02821e6
Add requested key and job-level info
2016-06-19 18:22:31 -07:00
Ralph Castain
0a29f5cb77
Sigh - missed two typos
2016-06-18 20:57:53 -07:00
Ralph Castain
dd38cf1fed
Fix typo
2016-06-18 20:56:43 -07:00