Ralph Castain
7da9793fef
Support the PMIX_TIMEOUT key at the PMIx server when timeout=0 - this indicates that the user doesn't want a lookup of any data from the host RM.
2016-08-17 16:26:58 -05:00
Gilles Gouaillardet
3126ff77e2
pmix2x: common syms: whitelist bison-generated common symbols
...
Bison generates some common symbols that we can't do anything about,
so whitelist them.
2016-08-16 11:29:06 +09:00
Ralph Castain
ecbedee8bb
Fix typo
2016-08-15 07:32:00 -07:00
Gilles Gouaillardet
483685eb6a
update .gitignore
...
remove autogenerated opal/mca/pmix/pmix2x/pmix/src/include/pmix_config.h.in
2016-08-15 17:00:20 +09:00
Ralph Castain
be8424b691
Provide backward compatible keys so that the non-PMIx components in the opal/pmix framework don't have to adjust as we continue to work on finalizing the PMIx reference scheme. Activate and utilize the new PMIx show_help capability to provide more meaningful error output when the server cannot start.
...
Add a contrib script to cleanup permissions incorrectly modified due to things like smb mounts
dd
2016-08-13 12:13:04 -07:00
rhc54
ddde154d28
Merge pull request #1962 from rhc54/topic/notify
...
Ensure we properly convert pmix status to ORTE state before activatin…
2016-08-13 06:59:50 -07:00
Ralph Castain
48d35a9627
Ensure we properly convert pmix status to ORTE state before activating an error state upon notification. Cleanup some conversion issues on notification info. Add a new orte_notify.c test program
2016-08-12 21:14:29 -07:00
Ralph Castain
4a4c9703a9
Setup the job list in the PMIx integration so that static ports can run
2016-08-12 13:27:10 -07:00
Ralph Castain
1d44f0c0e2
Silence Coverity warnings
2016-08-11 21:22:01 -07:00
Ralph Castain
73544d2e00
Rename symbol
2016-08-11 13:06:46 -07:00
Ralph Castain
b0cc9b0bc8
Update to latest PMIx toolext branch
...
Fix indentations
Update the ext20 component to match latest PMIx master.
Cleanup name conflicts and uninit vars
2016-08-11 12:29:48 -07:00
rhc54
60f789dca1
Merge pull request #1948 from rhc54/topic/pmixtool
...
Update to include extended tool support, new datatypes
2016-08-09 16:17:28 -07:00
Nathan Hjelm
19be439998
Merge pull request #1949 from hjelmn/ugni_fix
...
btl/ugni: fix another connection race
2016-08-09 08:32:40 -06:00
Gilles Gouaillardet
6f6b3ac68a
configury: standardize memory/patcher symbol detection and make it more robust
...
by default, Sun compilers optimize out the original test, and hence fail detecting a symbol is missing.
2016-08-09 09:35:52 +09:00
Nathan Hjelm
adb668209b
btl/ugni: fix another connection race
...
This commit fixes a race that can occur when two threads are in the
ugni progress function at the same time. This race occurs when one
thread calls GNI_PostDataProbeById then goes to sleep then another
thread calls GNI_PostDataProbeById then GNI_EpPostDataWaitById before
the other thread wakes up. If this happens the first thread will print
a warning on GNI_EpPostDataWaitById about no matching post.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-08 15:38:11 -06:00
Ralph Castain
527b5c692a
Update to include extended tool support, new datatypes
2016-08-08 13:39:46 -07:00
Todd Kordenbrock
b90da992c8
Merge pull request #1895 from PDeveze/Patchs-on-btl-portals4
...
btl/portals4: Take into account the limitation of portals4 (max_msg_s…
2016-08-08 15:12:50 -05:00
Nathan Hjelm
5ced037488
Merge pull request #1939 from hjelmn/ugni_fix
...
btl/ugni: protect against re-entry and races in connections
2016-08-08 08:55:30 -06:00
Artem Polyakov
b24ec3e3b9
pmix/s2: fix indentation (only)
2016-08-06 16:31:19 +06:00
Artem Polyakov
2cb923a413
pmix/s1: fix indentation (only)
2016-08-06 16:30:45 +06:00
Artem Polyakov
8aa3ef7799
pmix/s2: fix s2 component data placement
...
Use wildcard for the information related to the job-level data.
Fixes s2 component with regard to PR https://github.com/open-mpi/ompi/pull/1897 .
2016-08-06 15:49:16 +06:00
Artem Polyakov
81063f1717
pmix/s1: fix s1 component data placement
...
Use wildcard for the information related to the job-level data.
Fixes s1 component with regard to PR https://github.com/open-mpi/ompi/pull/1897 .
2016-08-06 15:45:46 +06:00
Nathan Hjelm
14b36d4503
btl/ugni: protect against re-entry and races in connections
...
This commit fixes two issues that can occur during a connection:
- Re-entry to connection progress from modex lookup. Added an
additional endpoint state that will keep the code from re-entering
the common endpoint create.
- Fixed a race between a process posting a directed datagram through
a send and a connection being progressed through opal_progress().
The progress code was not obtaining the endpoint lock before
attempting to update the endpoint. To limit the amount of code
changed for 2.0.1 this commit makes the endpoint lock recursive. In
a future update this may be changed.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-04 16:08:01 -06:00
Jeff Squyres
c42d8867e6
Merge pull request #1925 from jsquyres/pr/warnings-fixes
...
hwloc: fix Valgrind warning
2016-08-04 08:48:50 -07:00
Jeff Squyres
36555b7a1d
Merge pull request #1933 from thananon/fix_random
...
Make libevent use internal random
2016-08-04 08:27:56 -07:00
Boris Karasev
9d6a4b3b2d
configury/libevent: fix incorrect drop of OPAL_HAVE_WORKING_EVENTOPS
...
Fixes PR https://github.com/open-mpi/ompi/pull/1687
The code that sets OPAL_HAVE_WORKING_EVENTOPS for internal libevent
was executed even if the external libevent component was configured.
As the result libevent progress wasn't called in opal_progress which
for example caused ring_c to hang when pml/ob1 was used.
2016-08-04 16:37:37 +06:00
Gilles Gouaillardet
30f98cd9d0
pmix: redefine OPAL_PMIX_ARCH macro
...
Architecture is set by the ompi layer *after* job startup, so the key cannot
have the "pmix" prefix since optimizations in open-mpi/ompi@01a653d50a
otherwise architecture cannot be retrieved
2016-08-04 13:31:28 +09:00
Thananon Patinyasakdikul
b3e9dadff2
libevent: use opal_random() instead of rand(3)
...
This commits changed rand(3) and family in libevent to use internal
random function provided in opal to prevent pertubing user's random seed.
Fixes open-mpi/ompi#1877
2016-08-03 09:18:12 -07:00
Howard Pritchard
08266a1a56
mpool/hugepage mntent intro fallout
...
On Cray, PR #1846 introduced a double free
situation which led to all kinds of random memory
corruption problems.
This commit fixes this problem.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-08-02 05:52:31 -05:00
Jeff Squyres
7bea563e02
hwloc: fix Valgrind warning
...
Cherry picked from open-mpi/hwloc@d4565c351e
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-01 18:50:40 -07:00
Gilles Gouaillardet
21e7f31dbe
pmix2x: fix unpack sequence in PMIx_Get callback
...
first unpack the nspace (PMIX_STRING) before unpacking the various keys (PMIX_KVAL)
2016-08-01 14:21:22 +09:00
Howard Pritchard
477f6cb6a8
Merge pull request #1846 from ggouaillardet/topic/mntent
...
mpool/hugepage: set mntent API instead of manually parsing /proc/mounts
2016-07-31 20:17:37 -06:00
Ralph Castain
16fccd4964
Establish a way for ORTE to tell PMIx the base tmpdir to use, and update PMIx to understand such directives
2016-07-29 09:52:36 -07:00
Ralph Castain
cacb582ecd
Support timeout values when performing connect/accept operations. Bump default timeout to 10 minutes so folks have time to start the partnering application
2016-07-28 14:09:06 -07:00
Nathan Hjelm
4658b761e4
rcache/udreg: make reference count thread safe
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-27 13:40:35 -06:00
Nathan Hjelm
1eb4ef438e
Merge pull request #1903 from hjelmn/openib_fixes
...
btl/openib: set send flags only after endpoint is connected
2016-07-27 09:01:49 -06:00
Howard Pritchard
1dc7e9ed8f
Merge pull request #1904 from hppritcha/topic/fix_cray_srun_native_launch
...
pmix/cray: switch to using wildcards for some
2016-07-27 07:12:02 -06:00
Howard Pritchard
b65bbe017f
pmix/cray: switch to using wildcards for some
...
items so that at least srun native launch on
cray works again.
More issues to fix when using alps.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-07-26 17:07:58 -05:00
Nathan Hjelm
5e13e1ab7d
btl/openib: set send flags only after endpoint is connected
...
The max inline send size on a queue pair is not available until after
the endpoint is connected. Before this commit the send flags
(including the inline flag) were set before this value was
initialized. This commit moves setting the send_flags down to
mca_btl_openib_put_internal which is only called after the endpoint is
connected. This fixes a bug when using osc/rdma.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-26 16:01:11 -06:00
Gilles Gouaillardet
91ccec342c
btl/openib: remove some dead code
...
remove useless call to opal_mem_hooks_support_level() and the value local variable.
2016-07-22 09:26:33 +09:00
Gilles Gouaillardet
1b3be0ac8c
configury + btl/openib: fix a typo
...
test for existence of struct ibv_exp_device_attr.exp_atomic_cap.
That was previously mistyped struct ibv_exp_device_attr.ext_atomic_cap
2016-07-22 09:26:33 +09:00
Ralph Castain
71de03fc67
Cleanup the new naming requirements to ensure that info is correctly retrieved
...
Cleanup permissions
Restore singleton operations
2016-07-21 09:46:03 -07:00
Ralph Castain
2b55ee8118
Cleanup Coverity warnings
2016-07-20 20:31:58 -07:00
Ralph Castain
01a653d50a
Remove a debug print in comm_cid.c. Update PMIx2 to include the revised PMIx_Get logic for higher performance by reducing the number of hash table lookups. Fix a bug where requests for data from a proc in another nspace could hang, or result in "not found".
...
Remove stale file reference
Restore autogen pass thru pmix
Remove generated file
2016-07-20 00:58:19 -07:00
Pascal Deveze
6d6ec66705
btl/portals4: Take into account the limitation of portals4 (max_msg_size)
2016-07-19 15:19:29 +02:00
Nathan Hjelm
03bce91de8
pmix/pmix2x: add missing increment in loop
...
This commit fixes a bug in the pmix2x client code where a loop
variable is not correctly incremented. This was leading to hangs and
crashes when creating intercommunicators. Also fixed two double
increments in other loops.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-18 10:35:05 -06:00
Jeff Squyres
72f41d4490
pmix: replace all tabs with spaces
...
No code or logic changes
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-17 15:08:33 -04:00
Jeff Squyres
1c32742c66
pmix_ext20: fix syntax error
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-07-17 15:04:12 -04:00
Ralph Castain
99f7096031
Fix permissions
2016-07-16 21:03:55 -07:00
Ralph Castain
d4071fbd1c
Fix dynamic operations by ensuring that we only fire the debugger release if the debugger is attached, and that the OPAL pmix key for directing events to non-default handlers matches the PMIx spelling
2016-07-16 13:20:41 -07:00