1
1
Граф коммитов

4300 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
48d35a9627 Ensure we properly convert pmix status to ORTE state before activating an error state upon notification. Cleanup some conversion issues on notification info. Add a new orte_notify.c test program 2016-08-12 21:14:29 -07:00
Ralph Castain
0e58609327 Fix a bug where we were requiring that all paths in $PATH be absolute. Some users provide relative paths in their environment, and we should respect those. 2016-08-12 11:28:57 -07:00
Ralph Castain
1d44f0c0e2 Silence Coverity warnings 2016-08-11 21:22:01 -07:00
Ralph Castain
73544d2e00 Rename symbol 2016-08-11 13:06:46 -07:00
Ralph Castain
b0cc9b0bc8 Update to latest PMIx toolext branch
Fix indentations

Update the ext20 component to match latest PMIx master.

Cleanup name conflicts and uninit vars
2016-08-11 12:29:48 -07:00
Gilles Gouaillardet
dfbf2b7be4 opal/threads: add OPAL_THREAD_SUB_SIZE_T macro
-1 is not a valid size_t, so instead of OPAL_THREAD_ADD_SIZE_T(..., -1),
simply OPAL_THREAD_SUB_SIZE_T(..., 1) and keep picky compilers happy
2016-08-10 13:37:36 +09:00
rhc54
60f789dca1 Merge pull request #1948 from rhc54/topic/pmixtool
Update to include extended tool support, new datatypes
2016-08-09 16:17:28 -07:00
Nathan Hjelm
19be439998 Merge pull request #1949 from hjelmn/ugni_fix
btl/ugni: fix another connection race
2016-08-09 08:32:40 -06:00
Nathan Hjelm
38f18eed22 Merge pull request #1941 from ggouaillardet/topic/memory_patcher_configury
configury: make memory/patcher symbol detection more robust
2016-08-09 07:06:38 -06:00
Gilles Gouaillardet
13009aa290 opal/alfg: have opal_random() wrapper always return a positive int 2016-08-09 17:12:30 +09:00
Gilles Gouaillardet
6f6b3ac68a configury: standardize memory/patcher symbol detection and make it more robust
by default, Sun compilers optimize out the original test, and hence fail detecting a symbol is missing.
2016-08-09 09:35:52 +09:00
Nathan Hjelm
adb668209b btl/ugni: fix another connection race
This commit fixes a race that can occur when two threads are in the
ugni progress function at the same time. This race occurs when one
thread calls GNI_PostDataProbeById then goes to sleep then another
thread calls GNI_PostDataProbeById then GNI_EpPostDataWaitById before
the other thread wakes up. If this happens the first thread will print
a warning on GNI_EpPostDataWaitById about no matching post.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-08 15:38:11 -06:00
Ralph Castain
527b5c692a Update to include extended tool support, new datatypes 2016-08-08 13:39:46 -07:00
Todd Kordenbrock
b90da992c8 Merge pull request #1895 from PDeveze/Patchs-on-btl-portals4
btl/portals4: Take into account the limitation of portals4 (max_msg_s…
2016-08-08 15:12:50 -05:00
Nathan Hjelm
5ced037488 Merge pull request #1939 from hjelmn/ugni_fix
btl/ugni: protect against re-entry and races in connections
2016-08-08 08:55:30 -06:00
Artem Polyakov
b24ec3e3b9 pmix/s2: fix indentation (only) 2016-08-06 16:31:19 +06:00
Artem Polyakov
2cb923a413 pmix/s1: fix indentation (only) 2016-08-06 16:30:45 +06:00
Artem Polyakov
8aa3ef7799 pmix/s2: fix s2 component data placement
Use wildcard for the information related to the job-level data.
Fixes s2 component with regard to PR https://github.com/open-mpi/ompi/pull/1897.
2016-08-06 15:49:16 +06:00
Artem Polyakov
81063f1717 pmix/s1: fix s1 component data placement
Use wildcard for the information related to the job-level data.
Fixes s1 component with regard to PR https://github.com/open-mpi/ompi/pull/1897.
2016-08-06 15:45:46 +06:00
Nathan Hjelm
14b36d4503 btl/ugni: protect against re-entry and races in connections
This commit fixes two issues that can occur during a connection:

 - Re-entry to connection progress from modex lookup. Added an
   additional endpoint state that will keep the code from re-entering
   the common endpoint create.

 - Fixed a race between a process posting a directed datagram through
   a send and a connection being progressed through opal_progress().
   The progress code was not obtaining the endpoint lock before
   attempting to update the endpoint. To limit the amount of code
   changed for 2.0.1 this commit makes the endpoint lock recursive. In
   a future update this may be changed.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-04 16:08:01 -06:00
Jeff Squyres
c42d8867e6 Merge pull request #1925 from jsquyres/pr/warnings-fixes
hwloc: fix Valgrind warning
2016-08-04 08:48:50 -07:00
Jeff Squyres
36555b7a1d Merge pull request #1933 from thananon/fix_random
Make libevent use internal random
2016-08-04 08:27:56 -07:00
Boris Karasev
9d6a4b3b2d configury/libevent: fix incorrect drop of OPAL_HAVE_WORKING_EVENTOPS
Fixes PR https://github.com/open-mpi/ompi/pull/1687
The code that sets OPAL_HAVE_WORKING_EVENTOPS for internal libevent
was executed even if the external libevent component was configured.

As the result libevent progress wasn't called in opal_progress which
for example caused ring_c to hang when pml/ob1 was used.
2016-08-04 16:37:37 +06:00
Gilles Gouaillardet
30f98cd9d0 pmix: redefine OPAL_PMIX_ARCH macro
Architecture is set by the ompi layer *after* job startup, so the key cannot
have the "pmix" prefix since optimizations in open-mpi/ompi@01a653d50a
otherwise architecture cannot be retrieved
2016-08-04 13:31:28 +09:00
Thananon Patinyasakdikul
b3e9dadff2 libevent: use opal_random() instead of rand(3)
This commits changed rand(3) and family in libevent to use internal
random function provided in opal to prevent pertubing user's random seed.

Fixes open-mpi/ompi#1877
2016-08-03 09:18:12 -07:00
Howard Pritchard
08266a1a56 mpool/hugepage mntent intro fallout
On Cray, PR #1846 introduced a double free
situation which led to all kinds of random memory
corruption problems.

This commit fixes this problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-08-02 05:52:31 -05:00
Jeff Squyres
7bea563e02 hwloc: fix Valgrind warning
Cherry picked from open-mpi/hwloc@d4565c351e

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-01 18:50:40 -07:00
Gilles Gouaillardet
21e7f31dbe pmix2x: fix unpack sequence in PMIx_Get callback
first unpack the nspace (PMIX_STRING) before unpacking the various keys (PMIX_KVAL)
2016-08-01 14:21:22 +09:00
Howard Pritchard
477f6cb6a8 Merge pull request #1846 from ggouaillardet/topic/mntent
mpool/hugepage: set mntent API instead of manually parsing /proc/mounts
2016-07-31 20:17:37 -06:00
Gilles Gouaillardet
1778e5b586 atomic/sparcv9: fix a typo in the comment, no code change 2016-08-01 10:34:02 +09:00
Ralph Castain
16fccd4964 Establish a way for ORTE to tell PMIx the base tmpdir to use, and update PMIx to understand such directives 2016-07-29 09:52:36 -07:00
Nathan Hjelm
325c9ba4cc opal/thread: fix warnings
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-29 07:04:19 -06:00
Nathan Hjelm
1da558407c Merge pull request #1911 from hjelmn/threads
opal/thread: clean up and add additional OPAL_THREAD macros
2016-07-29 06:44:11 -06:00
Gilles Gouaillardet
273e56096b configury: capture configury command line
configury command line is quoted and made available via the OPAL_CONFIGURE_CLI macro.
it can be retrieved via {orte-info,ompi_info,oshmem_info} -c, or
{orte-info,ompi_info,oshmem_info} --all --parseable | grep ^config:cli:
2016-07-29 09:14:09 +09:00
Ralph Castain
cacb582ecd Support timeout values when performing connect/accept operations. Bump default timeout to 10 minutes so folks have time to start the partnering application 2016-07-28 14:09:06 -07:00
Nathan Hjelm
c281bd3c7f Merge pull request #1908 from hjelmn/udreg_fix
rcache/udreg: make reference count thread safe
2016-07-28 09:27:16 -06:00
Nathan Hjelm
aac611237b opal/thread: clean up and add additional OPAL_THREAD macros
This commit expands the OPAL_THREAD macros to include 32- and 64-bit
atomic swap. Additionally, macro declararations have been updated to
include both OPAL_THREAD_* and OPAL_ATOMIC_*. Before this commit the
former was used with add and the later with cmpset.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-07-28 09:23:14 -06:00
Nathan Hjelm
a8c3699484 Fix performance regression caused by enabling opal thread support
This commit adds opal_using_threads() protection around the atomic
operation in OBJ_RETAIN/OBJ_RELEASE. This resolves the performance
issues seen when running psm with MPI_THREAD_SINGLE.

To avoid issues with header dependencies opal_using_threads() has been
moved to a new header (thread_usage.h). The OPAL_THREAD_ADD* and
OPAL_THREAD_CMPSET* macros have also been relocated to this header.

This commit is cherry-picked off a fix that was submitted for the v1.8
release series but never applied to master. This fixes part of the
problem reported by @nysal in #1902.

(cherry picked from commit open-mpi/ompi-release@ce91307918)

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-07-28 07:01:27 -06:00
Nathan Hjelm
4658b761e4 rcache/udreg: make reference count thread safe
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-27 13:40:35 -06:00
Nathan Hjelm
1eb4ef438e Merge pull request #1903 from hjelmn/openib_fixes
btl/openib: set send flags only after endpoint is connected
2016-07-27 09:01:49 -06:00
Howard Pritchard
1dc7e9ed8f Merge pull request #1904 from hppritcha/topic/fix_cray_srun_native_launch
pmix/cray: switch to using wildcards for some
2016-07-27 07:12:02 -06:00
Howard Pritchard
b65bbe017f pmix/cray: switch to using wildcards for some
items so that at least srun native launch on
cray works again.

More issues to fix when using alps.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-07-26 17:07:58 -05:00
Nathan Hjelm
5e13e1ab7d btl/openib: set send flags only after endpoint is connected
The max inline send size on a queue pair is not available until after
the endpoint is connected. Before this commit the send flags
(including the inline flag) were set before this value was
initialized. This commit moves setting the send_flags down to
mca_btl_openib_put_internal which is only called after the endpoint is
connected. This fixes a bug when using osc/rdma.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-26 16:01:11 -06:00
Gilles Gouaillardet
91ccec342c btl/openib: remove some dead code
remove useless call to opal_mem_hooks_support_level() and the value local variable.
2016-07-22 09:26:33 +09:00
Gilles Gouaillardet
1b3be0ac8c configury + btl/openib: fix a typo
test for existence of struct ibv_exp_device_attr.exp_atomic_cap.
That was previously mistyped struct ibv_exp_device_attr.ext_atomic_cap
2016-07-22 09:26:33 +09:00
Ralph Castain
71de03fc67 Cleanup the new naming requirements to ensure that info is correctly retrieved
Cleanup permissions

Restore singleton operations
2016-07-21 09:46:03 -07:00
Ralph Castain
2b55ee8118 Cleanup Coverity warnings 2016-07-20 20:31:58 -07:00
Ralph Castain
01a653d50a Remove a debug print in comm_cid.c. Update PMIx2 to include the revised PMIx_Get logic for higher performance by reducing the number of hash table lookups. Fix a bug where requests for data from a proc in another nspace could hang, or result in "not found".
Remove stale file reference

Restore autogen pass thru pmix

Remove generated file
2016-07-20 00:58:19 -07:00
Pascal Deveze
6d6ec66705 btl/portals4: Take into account the limitation of portals4 (max_msg_size) 2016-07-19 15:19:29 +02:00
Nathan Hjelm
03bce91de8 pmix/pmix2x: add missing increment in loop
This commit fixes a bug in the pmix2x client code where a loop
variable is not correctly incremented. This was leading to hangs and
crashes when creating intercommunicators. Also fixed two double
increments in other loops.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-07-18 10:35:05 -06:00