Gilles Gouaillardet
6b7bc64101
spml/yoda: MCA_PML(add_procs) all procs from oshmem_comm_world
...
and fix oshmem_group_proc_{init,create} so they use the number of procs in oshmem_comm_world
Thanks Debendra Das for the report and Josh Ladd for the guidance
Fixes open-mpi/ompi#1966
2016-08-17 14:24:02 +09:00
LANL OMPI Bot
96c7762050
Merge pull request #1942 from hppritcha/topic/minor_ofi_fix
...
mtl/ofi: use mca param to set av type
2016-08-16 14:14:12 -06:00
Nathan Hjelm
2e1378596f
Merge pull request #1953 from hjelmn/pt2pt_fixes
...
osc/pt2pt updates
2016-08-16 08:00:24 -06:00
rhc54
d7cd802426
Merge pull request #1971 from rhc54/topic/sesdir
...
Update the session dir structure. Restore the creation of a top-level…
2016-08-16 03:14:08 -05:00
Ralph Castain
ae2af61ee3
Update the session dir structure. Restore the creation of a top-level dir based on userid so that everything is contained under the user's top-level dir. Make the next level down (the "job family" level) be either the pid (indicated by a name of "pid.N") or the job family if not launched by mpirun. This allows for proper rendezvous by direct-launched procs.
2016-08-15 22:46:46 -05:00
rhc54
dd05f085e9
Merge pull request #1968 from rhc54/topic/rsh
...
Further cleanup getpwuid usage - try it first (unless completely disa…
2016-08-15 22:11:21 -05:00
Gilles Gouaillardet
3126ff77e2
pmix2x: common syms: whitelist bison-generated common symbols
...
Bison generates some common symbols that we can't do anything about,
so whitelist them.
2016-08-16 11:29:06 +09:00
Ralph Castain
9f43db7303
Further cleanup getpwuid usage - try it first (unless completely disabled), and then silently failover to try other methods.
2016-08-15 07:51:36 -07:00
Ralph Castain
ecbedee8bb
Fix typo
2016-08-15 07:32:00 -07:00
Gilles Gouaillardet
483685eb6a
update .gitignore
...
remove autogenerated opal/mca/pmix/pmix2x/pmix/src/include/pmix_config.h.in
2016-08-15 17:00:20 +09:00
rhc54
2228d2efc2
Merge pull request #1965 from rhc54/topic/pmixfix
...
Provide backward compatible keys so that the non-PMIx components in t…
2016-08-13 13:48:12 -07:00
Ralph Castain
be8424b691
Provide backward compatible keys so that the non-PMIx components in the opal/pmix framework don't have to adjust as we continue to work on finalizing the PMIx reference scheme. Activate and utilize the new PMIx show_help capability to provide more meaningful error output when the server cannot start.
...
Add a contrib script to cleanup permissions incorrectly modified due to things like smb mounts
dd
2016-08-13 12:13:04 -07:00
rhc54
d12e50b2d6
Merge pull request #1963 from rhc54/topic/pmixfix
...
Fix shared memory rendezvous
2016-08-13 09:59:14 -07:00
Ralph Castain
08a0644df5
Fix shared memory rendezvous
2016-08-13 08:14:50 -07:00
rhc54
ddde154d28
Merge pull request #1962 from rhc54/topic/notify
...
Ensure we properly convert pmix status to ORTE state before activatin…
2016-08-13 06:59:50 -07:00
Ralph Castain
48d35a9627
Ensure we properly convert pmix status to ORTE state before activating an error state upon notification. Cleanup some conversion issues on notification info. Add a new orte_notify.c test program
2016-08-12 21:14:29 -07:00
rhc54
9868093bef
Merge pull request #1961 from rhc54/topic/static
...
Setup the job list in the PMIx integration so that static ports can run
2016-08-12 15:17:31 -07:00
rhc54
9eed451916
Merge pull request #1960 from rhc54/topic/rsh
...
Restore the rsh template creation code
2016-08-12 13:38:43 -07:00
rhc54
8d67f753ca
Merge pull request #1959 from rhc54/topic/nodeid
...
The node index isn't normally passed with the packed node object, so …
2016-08-12 13:30:10 -07:00
Ralph Castain
4a4c9703a9
Setup the job list in the PMIx integration so that static ports can run
2016-08-12 13:27:10 -07:00
rhc54
1ef3c86d44
Merge pull request #1931 from hjelmn/ess_fix
...
ess/base: set up nidmap after pmix
2016-08-12 13:10:30 -07:00
Ralph Castain
5717b75b45
Restore the rsh template creation code
2016-08-12 12:43:40 -07:00
rhc54
ee1ee2086c
Merge pull request #1958 from rhc54/topic/path
...
Fix a bug where we were requiring that all paths in $PATH be absolute
2016-08-12 12:31:43 -07:00
Ralph Castain
d4327fd973
The node index isn't normally passed with the packed node object, so we need to set it on the remote end as the orted needs to pass it down to the procs. Refactor the registration code to better package proc-level info - we will separate out the node and app levels in a subsequent change.
2016-08-12 12:06:23 -07:00
Ralph Castain
0e58609327
Fix a bug where we were requiring that all paths in $PATH be absolute. Some users provide relative paths in their environment, and we should respect those.
2016-08-12 11:28:57 -07:00
rhc54
163999bce0
Merge pull request #1957 from rhc54/topic/rsh
...
If the ssh agent hasn't been given, then check for qrsh and friends
2016-08-12 11:18:28 -07:00
Ralph Castain
1c44543854
If the ssh agent hasn't been given, then check for qrsh and friends
2016-08-12 07:46:39 -07:00
rhc54
397faad46b
Merge pull request #1954 from rhc54/topic/covpmix
...
Silence Coverity warnings
2016-08-12 06:38:04 -07:00
Ralph Castain
1d44f0c0e2
Silence Coverity warnings
2016-08-11 21:22:01 -07:00
Nathan Hjelm
9444df1eb7
osc/pt2pt: make lock_all locking on-demand
...
The original lock_all algorithm in osc/pt2pt sent a lock message to
each peer in the communicator even if the peer is never the target of
an operation. Since this scales very poorly the implementation has
been replaced by one that locks the remote peer on first communication
after a call to MPI_Win_lock_all.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-11 15:33:07 -06:00
Nathan Hjelm
7589a25377
osc/pt2pt: do not repost receive from request callback
...
This commit fixes an issue that can occur if a target gets overwhelmed with
requests. This can cause osc/pt2pt to go into deep recursion with a stack
like req_complete_cb -> ompi_osc_pt2pt_callback -> start -> req_complete_cb
-> ... . At small scale this is fine as the recursion depth stays small but
at larger scale we can quickly exhaust the stack processing frag requests.
To fix the issue the request callback now simply puts the request on a
list and returns. The osc/pt2pt progress function then handles the
processing and reposting of the request.
As part of this change osc/pt2pt can now post multiple fragment receive
requests per window. This should help prevent a target from being overwhelmed.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-08-11 15:33:07 -06:00
rhc54
82240f579a
Merge pull request #1952 from rhc54/topic/pmixcov
...
Update to latest PMIx toolext branch
2016-08-11 14:24:13 -07:00
Ralph Castain
73544d2e00
Rename symbol
2016-08-11 13:06:46 -07:00
Ralph Castain
b0cc9b0bc8
Update to latest PMIx toolext branch
...
Fix indentations
Update the ext20 component to match latest PMIx master.
Cleanup name conflicts and uninit vars
2016-08-11 12:29:48 -07:00
George Bosilca
8d0baf140f
If the RTE fails to deliver the daemon information,
...
gracefully fallback to a non-reordered communicator.
Optimize the loops building the process hierarchy.
2016-08-11 13:04:27 -04:00
Howard Pritchard
e46eee3fcb
mtl/ofi: use mca param to set av type
...
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-08-10 16:10:17 -06:00
Ralph Castain
23886754f0
Trim the coverity build line to packages available on this machine
2016-08-10 13:55:55 -07:00
Ralph Castain
55551a4fb7
Complete debug of the nightly coverity submittal
2016-08-10 12:05:21 -07:00
Ralph Castain
375f04b277
Update the nightly builds to submit to coverity
2016-08-10 08:45:18 -07:00
Gilles Gouaillardet
dfbf2b7be4
opal/threads: add OPAL_THREAD_SUB_SIZE_T macro
...
-1 is not a valid size_t, so instead of OPAL_THREAD_ADD_SIZE_T(..., -1),
simply OPAL_THREAD_SUB_SIZE_T(..., 1) and keep picky compilers happy
2016-08-10 13:37:36 +09:00
Nathan Hjelm
799104f688
Merge pull request #1947 from hjelmn/perf
...
pml/ob1: be more selective when using rdma capable btls
2016-08-09 22:15:09 -06:00
Nathan Hjelm
4079eec974
pml/ob1: be more selective when using rdma capable btls
...
This commit updates the btl selection logic for the RDMA and RDMA
pipeline protocols to use a btl iff: 1) the btl is also used for eager
messages (high exclusivity), or 2) no other RDMA btl is available on
an endpoint and the pml_ob1_use_all_rdma MCA variable is true. This
fixes a performance regression with shared memory when an RDMA capable
network is available.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-09 20:54:42 -06:00
rhc54
60f789dca1
Merge pull request #1948 from rhc54/topic/pmixtool
...
Update to include extended tool support, new datatypes
2016-08-09 16:17:28 -07:00
Nathan Hjelm
19be439998
Merge pull request #1949 from hjelmn/ugni_fix
...
btl/ugni: fix another connection race
2016-08-09 08:32:40 -06:00
Nathan Hjelm
38f18eed22
Merge pull request #1941 from ggouaillardet/topic/memory_patcher_configury
...
configury: make memory/patcher symbol detection more robust
2016-08-09 07:06:38 -06:00
Gilles Gouaillardet
13009aa290
opal/alfg: have opal_random() wrapper always return a positive int
2016-08-09 17:12:30 +09:00
Gilles Gouaillardet
50966673a9
configury: fix sed expression in libtool's patch for NAG compiler
2016-08-09 11:02:46 +09:00
Gilles Gouaillardet
6f6b3ac68a
configury: standardize memory/patcher symbol detection and make it more robust
...
by default, Sun compilers optimize out the original test, and hence fail detecting a symbol is missing.
2016-08-09 09:35:52 +09:00
Nathan Hjelm
adb668209b
btl/ugni: fix another connection race
...
This commit fixes a race that can occur when two threads are in the
ugni progress function at the same time. This race occurs when one
thread calls GNI_PostDataProbeById then goes to sleep then another
thread calls GNI_PostDataProbeById then GNI_EpPostDataWaitById before
the other thread wakes up. If this happens the first thread will print
a warning on GNI_EpPostDataWaitById about no matching post.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-08 15:38:11 -06:00
Ralph Castain
527b5c692a
Update to include extended tool support, new datatypes
2016-08-08 13:39:46 -07:00