1
1
Граф коммитов

25603 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
6b7bc64101 spml/yoda: MCA_PML(add_procs) all procs from oshmem_comm_world
and fix oshmem_group_proc_{init,create} so they use the number of procs in oshmem_comm_world

Thanks Debendra Das for the report and Josh Ladd for the guidance

Fixes open-mpi/ompi#1966
2016-08-17 14:24:02 +09:00
LANL OMPI Bot
96c7762050 Merge pull request #1942 from hppritcha/topic/minor_ofi_fix
mtl/ofi: use mca param to set av type
2016-08-16 14:14:12 -06:00
Nathan Hjelm
2e1378596f Merge pull request #1953 from hjelmn/pt2pt_fixes
osc/pt2pt updates
2016-08-16 08:00:24 -06:00
rhc54
d7cd802426 Merge pull request #1971 from rhc54/topic/sesdir
Update the session dir structure. Restore the creation of a top-level…
2016-08-16 03:14:08 -05:00
Ralph Castain
ae2af61ee3 Update the session dir structure. Restore the creation of a top-level dir based on userid so that everything is contained under the user's top-level dir. Make the next level down (the "job family" level) be either the pid (indicated by a name of "pid.N") or the job family if not launched by mpirun. This allows for proper rendezvous by direct-launched procs. 2016-08-15 22:46:46 -05:00
rhc54
dd05f085e9 Merge pull request #1968 from rhc54/topic/rsh
Further cleanup getpwuid usage - try it first (unless completely disa…
2016-08-15 22:11:21 -05:00
Gilles Gouaillardet
3126ff77e2 pmix2x: common syms: whitelist bison-generated common symbols
Bison generates some common symbols that we can't do anything about,
so whitelist them.
2016-08-16 11:29:06 +09:00
Ralph Castain
9f43db7303 Further cleanup getpwuid usage - try it first (unless completely disabled), and then silently failover to try other methods. 2016-08-15 07:51:36 -07:00
Ralph Castain
ecbedee8bb Fix typo 2016-08-15 07:32:00 -07:00
Gilles Gouaillardet
483685eb6a update .gitignore
remove autogenerated opal/mca/pmix/pmix2x/pmix/src/include/pmix_config.h.in
2016-08-15 17:00:20 +09:00
rhc54
2228d2efc2 Merge pull request #1965 from rhc54/topic/pmixfix
Provide backward compatible keys so that the non-PMIx components in t…
2016-08-13 13:48:12 -07:00
Ralph Castain
be8424b691 Provide backward compatible keys so that the non-PMIx components in the opal/pmix framework don't have to adjust as we continue to work on finalizing the PMIx reference scheme. Activate and utilize the new PMIx show_help capability to provide more meaningful error output when the server cannot start.
Add a contrib script to cleanup permissions incorrectly modified due to things like smb mounts

dd
2016-08-13 12:13:04 -07:00
rhc54
d12e50b2d6 Merge pull request #1963 from rhc54/topic/pmixfix
Fix shared memory rendezvous
2016-08-13 09:59:14 -07:00
Ralph Castain
08a0644df5 Fix shared memory rendezvous 2016-08-13 08:14:50 -07:00
rhc54
ddde154d28 Merge pull request #1962 from rhc54/topic/notify
Ensure we properly convert pmix status to ORTE state before activatin…
2016-08-13 06:59:50 -07:00
Ralph Castain
48d35a9627 Ensure we properly convert pmix status to ORTE state before activating an error state upon notification. Cleanup some conversion issues on notification info. Add a new orte_notify.c test program 2016-08-12 21:14:29 -07:00
rhc54
9868093bef Merge pull request #1961 from rhc54/topic/static
Setup the job list in the PMIx integration so that static ports can run
2016-08-12 15:17:31 -07:00
rhc54
9eed451916 Merge pull request #1960 from rhc54/topic/rsh
Restore the rsh template creation code
2016-08-12 13:38:43 -07:00
rhc54
8d67f753ca Merge pull request #1959 from rhc54/topic/nodeid
The node index isn't normally passed with the packed node object, so …
2016-08-12 13:30:10 -07:00
Ralph Castain
4a4c9703a9 Setup the job list in the PMIx integration so that static ports can run 2016-08-12 13:27:10 -07:00
rhc54
1ef3c86d44 Merge pull request #1931 from hjelmn/ess_fix
ess/base: set up nidmap after pmix
2016-08-12 13:10:30 -07:00
Ralph Castain
5717b75b45 Restore the rsh template creation code 2016-08-12 12:43:40 -07:00
rhc54
ee1ee2086c Merge pull request #1958 from rhc54/topic/path
Fix a bug where we were requiring that all paths in $PATH be absolute
2016-08-12 12:31:43 -07:00
Ralph Castain
d4327fd973 The node index isn't normally passed with the packed node object, so we need to set it on the remote end as the orted needs to pass it down to the procs. Refactor the registration code to better package proc-level info - we will separate out the node and app levels in a subsequent change. 2016-08-12 12:06:23 -07:00
Ralph Castain
0e58609327 Fix a bug where we were requiring that all paths in $PATH be absolute. Some users provide relative paths in their environment, and we should respect those. 2016-08-12 11:28:57 -07:00
rhc54
163999bce0 Merge pull request #1957 from rhc54/topic/rsh
If the ssh agent hasn't been given, then check for qrsh and friends
2016-08-12 11:18:28 -07:00
Ralph Castain
1c44543854 If the ssh agent hasn't been given, then check for qrsh and friends 2016-08-12 07:46:39 -07:00
rhc54
397faad46b Merge pull request #1954 from rhc54/topic/covpmix
Silence Coverity warnings
2016-08-12 06:38:04 -07:00
Ralph Castain
1d44f0c0e2 Silence Coverity warnings 2016-08-11 21:22:01 -07:00
Nathan Hjelm
9444df1eb7 osc/pt2pt: make lock_all locking on-demand
The original lock_all algorithm in osc/pt2pt sent a lock message to
each peer in the communicator even if the peer is never the target of
an operation. Since this scales very poorly the implementation has
been replaced by one that locks the remote peer on first communication
after a call to MPI_Win_lock_all.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-11 15:33:07 -06:00
Nathan Hjelm
7589a25377 osc/pt2pt: do not repost receive from request callback
This commit fixes an issue that can occur if a target gets overwhelmed with
requests. This can cause osc/pt2pt to go into deep recursion with a stack
like req_complete_cb -> ompi_osc_pt2pt_callback -> start -> req_complete_cb
-> ... . At small scale this is fine as the recursion depth stays small but
at larger scale we can quickly exhaust the stack processing frag requests.
To fix the issue the request callback now simply puts the request on a
list and returns. The osc/pt2pt progress function then handles the
processing and reposting of the request.

As part of this change osc/pt2pt can now post multiple fragment receive
requests per window. This should help prevent a target from being overwhelmed.

Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-08-11 15:33:07 -06:00
rhc54
82240f579a Merge pull request #1952 from rhc54/topic/pmixcov
Update to latest PMIx toolext branch
2016-08-11 14:24:13 -07:00
Ralph Castain
73544d2e00 Rename symbol 2016-08-11 13:06:46 -07:00
Ralph Castain
b0cc9b0bc8 Update to latest PMIx toolext branch
Fix indentations

Update the ext20 component to match latest PMIx master.

Cleanup name conflicts and uninit vars
2016-08-11 12:29:48 -07:00
George Bosilca
8d0baf140f If the RTE fails to deliver the daemon information,
gracefully fallback to a non-reordered communicator.
Optimize the loops building the process hierarchy.
2016-08-11 13:04:27 -04:00
Howard Pritchard
e46eee3fcb mtl/ofi: use mca param to set av type
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-08-10 16:10:17 -06:00
Ralph Castain
23886754f0 Trim the coverity build line to packages available on this machine 2016-08-10 13:55:55 -07:00
Ralph Castain
55551a4fb7 Complete debug of the nightly coverity submittal 2016-08-10 12:05:21 -07:00
Ralph Castain
375f04b277 Update the nightly builds to submit to coverity 2016-08-10 08:45:18 -07:00
Gilles Gouaillardet
dfbf2b7be4 opal/threads: add OPAL_THREAD_SUB_SIZE_T macro
-1 is not a valid size_t, so instead of OPAL_THREAD_ADD_SIZE_T(..., -1),
simply OPAL_THREAD_SUB_SIZE_T(..., 1) and keep picky compilers happy
2016-08-10 13:37:36 +09:00
Nathan Hjelm
799104f688 Merge pull request #1947 from hjelmn/perf
pml/ob1: be more selective when using rdma capable btls
2016-08-09 22:15:09 -06:00
Nathan Hjelm
4079eec974 pml/ob1: be more selective when using rdma capable btls
This commit updates the btl selection logic for the RDMA and RDMA
pipeline protocols to use a btl iff: 1) the btl is also used for eager
messages (high exclusivity), or 2) no other RDMA btl is available on
an endpoint and the pml_ob1_use_all_rdma MCA variable is true. This
fixes a performance regression with shared memory when an RDMA capable
network is available.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-09 20:54:42 -06:00
rhc54
60f789dca1 Merge pull request #1948 from rhc54/topic/pmixtool
Update to include extended tool support, new datatypes
2016-08-09 16:17:28 -07:00
Nathan Hjelm
19be439998 Merge pull request #1949 from hjelmn/ugni_fix
btl/ugni: fix another connection race
2016-08-09 08:32:40 -06:00
Nathan Hjelm
38f18eed22 Merge pull request #1941 from ggouaillardet/topic/memory_patcher_configury
configury: make memory/patcher symbol detection more robust
2016-08-09 07:06:38 -06:00
Gilles Gouaillardet
13009aa290 opal/alfg: have opal_random() wrapper always return a positive int 2016-08-09 17:12:30 +09:00
Gilles Gouaillardet
50966673a9 configury: fix sed expression in libtool's patch for NAG compiler 2016-08-09 11:02:46 +09:00
Gilles Gouaillardet
6f6b3ac68a configury: standardize memory/patcher symbol detection and make it more robust
by default, Sun compilers optimize out the original test, and hence fail detecting a symbol is missing.
2016-08-09 09:35:52 +09:00
Nathan Hjelm
adb668209b btl/ugni: fix another connection race
This commit fixes a race that can occur when two threads are in the
ugni progress function at the same time. This race occurs when one
thread calls GNI_PostDataProbeById then goes to sleep then another
thread calls GNI_PostDataProbeById then GNI_EpPostDataWaitById before
the other thread wakes up. If this happens the first thread will print
a warning on GNI_EpPostDataWaitById about no matching post.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-08 15:38:11 -06:00
Ralph Castain
527b5c692a Update to include extended tool support, new datatypes 2016-08-08 13:39:46 -07:00