Jeff Squyres
5ac7b3c6d2
.mailmap: Remove stale SVN references
...
Also explain the true purpose of this file.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-18 07:10:27 -05:00
Artem Polyakov
6ea8cccdab
Merge pull request #1969 from artpol84/pmix_jobid_fix
...
Pmix jobid fix
2016-08-18 17:24:58 +07:00
Nathan Hjelm
6aa658ae33
ompi/request: change semantics of ompi request callbacks
...
This commit changes the sematics of ompi request callbacks. If a
request's callback has freed or re-posted (using start) a request
the callback must return 1 instead of OMPI_SUCCESS. This indicates
to ompi_request_complete that the request should not be modified
further. This fixes a race condition in osc/pt2pt that could lead
to the req_state being inconsistent if a request is freed between
the callback and setting the request as complete.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-17 20:14:01 -06:00
Sylvain Jeaugey
61e900eea5
Fix typo calling allreduce with the allgather module.
...
That was causing CUDA collective to crash.
2016-08-17 17:05:13 -07:00
rhc54
394e23d179
Merge pull request #1981 from rhc54/topic/timeout
...
Support the PMIX_TIMEOUT key at the PMIx server when timeout=0 - this…
2016-08-17 17:40:43 -05:00
Edgar Gabriel
e14c23ba79
Merge pull request #1980 from edgargabriel/topic/coverty-cleanup
...
io/ompio: Topic/coverty cleanup
2016-08-17 17:27:51 -05:00
Ralph Castain
7da9793fef
Support the PMIX_TIMEOUT key at the PMIx server when timeout=0 - this indicates that the user doesn't want a lookup of any data from the host RM.
2016-08-17 16:26:58 -05:00
Edgar Gabriel
2c8437ce62
fs/pvfs2: fix a common symbol
2016-08-17 13:10:32 -05:00
Edgar Gabriel
eba5293586
fix coverty warning CID 1369021
2016-08-17 13:02:45 -05:00
Nathan Hjelm
cdbc94e34e
Merge pull request #1977 from hjelmn/osc_pt2pt_fix
...
osc/pt2pt: make receive count an unsigned int
2016-08-17 09:38:33 -06:00
Nathan Hjelm
40b70889e5
osc/pt2pt: make receive count an unsigned int
...
This receive_count MCA variable should never be negative. Change it
to an unsigned int.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-08-17 08:14:24 -06:00
Jeff Squyres
ce0124603d
.gitignore: add test executable to ignore
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-17 04:27:51 -07:00
Gilles Gouaillardet
8faa1edafa
osc/pt2pt: silence misc warnings
2016-08-17 14:24:14 +09:00
Gilles Gouaillardet
6b7bc64101
spml/yoda: MCA_PML(add_procs) all procs from oshmem_comm_world
...
and fix oshmem_group_proc_{init,create} so they use the number of procs in oshmem_comm_world
Thanks Debendra Das for the report and Josh Ladd for the guidance
Fixes open-mpi/ompi#1966
2016-08-17 14:24:02 +09:00
LANL OMPI Bot
96c7762050
Merge pull request #1942 from hppritcha/topic/minor_ofi_fix
...
mtl/ofi: use mca param to set av type
2016-08-16 14:14:12 -06:00
Nathan Hjelm
2e1378596f
Merge pull request #1953 from hjelmn/pt2pt_fixes
...
osc/pt2pt updates
2016-08-16 08:00:24 -06:00
Jeff Squyres
71ec5cfb43
rsh: robustify the check for plm_rsh_agent default value
...
Don't strcmp against the default value -- the default value may change
over time. Instead, check to see if the MCA var source is not
DEFAULT.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-16 06:58:20 -05:00
rhc54
d7cd802426
Merge pull request #1971 from rhc54/topic/sesdir
...
Update the session dir structure. Restore the creation of a top-level…
2016-08-16 03:14:08 -05:00
Ralph Castain
ae2af61ee3
Update the session dir structure. Restore the creation of a top-level dir based on userid so that everything is contained under the user's top-level dir. Make the next level down (the "job family" level) be either the pid (indicated by a name of "pid.N") or the job family if not launched by mpirun. This allows for proper rendezvous by direct-launched procs.
2016-08-15 22:46:46 -05:00
rhc54
dd05f085e9
Merge pull request #1968 from rhc54/topic/rsh
...
Further cleanup getpwuid usage - try it first (unless completely disa…
2016-08-15 22:11:21 -05:00
Gilles Gouaillardet
3126ff77e2
pmix2x: common syms: whitelist bison-generated common symbols
...
Bison generates some common symbols that we can't do anything about,
so whitelist them.
2016-08-16 11:29:06 +09:00
Artem Polyakov
c5a91c5c9d
opal/pmix: fix pmix jobid calculation if external PMIx server is used.
2016-08-15 21:13:51 +03:00
Ralph Castain
9f43db7303
Further cleanup getpwuid usage - try it first (unless completely disabled), and then silently failover to try other methods.
2016-08-15 07:51:36 -07:00
Ralph Castain
ecbedee8bb
Fix typo
2016-08-15 07:32:00 -07:00
Artem Polyakov
f3c816b52e
opal/pmix: fix indentation in some files.
2016-08-15 18:21:50 +07:00
Gilles Gouaillardet
483685eb6a
update .gitignore
...
remove autogenerated opal/mca/pmix/pmix2x/pmix/src/include/pmix_config.h.in
2016-08-15 17:00:20 +09:00
rhc54
2228d2efc2
Merge pull request #1965 from rhc54/topic/pmixfix
...
Provide backward compatible keys so that the non-PMIx components in t…
2016-08-13 13:48:12 -07:00
Ralph Castain
be8424b691
Provide backward compatible keys so that the non-PMIx components in the opal/pmix framework don't have to adjust as we continue to work on finalizing the PMIx reference scheme. Activate and utilize the new PMIx show_help capability to provide more meaningful error output when the server cannot start.
...
Add a contrib script to cleanup permissions incorrectly modified due to things like smb mounts
dd
2016-08-13 12:13:04 -07:00
rhc54
d12e50b2d6
Merge pull request #1963 from rhc54/topic/pmixfix
...
Fix shared memory rendezvous
2016-08-13 09:59:14 -07:00
Ralph Castain
08a0644df5
Fix shared memory rendezvous
2016-08-13 08:14:50 -07:00
rhc54
ddde154d28
Merge pull request #1962 from rhc54/topic/notify
...
Ensure we properly convert pmix status to ORTE state before activatin…
2016-08-13 06:59:50 -07:00
Ralph Castain
48d35a9627
Ensure we properly convert pmix status to ORTE state before activating an error state upon notification. Cleanup some conversion issues on notification info. Add a new orte_notify.c test program
2016-08-12 21:14:29 -07:00
rhc54
9868093bef
Merge pull request #1961 from rhc54/topic/static
...
Setup the job list in the PMIx integration so that static ports can run
2016-08-12 15:17:31 -07:00
rhc54
9eed451916
Merge pull request #1960 from rhc54/topic/rsh
...
Restore the rsh template creation code
2016-08-12 13:38:43 -07:00
rhc54
8d67f753ca
Merge pull request #1959 from rhc54/topic/nodeid
...
The node index isn't normally passed with the packed node object, so …
2016-08-12 13:30:10 -07:00
Ralph Castain
4a4c9703a9
Setup the job list in the PMIx integration so that static ports can run
2016-08-12 13:27:10 -07:00
rhc54
1ef3c86d44
Merge pull request #1931 from hjelmn/ess_fix
...
ess/base: set up nidmap after pmix
2016-08-12 13:10:30 -07:00
Ralph Castain
5717b75b45
Restore the rsh template creation code
2016-08-12 12:43:40 -07:00
rhc54
ee1ee2086c
Merge pull request #1958 from rhc54/topic/path
...
Fix a bug where we were requiring that all paths in $PATH be absolute
2016-08-12 12:31:43 -07:00
Ralph Castain
d4327fd973
The node index isn't normally passed with the packed node object, so we need to set it on the remote end as the orted needs to pass it down to the procs. Refactor the registration code to better package proc-level info - we will separate out the node and app levels in a subsequent change.
2016-08-12 12:06:23 -07:00
Ralph Castain
0e58609327
Fix a bug where we were requiring that all paths in $PATH be absolute. Some users provide relative paths in their environment, and we should respect those.
2016-08-12 11:28:57 -07:00
rhc54
163999bce0
Merge pull request #1957 from rhc54/topic/rsh
...
If the ssh agent hasn't been given, then check for qrsh and friends
2016-08-12 11:18:28 -07:00
Ralph Castain
1c44543854
If the ssh agent hasn't been given, then check for qrsh and friends
2016-08-12 07:46:39 -07:00
rhc54
397faad46b
Merge pull request #1954 from rhc54/topic/covpmix
...
Silence Coverity warnings
2016-08-12 06:38:04 -07:00
Ralph Castain
1d44f0c0e2
Silence Coverity warnings
2016-08-11 21:22:01 -07:00
Nathan Hjelm
9444df1eb7
osc/pt2pt: make lock_all locking on-demand
...
The original lock_all algorithm in osc/pt2pt sent a lock message to
each peer in the communicator even if the peer is never the target of
an operation. Since this scales very poorly the implementation has
been replaced by one that locks the remote peer on first communication
after a call to MPI_Win_lock_all.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-11 15:33:07 -06:00
Nathan Hjelm
7589a25377
osc/pt2pt: do not repost receive from request callback
...
This commit fixes an issue that can occur if a target gets overwhelmed with
requests. This can cause osc/pt2pt to go into deep recursion with a stack
like req_complete_cb -> ompi_osc_pt2pt_callback -> start -> req_complete_cb
-> ... . At small scale this is fine as the recursion depth stays small but
at larger scale we can quickly exhaust the stack processing frag requests.
To fix the issue the request callback now simply puts the request on a
list and returns. The osc/pt2pt progress function then handles the
processing and reposting of the request.
As part of this change osc/pt2pt can now post multiple fragment receive
requests per window. This should help prevent a target from being overwhelmed.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-08-11 15:33:07 -06:00
rhc54
82240f579a
Merge pull request #1952 from rhc54/topic/pmixcov
...
Update to latest PMIx toolext branch
2016-08-11 14:24:13 -07:00
Ralph Castain
73544d2e00
Rename symbol
2016-08-11 13:06:46 -07:00
Ralph Castain
b0cc9b0bc8
Update to latest PMIx toolext branch
...
Fix indentations
Update the ext20 component to match latest PMIx master.
Cleanup name conflicts and uninit vars
2016-08-11 12:29:48 -07:00