Edgar Gabriel
c3d4ee3f73
ompi/file: add a muteex to the ompi_file_t structure
...
Adding a mutex to thje ompi_file_t structure allows to have a per-file handle
mutex lock for both ROMIO and OMPIO. I double checked that the size of the
ompi_file_t structure is still below the size of the predefined_file_t structure,
so we should be good from the backward compatibility perspective.
2016-08-21 16:09:12 -05:00
Edgar Gabriel
bc042259bc
make initialization of the io framework thread safe.
...
Also, remove the lock/unlock in the file_open ompi-interface routines of romio314.
The global lock in the romio component does probably not work, it is easy to construct a testcase where two threads perform collective I/O operations on different file handles. With a global lock it is easy to deadlock. THe lock has to be at least on the file handle basis.
move the mutex to file/file.c to avoid duplicate symbol problem in file_open.c pfile_open.c
2016-08-21 16:09:00 -05:00
Nathan Hjelm
f3e9a72f1a
Merge pull request #1987 from hjelmn/cid
...
comm/cid: fix threaded CID allocation
2016-08-19 14:26:39 -06:00
Nathan Hjelm
fbbf743c36
comm/cid: fix threaded CID allocation
...
This commit should restore the pre-non-blocking behavior of the CID
allocator when threads are used. There are two primary changes: 1)
do not hold the cid allocator lock past the end of a request callback,
and 2) if a lower id communicator is detected during CID allocation
back off and let the lower id communicator finish before continuing.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-19 11:47:19 -06:00
Jeff Squyres
bb6b87f086
Merge pull request #1972 from jsquyres/pr/rsh-robustify-default-check
...
rsh: robustify the check for plm_rsh_agent default value
2016-08-19 13:33:38 -04:00
Ralph Castain
c9dc286f25
Update the hwloc coverity submission script
2016-08-19 09:20:48 -07:00
Nathan Hjelm
e5c7512692
Merge pull request #1983 from hjelmn/request_cb
...
ompi/request: change semantics of ompi request callbacks
2016-08-18 08:31:56 -06:00
Artem Polyakov
6ea8cccdab
Merge pull request #1969 from artpol84/pmix_jobid_fix
...
Pmix jobid fix
2016-08-18 17:24:58 +07:00
Nathan Hjelm
6aa658ae33
ompi/request: change semantics of ompi request callbacks
...
This commit changes the sematics of ompi request callbacks. If a
request's callback has freed or re-posted (using start) a request
the callback must return 1 instead of OMPI_SUCCESS. This indicates
to ompi_request_complete that the request should not be modified
further. This fixes a race condition in osc/pt2pt that could lead
to the req_state being inconsistent if a request is freed between
the callback and setting the request as complete.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-17 20:14:01 -06:00
Sylvain Jeaugey
61e900eea5
Fix typo calling allreduce with the allgather module.
...
That was causing CUDA collective to crash.
2016-08-17 17:05:13 -07:00
rhc54
394e23d179
Merge pull request #1981 from rhc54/topic/timeout
...
Support the PMIX_TIMEOUT key at the PMIx server when timeout=0 - this…
2016-08-17 17:40:43 -05:00
Edgar Gabriel
e14c23ba79
Merge pull request #1980 from edgargabriel/topic/coverty-cleanup
...
io/ompio: Topic/coverty cleanup
2016-08-17 17:27:51 -05:00
Ralph Castain
7da9793fef
Support the PMIX_TIMEOUT key at the PMIx server when timeout=0 - this indicates that the user doesn't want a lookup of any data from the host RM.
2016-08-17 16:26:58 -05:00
Edgar Gabriel
2c8437ce62
fs/pvfs2: fix a common symbol
2016-08-17 13:10:32 -05:00
Edgar Gabriel
eba5293586
fix coverty warning CID 1369021
2016-08-17 13:02:45 -05:00
Nathan Hjelm
cdbc94e34e
Merge pull request #1977 from hjelmn/osc_pt2pt_fix
...
osc/pt2pt: make receive count an unsigned int
2016-08-17 09:38:33 -06:00
Nathan Hjelm
40b70889e5
osc/pt2pt: make receive count an unsigned int
...
This receive_count MCA variable should never be negative. Change it
to an unsigned int.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2016-08-17 08:14:24 -06:00
Jeff Squyres
ce0124603d
.gitignore: add test executable to ignore
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-17 04:27:51 -07:00
Gilles Gouaillardet
8faa1edafa
osc/pt2pt: silence misc warnings
2016-08-17 14:24:14 +09:00
Gilles Gouaillardet
6b7bc64101
spml/yoda: MCA_PML(add_procs) all procs from oshmem_comm_world
...
and fix oshmem_group_proc_{init,create} so they use the number of procs in oshmem_comm_world
Thanks Debendra Das for the report and Josh Ladd for the guidance
Fixes open-mpi/ompi#1966
2016-08-17 14:24:02 +09:00
LANL OMPI Bot
96c7762050
Merge pull request #1942 from hppritcha/topic/minor_ofi_fix
...
mtl/ofi: use mca param to set av type
2016-08-16 14:14:12 -06:00
Nathan Hjelm
2e1378596f
Merge pull request #1953 from hjelmn/pt2pt_fixes
...
osc/pt2pt updates
2016-08-16 08:00:24 -06:00
Jeff Squyres
71ec5cfb43
rsh: robustify the check for plm_rsh_agent default value
...
Don't strcmp against the default value -- the default value may change
over time. Instead, check to see if the MCA var source is not
DEFAULT.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-16 06:58:20 -05:00
rhc54
d7cd802426
Merge pull request #1971 from rhc54/topic/sesdir
...
Update the session dir structure. Restore the creation of a top-level…
2016-08-16 03:14:08 -05:00
Ralph Castain
ae2af61ee3
Update the session dir structure. Restore the creation of a top-level dir based on userid so that everything is contained under the user's top-level dir. Make the next level down (the "job family" level) be either the pid (indicated by a name of "pid.N") or the job family if not launched by mpirun. This allows for proper rendezvous by direct-launched procs.
2016-08-15 22:46:46 -05:00
rhc54
dd05f085e9
Merge pull request #1968 from rhc54/topic/rsh
...
Further cleanup getpwuid usage - try it first (unless completely disa…
2016-08-15 22:11:21 -05:00
Gilles Gouaillardet
3126ff77e2
pmix2x: common syms: whitelist bison-generated common symbols
...
Bison generates some common symbols that we can't do anything about,
so whitelist them.
2016-08-16 11:29:06 +09:00
Artem Polyakov
c5a91c5c9d
opal/pmix: fix pmix jobid calculation if external PMIx server is used.
2016-08-15 21:13:51 +03:00
Ralph Castain
9f43db7303
Further cleanup getpwuid usage - try it first (unless completely disabled), and then silently failover to try other methods.
2016-08-15 07:51:36 -07:00
Ralph Castain
ecbedee8bb
Fix typo
2016-08-15 07:32:00 -07:00
Artem Polyakov
f3c816b52e
opal/pmix: fix indentation in some files.
2016-08-15 18:21:50 +07:00
Gilles Gouaillardet
483685eb6a
update .gitignore
...
remove autogenerated opal/mca/pmix/pmix2x/pmix/src/include/pmix_config.h.in
2016-08-15 17:00:20 +09:00
rhc54
2228d2efc2
Merge pull request #1965 from rhc54/topic/pmixfix
...
Provide backward compatible keys so that the non-PMIx components in t…
2016-08-13 13:48:12 -07:00
Ralph Castain
be8424b691
Provide backward compatible keys so that the non-PMIx components in the opal/pmix framework don't have to adjust as we continue to work on finalizing the PMIx reference scheme. Activate and utilize the new PMIx show_help capability to provide more meaningful error output when the server cannot start.
...
Add a contrib script to cleanup permissions incorrectly modified due to things like smb mounts
dd
2016-08-13 12:13:04 -07:00
rhc54
d12e50b2d6
Merge pull request #1963 from rhc54/topic/pmixfix
...
Fix shared memory rendezvous
2016-08-13 09:59:14 -07:00
Ralph Castain
08a0644df5
Fix shared memory rendezvous
2016-08-13 08:14:50 -07:00
rhc54
ddde154d28
Merge pull request #1962 from rhc54/topic/notify
...
Ensure we properly convert pmix status to ORTE state before activatin…
2016-08-13 06:59:50 -07:00
Ralph Castain
48d35a9627
Ensure we properly convert pmix status to ORTE state before activating an error state upon notification. Cleanup some conversion issues on notification info. Add a new orte_notify.c test program
2016-08-12 21:14:29 -07:00
rhc54
9868093bef
Merge pull request #1961 from rhc54/topic/static
...
Setup the job list in the PMIx integration so that static ports can run
2016-08-12 15:17:31 -07:00
rhc54
9eed451916
Merge pull request #1960 from rhc54/topic/rsh
...
Restore the rsh template creation code
2016-08-12 13:38:43 -07:00
rhc54
8d67f753ca
Merge pull request #1959 from rhc54/topic/nodeid
...
The node index isn't normally passed with the packed node object, so …
2016-08-12 13:30:10 -07:00
Ralph Castain
4a4c9703a9
Setup the job list in the PMIx integration so that static ports can run
2016-08-12 13:27:10 -07:00
rhc54
1ef3c86d44
Merge pull request #1931 from hjelmn/ess_fix
...
ess/base: set up nidmap after pmix
2016-08-12 13:10:30 -07:00
Ralph Castain
5717b75b45
Restore the rsh template creation code
2016-08-12 12:43:40 -07:00
rhc54
ee1ee2086c
Merge pull request #1958 from rhc54/topic/path
...
Fix a bug where we were requiring that all paths in $PATH be absolute
2016-08-12 12:31:43 -07:00
Ralph Castain
d4327fd973
The node index isn't normally passed with the packed node object, so we need to set it on the remote end as the orted needs to pass it down to the procs. Refactor the registration code to better package proc-level info - we will separate out the node and app levels in a subsequent change.
2016-08-12 12:06:23 -07:00
Ralph Castain
0e58609327
Fix a bug where we were requiring that all paths in $PATH be absolute. Some users provide relative paths in their environment, and we should respect those.
2016-08-12 11:28:57 -07:00
rhc54
163999bce0
Merge pull request #1957 from rhc54/topic/rsh
...
If the ssh agent hasn't been given, then check for qrsh and friends
2016-08-12 11:18:28 -07:00
Ralph Castain
1c44543854
If the ssh agent hasn't been given, then check for qrsh and friends
2016-08-12 07:46:39 -07:00
rhc54
397faad46b
Merge pull request #1954 from rhc54/topic/covpmix
...
Silence Coverity warnings
2016-08-12 06:38:04 -07:00