Nathan Hjelm
f3d4eaeaf7
Merge pull request #2013 from hjelmn/osc_rdma_fix
...
osc/rdma: fix bug in dynamic memory window tracking code
2016-08-25 13:42:27 -07:00
rhc54
19b0f4db9f
Merge pull request #1995 from rhc54/topic/pe-per-rank
...
Change the behavior of cpus-per-rank.
2016-08-25 14:38:12 -05:00
Nathan Hjelm
e53de7ecbe
osc/rdma: fix bug in dynamic memory window tracking code
...
This commit fixes an ordering bug in the code that keeps track of all
attached memory windows. The code is intended to keep the memory
regions sorted but was often inserting at the wrong index. Thanks to
Christoph Niethammer for reporting the issue. The reproducer will be
added to nightly MTT testing.
Fixes open-mpi/ompi#2012
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-25 12:08:46 -06:00
Nathan Hjelm
c082068953
Merge pull request #2006 from hjelmn/osc_pt2pt_fix
...
osc/pt2pt: fix several bugs
2016-08-25 09:19:29 -06:00
rhc54
17a210f7f0
Merge pull request #2008 from rhc54/topic/binding
...
Correct the binding algorithm to decouple it from oversubscribe.
2016-08-25 09:25:33 -05:00
Jeff Squyres
0d19cc4a13
README: fix a bunch of typos
...
Thanks to Paul Hargrove for pointing them out. Really.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-25 09:15:27 -04:00
rhc54
b563c9e303
Merge pull request #2003 from rhc54/topic/sync
...
Set the default value of both barrier counters to zero, thus ensuring the coll/sync component is off by default
2016-08-24 23:18:58 -05:00
Ralph Castain
440eae90ec
Correct the binding algorithm to decouple it from oversubscribe.
...
Oversubscribe stipulates that we allow more procs on the node than assigned slots - it has nothing to do with the number of available pe's. Let overload directives handle the pe situation.
2016-08-24 21:17:22 -07:00
George Bosilca
3adff9d323
Fixes #1793 .
...
Reshape the tearing down process (connection close) to prevent race
conditions between the main thread and the progress thread.
Minor cleanups.
2016-08-24 22:45:19 -04:00
Nathan Hjelm
70f8a6e792
osc/pt2pt: fix several bugs
...
This commit fixes some bugs uncovered during thread testing of
2.0.1rc1. With these fixes the component is running cleanly with
threads.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-24 14:35:45 -06:00
Nathan Hjelm
6de64ddbc1
Merge pull request #2005 from hjelmn/ugni_fix
...
btl/ugni: actually make the endpoint lock recursive
2016-08-24 11:05:27 -06:00
Nathan Hjelm
83062db7cb
btl/ugni: actually make the endpoint lock recursive
...
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-24 10:36:08 -06:00
Ralph Castain
bcf5ac3971
Set the default value of both barrier counters to zero, thus ensuring the coll/sync component is off by default
2016-08-24 07:51:32 -07:00
Gilles Gouaillardet
2eec8970ff
opal/util: fix (again) incorrect type casting in opal_path_df
...
this fixes previous commit open-mpi/ompi@a439afce5b
2016-08-24 12:50:15 +09:00
rhc54
b12e43fc03
Merge pull request #2001 from ggouaillardet/topic/pmix2x_sec_native
...
fix sec/native module under Solaris and other misc issues
2016-08-23 22:47:05 -05:00
Gilles Gouaillardet
02847d9e7b
pmix2x: dstore: add missing <fcntl.h> include file in pmix_esh.c
...
(back-ported from upstream pmix/master@5c66ffe0f0 )
2016-08-24 11:18:46 +09:00
Gilles Gouaillardet
c11e8163f8
pmix2x: sec/native: fix the pmix_native module under solaris by using getpeerucred()
...
and fail with a user friendly message if no method is available:
"sec: native cannot validate_cred on this system"
(back-ported from upstream pmix/master@c474a1fc60 )
2016-08-24 11:18:40 +09:00
Gilles Gouaillardet
e91292aa41
pmix2x: configury: add missing check for <netdb.h> header file
...
(back-ported from upstream pmix/master@e54ce6d423 )
2016-08-24 11:18:32 +09:00
Gilles Gouaillardet
a439afce5b
opal/util: fix incorrect type casting in opal_path_df
2016-08-24 10:26:13 +09:00
Ralph Castain
22844b0dc6
Balance priorities to ensure something is below sync
2016-08-23 17:33:45 -07:00
Ralph Castain
540f23c4dd
Adjust priority of coll/sync downwards
2016-08-23 17:12:48 -07:00
Jeff Squyres
b5d03c6eea
Merge pull request #1996 from bharatpotnuri/patching
...
Add Chelsio T6 adapter device parameters.
2016-08-23 13:04:26 -04:00
Jeff Squyres
a0a1849101
README: restrict OS X and Oracle Studio compiler versions
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-23 09:46:30 -07:00
Jeff Squyres
065b93600d
AUTHORS: Fix minor typos
...
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2016-08-23 06:32:57 -07:00
Potnuri Bharat Teja
9b7f9ece20
Add Chelsio T6 adapter device parameters.
...
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
2016-08-23 10:38:13 +05:30
Ralph Castain
92102304b6
Minor typo - init the job_data stdin_target field to 0 for default behavior. Add test.
2016-08-22 21:03:45 -07:00
Gilles Gouaillardet
93e73841f9
ess/singleton: push all PMIX_* environment variables, regardless how many there are
2016-08-23 09:46:55 +09:00
Gilles Gouaillardet
a1e8e58a8a
ess/singleton: expects 4 PMIX_* environment variables or more
2016-08-23 09:34:03 +09:00
Howard Pritchard
696121cc4a
Merge pull request #1988 from hppritcha/topic/another_ofi_fix
...
mtl/ofi: fix a botched assignment of av_type
2016-08-22 17:59:59 -06:00
Ralph Castain
7de4d6922b
Change the behavior of cpus-per-rank. We previously counted each cpu against the #slots. However, IBM has pointed out that "slot" is equated to the number of processes allowed to run on each node, and not the number of cpus on the node. This has been a continuing source of confusion, so make the distinction a "hard" one.
...
Each process occupies a "slot". We automatically set #slots = #cpus if nothing else is told to us. If you want to run more procs and slots, you must tell us to allow oversubscription.
A process can utilize multiple pe's if that option is given. If you try to bind more than one proc to a given pe, then we will error out unless you tell us to allow overloading.
2016-08-22 15:54:41 -07:00
Ralph Castain
6549c878a9
Silence the warnings
2016-08-22 15:35:27 -07:00
rhc54
aa21013da3
Merge pull request #1994 from rhc54/topic/unify
...
Unify the PMIx2x components and minor cleanup of coll/sync
2016-08-22 17:20:24 -05:00
Ralph Castain
639dbdb7ea
For maintainability, fold the external PMIx 2.x integration into the internal PMIx 2.x library component. This ensures that we always stay in sync with the two as that is becoming a problem.
2016-08-22 13:28:55 -07:00
Ralph Castain
871bedb103
Add missing "const" qualifiers
2016-08-22 12:54:24 -07:00
Jeff Squyres
17ca44b25e
Merge pull request #1984 from jsquyres/pr/auto-generate-AUTHORS
...
Be able to auto-generate AUTHORS and preserve org affiliations
2016-08-22 15:37:22 -04:00
Edgar Gabriel
a76f4d7c69
Merge pull request #1990 from edgargabriel/topic/mt-io
...
steps towards making file I/O operations thread safe
2016-08-22 08:19:33 -05:00
Joshua Ladd
deae1ab375
Merge pull request #1985 from vspetrov/master
...
coll/hcoll: Fixes predifined types mapping
2016-08-22 09:18:59 -04:00
Edgar Gabriel
c3d4ee3f73
ompi/file: add a muteex to the ompi_file_t structure
...
Adding a mutex to thje ompi_file_t structure allows to have a per-file handle
mutex lock for both ROMIO and OMPIO. I double checked that the size of the
ompi_file_t structure is still below the size of the predefined_file_t structure,
so we should be good from the backward compatibility perspective.
2016-08-21 16:09:12 -05:00
Edgar Gabriel
bc042259bc
make initialization of the io framework thread safe.
...
Also, remove the lock/unlock in the file_open ompi-interface routines of romio314.
The global lock in the romio component does probably not work, it is easy to construct a testcase where two threads perform collective I/O operations on different file handles. With a global lock it is easy to deadlock. THe lock has to be at least on the file handle basis.
move the mutex to file/file.c to avoid duplicate symbol problem in file_open.c pfile_open.c
2016-08-21 16:09:00 -05:00
George Bosilca
b96ec77e40
This variable belongs to the tuned modules and not to base.
2016-08-20 15:37:55 -04:00
George Bosilca
e8425eb1f5
Rename an OMPI internal variable (ticket #1955 ).
2016-08-20 15:37:55 -04:00
rhc54
102d3afe2c
Merge pull request #1992 from rhc54/topic/sync
...
Restore the coll/sync module and provide a test to verify its operation
2016-08-20 13:33:28 -05:00
George Bosilca
fd57f5bccd
Remove some of the clang warnings.
2016-08-20 14:21:42 -04:00
Ralph Castain
9888615e75
Restore the coll/sync module and provide a test to verify its operation
2016-08-20 10:14:52 -07:00
rhc54
60d74be156
Merge pull request #1991 from rhc54/topic/pmixsm
...
Roll in the latest PMIx version - includes shared memory datastore and reduced memory footprint
2016-08-20 12:00:20 -05:00
Ralph Castain
61ffba668b
Roll in the latest PMIx version - includes shared memory datastore and reduced memory footprint
2016-08-20 07:53:06 -07:00
Ralph Castain
700ad84243
Send the pmix build results to me
2016-08-20 07:32:06 -07:00
Howard Pritchard
61d62b6821
mtl/ofi: fix a botched assignment of av_type
...
Well now the av_type is being assigned correctly
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2016-08-19 17:01:02 -05:00
Nathan Hjelm
f3e9a72f1a
Merge pull request #1987 from hjelmn/cid
...
comm/cid: fix threaded CID allocation
2016-08-19 14:26:39 -06:00
Nathan Hjelm
fbbf743c36
comm/cid: fix threaded CID allocation
...
This commit should restore the pre-non-blocking behavior of the CID
allocator when threads are used. There are two primary changes: 1)
do not hold the cid allocator lock past the end of a request callback,
and 2) if a lower id communicator is detected during CID allocation
back off and let the lower id communicator finish before continuing.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2016-08-19 11:47:19 -06:00