1
1
Граф коммитов

27910 Коммитов

Автор SHA1 Сообщение Дата
Gilles Gouaillardet
d19a8351c8 memory/patcher: #ifdef out some parts when SYS_munmap is not defined
so memory/patcher can work under cygwin

Thanks Marco Atzeri for bringing this to our attention

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-11-07 16:44:40 +09:00
Ralph Castain
ec6b2e1cf1
Merge pull request #4450 from rhc54/topic/rmaps
We now add all nodes to the job data object when we map, so don't do it twice
2017-11-05 18:41:55 -08:00
Ralph Castain
dcf389b6fa We now add all nodes to the job data object when we map, so don't do it twice
Fixes #4449

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-05 17:17:50 -08:00
Edgar Gabriel
8fe2c372af
Merge pull request #4448 from edgargabriel/topic/fs-error-codes
fs/ufs error codes
2017-11-04 06:34:52 -05:00
Edgar Gabriel
91881c4fdc fs/lustre: update component to match fs/ufs
the fs/lustre component has missed out on a number of updates to the fs/ufs component.
This commit tries to import all the changes performed on the fs/ufs component
w.r.t to the file_open operation, including updates on how the amode is set,
error is propegated and setting the fs_block_size value (which is required for
locking purposes).

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-11-03 16:11:46 -05:00
Edgar Gabriel
1885d99ac7 fs/ufs: set proper error codes on file_open
set proper error codes in mca_fs_ufs_file_open by mapping the errno value to
the MPI error code. Fixes an issue reported on the mailing by Wei-keng Liao

Fixes Issue #4443

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-11-03 16:06:31 -05:00
bosilca
3e1f771045
Merge pull request #4403 from naughtont3/tjn-fix-plm-monitoring-configury
fix PML monitoring configury to compile DSOs
2017-11-02 19:08:20 -04:00
bosilca
63e8a8c608
Merge pull request #4431 from hjelmn/asm_cleanup
opal: rename opal_atomic_cmpset* to opal_atomic_bool_cmpset*
2017-11-02 18:45:56 -04:00
Matias Cabral
c8aa22ee22
Merge pull request #4304 from aravindksg/master
Fix OFI MTL to recognize correct CQ empty scenario and improve error reporting
2017-11-02 11:56:57 -07:00
Ralph Castain
e7990e7e75
Merge pull request #4438 from rhc54/topic/pinter
Correct copy/paste error in example
2017-11-02 12:35:49 -05:00
Ralph Castain
b97caf8f05 Correct copy/paste error in example
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-02 10:33:28 -07:00
George Bosilca
b2b3da3046
Do not access the frag after returning it.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-10-31 16:39:23 -04:00
Nathan Hjelm
3ff34af355 opal: rename opal_atomic_cmpset* to opal_atomic_bool_cmpset*
This commit renames the atomic compare-and-swap functions to indicate
the return value. This is in preperation for adding support for a
compare-and-swap that returns the old value. At the same time the
return type has been changed to bool.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-10-31 12:47:23 -06:00
Nathan Hjelm
055f413d1b opal/asm: add support for and, or, and xor atomics
This commit adds additional atomics math operations that are needed
throughout the codebase. The semantics of the new operations are
consistent with the existing atomics (op then fetch).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-10-31 11:39:50 -06:00
Ralph Castain
e87d3dc47a
Merge pull request #4428 from rhc54/topic/threads
Revert the MPI_Init fence operations to use volatile bool instead of thread macros
2017-10-31 11:54:52 -05:00
Ralph Castain
27f3d417ca Revert the MPI_Init fence operations to use volatile bool instead of thread macros.
The problem is that the waiting thread is cycling using OMPI_LAZY_WAIT_FOR_COMPLETION so it can exercise opal_progress. This probably isn't as critical for the modex step, but definitely necessary for the barrier at the end of mpi_init. The problem this creates is that the lazy macro exits as soon as "active" becomes false, and then we destruct the lock.

However, wakeup_thread sets "active" to false - and then calls the condition broadcast to wakeup any waiting threads. So there is a race condition between that broadcast and the lock destruct.

Add OPAL_ACQUIRE_OBJECT and OPAL_POST_OBJECT memory barriers to help protect against thread race conditions on some platforms

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-31 08:09:02 -07:00
Ralph Castain
1ae8d59404
Merge pull request #4424 from rhc54/topic/pup
Sync to PMIx v3.0 (master)
2017-10-30 18:49:38 -05:00
Ralph Castain
7839dc91a8 Sync to PMIx v3.0 (master)
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-30 13:06:41 -07:00
Aravind Gopalakrishnan
285fc42b4e Fix OFI MTL to recognize correct CQ empty scenario
Currently, the progress function is incorrectly interpreting any error
value other than a positive value or -FI_EAVAIL to mean CQ is empty.
CQ is empty only if fi_cq_read() call returned -EAGAIN error
code. Fix that here.

While at it, fix help text output for calls made to OFI API.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2017-10-30 12:13:44 -07:00
Gilles Gouaillardet
582aa6c0f0
Merge pull request #4422 from ggouaillardet/topic/configury_zlib_misc_fixes
configury: misc fixes in zlib detection
2017-10-30 17:10:31 +09:00
Gilles Gouaillardet
48a39e34f2 configury: misc fixes in zlib detection
- push extra local variables
 - remove unnecessary AC_MSG_RESULT

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-10-30 16:19:37 +09:00
Ralph Castain
56067f38e3
Merge pull request #4418 from rhc54/topic/pinterlib
Add another test program for cross-lib coordination, this one based on native PMIx commands
2017-10-29 16:41:40 -05:00
Ralph Castain
6be74bfa7e Add another test program for cross-lib coordination, this one based on native PMIx commands
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-29 11:33:25 -07:00
Ralph Castain
21474a6933
Merge pull request #4413 from rhc54/topic/event
Address external libevent once again
2017-10-29 13:16:26 -05:00
Ralph Castain
36d7e752b6 I think we have all concluded that there is no good answer to locating the external libevent library, so surrender to the situation and simply remove that requirement. Users wanting to utilize the embedded PMIx library can install it, but will have to use mpicc _and_ add an explicit -lpmix to their cmd line to compile their application.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-29 07:39:02 -07:00
Ralph Castain
ae4f310c0b
Merge pull request #4406 from rhc54/topic/update
Sync to PMIx master (now v3.0)
2017-10-27 12:22:44 -05:00
Ralph Castain
df48ddd2a1 Merge pull request #4323 from aravindksg/fix_help_text
Move help text output regarding PSM2_CUDA environment variable
2017-10-27 10:10:01 -05:00
Gilles Gouaillardet
5d208a177d Merge pull request #4407 from ggouaillardet/topic/configury_zlib
configury: fix handling of --with-zlib=DIR and --with-zlib-libdir=DIR…
2017-10-27 15:54:55 +09:00
Gilles Gouaillardet
5c61a4e3a5 configury: fix handling of external libevent library
Search external libevent library in both DIR/lib64 and DIR/lib
when --with-libevent=DIR is specified but --with-libevent-libdir=DIR is not

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-10-27 15:52:18 +09:00
Gilles Gouaillardet
5a816a98df configury: fix handling of --with-zlib=DIR and --with-zlib-libdir=DIR option
- if --with-zlib=DIR --with-zlib-libdir=LIBDIR are given, do not search
   libs in DIR/lib[64], and do not abort if libs are not there
 - if --with-zlib=DIR is given but not --with-zlib-libdir, then do append
   -LDIR/lib[64] to LDFLAGS

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-10-27 14:47:47 +09:00
Ralph Castain
ea3508b26b Sync to PMIx master (now v3.0)
Fix an apparent typo in external libevent configury
Require external libevent for install of separate libpmix

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-26 21:05:17 -07:00
Aravind Gopalakrishnan
bea4503f95 Move help text output regarding PSM2_CUDA envvar to component init phase
The messages should be printed only in the event of CUDA builds and in the
presence of supporting hardware and when PSM2 MTL has actually been selected
for use. To this end, move help text output to component init phase.

Also use opal_setenv/unsetenv() for safer setting, unsetting of the environment
variable and sanitize the help text message.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2017-10-26 16:01:01 -07:00
Thomas Naughton
86d282d6dd fix PML monitoring configury to compile DSOs
Signed-off-by: Thomas Naughton <naughtont@ornl.gov>
2017-10-26 15:53:11 -04:00
Ralph Castain
01ed7548c4 Update to PMIx v3.0a
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-25 12:25:27 -07:00
Ralph Castain
05f98d66a6 Merge pull request #4396 from rhc54/topic/pmixconfig
Alter the PMIx embedded configuration
2017-10-25 10:10:50 -05:00
Gilles Gouaillardet
c4650b5904 Merge pull request #4383 from ggouaillardet/topic/configury_ucx
configury: revamp ucx detection
2017-10-25 15:33:38 +09:00
Ralph Castain
8fbfe68754 Alter the PMIx embedded configuration so that we can build static with devel headers - if the builder requests that we install a separate libpmix, then don't prefix the PMIx variables.
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-24 21:45:27 -07:00
Ralph Castain
cf3bc4f55b Merge pull request #4346 from matcabral/psm2_mtl_mq_thread_fix
MTL PSM2: add a thread lock while peeking and completing the psm2 requests.
2017-10-24 16:41:29 -05:00
Ralph Castain
14a0701949 Merge pull request #4391 from rhc54/topic/scale
Add timeout option to scaling script
2017-10-24 14:34:48 -05:00
Ralph Castain
e7c6718d29 Add timeout option to scaling script
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-24 12:33:22 -07:00
Ralph Castain
987aac1268 Merge pull request #4387 from rhc54/topic/dmodx
We should never block when requesting dmodex data from the PMIx server
2017-10-24 10:46:04 -05:00
Ralph Castain
292983261a We should never block when requesting dmodex data from the PMIx server as this will block it from being able to accept connections from local clients. Do not deregister standing dmodx requests when a fence completes unless we actually collected the data in the fence
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-24 07:51:10 -07:00
Ralph Castain
70c455938b Merge pull request #4382 from rhc54/topic/scaling
Update the scaling script to avoid use of "system" command
2017-10-23 22:43:25 -05:00
Ralph Castain
0353be9704 Update MPI init to properly skip barriers
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-23 19:28:34 -07:00
Gilles Gouaillardet
af03f55aa8 configury: revamp ucx detection
- when --with-ucx=DIR is not set, try the default path and fallback to /opt/ucx
 - when --with-ucx-libdir is not set, try lib64 and then lib directories
 - do not handle --with-ucx-libdir (this is a user mistake, no need to over-complicate our logic)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-10-24 09:27:57 +09:00
Ralph Castain
3b71be4db4 Update the scaling script to avoid use of "system" command, thus ensuring that each command sees the same environment. Fix prun to pickup and propagate OMPI MCA params
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-23 16:27:41 -07:00
bosilca
ac348da13a Merge pull request #4374 from bosilca/topic/osx_syslog
Topic/osx syslog
2017-10-23 18:06:36 -04:00
Ralph Castain
0721d933fc Merge pull request #4376 from rhc54/topic/interlib
Update the interlib example to show an alternative method for model declaration
2017-10-23 14:53:13 -05:00
Ralph Castain
e33f319380 Update example to show tests of various APIs
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-23 12:02:54 -07:00
Ralph Castain
6ea3c8a0bd Update the interlib example to show an alternative method for model declaration. Add a missing range value to the OPAL layer. Make it easier to see OMPI model callbacks
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-10-23 11:27:42 -07:00