1
1
Граф коммитов

27946 Коммитов

Автор SHA1 Сообщение Дата
Wojtek Wasko
276de13a1e Make interface's kernel index an int instead of int16_t
Sometimes, the ethernet interfaces can get quite high kernel indices. struct
ifreq (see netdevice(7)) defines ifr_ifindex to be int's. The OOB component
used int16_t internally for matching (in case of -mca oob_tcp_if_[in|ex]clude)
which meant that any interface index > 32767 would never be matched because the
integer would be truncated to int16_t upon return from the function. OOB would
then refuse to work because it didn't find any usable interfaces and MPI job
would abort.

Signed-off-by: Wojtek Wasko <wwasko@nvidia.com>
2017-11-15 04:32:26 -05:00
Jeff Squyres
97d0469719
Merge pull request #4500 from jsquyres/pr/pmix-fix
pmix: pack pointer to object (vs. pointer to pointer)
2017-11-13 14:19:50 -07:00
Jeff Squyres
c19822dad4 pmix: pack pointer to object (vs. pointer to pointer)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-13 09:50:44 -08:00
Matias Cabral
d1869a725a
Merge pull request #4467 from matcabral/master
mtl/ofi: Set data and control progress options default values to FI_PROGRESS_UNSPEC
2017-11-13 07:35:39 -08:00
Ralph Castain
6eb3c124e1
Merge pull request #4498 from rhc54/topic/pmixup
Some minor cleanups of the DVM
2017-11-12 19:01:15 -08:00
Ralph Castain
4381b2c60f Add ability to multiply number of nodes when running scaling tests
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-12 16:38:37 -08:00
Ralph Castain
9c84e1485b Some minor cleanups of the DVM
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-12 16:27:37 -08:00
Jeff Squyres
6d6f0beb62
Merge pull request #4494 from jsquyres/pr/ofi-mtl-updates
MTL OFI updates
2017-11-12 11:30:15 -05:00
Ralph Castain
64e838c1ac
Merge pull request #4495 from rhc54/topic/pmixup
Sync to PMIx master
2017-11-11 18:25:45 -08:00
Ralph Castain
d75d0bc5f6 Sync to PMIx master
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-11 17:06:41 -08:00
Jeff Squyres
a8686a6813 mtl ofi: squelch compiler warnings
gcc 5.2 complains:

```
mtl_ofi_component.c: In function ‘ompi_mtl_ofi_finalize’:
mtl_ofi_component.c:613:5: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
     if (ret = fi_close((fid_t)ompi_mtl_ofi.fabric)) {
     ^
```

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-11 05:07:11 -08:00
Jeff Squyres
5a6ddf42d6 mtl ofi: it is not an error to return no data from fi_getinfo()
Before this commit, the presence of usNIC devices -- which will
(currently) return no data when fi_getinfo() is queried for tagged
matching providers -- would cause an error message to be displayed.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-11 05:07:11 -08:00
Jeff Squyres
f910f554f7 mtl ofi: show the positive value of the error
The value of ret is negative (e.g., -61), but it is displayed in the
help message as `%zd`, which renders as unsigned (i.e., a giant
positive value).  So make sure to negate the negative value before
rendering it (e.g., so we display "61", not "4294967235").

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-11 05:07:11 -08:00
Jeff Squyres
e8c13ef286 mtl ofi: fix trivial comment whitespace
No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-11 05:07:10 -08:00
Jeff Squyres
bed1930df8 mtl ofi: fix formatting of help message
No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-11 05:07:05 -08:00
bosilca
3f43e2c8df
Merge pull request #4419 from thananon/pr_newob1
pml/ob1: match callback will now queue wrong sequence frag and return.
2017-11-10 20:10:53 -05:00
Jeff Squyres
c3ac3f753c
Merge pull request #4488 from jsquyres/pr/usnic-show-unknown-frames-in-verbose-mode
usnic: only output unknown frames in verbose mode
2017-11-10 15:20:42 -05:00
Jeff Squyres
99662757e2 usnic: only output unknown frames in verbose mode
Per
https://www.mail-archive.com/users@lists.open-mpi.org/msg31758.html,
only output unknown frames when we're outputting verbose BTL messages.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-10 11:14:05 -08:00
Gilles Gouaillardet
7a1c65007a atomics: always #include <stdbool.h>
so the bool type is defined when using old compilers that do not support gcc builtin atomics (such as gcc 4.1.x from CentOS 5)

Fixes open-mpi/ompi#4478

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-11-10 14:05:46 +09:00
Joshua Ladd
6d9bd1746f
Merge pull request #4480 from karasevb/fix/rmaps/mindist
rmaps/mindist: reworked the job map binding
2017-11-09 14:27:52 -05:00
Boris Karasev
d2a568afa5 rmaps/mindist: reworked the job map binding
The following issues have been fixed for `mindist`:
- computing the job map on the backend nodes
- using slots count (`-host node1:<s1>,nodeN:<sN>`)
- fixed `dist:span` job mapping method
- fixed `oversubcribe` option with `-host`

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2017-11-09 08:56:44 +02:00
George Bosilca
8a9ef3dc2d
Delay the initialization until necessary.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-11-08 17:32:18 -05:00
George Bosilca
409638bdf4 Keep the out-of-sequence fragment ordered.
Rework the logic to handle the out-of-sequence fragments on the receiver
side. A large number of OOS messages are still arriving even in single
threaded scenarios.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-11-08 14:27:13 -05:00
Matias Cabral
b76bb42ac1 mtl/ofi: Set data and control progress options default
values to FI_PROGRESS_UNSPEC so each provider will use its default.

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2017-11-08 08:24:33 -08:00
Thomas Naughton
86bb6f8bac
Merge pull request #4444 from naughtont3/tjn-fix-plm-monitoring-configury
configury: single quote to avoid trouble with BSD
2017-11-08 10:56:37 -05:00
Jeff Squyres
c925be0ebb
Merge pull request #4474 from jsquyres/pr/ompi-info-ipv6-flag
ompi_info: add ipv6 config param
2017-11-08 10:48:39 -05:00
Jeff Squyres
5ac1ea4512 ompi_info: add ipv6 config param
Allow ompi_info to show whether IPv6 support was enabled in Open MPI
or not.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2017-11-08 07:02:27 -08:00
Gilles Gouaillardet
cfdf042d89
Merge pull request #4461 from ggouaillardet/topic/cygwin_fixes
memory/patcher: #ifdef out some parts when SYS_munmap is not defined
2017-11-08 13:24:12 +09:00
Ralph Castain
4b5be96af0
Merge pull request #4469 from rhc54/topic/pmup
Sync with PMIx master
2017-11-07 19:46:35 -08:00
Ralph Castain
d4b83cc951 Sync with PMIx master
Implement direct modex protection to turn off PMIx dstore when direct modex scenario is detected

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-07 18:10:56 -08:00
Edgar Gabriel
0c3aa44ba3
Merge pull request #4454 from edgargabriel/topic/large-file-bug-fix
io/ompio: fix a bug in handling large write/read operations
2017-11-07 09:46:47 -06:00
Gilles Gouaillardet
291d22c0a4
Merge pull request #4460 from ggouaillardet/topic/fs-error-codes
fs/ufs: add some more error codes on file_open
2017-11-07 16:48:41 +09:00
Gilles Gouaillardet
d19a8351c8 memory/patcher: #ifdef out some parts when SYS_munmap is not defined
so memory/patcher can work under cygwin

Thanks Marco Atzeri for bringing this to our attention

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-11-07 16:44:40 +09:00
Gilles Gouaillardet
18d3897479 fs/ufs: add some more error codes on file_open
set proper error codes in mca_fs_ufs_file_open by mapping the errno value to
the MPI error code.

Refs. open-mpi/ompi#4443

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2017-11-07 11:09:32 +09:00
Edgar Gabriel
c9bb049d00 io/ompio: fix a bug in handling large write/read operations
This is a bug fix based on a problem reported on the mailing list.
For very large read/write operations, ompio breaks the operation
down into multiple cycles. The problem was that
one of the variables required to maintain its values
across the different cycles did not do that, and because
of that the calculations of the memory offsets was wrong.

Fixes issue #4453

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-11-06 11:48:13 -06:00
Ralph Castain
ec6b2e1cf1
Merge pull request #4450 from rhc54/topic/rmaps
We now add all nodes to the job data object when we map, so don't do it twice
2017-11-05 18:41:55 -08:00
Ralph Castain
dcf389b6fa We now add all nodes to the job data object when we map, so don't do it twice
Fixes #4449

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-05 17:17:50 -08:00
Edgar Gabriel
8fe2c372af
Merge pull request #4448 from edgargabriel/topic/fs-error-codes
fs/ufs error codes
2017-11-04 06:34:52 -05:00
Edgar Gabriel
91881c4fdc fs/lustre: update component to match fs/ufs
the fs/lustre component has missed out on a number of updates to the fs/ufs component.
This commit tries to import all the changes performed on the fs/ufs component
w.r.t to the file_open operation, including updates on how the amode is set,
error is propegated and setting the fs_block_size value (which is required for
locking purposes).

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-11-03 16:11:46 -05:00
Edgar Gabriel
1885d99ac7 fs/ufs: set proper error codes on file_open
set proper error codes in mca_fs_ufs_file_open by mapping the errno value to
the MPI error code. Fixes an issue reported on the mailing by Wei-keng Liao

Fixes Issue #4443

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2017-11-03 16:06:31 -05:00
Thomas Naughton
c5dc41ee1a configury: single quote to avoid trouble with BSD
Signed-off-by: Thomas Naughton <naughtont@ornl.gov>
2017-11-03 11:34:28 -04:00
bosilca
3e1f771045
Merge pull request #4403 from naughtont3/tjn-fix-plm-monitoring-configury
fix PML monitoring configury to compile DSOs
2017-11-02 19:08:20 -04:00
bosilca
63e8a8c608
Merge pull request #4431 from hjelmn/asm_cleanup
opal: rename opal_atomic_cmpset* to opal_atomic_bool_cmpset*
2017-11-02 18:45:56 -04:00
Matias Cabral
c8aa22ee22
Merge pull request #4304 from aravindksg/master
Fix OFI MTL to recognize correct CQ empty scenario and improve error reporting
2017-11-02 11:56:57 -07:00
Ralph Castain
e7990e7e75
Merge pull request #4438 from rhc54/topic/pinter
Correct copy/paste error in example
2017-11-02 12:35:49 -05:00
Ralph Castain
b97caf8f05 Correct copy/paste error in example
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2017-11-02 10:33:28 -07:00
George Bosilca
b2b3da3046
Do not access the frag after returning it.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2017-10-31 16:39:23 -04:00
Nathan Hjelm
3ff34af355 opal: rename opal_atomic_cmpset* to opal_atomic_bool_cmpset*
This commit renames the atomic compare-and-swap functions to indicate
the return value. This is in preperation for adding support for a
compare-and-swap that returns the old value. At the same time the
return type has been changed to bool.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-10-31 12:47:23 -06:00
Nathan Hjelm
055f413d1b opal/asm: add support for and, or, and xor atomics
This commit adds additional atomics math operations that are needed
throughout the codebase. The semantics of the new operations are
consistent with the existing atomics (op then fetch).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2017-10-31 11:39:50 -06:00
Ralph Castain
e87d3dc47a
Merge pull request #4428 from rhc54/topic/threads
Revert the MPI_Init fence operations to use volatile bool instead of thread macros
2017-10-31 11:54:52 -05:00