1
1
Граф коммитов

29325 Коммитов

Автор SHA1 Сообщение Дата
Aravind Gopalakrishnan
5cf43de445 MTL/OFI: Check threshold number of peers allowed per rank
When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have
sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when
this limit is crossed.

Check the max allowed number of ranks during add_procs() and return if there is
danger of exceeding this threshold.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-11-01 14:03:00 -07:00
Aurelien Bouteiller
37954b5fda
Merge pull request #6010 from ICLDisco/export/orte_crashfini
Export/orte crashfini
2018-11-01 13:04:42 -04:00
Yossi Itigin
241b424bd3
Merge pull request #6000 from hoopoepg/topic/added-missing-amo-datatypes
OSHMEM/AMO: added missing C11 macro datatypes
2018-11-01 15:29:56 +02:00
Sergey Oblomov
6e78102089 OSHMEM/AMO: code beautify
- added <cr> to split API groups to simplify human processing

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-11-01 11:33:34 +02:00
Aurélien Bouteiller
43bd232fd0
Resolve a recursive destruct on the iof proct in finalize
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
2018-10-31 16:38:42 -04:00
Aurelien Bouteiller
348bf8e13f
Prevent errmgr invokation from crashing in finalize
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2018-10-31 16:28:04 -04:00
Matias Cabral
2da31706bf
Merge pull request #5970 from aravindksg/coll-tuned-fix
coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
2018-10-31 11:20:07 -07:00
Yossi Itigin
bbe5da483a
Merge pull request #5985 from hoopoepg/topic/fixed-oshmem-profile-build
OSHMEM/PROFILE: fixed oshmem profile build
2018-10-31 14:23:46 +02:00
Yossi Itigin
c8d3ad0d48
Merge pull request #5983 from yosefe/topic/scoll-basic-fix-pSync
SCOLL/BASIC: Fix invalid pSync pointer passed to barrier func
2018-10-31 11:56:33 +02:00
Sergey Oblomov
f63d6da6d7 OSHMEM/AMO: added missing C11 macro datatypes
- added signed datatypes for atomic_add calls
- added unsigned datatypes for atomic put/inc/get/fetch calls
- fixed incorrect SHMEM_CTX_DEFAULT macro, added
  external declaration of oshmem_ctx_default variable

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-30 23:15:26 +02:00
Brian Barrett
a1e85b03aa btl tcp: Fix compile error in IPv6
In 457f058 I broke the TCP BTL with --enable-ipv6.  This patch
fixes the compile error, so IPv6 works again.

Fixed #5996

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-30 12:31:04 -07:00
Ralph Castain
2e599663ca
Merge pull request #5993 from rhc54/topic/cleanup
Remove stale defunct tools
2018-10-30 10:54:58 -07:00
Ralph Castain
05ac8fa71c Remove stale defunct tools
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-30 08:48:16 -07:00
Sergey Oblomov
4a3e83780c OSHMEM/PROFILE: fixed profile build
- added missing file to profile makefile
- constants SHMEM_CTX_* are shifted into public header

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-29 23:50:26 +02:00
Howard Pritchard
9c44e8c904
Merge pull request #5977 from ggouaillardet/topic/libevent_configury
event/external: misc configury fixes
2018-10-29 15:19:17 -06:00
Yossi Itigin
6754bf1465 SCOLL/BASIC: Fix invalid pSync pointer passed to barrier func
mca_scoll_basic_alltoall() passed (pSync + 1) to barrier function, but
the value of _SHMEM_ALLTOALL_SYNC_SIZE is 1, which made the barrier
function use an invalid memory location. In particular, this location
was not initialized to _SHMEM_SYNC_VALUE, which broke the barrier
algorithm and it did not complete: One PE could read 0 from its peer and
assume the peer already started the barrier, and then write 1 to the
peer. Then, the peer entered the barrier and overwrote the 1 with 0, and
then it waited forever to see '1' in its pSync.

Found with shmem_verifier test suite.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-29 12:21:34 +02:00
Gilles Gouaillardet
b205039205 event/external: fix version requirement
Only default to the external component if its version is
greater or equal than the internal libevent (2.0.22)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-10-29 10:12:56 +09:00
Gilles Gouaillardet
35e77a286c event/external: misc configury fixes
- Always use the external component when configure'd with --with-libevent=external
 - Fix the external libevent library version detection
   by testing _EVENT_NUMERIC_VERSION and EVENT__NUMERIC_VERSION macros
 - Use the event2/event.h header (event.h is deprecated since libevent 2.0

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-10-29 10:12:56 +09:00
Aurelien Bouteiller
f2e6d7891e
Merge pull request #5975 from ICLDisco/export/pmix-fini-threadinterlock
Avoid a double lock interlock when calling pmix_finalize
2018-10-26 10:01:23 -04:00
Nathan Hjelm
6b6b153d10
Merge pull request #5978 from ggouaillardet/topic/ucx_configury
btl/uct: fix AC_CHECK_DECLS usage
2018-10-26 06:44:10 -06:00
Gilles Gouaillardet
b715dd2657 btl/uct: fix AC_CHECK_DECLS usage
AC_CHECK_DECLS take a comma separated list of macros/symbols,
so replace the whitespace separator with a comma.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-10-26 15:36:02 +09:00
Aurelien Bouteiller
50cf707f3f
Avoid a double lock interlock when calling pmix_finalize
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2018-10-25 16:36:58 -04:00
Ralph Castain
1a97555478
Merge pull request #5947 from rhc54/topic/mpir
Provide deprecation warning of MPIR debugger
2018-10-25 08:47:40 -07:00
Nathan Hjelm
049bfb6c3c
Merge pull request #5971 from devreal/ompi-rdma-preinit-fixbarrier
Fix regression in RDMA shared memory registration
2018-10-25 09:45:36 -06:00
Ralph Castain
2cb271716b Provide deprecation warning of MPIR debugger
If we detect that we are being debugged by an MPIR-based debugger, then
print a warning that OMPI's MPIR support has been deprecated and will be
removed in a subsequent release.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-10-25 07:59:07 -07:00
Aravind Gopalakrishnan
88d781056f coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.

MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-10-24 15:31:33 -07:00
Joseph Schuchart
a193ae26bf Fix regression introduced earlier by re-adding a barrier after shared memory has been registered
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2018-10-24 15:54:19 -04:00
Ralph Castain
c186004e5e
Merge pull request #5968 from ICLDisco/export/overspawn
Correctly propagate the oversubscribe flag to the spawnees
2018-10-24 09:21:49 -07:00
Aurélien Bouteiller
2820aef551
Correctly propagate the oversubscribe flag to the spawnees
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
2018-10-23 23:02:36 -04:00
Yossi Itigin
4fdf57a6ee
Merge pull request #5955 from amaslenn/mlnx-no-uct
platform/mellanox: disable btl-uct by default
2018-10-23 08:21:02 +03:00
Nathan Hjelm
7b34c9b3fe
Merge pull request #5958 from hjelmn/btl_uct_sync
btl/uct: update for UCT_CB_FLAG_SYNC removal
2018-10-22 23:17:27 -06:00
Andrey Maslennikov
074e9cc92c platform/mellanox: disable btl-uct by default
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
2018-10-22 12:23:40 +03:00
Yossi Itigin
4c442f2601
Merge pull request #5934 from hoopoepg/topic/suppressed-cov-warn-added-log-msg
COMMON/UCX: suppressed coverity warnings
2018-10-22 11:00:47 +03:00
Nathan Hjelm
1b37328ba8 btl/uct: update for UCT_CB_FLAG_SYNC removal
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2018-10-21 18:57:42 -06:00
bosilca
690024917d
Merge pull request #5938 from bwbarrett/bugfix/tcp-multi-address
Always use same IP address for module and modex
2018-10-21 09:19:43 -07:00
Sergey Oblomov
1099d5f023 COMMON/UCX: added error code to log output
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-21 11:37:25 +03:00
Nathan Hjelm
a66373454e
Merge pull request #5943 from bosilca/fix/libnbc_warnings
Remove few warnings in libnbc identified by clang-1000.11.45.2
2018-10-20 21:24:30 -06:00
bosilca
c3abedbd2c
Merge pull request #5759 from bosilca/fix/monitoring
Fix/monitoring
2018-10-19 07:18:41 -07:00
Nathan Hjelm
8db5aaa33a
Merge pull request #5952 from hjelmn/romio_warnings
romio/romio321: silence some compiler warnings
2018-10-18 18:37:23 -06:00
Nathan Hjelm
dbae9c0958 romio/romio321: silence some compiler warnings
Some compilers complain when comparing signed and unsigned. romio321
was doing just this. The check is meant to check whether a size (which
is an ADIO_Offset-- a signed number) will work with memcpy which takes
a size_t. To silence the warning I added a new type (ADIO_Size) which
is an unsigned type and cast the ADIO_Offset to this new type.

Fixes #5951

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-18 13:36:51 -06:00
George Bosilca
dc972f0b92
Fix the PML monitoring.
The monitoring PML hides it's existence from the OMPI infrastructure by
removing itself from the list of PML loaded components, remaining hidden
until MPI_Finalize.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-10-18 00:29:23 -04:00
George Bosilca
668aa15dda
Early selection of the best PML.
With this patch the best PML is selected earlier, before finalizing
the others PML. This provides a simpler mechanism to intercept and
highjack the PML (as done in the monitoring PML)

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-10-18 00:29:23 -04:00
Brian Barrett
457f058e73 btl tcp: Simplify modex address selection
Simplify selection of the address to publish for a given BTL TCP
module in the module exchange code.  Rather than looping through
all IP addresses associated with a node, looking for one that
matches the kindex of a module, loop over the modules and
use the address stored in the module structure.  This also
happens to be the address that the source will use to bind()
in a connect() call, so this should eliminate any confusion
(read: bugs) when an interface has multiple IPs associated with
it.

Refs #5818

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-18 02:23:21 +00:00
Ralph Castain
6213d23f0b
Merge pull request #5944 from rhc54/topic/psrvr
Remove the stale orte-dvm code
2018-10-17 16:12:14 -07:00
Ralph Castain
1bd772e8eb Remove the stale orte-dvm code
Users should migrate to https://github.com/pmix/prrte

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-17 15:11:38 -07:00
Howard Pritchard
7730db9982
Merge pull request #5937 from hppritcha/topic/remove_crs_components
remove some dead crs components
2018-10-17 16:06:36 -06:00
George Bosilca
66182a294d
Remove few warnings in libnbc identified by clang-1000.11.45.2
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-10-17 18:04:39 -04:00
Jeff Squyres
c2608fb597
Merge pull request #5939 from jsquyres/pr/coverity-cid-1440332
opal/os_path: fix minor string overrun
2018-10-17 14:24:42 -07:00
Howard Pritchard
6564d3d217 remove some dead crs components
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-10-17 10:29:00 -06:00
Brian Barrett
4f19221af2 btl tcp: Simplify module address storage
Today, a btl tcp module is associated with exactly one IP
address (IPv4 or IPv6).  There's no need to reserve space
for both an IPv4 and IPv6 address in the module structure,
since the module will only be associated with one or the
other.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-17 16:21:17 +00:00