1
1

29741 Коммитов

Автор SHA1 Сообщение Дата
Joseph Schuchart
08da2f5ea5 Correctly set baseptr in contiguous shared memory window with local size zero
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 06bbcf4fd63dd184cf22f8bcad007c4b8b991a3c)
2020-02-20 20:46:29 +01:00
Howard Pritchard
43ecbb1734
Merge pull request #7392 from awlauria/pgcc18_v4.0.x
v4.0.x: Fix pgcc18 support.
2020-02-14 14:16:42 -06:00
Austen Lauria
ff6b068b93 Fix pgcc18 support.
- pgcc18 defines __GNUC__ similar to Intel compilers. So we must
  check for pgi higher up, or else configury will mistake
  it for gcc.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit 14785deb3c6609cb3f6763d0e07a49e86588c4da)
2020-02-12 15:11:03 -05:00
Geoff Paulsen
a1259e6a14
Merge pull request #7356 from hppritcha/topic/pr7201_to_v40x
Topic/pr7201 to v40x
2020-02-12 14:06:31 -06:00
Brice Goglin
f136804c45 hwloc/base: fix opal proc locality wrt to NUMA nodes on hwloc 1.11
Build was broken by mistake in commit d40662edc41a5a4d09ae690b640cfdeeb24e15a1

Fixes #7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 907ad854b46b42ae7cb1e9c87238691a5cc25e36)
2020-02-11 18:37:18 -06:00
Geoff Paulsen
3894e5760c
Merge pull request #7380 from gpaulsen/topic/v4.0.x/VERSION_v4.0.3rc4
Reving to VERSION v4.0.3rc4
2020-02-11 18:28:45 -06:00
Howard Pritchard
eddb0ef626
Merge pull request #7382 from gpaulsen/topic/v4.0.x/pmix_v3.1.5rc2
Adding PMIx v3.1.5rc2
2020-02-11 10:05:38 -06:00
Geoffrey Paulsen
81ad9bfdb6 Adding PMIx v3.1.5rc2
Adding PMIx v3.1.5rc2 from:
  https://github.com/openpmix/openpmix/releases/tag/v3.1.5rc2

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-02-10 17:05:53 -06:00
Geoffrey Paulsen
aff4fa6c8f Reving to VERSION v4.0.3rc4
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-02-10 15:56:28 -06:00
Brice Goglin
6702a4febb opal/hwloc: remove some unused variables when building with hwloc < 1.7
Refs #7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 329d4451a6cdd544e532a29f594f6e5ee63e06da)
2020-02-10 15:54:48 -06:00
Geoff Paulsen
d79fe7fe10
Merge pull request #7376 from hoopoepg/topic/oshmem-inc-max-segments-v4.0
OSHMEM/SEGMENTS: increase max number of segments - v4.0
2020-02-10 15:52:24 -06:00
Sergey Oblomov
45a722ad6a OSHMEM/SEGMENTS: increase number of max segments
- increase number of max segments to allow application be launched
  on some Ubuntu configurations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit f742f289ea32a0f3dfe5f769fb318213f1a74c37)
2020-02-10 07:44:50 +02:00
Howard Pritchard
42acf4fe6f
Merge pull request #7360 from jsquyres/pr/v4.0.x/fortran-you-win-again
v4.0.x: Fortran fixes
2020-02-09 10:02:04 -06:00
Jeff Squyres
fbeebdb9a0 fortran: ensure not to use [AM_]CPPFLAGS
Automake's Fortran compilation rules inexplicably use CPPFLAGS and
AM_CPPFLAGS.  Unfortunately, this can cause problems in some cases
(e.g., picking up already-installed mpi.mod in a system-default
include search path).

So in relevant module-using Fortran compilation Makefile.am's, zero
out CPPFLAGS and AM_CPPFLAGS.

This has a side-effect of requiring that we compile the one .c file in
the F08 library in a new, separate subdirectory (with its own
Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit ab398f4b9a340b54a88b83021b66911fe46d5862)
2020-02-04 05:15:40 -08:00
Jeff Squyres
85ce373730 fortran: remove useless CPPFLAGS assignment
These -D's are for C compilation, not Fortran compilation.  Remove
this useless statement.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f4a47a5a8e4e3f2c902807d75e211f7f500f802b)
2020-02-04 04:26:11 -08:00
Howard Pritchard
bed0ce70a7 fix a problem with opal_asprintf
not being defined.

related to #7201

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-02-03 14:22:59 -07:00
Brice Goglin
82567996f7 hwloc/base: fix opal proc locality wrt to NUMA nodes on hwloc 2.0
Both opal_hwloc_base_get_relative_locality() and _get_locality_string()
iterate over hwloc levels to build the proc locality information.
Unfortunately, NUMA nodes are not in those normal levels anymore since 2.0.
We have to explicitly look a the special NUMA level to get that locality info.

I am factorizing the core of the iterations inside dedicated "_by_depth"
functions and calling them again for the NUMA level at the end of the loops.

Thanks to Hatem Elshazly for reporting the NUMA communicator split failure
at https://www.mail-archive.com/users@lists.open-mpi.org/msg33589.html

It looks like only the opal_hwloc_base_get_locality_string() part is needed
to fix that split, but there's no reason not to fix get_relative_locality()
as well.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit ea80a20e108cb69efc67ad04ad968da7b85772af)
2020-02-03 13:29:41 -07:00
Howard Pritchard
a26cd349b9
Merge pull request #7355 from jsquyres/pr/v4.0.x/fortran-sentinel-linker-black-magic
v4.0.x: Make C and Fortran types for MPI sentinels agree in size
2020-02-03 13:25:29 -07:00
Fangrui Song
02f3795299 Make C and Fortran types for MPI sentinels agree in size
Fix the C types for the following:

* MPI_UNWEIGHTED
* MPI_WEIGHTS_EMPTY
* MPI_ARGV_NULL
* MPI_ARGVS_NULL
* MPI_ERRCODES_IGNORE

There is lengthy discussion on
https://github.com/open-mpi/ompi/pull/7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree).  We erroneously had several of these array-like sentinel
values be "array-like" values in C.  E.g., MPI_ERRCODES_IGNORE was an
(int *) in C while its corresponding Fortran type was "integer,
dimension(1)".  On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int*)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.

That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.

(**) NOTE: This code has been wrong for years.  This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the *addresses* of these instances,
not their *types* (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same.  But it caused linker warning messages,
and even caused errors in some cases.

Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.

This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):

  % mpifort -fuse-ld=lld /dev/null
  ld.lld: error: corrupt input file: version definition index 0 for symbol
  mpi_fortran_argv_null_ is out of bounds
  >>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so
  ...

If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.

This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c0073dd11fb90d18122c521686f692a32cdd8).

Fixes open-mpi/ompi#7209

Signed-off-by: Fangrui Song <i@maskray.me>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 5609268e90cb0ff7b2431d29041c10a700fd6996)
2020-02-02 13:57:50 -08:00
Geoff Paulsen
2f42a125be
Merge pull request #7352 from hppritcha/topic/minor_news_update_v4.0.x
NEWS: tweak for v4.0.3 release
2020-01-31 13:46:48 -06:00
Howard Pritchard
89e3a2ba02 NEWS: tweak for v4.0.3 release
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-31 12:38:41 -07:00
Geoff Paulsen
731721119e
Merge pull request #7346 from gpaulsen/topic/v4.0.x/VERSION_4.0.3rc3_part2
Actually Updating VERSION to v4.0.3rc3
2020-01-28 16:53:28 -06:00
Geoffrey Paulsen
80950480a9 Actually Updating VERSION to v4.0.3rc3
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-28 15:16:06 -06:00
Geoff Paulsen
c79e841921
Merge pull request #7342 from gpaulsen/topic/v4.0.x/VERSION_4.0.3rc3
Updating VERSION to v4.0.3rc3
2020-01-28 10:44:55 -06:00
Geoffrey Paulsen
44c1b6fb98 Updating VERSION to v4.0.3rc3
We tried doing an RC2 built without updating the greek,
and found where that failed in build automation.
Reving again for rc3, as we've already applied the rc2 tag.

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-28 10:42:04 -05:00
Geoff Paulsen
b9d54dadb6
Merge pull request #7341 from hppritcha/topic/news_for_rc4.0.3rc2
NEWS: updates for 4.0.3rc2
2020-01-27 14:52:58 -06:00
Howard Pritchard
7147a8c3bb NEWS: updates for 4.0.3rc2
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-27 13:50:35 -07:00
Howard Pritchard
0ea96ec4db
Merge pull request #7340 from jjhursey/v4-no-ssh-core
plm/rsh: Fix segv on missing agent.
2020-01-27 13:08:40 -07:00
Joshua Hursey
05d003b109
plm/rsh: Fix segv on missing agent.
* Additionally, fixes the `NULL` option to `OMPI_MCA_plm_rsh_agent`
   would would also lead to a segv. Now it operates as intended by
   disqualifying the `rsh` component and falling back onto the `isolated`
   component.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 62d0058738e8a111cd099199bc5f1886f13aa8ec)
2020-01-27 10:34:28 -06:00
Howard Pritchard
5f40b47088
Merge pull request #7338 from hppritcha/topic/fix_6539_v4.0.x
Topic/fix 6539 v4.0.x
2020-01-26 12:39:38 -07:00
Howard Pritchard
297505592a Fix a problem with fortran configure test.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-24 15:42:00 -06:00
Geoff Paulsen
2549ba2e47
Merge pull request #7329 from janjust/v4.0.x-oshmem-perf-multi-worker
V4.0.x: oshmem/ucx: improves spml ucx performance for multi-threaded applications.
2020-01-24 13:41:13 -06:00
Howard Pritchard
d12e0fdf32 make mpifort obey disable-wrapper-runpath
related to #6539

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 37b3e2f3fa7a4971dda64d8d2ff933dc4d4c807d)
2020-01-24 10:47:22 -06:00
Sergey Oblomov
91ab0e2191 SPML/UCX: fixed coverity issues
- fixed sizeof(char***) by variable datatype
- fixed resorce leak in proc_add

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 8543860689029dc09b5edfa25afafa087fe8603b)
2020-01-24 17:29:53 +02:00
Tomislav Janjusic
0daf3df384 oshmem/ucx: improves spml ucx performance for multi-threaded
applications.

Improves multi-threaded performance by adding the option to create
multiple ucx workers in threaded applications.

Co-authored with:
Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 3d6bf9fd8ec729d1c07470600e2c92c0f1580830)
2020-01-24 17:29:53 +02:00
Howard Pritchard
0b2b9d7660
Merge pull request #7325 from hppritcha/topic/pr_7304_to_v4.0.x
btl/vader: modify how the max attachment address is determined
2020-01-24 08:00:36 -07:00
Howard Pritchard
686f2debda
Merge pull request #7327 from janjust/v4.0.x-oshmem-perf-progress
v4.0.x: oshmem/ucx: Fix progress in iput/iget: periodically poke progress to prevent hardware stalls when using DCT transport.
2020-01-24 07:58:54 -07:00
Howard Pritchard
0f54228535
Merge pull request #7321 from hppritcha/topic/pr_2551_to_v4.0.x
Topic/pr 7283 to v4.0.x
2020-01-23 16:36:36 -07:00
Tomislav Janjusic
9e755d3803 oshmem/ucx: Improves performance for non-blocking put/get operations.
Improves the performance when excess non-blocking operations are posted
by periodically calling progress on ucx workers.

Co-authored with:
Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 1b58e3d07388c8c63d485fe308589009279c1f4f)
2020-01-22 21:45:32 +02:00
Nathan Hjelm
66684bbda3 btl/vader: modify how the max attachment address is determined
This PR removes the constant defining the max attachment address and
replaces it with the largest address that shows up in /proc/self/maps.
This should address issues found on AARCH64 where the max address
may differ based on the configuration.

Since the calculated max address may differ between processes the
max address is sent as part of the modex and stored in the endpoint
data.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 728d51f9f3f2df6577e5f9729b9d6a0fe9441d37)
2020-01-19 15:05:41 -08:00
Nathan Hjelm
a64a7c8a0a btl/vader: fix issues with xpmem registration invalidation
This commit fixes an issue discovered in the XPMEM registration cache. It
was possible for a registration to be invalidated by multiple threads
leading to a double-free situation or re-use of an invalidated registration.

This commit fixes the issue by setting the INVALID flag on a registation
when it will be deleted. The flag is set while iterating over the tree
to take advantage of the fact that a registration can not be removed
from the VMA tree by a thread while another thread is traversing the VMA
tree.

References #6524
References #7030
Closes #6534

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit f86f805be1145ace46b570c5c518555b38e58cee)
2020-01-19 13:42:00 -08:00
Nathan Hjelm
76002ada84 opal: make interval tree resilient to similar intervals
There are cases where the same interval may be in the tree multiple
times. This generally isn't a problem when searching the tree but
may cause issues when attempting to delete a particular registration
from the tree. The issue is fixed by breaking a low value tie by
checking the high value then the interval data.

If the high, low, and data of a new insertion exactly matches an
existing interval then an assertion is raised.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 1145abc0b790f82ea25e24a3becad91ff502769c)
2020-01-19 13:40:57 -08:00
Geoff Paulsen
629d0efa15
Merge pull request #7314 from hppritcha/topic/NEWS_v403
NEWS: update for 4.0.3
2020-01-17 12:41:05 -06:00
Geoff Paulsen
b21c475df6
Merge pull request #7313 from hppritcha/topic/version_for_4.0.3
VERSION - update for v4.0.3
2020-01-17 12:40:52 -06:00
Howard Pritchard
9d32fedd55 NEWS: update for 4.0.3
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-17 11:29:08 -07:00
Howard Pritchard
baf1b06c9e VERSION - update for v4.0.3
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-17 10:21:40 -07:00
Howard Pritchard
1cdcce7f89
Merge pull request #7296 from michaellass/v4.0.x-fix-dims_create
dims_create: fix calculation of factors for odd squares (v4.0.x)
2020-01-14 09:10:40 -07:00
Geoff Paulsen
3da939b124
Merge pull request #7248 from wckzhang/v4.0.x
MTL/OFI: Check threshold number of peers allowed per rank
2020-01-13 14:19:51 -06:00
Geoff Paulsen
6985a5560f
Merge pull request #7291 from gpaulsen/topic/v4.0.x/from_pr7190_7192
Topic/v4.0.x/from pr7190 7192
2020-01-13 14:03:16 -06:00
Michael Lass
ff85c82151 dims_create: fix calculation of factors for odd squares
Until now sqrt(n) was missed as a factor for odd square numbers n. This
lead to suboptimal results of MPI_Dims_create for input numbers like 9,
25, 49, ... Fix the results by including sqrt(n) in the search for
factors.

Refs: #7186

Signed-off-by: Michael Lass <bevan@bi-co.net>
(cherry picked from commit 67490118adb8372d2aefe1d2d923432e51e100cd)
2020-01-10 16:07:40 +01:00