1
1
Граф коммитов

29929 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
fbeebdb9a0 fortran: ensure not to use [AM_]CPPFLAGS
Automake's Fortran compilation rules inexplicably use CPPFLAGS and
AM_CPPFLAGS.  Unfortunately, this can cause problems in some cases
(e.g., picking up already-installed mpi.mod in a system-default
include search path).

So in relevant module-using Fortran compilation Makefile.am's, zero
out CPPFLAGS and AM_CPPFLAGS.

This has a side-effect of requiring that we compile the one .c file in
the F08 library in a new, separate subdirectory (with its own
Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit ab398f4b9a)
2020-02-04 05:15:40 -08:00
Jeff Squyres
85ce373730 fortran: remove useless CPPFLAGS assignment
These -D's are for C compilation, not Fortran compilation.  Remove
this useless statement.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f4a47a5a8e)
2020-02-04 04:26:11 -08:00
Howard Pritchard
bed0ce70a7 fix a problem with opal_asprintf
not being defined.

related to #7201

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-02-03 14:22:59 -07:00
Brice Goglin
82567996f7 hwloc/base: fix opal proc locality wrt to NUMA nodes on hwloc 2.0
Both opal_hwloc_base_get_relative_locality() and _get_locality_string()
iterate over hwloc levels to build the proc locality information.
Unfortunately, NUMA nodes are not in those normal levels anymore since 2.0.
We have to explicitly look a the special NUMA level to get that locality info.

I am factorizing the core of the iterations inside dedicated "_by_depth"
functions and calling them again for the NUMA level at the end of the loops.

Thanks to Hatem Elshazly for reporting the NUMA communicator split failure
at https://www.mail-archive.com/users@lists.open-mpi.org/msg33589.html

It looks like only the opal_hwloc_base_get_locality_string() part is needed
to fix that split, but there's no reason not to fix get_relative_locality()
as well.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit ea80a20e10)
2020-02-03 13:29:41 -07:00
Howard Pritchard
a26cd349b9
Merge pull request #7355 from jsquyres/pr/v4.0.x/fortran-sentinel-linker-black-magic
v4.0.x: Make C and Fortran types for MPI sentinels agree in size
2020-02-03 13:25:29 -07:00
Fangrui Song
02f3795299 Make C and Fortran types for MPI sentinels agree in size
Fix the C types for the following:

* MPI_UNWEIGHTED
* MPI_WEIGHTS_EMPTY
* MPI_ARGV_NULL
* MPI_ARGVS_NULL
* MPI_ERRCODES_IGNORE

There is lengthy discussion on
https://github.com/open-mpi/ompi/pull/7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree).  We erroneously had several of these array-like sentinel
values be "array-like" values in C.  E.g., MPI_ERRCODES_IGNORE was an
(int *) in C while its corresponding Fortran type was "integer,
dimension(1)".  On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int*)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.

That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.

(**) NOTE: This code has been wrong for years.  This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the *addresses* of these instances,
not their *types* (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same.  But it caused linker warning messages,
and even caused errors in some cases.

Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.

This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):

  % mpifort -fuse-ld=lld /dev/null
  ld.lld: error: corrupt input file: version definition index 0 for symbol
  mpi_fortran_argv_null_ is out of bounds
  >>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so
  ...

If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.

This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c0073dd).

Fixes open-mpi/ompi#7209

Signed-off-by: Fangrui Song <i@maskray.me>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 5609268e90)
2020-02-02 13:57:50 -08:00
Geoff Paulsen
2f42a125be
Merge pull request #7352 from hppritcha/topic/minor_news_update_v4.0.x
NEWS: tweak for v4.0.3 release
2020-01-31 13:46:48 -06:00
Howard Pritchard
89e3a2ba02 NEWS: tweak for v4.0.3 release
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-31 12:38:41 -07:00
Artem Ryabov
0f1f13c67a Enabled Mellanox CI for release branches.
Signed-off-by: Artem Ryabov <artemry@mellanox.com>
2020-01-31 21:59:51 +03:00
Geoff Paulsen
731721119e
Merge pull request #7346 from gpaulsen/topic/v4.0.x/VERSION_4.0.3rc3_part2
Actually Updating VERSION to v4.0.3rc3
2020-01-28 16:53:28 -06:00
Geoffrey Paulsen
80950480a9 Actually Updating VERSION to v4.0.3rc3
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-28 15:16:06 -06:00
Geoff Paulsen
c79e841921
Merge pull request #7342 from gpaulsen/topic/v4.0.x/VERSION_4.0.3rc3
Updating VERSION to v4.0.3rc3
2020-01-28 10:44:55 -06:00
Geoffrey Paulsen
44c1b6fb98 Updating VERSION to v4.0.3rc3
We tried doing an RC2 built without updating the greek,
and found where that failed in build automation.
Reving again for rc3, as we've already applied the rc2 tag.

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-28 10:42:04 -05:00
Geoff Paulsen
b9d54dadb6
Merge pull request #7341 from hppritcha/topic/news_for_rc4.0.3rc2
NEWS: updates for 4.0.3rc2
2020-01-27 14:52:58 -06:00
Howard Pritchard
7147a8c3bb NEWS: updates for 4.0.3rc2
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-27 13:50:35 -07:00
Howard Pritchard
0ea96ec4db
Merge pull request #7340 from jjhursey/v4-no-ssh-core
plm/rsh: Fix segv on missing agent.
2020-01-27 13:08:40 -07:00
Joshua Hursey
05d003b109
plm/rsh: Fix segv on missing agent.
* Additionally, fixes the `NULL` option to `OMPI_MCA_plm_rsh_agent`
   would would also lead to a segv. Now it operates as intended by
   disqualifying the `rsh` component and falling back onto the `isolated`
   component.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 62d0058738)
2020-01-27 10:34:28 -06:00
Howard Pritchard
5f40b47088
Merge pull request #7338 from hppritcha/topic/fix_6539_v4.0.x
Topic/fix 6539 v4.0.x
2020-01-26 12:39:38 -07:00
Howard Pritchard
297505592a Fix a problem with fortran configure test.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-24 15:42:00 -06:00
Geoff Paulsen
2549ba2e47
Merge pull request #7329 from janjust/v4.0.x-oshmem-perf-multi-worker
V4.0.x: oshmem/ucx: improves spml ucx performance for multi-threaded applications.
2020-01-24 13:41:13 -06:00
Howard Pritchard
d12e0fdf32 make mpifort obey disable-wrapper-runpath
related to #6539

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 37b3e2f3fa)
2020-01-24 10:47:22 -06:00
Sergey Oblomov
91ab0e2191 SPML/UCX: fixed coverity issues
- fixed sizeof(char***) by variable datatype
- fixed resorce leak in proc_add

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 8543860689)
2020-01-24 17:29:53 +02:00
Tomislav Janjusic
0daf3df384 oshmem/ucx: improves spml ucx performance for multi-threaded
applications.

Improves multi-threaded performance by adding the option to create
multiple ucx workers in threaded applications.

Co-authored with:
Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 3d6bf9fd8e)
2020-01-24 17:29:53 +02:00
Howard Pritchard
0b2b9d7660
Merge pull request #7325 from hppritcha/topic/pr_7304_to_v4.0.x
btl/vader: modify how the max attachment address is determined
2020-01-24 08:00:36 -07:00
Howard Pritchard
686f2debda
Merge pull request #7327 from janjust/v4.0.x-oshmem-perf-progress
v4.0.x: oshmem/ucx: Fix progress in iput/iget: periodically poke progress to prevent hardware stalls when using DCT transport.
2020-01-24 07:58:54 -07:00
Howard Pritchard
0f54228535
Merge pull request #7321 from hppritcha/topic/pr_2551_to_v4.0.x
Topic/pr 7283 to v4.0.x
2020-01-23 16:36:36 -07:00
Tomislav Janjusic
9e755d3803 oshmem/ucx: Improves performance for non-blocking put/get operations.
Improves the performance when excess non-blocking operations are posted
by periodically calling progress on ucx workers.

Co-authored with:
Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 1b58e3d073)
2020-01-22 21:45:32 +02:00
Nathan Hjelm
66684bbda3 btl/vader: modify how the max attachment address is determined
This PR removes the constant defining the max attachment address and
replaces it with the largest address that shows up in /proc/self/maps.
This should address issues found on AARCH64 where the max address
may differ based on the configuration.

Since the calculated max address may differ between processes the
max address is sent as part of the modex and stored in the endpoint
data.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 728d51f9f3)
2020-01-19 15:05:41 -08:00
Nathan Hjelm
a64a7c8a0a btl/vader: fix issues with xpmem registration invalidation
This commit fixes an issue discovered in the XPMEM registration cache. It
was possible for a registration to be invalidated by multiple threads
leading to a double-free situation or re-use of an invalidated registration.

This commit fixes the issue by setting the INVALID flag on a registation
when it will be deleted. The flag is set while iterating over the tree
to take advantage of the fact that a registration can not be removed
from the VMA tree by a thread while another thread is traversing the VMA
tree.

References #6524
References #7030
Closes #6534

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit f86f805be1)
2020-01-19 13:42:00 -08:00
Nathan Hjelm
76002ada84 opal: make interval tree resilient to similar intervals
There are cases where the same interval may be in the tree multiple
times. This generally isn't a problem when searching the tree but
may cause issues when attempting to delete a particular registration
from the tree. The issue is fixed by breaking a low value tie by
checking the high value then the interval data.

If the high, low, and data of a new insertion exactly matches an
existing interval then an assertion is raised.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 1145abc0b7)
2020-01-19 13:40:57 -08:00
Geoff Paulsen
629d0efa15
Merge pull request #7314 from hppritcha/topic/NEWS_v403
NEWS: update for 4.0.3
2020-01-17 12:41:05 -06:00
Geoff Paulsen
b21c475df6
Merge pull request #7313 from hppritcha/topic/version_for_4.0.3
VERSION - update for v4.0.3
2020-01-17 12:40:52 -06:00
Howard Pritchard
9d32fedd55 NEWS: update for 4.0.3
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-17 11:29:08 -07:00
Howard Pritchard
baf1b06c9e VERSION - update for v4.0.3
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-17 10:21:40 -07:00
Howard Pritchard
1cdcce7f89
Merge pull request #7296 from michaellass/v4.0.x-fix-dims_create
dims_create: fix calculation of factors for odd squares (v4.0.x)
2020-01-14 09:10:40 -07:00
Geoff Paulsen
3da939b124
Merge pull request #7248 from wckzhang/v4.0.x
MTL/OFI: Check threshold number of peers allowed per rank
2020-01-13 14:19:51 -06:00
Geoff Paulsen
6985a5560f
Merge pull request #7291 from gpaulsen/topic/v4.0.x/from_pr7190_7192
Topic/v4.0.x/from pr7190 7192
2020-01-13 14:03:16 -06:00
Michael Lass
ff85c82151 dims_create: fix calculation of factors for odd squares
Until now sqrt(n) was missed as a factor for odd square numbers n. This
lead to suboptimal results of MPI_Dims_create for input numbers like 9,
25, 49, ... Fix the results by including sqrt(n) in the search for
factors.

Refs: #7186

Signed-off-by: Michael Lass <bevan@bi-co.net>
(cherry picked from commit 67490118ad)
2020-01-10 16:07:40 +01:00
Howard Pritchard
8df9f53bdc
Merge pull request #7288 from gpaulsen/topic/v4.0.x/from_pr7191
v4.0.x: Add the missing code to check a return code
2020-01-09 18:01:01 -07:00
Geoff Paulsen
e8834629af
Merge pull request #7294 from gpaulsen/topic/v4.0.x/from_pr7183
lustre: squash some compiler warnings
2020-01-09 08:41:02 -06:00
Howard Pritchard
90cc1f1cf0
Merge pull request #7287 from hppritcha/topic/support_for_cray_fortran_v4.0.x
cray ftn: modify fortran module loc checker
2020-01-08 19:51:13 -07:00
Geoff Paulsen
c18d74bd70
Merge pull request #7282 from jsquyres/pr/v4.0.x/large-count-ddt-fixes
v4.0.x: Large count DDT fixes
2020-01-08 15:01:23 -06:00
Howard Pritchard
e06595d7f6 lustre: squash some compiler warnings
Compiling OMPI on cray systems using latest Cray compilers (clang based)
yielded some compiler warnings from ompio/lustre.  Squash these warnings.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit e66a7cef11)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 16:00:01 -05:00
Geoffroy Vallee
836ce83c9a Fix typo in comment: contiaing -> containing
Signed-off-by: Geoffroy Vallee <geoffroy.vallee@gmail.com>
(cherry picked from commit 127573cf44)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 15:49:15 -05:00
Geoffroy Vallee
d59faea868 Fix a type in comments: insertted -> inserted
Signed-off-by: Geoffroy Vallee <geoffroy.vallee@gmail.com>
(cherry picked from commit 98de17c6da)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 15:48:40 -05:00
Geoffroy Vallee
a479beeeae Add the missing code to check a return code
Signed-off-by: Geoffroy Vallee <geoffroy.vallee@gmail.com>
(cherry picked from commit de6f130b4a)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 15:27:38 -05:00
Howard Pritchard
dc8246b8ae
Merge pull request #7279 from jsquyres/pr/v4.0.x/rpm-specfile-fix
v4.0.x: openmpi.spec: update modulefile_path behavior
2020-01-08 12:27:37 -07:00
Howard Pritchard
9582b76168 cray ftn: modify fortran module loc checker
to support the Cray Fortran compiler.  Cray Fortran compiler does not
contain all symbol info in the module file, have to link with the *.o
created as part of module file compilation.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 441bad9a75)
2020-01-08 13:20:02 -06:00
George Bosilca
9330dc2a42 Swap the 2 fields to maintain the size of the struct.
Thanks @devreal for catching this.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 3de636dc6f)
2020-01-07 15:13:14 -08:00
George Bosilca
a1b4e697f5 Prevent overflow when dealing with datatype count.
This patch fixes #7147 by preventing overflow when multiplying
the count and the blocklen. The count reflects MPI count and is
therefore bound to the size of an int (it is an uint32_t) while the
blocklen can be merged together to represent the largest contiguous
memory layout and it is therefore promoted to a size_t.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 59fb02618e)
2020-01-07 15:13:14 -08:00