1
1
Граф коммитов

29772 Коммитов

Автор SHA1 Сообщение Дата
George Bosilca
0425a7a7d8 Consistent return from all progress functions.
This fix ensures that all progress functions return the number of
completed events.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 72501f8f9c)
2020-03-30 19:00:03 +02:00
Joseph Schuchart
7b1beb0f6c Harmonize return values of progress callbacks
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 2c97187ee0)
2020-03-30 18:58:57 +02:00
Howard Pritchard
34c4f934e1
Merge pull request #7536 from rhc54/cmr40/hwloc
Support hwloc retrieval using legacy key
2020-03-23 08:57:32 -06:00
Ralph Castain
898b4f2210
Support hwloc retrieval using legacy key
Re-enable support for Slurm plugin using earlier PMIx_LOCAL_TOPO key

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-03-17 08:48:04 -07:00
Geoff Paulsen
e40b5edf68
Merge pull request #7527 from artpol84/topic/v4.0.x/timings_update
timings: Update/extend OSHMEM timings
2020-03-16 14:42:09 -05:00
Artem Polyakov
e5cdf2612a timings: Update/extend OSHMEM timings
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
(cherry picked from commit 0f51ea3fe5)
2020-03-11 21:05:34 -07:00
Artem Polyakov
253502b1b1 timings: Fix timings when 'prefix' is used
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
(cherry picked from commit 7c17a38c96)
2020-03-11 21:05:30 -07:00
Howard Pritchard
65219ebaa6
Merge pull request #7415 from hjelmn/v4.0.x_osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap
v4.0.x: osc/rdma: modify attach to check for region overlap
2020-03-07 11:00:10 -07:00
Geoff Paulsen
734c5107bc
Merge pull request #7508 from awlauria/v4.0.x_fix_btl_vader_osc_sm_segv
v4.0.x: Fix segv in btl/vader.
2020-03-06 11:03:08 -06:00
Austen Lauria
f7979fbc82 Fix segv in btl/vader.
Keep track of the connected procs in vader_add_procs().
Otherwise, the same rank will reconnect the same shmem
segment (rank 0+...) multiple times instead of the next
one as intended.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit f69c8d6819)
2020-03-06 11:21:24 -05:00
Howard Pritchard
e00fc61dcf
Merge pull request #7457 from artemry-mlnx/artemry-mlnx/reduce_mellanox_ci_time_v4
Mellanox Open MPI CI: optimized git checkout step to reduce CI duration (v4.0.x)
2020-03-06 08:38:54 -07:00
Geoff Paulsen
8b4a8cd34c
Merge pull request #7482 from hppritcha/topic/news_update_for_4.0.3rc4
news update for 4.0.3rc4
2020-02-27 16:59:12 -06:00
Howard Pritchard
faa1bdc8c8
Merge pull request #7465 from artemry-mlnx/artemry-mlnx/disable-per-commit-ci-v4.0.x
Disabled Mellanox Open MPI per-commit CI (as redundant) - v4.0.x
2020-02-26 20:55:04 -07:00
Howard Pritchard
0fce596ffb
Merge pull request #7467 from tjahns/v4.0.x
v4.0.x: Fix incorrect argument in manual page.
2020-02-26 20:54:00 -07:00
Geoff Paulsen
a4b2e92b91
Merge pull request #7468 from artemry-mlnx/artemry-mlnx/revert-test-behavior-for-v4.0.x
Use Mellanox CI tests specific for Open MPI v4.0.x
2020-02-26 14:30:00 -06:00
Howard Pritchard
c5a2bc8f81 news update for 4.0.3rc4
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-02-26 12:51:46 -07:00
Howard Pritchard
7a989fe33f
Merge pull request #7478 from yanagibashi/pr/v4.0.x/fix-info-key-object
v4.0.x: osc/sm: fix typo and minor correction
2020-02-26 11:45:06 -07:00
Geoff Paulsen
2a3ec874f4
Merge pull request #7463 from gpaulsen/topic/v4.0.x/update_pmix_to_v3.1.5
Updating OMPI v4.0.x to PMIx v3.1.5 released
2020-02-26 08:46:29 -06:00
Geoff Paulsen
d94feecc56
Merge pull request #7462 from edgargabriel/pr/individual-as-dummy-module-v4.0.x
sharedfp/individual: defer error when not being able to open datafile
2020-02-26 08:43:06 -06:00
Tsubasa Yanagibashi
a2c850e02d osc/sm: fix typo and minor correction
- fix a typo `alloc_shared_contig` to `alloc_shared_noncontig`
- correct the value of `blocking_fence`

Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
(cherry picked from commit a07a83d189)
2020-02-26 10:54:08 +09:00
Artem Ryabov
7dbf41969c Use Mellanox CI tests specific for Open MPI v4.0.x
Signed-off-by: Artem Ryabov <artemry@mellanox.com>
2020-02-25 14:28:49 +03:00
Thomas Jahns
1392fcd51e Fix incorrect argument in manual page.
Signed-off-by: Thomas Jahns <jahns@dkrz.de>
2020-02-25 09:20:58 +01:00
Artem Ryabov
00c3cc143d Disabled Mellanox Open MPI per-commit CI (as redundant).
The CI is triggered only upon a PR creation or by special PR comments.

Signed-off-by: Artem Ryabov <artemry@mellanox.com>
2020-02-25 01:01:37 +03:00
Geoffrey Paulsen
11d79d1f6e Updating OMPI v4.0.x to PMIx v3.1.5 released
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-02-24 12:13:20 -05:00
Edgar Gabriel
e14e84aceb sharedfp/individual: defer error when not being able to open datafile
This commit changes the behavior of the individual sharedfp component. If
the component cannot create either the datafile or the metadatafile during File_open,
no error is being raised going forward. This allows applications that do not use shared
file pointer operations to continue execution without any issue.

If the user however subsequently calls MPI_File_write_shared or similar operations, an error
will be raised.

Fixes issue #7429

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit df6e3e503a)
2020-02-24 08:39:53 -06:00
Geoff Paulsen
625f27a715
Merge pull request #7438 from devreal/shmwin_contig_v4.0.x
Correctly set baseptr in contiguous shared memory window with local size zero (v4.0.x)
2020-02-24 07:14:59 -06:00
Artem Ryabov
d5434545a9 Mellanox Open MPI CI: optimized git checkout step to reduce CI duration
Signed-off-by: Artem Ryabov <artemry@mellanox.com>
2020-02-23 00:42:43 +03:00
Ralph Castain
af72065ca7
Merge pull request #7351 from artemry-mlnx/artemry/enable_mlnx_ci_for_release_branches_v4
Enabled Mellanox CI for release branches (changes for v4.0.x branch).
2020-02-22 13:31:33 -08:00
Joseph Schuchart
08da2f5ea5 Correctly set baseptr in contiguous shared memory window with local size zero
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 06bbcf4fd6)
2020-02-20 20:46:29 +01:00
Nathan Hjelm
f4bc0f46d6 osc/rdma: bump the default max dynamic attachments to 64
This commit increaes the osc_rdma_max_attach variable from 32
to 64. The new default is kept low due to the small number
of registration resources on some systems (Cray Aries). A
larger max attachement value can be set by the user on other
systems.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 54c8233f4f)
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-17 13:49:36 -08:00
Nathan Hjelm
eeb6821550 osc/rdma: modify attach to check for region overlap
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References #7384

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 6649aef8bd)
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-17 13:45:55 -08:00
Howard Pritchard
43ecbb1734
Merge pull request #7392 from awlauria/pgcc18_v4.0.x
v4.0.x: Fix pgcc18 support.
2020-02-14 14:16:42 -06:00
Austen Lauria
ff6b068b93 Fix pgcc18 support.
- pgcc18 defines __GNUC__ similar to Intel compilers. So we must
  check for pgi higher up, or else configury will mistake
  it for gcc.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit 14785deb3c)
2020-02-12 15:11:03 -05:00
Geoff Paulsen
a1259e6a14
Merge pull request #7356 from hppritcha/topic/pr7201_to_v40x
Topic/pr7201 to v40x
2020-02-12 14:06:31 -06:00
Brice Goglin
f136804c45 hwloc/base: fix opal proc locality wrt to NUMA nodes on hwloc 1.11
Build was broken by mistake in commit d40662edc41a5a4d09ae690b640cfdeeb24e15a1

Fixes #7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 907ad854b4)
2020-02-11 18:37:18 -06:00
Geoff Paulsen
3894e5760c
Merge pull request #7380 from gpaulsen/topic/v4.0.x/VERSION_v4.0.3rc4
Reving to VERSION v4.0.3rc4
2020-02-11 18:28:45 -06:00
Howard Pritchard
eddb0ef626
Merge pull request #7382 from gpaulsen/topic/v4.0.x/pmix_v3.1.5rc2
Adding PMIx v3.1.5rc2
2020-02-11 10:05:38 -06:00
Geoffrey Paulsen
81ad9bfdb6 Adding PMIx v3.1.5rc2
Adding PMIx v3.1.5rc2 from:
  https://github.com/openpmix/openpmix/releases/tag/v3.1.5rc2

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-02-10 17:05:53 -06:00
Geoffrey Paulsen
aff4fa6c8f Reving to VERSION v4.0.3rc4
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-02-10 15:56:28 -06:00
Brice Goglin
6702a4febb opal/hwloc: remove some unused variables when building with hwloc < 1.7
Refs #7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit 329d4451a6)
2020-02-10 15:54:48 -06:00
Geoff Paulsen
d79fe7fe10
Merge pull request #7376 from hoopoepg/topic/oshmem-inc-max-segments-v4.0
OSHMEM/SEGMENTS: increase max number of segments - v4.0
2020-02-10 15:52:24 -06:00
Sergey Oblomov
45a722ad6a OSHMEM/SEGMENTS: increase number of max segments
- increase number of max segments to allow application be launched
  on some Ubuntu configurations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit f742f289ea)
2020-02-10 07:44:50 +02:00
Howard Pritchard
42acf4fe6f
Merge pull request #7360 from jsquyres/pr/v4.0.x/fortran-you-win-again
v4.0.x: Fortran fixes
2020-02-09 10:02:04 -06:00
Jeff Squyres
fbeebdb9a0 fortran: ensure not to use [AM_]CPPFLAGS
Automake's Fortran compilation rules inexplicably use CPPFLAGS and
AM_CPPFLAGS.  Unfortunately, this can cause problems in some cases
(e.g., picking up already-installed mpi.mod in a system-default
include search path).

So in relevant module-using Fortran compilation Makefile.am's, zero
out CPPFLAGS and AM_CPPFLAGS.

This has a side-effect of requiring that we compile the one .c file in
the F08 library in a new, separate subdirectory (with its own
Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit ab398f4b9a)
2020-02-04 05:15:40 -08:00
Jeff Squyres
85ce373730 fortran: remove useless CPPFLAGS assignment
These -D's are for C compilation, not Fortran compilation.  Remove
this useless statement.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f4a47a5a8e)
2020-02-04 04:26:11 -08:00
Howard Pritchard
bed0ce70a7 fix a problem with opal_asprintf
not being defined.

related to #7201

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-02-03 14:22:59 -07:00
Brice Goglin
82567996f7 hwloc/base: fix opal proc locality wrt to NUMA nodes on hwloc 2.0
Both opal_hwloc_base_get_relative_locality() and _get_locality_string()
iterate over hwloc levels to build the proc locality information.
Unfortunately, NUMA nodes are not in those normal levels anymore since 2.0.
We have to explicitly look a the special NUMA level to get that locality info.

I am factorizing the core of the iterations inside dedicated "_by_depth"
functions and calling them again for the NUMA level at the end of the loops.

Thanks to Hatem Elshazly for reporting the NUMA communicator split failure
at https://www.mail-archive.com/users@lists.open-mpi.org/msg33589.html

It looks like only the opal_hwloc_base_get_locality_string() part is needed
to fix that split, but there's no reason not to fix get_relative_locality()
as well.

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
(cherry picked from commit ea80a20e10)
2020-02-03 13:29:41 -07:00
Howard Pritchard
a26cd349b9
Merge pull request #7355 from jsquyres/pr/v4.0.x/fortran-sentinel-linker-black-magic
v4.0.x: Make C and Fortran types for MPI sentinels agree in size
2020-02-03 13:25:29 -07:00
Fangrui Song
02f3795299 Make C and Fortran types for MPI sentinels agree in size
Fix the C types for the following:

* MPI_UNWEIGHTED
* MPI_WEIGHTS_EMPTY
* MPI_ARGV_NULL
* MPI_ARGVS_NULL
* MPI_ERRCODES_IGNORE

There is lengthy discussion on
https://github.com/open-mpi/ompi/pull/7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree).  We erroneously had several of these array-like sentinel
values be "array-like" values in C.  E.g., MPI_ERRCODES_IGNORE was an
(int *) in C while its corresponding Fortran type was "integer,
dimension(1)".  On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int*)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.

That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.

(**) NOTE: This code has been wrong for years.  This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the *addresses* of these instances,
not their *types* (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same.  But it caused linker warning messages,
and even caused errors in some cases.

Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.

This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):

  % mpifort -fuse-ld=lld /dev/null
  ld.lld: error: corrupt input file: version definition index 0 for symbol
  mpi_fortran_argv_null_ is out of bounds
  >>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so
  ...

If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.

This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c0073dd).

Fixes open-mpi/ompi#7209

Signed-off-by: Fangrui Song <i@maskray.me>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 5609268e90)
2020-02-02 13:57:50 -08:00
Geoff Paulsen
2f42a125be
Merge pull request #7352 from hppritcha/topic/minor_news_update_v4.0.x
NEWS: tweak for v4.0.3 release
2020-01-31 13:46:48 -06:00