1
1

10380 Коммитов

Автор SHA1 Сообщение Дата
Sergey Oblomov
7c621acf1b OPAL/UCX: enabling new API provided by UCX
- added detection of new API into configuration
- added tag_send call implemented using new API
- added MPI_Send/MPI_Isend/MPI_Recv/MPI_Irecv implementations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 75bda25ddbeea18bb001f367a712dc72592e1e58)
2020-05-04 10:02:10 +03:00
Geoff Paulsen
3bd394ee9f
Merge pull request #7619 from jsquyres/pr/v4.0.x/fix-mpit-err-codes
v4.0.x: Follow the MPI_T guidelines on return errors.
2020-04-24 14:02:03 -05:00
Joseph Schuchart
2785cbbc04 OSC base: fix typos in documentation
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 07d1011afeb59d6ccaf401b30f6af7403da83f1f)
2020-04-24 09:20:05 -06:00
Joseph Schuchart
a0cdd9f597 OSC base: do not retain datatype by default
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 154cf571b6a6996ff6ba9e0860f5d6a6fcf2d425)
2020-04-24 09:19:51 -06:00
Joseph Schuchart
2503b5f10f RDMA osc: remove extra retain on pending_op
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit de67ada44251b2792f3fef19d890dc7aad39dd73)
2020-04-24 09:16:57 -06:00
Geoff Paulsen
ed358e5df2
Merge pull request #7622 from ggouaillardet/topic/v4.0.x/restore_f08_compatibility
fortran/use-mpi-f08: restore ABI compatibility
2020-04-20 14:17:53 -05:00
Ralph Castain
299aa6f31e
Fix intercomm operations
The locality for remote procs is not provided as it is only a local
concept. Thus, you must always use modex_recv_optional to ensure you
don't hang waiting for a response until dmodex times out.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-04-15 17:08:21 -07:00
Gilles Gouaillardet
dfd34eb4e3 fortran/use-mpi-f08: restore ABI compatibility
An incorrect backport in open-mpi/ompi#7360 removed
constants.c from ompi/mpi/fortran/use-mpi-f08/base/Makefile.am

This one off commit fixes that, and move constants.h from
ompi/mpi/fortran/use-mpi-f08 to ompi/mpi/fortran/use-mpi-f08/base

Fixes open-mpi/ompi#7616

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-04-14 10:25:20 +09:00
George Bosilca
a073ef9ee8 Follow the MPI_T guidelines on return errors.
As indicated in the MPI3.2 document 14.3.10 page 599 line 1, the only
MPI error code possible is MPI_SUCCESS. All other errors must be in the
error class MPI_T_ERR*.
Fix the return of few pvar/cvar function that failed to correctly
convert to an MPI error code.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit f4af1848c9e5ea962a560e1f0b6bdab3d8ef0b6a)
2020-04-12 14:13:20 -04:00
Geoff Paulsen
b1eb3d7530
Merge pull request #7579 from devreal/progress-returns-v4.0.x
Harmonize return values of progress callbacks (v4.0.x)
2020-04-03 13:37:56 -05:00
George Bosilca
0425a7a7d8 Consistent return from all progress functions.
This fix ensures that all progress functions return the number of
completed events.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 72501f8f9c37b8db8dd08b274abd9774687a60cb)
2020-03-30 19:00:03 +02:00
Joseph Schuchart
7b1beb0f6c Harmonize return values of progress callbacks
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 2c97187ee05e592346206a697ca3d9531d600fcc)
2020-03-30 18:58:57 +02:00
Mikhail Kurnosov
d7857d000a Fix Bcast scatter_allgather (issue #7410)
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 66b6b8d34e9bb50d34145096a9a2b210290510ca)
2020-03-30 22:07:46 +07:00
Artem Polyakov
253502b1b1 timings: Fix timings when 'prefix' is used
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
(cherry picked from commit 7c17a38c96db6da1c28d6c46d9293124dee6a23c)
2020-03-11 21:05:30 -07:00
Howard Pritchard
65219ebaa6
Merge pull request #7415 from hjelmn/v4.0.x_osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap
v4.0.x: osc/rdma: modify attach to check for region overlap
2020-03-07 11:00:10 -07:00
Howard Pritchard
0fce596ffb
Merge pull request #7467 from tjahns/v4.0.x
v4.0.x: Fix incorrect argument in manual page.
2020-02-26 20:54:00 -07:00
Howard Pritchard
7a989fe33f
Merge pull request #7478 from yanagibashi/pr/v4.0.x/fix-info-key-object
v4.0.x: osc/sm: fix typo and minor correction
2020-02-26 11:45:06 -07:00
Tsubasa Yanagibashi
a2c850e02d osc/sm: fix typo and minor correction
- fix a typo `alloc_shared_contig` to `alloc_shared_noncontig`
- correct the value of `blocking_fence`

Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
(cherry picked from commit a07a83d1899f8b65bbcdb71e681f4ca8f2109375)
2020-02-26 10:54:08 +09:00
Thomas Jahns
1392fcd51e Fix incorrect argument in manual page.
Signed-off-by: Thomas Jahns <jahns@dkrz.de>
2020-02-25 09:20:58 +01:00
Edgar Gabriel
e14e84aceb sharedfp/individual: defer error when not being able to open datafile
This commit changes the behavior of the individual sharedfp component. If
the component cannot create either the datafile or the metadatafile during File_open,
no error is being raised going forward. This allows applications that do not use shared
file pointer operations to continue execution without any issue.

If the user however subsequently calls MPI_File_write_shared or similar operations, an error
will be raised.

Fixes issue #7429

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit df6e3e503aee6954807a6bdc3a73dfa9a7d030af)
2020-02-24 08:39:53 -06:00
Joseph Schuchart
08da2f5ea5 Correctly set baseptr in contiguous shared memory window with local size zero
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 06bbcf4fd63dd184cf22f8bcad007c4b8b991a3c)
2020-02-20 20:46:29 +01:00
Nathan Hjelm
f4bc0f46d6 osc/rdma: bump the default max dynamic attachments to 64
This commit increaes the osc_rdma_max_attach variable from 32
to 64. The new default is kept low due to the small number
of registration resources on some systems (Cray Aries). A
larger max attachement value can be set by the user on other
systems.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 54c8233f4f670ee43e59d95316b8dc68f8258ba0)
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-17 13:49:36 -08:00
Nathan Hjelm
eeb6821550 osc/rdma: modify attach to check for region overlap
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References #7384

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 6649aef8bde750d2f1f28e52ff295919face3e0b)
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-17 13:45:55 -08:00
Jeff Squyres
fbeebdb9a0 fortran: ensure not to use [AM_]CPPFLAGS
Automake's Fortran compilation rules inexplicably use CPPFLAGS and
AM_CPPFLAGS.  Unfortunately, this can cause problems in some cases
(e.g., picking up already-installed mpi.mod in a system-default
include search path).

So in relevant module-using Fortran compilation Makefile.am's, zero
out CPPFLAGS and AM_CPPFLAGS.

This has a side-effect of requiring that we compile the one .c file in
the F08 library in a new, separate subdirectory (with its own
Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit ab398f4b9a340b54a88b83021b66911fe46d5862)
2020-02-04 05:15:40 -08:00
Jeff Squyres
85ce373730 fortran: remove useless CPPFLAGS assignment
These -D's are for C compilation, not Fortran compilation.  Remove
this useless statement.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f4a47a5a8e4e3f2c902807d75e211f7f500f802b)
2020-02-04 04:26:11 -08:00
Fangrui Song
02f3795299 Make C and Fortran types for MPI sentinels agree in size
Fix the C types for the following:

* MPI_UNWEIGHTED
* MPI_WEIGHTS_EMPTY
* MPI_ARGV_NULL
* MPI_ARGVS_NULL
* MPI_ERRCODES_IGNORE

There is lengthy discussion on
https://github.com/open-mpi/ompi/pull/7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree).  We erroneously had several of these array-like sentinel
values be "array-like" values in C.  E.g., MPI_ERRCODES_IGNORE was an
(int *) in C while its corresponding Fortran type was "integer,
dimension(1)".  On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int*)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.

That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.

(**) NOTE: This code has been wrong for years.  This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the *addresses* of these instances,
not their *types* (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same.  But it caused linker warning messages,
and even caused errors in some cases.

Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.

This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):

  % mpifort -fuse-ld=lld /dev/null
  ld.lld: error: corrupt input file: version definition index 0 for symbol
  mpi_fortran_argv_null_ is out of bounds
  >>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so
  ...

If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.

This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c0073dd11fb90d18122c521686f692a32cdd8).

Fixes open-mpi/ompi#7209

Signed-off-by: Fangrui Song <i@maskray.me>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 5609268e90cb0ff7b2431d29041c10a700fd6996)
2020-02-02 13:57:50 -08:00
Howard Pritchard
1cdcce7f89
Merge pull request #7296 from michaellass/v4.0.x-fix-dims_create
dims_create: fix calculation of factors for odd squares (v4.0.x)
2020-01-14 09:10:40 -07:00
Geoff Paulsen
3da939b124
Merge pull request #7248 from wckzhang/v4.0.x
MTL/OFI: Check threshold number of peers allowed per rank
2020-01-13 14:19:51 -06:00
Michael Lass
ff85c82151 dims_create: fix calculation of factors for odd squares
Until now sqrt(n) was missed as a factor for odd square numbers n. This
lead to suboptimal results of MPI_Dims_create for input numbers like 9,
25, 49, ... Fix the results by including sqrt(n) in the search for
factors.

Refs: #7186

Signed-off-by: Michael Lass <bevan@bi-co.net>
(cherry picked from commit 67490118adb8372d2aefe1d2d923432e51e100cd)
2020-01-10 16:07:40 +01:00
Howard Pritchard
e06595d7f6 lustre: squash some compiler warnings
Compiling OMPI on cray systems using latest Cray compilers (clang based)
yielded some compiler warnings from ompio/lustre.  Squash these warnings.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit e66a7cef11c51f64b7766080e1cef34b1395c4da)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 16:00:01 -05:00
Geoff Paulsen
93c879962e
Merge pull request #7168 from wbailey2/pr/fix-yield_when_idle
v4.0.x: schizo/ompi: correctly handle the yield_when_idle option
2020-01-03 14:06:36 -06:00
Robert Wespetal
47c435e531 mtl/ofi: ignore case when comparing provider names
Change the provider include and exclude list name comparison check to
ignore case. The UDP provider's name is uppercase and was being selected
despite being in the exclude list.

Signed-off-by: Robert Wespetal <wesper@amazon.com>
(cherry picked from commit 9b72e9465da3f2891ac13ed0443db44136506a1a)
2020-01-03 08:52:24 -08:00
Howard Pritchard
6a739f8357
Merge pull request #7243 from jsquyres/pr/v4.0.x/neighbor-alltoall-fix
v4.0.x: neighbor alltoall fix
2019-12-23 13:40:28 -07:00
Aravind Gopalakrishnan
1bee429a8d MTL/OFI: Check threshold number of peers allowed per rank
When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have
sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when
this limit is crossed.

Check the max allowed number of ranks during add_procs() and return if there is
danger of exceeding this threshold.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 5cf43de44538b818b014bdad0490439e2d212395)
2019-12-19 22:36:43 +00:00
Howard Pritchard
f6914ee35c
Merge pull request #7229 from tkordenbrock/topic/v4.0.x/portals4.fix.flowcontrol.bugs
v4.0.x: portals4: fix flow control bugs
2019-12-18 08:35:46 -07:00
George Bosilca
be58cf7982 Fix the communication ordering for all cartesian neighbor collectives.
This work is rooted in the [MPI Forum issue
153](https://github.com/mpi-forum/mpi-issues/issues/153).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 86acdee4606c1ac3b38070d1b7973a00a991f1d6)
2019-12-17 14:25:22 -08:00
Nathan Hjelm
21221eb70a coll/basic: fix neighbor alltoall message ordering
This commit updates the coll/basic component to correctly order sends
and receives for cartesian communicators with cyclic boundaries. This
addresses an issue identified by mpi-forum/mpi-issues#153. This issue
occurs when the size in any dimension is 1. This gives the same
neighbor in the positive and negative directions. The old code was
sending and receiving in the same order so the -1 buffer contained
the +1 result and vise-versa. The problem is addressed by using
unique tags for each send. This should cover both the case where
overtaking is allowed and is not allowed. The former case will be
possible is a MPI_Cart_create_with_info() call is added to the
standard.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 196a91e604885d7aae9ac9dfbd9b2e846b3015b7)
2019-12-17 14:25:22 -08:00
Howard Pritchard
3f752f1d4f
Merge pull request #7237 from mcoil1/pr/v4.0.x/wbailey2-fixes
v4.0.x/two fixes
2019-12-17 09:22:07 -07:00
William Bailey
c01a71fbe9 romio: fix uninitialized variable
Squash compiler warning.

ROMIO is third-party software but has an annoying compiler warning;
this is the minimum distance fix.

Signed-off-by: William Bailey <wbailey2@nd.edu>
(cherry picked from commit 30bda56bcef6f56823ac07f0418fd33e1eff837f)
2019-12-14 17:18:57 -05:00
Maxwell Coil
6fdd902d3f romio: Update ADIOI_R_Exchange_data function
Squash compiler warning due to whitespace/brace problems.

The code block from lines 829-839 was improperly indented, which led to
both the code being confusing and a compiler warning. Comparing this code to
the current version in the MPICH repo made it clear that the code was simply
improperly indented. Fixing the indentation both makes the code readable and
squashes the compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
(cherry picked from commit 8c237e268472a763ad4aa55e24d25ee7ca64b888)
2019-12-14 12:25:51 -05:00
Maxwell Coil
84a67bd6cf libnbc: fixed uninitialized variable
Squash compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
(cherry picked from commit 52241dbbcdcbf3605c8098d0cfbcf3c5a75a1c9c)
2019-12-14 12:25:18 -05:00
Maxwell Coil
879a25c239 ompi/dpm/dpm.c: Fix uninititalized variable
Squash compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
(cherry picked from commit 3ced33c2ebf403206c242c89bb70b6b9a0fc8968)
2019-12-14 12:24:57 -05:00
Howard Pritchard
59d8d62555
Merge pull request #7116 from ggouaillardet/topic/v4.0.x/f08_bind_c_constants_revamp
v4.0.x: fortran/use-mpi-f08: revamp mpi_f08 constants
2019-12-13 08:07:42 -07:00
Todd Kordenbrock
1f5a79bbd4 mtl-portals4: don't finalize flow control if Portals4 was not initialized
This commit fixes a segfault in mtl-portals4 finalize().  The segfault
occurs if finalize() is called without any calls to add_procs().  This
commit resolves the segfault by skipping the flow control fini() call if
Portals4 was not initialized.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
(cherry picked from commit e7b867c044f8b776b75f3c6d917745c06237743e)
2019-12-12 08:43:20 -06:00
William Bailey
71fe9d78e0 fcoll/two_phase: Compiler warning for wrong variable type used
Squash compiler warning. Changed output specifier to match variable type (long int -> long long int).

Signed-off-by: William Bailey <wbailey2@nd.edu>
(cherry picked from commit e2718e01961782bffeef0bdfdb19ea286da0db2a)
2019-12-08 14:15:29 -05:00
Gilles Gouaillardet
b004f4c391 schizo/ompi: correctly handle the yield_when_idle option
in schizo/ompi, sets the new OMPI_MCA_mpi_oversubscribe environment
variable according to the node oversubscription state.

This MCA parameter is used to set the default value of the
mpi_yield_when_idle parameter.

This two steps tango is needed so the mpi_yield_when_idle setting
is always honored when set in a config file.

Refs. open-mpi/ompi#6433

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry-picked from cc97c0f6116ae62feef)
2019-12-02 17:18:26 -05:00
Howard Pritchard
d4dd837a3c
Merge pull request #7194 from edgargabriel/pr/two-phase-aggr-calc-32bits-bug-v4.0.x
fcoll/two_phase: fix error in calculating aggregators in 32bit mode
2019-11-27 12:20:34 -07:00
Edgar Gabriel
02da54c174 fcoll/two_phase: fix error in calculating aggregators in 32bit mode
In fcoll_two_phase_supprot_fns.c: calculation of the aggregator index
failed for large offsets on 32bit machine, due to improper handling of
64bit offsets.

Fixes Issue #7110

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit ea1355beae918b3acd67d5c0ccc44afbcc5b7ca9)
2019-11-25 09:06:36 -06:00
Edgar Gabriel
39acc3a251 common/ompio: fix calculation in simple-grouping option
This is based on a bug reported on the mailing list using a netcdf testcase.
The problem occurs if processes are using a custom file view, but on some
of them it appears as if the default file view is being used. Because of that,
the simple-grouping option lead to different number of aggregators used on different
processes, and ultimately to a deadlock. This patch fixes the problem by not using
the file_view size anymore for the calculation in the simple-grouping option,
but the contiguous chunk size (which is identical on all processes).

Fixes issue #7109

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit ad5d0df4e91d66efa5b69e8388f7ed0b4b6a1d09)
2019-11-25 09:04:13 -06:00
Gilles Gouaillardet
02c79ac0c8 fortran/use-mpi-f08: misc fixes
- fix typos from open-mpi/ompi@b10a60a5a9
 - remove remaining references to OMPI_PROTECTED from open-mpi/ompi@df6d763a53

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@fda4d040da)
2019-11-06 10:10:22 +09:00