1
1
Граф коммитов

29947 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
0d1af43851
Merge pull request #7924 from hkuno/hkuno/cherry-pick_5655d64b
mpi/c: fix param checks in [I]Neighbor_alltoall{v,w}
2020-07-13 12:02:12 -07:00
Brian Barrett
4fd3cdc848
Merge pull request #7923 from jsquyres/pr/v4.1.x/disallow-when-cint-not-equal-to-finteger
v4.1.x: fortran.m4: disallow when sizeof(int) != sizeof(INTEGER)
2020-07-13 12:00:56 -07:00
Gilles Gouaillardet
79a737ca94 mpi/c: fix param checks in [I]Neighbor_alltoall{v,w}
do not check some input parameters when an {in,out}degree is zero

Thanks Junchao Zhang for analyzing and reporting this issue.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 5655d64bd3)
2020-07-10 16:58:52 -06:00
Jeff Squyres
94922937c2 fortran.m4: disallow when sizeof(int) != sizeof(INTEGER)
NOTE: This is intentionally not a cherry pick from master.  Instead,
this is a cherry-pick from the equivalent commit on the v4.0.x branch.
See below.

There is a problem with the mpi_f08 module when sizeof(int) !=
sizeof(INTEGER): the size of TYPE(MPI_Status) is too small.  This
causes buffer overruns when Open MPI is configured with (for example)
sizeof(int)==4 and sizeof(INTEGER)==8, and then you call the mpi_f08
MPI_RECV subroutine.  This will end up copying the resulting C
MPI_Status to the buffer pointing to the Fortran status, but the code
does not know if the Fortran status is an mpif.h status or a
TYPE(MPI_Status) -- it just blindly copies over as if the Fortran
status is an INTEGER array of length MPI_STATUS_SIZE.  Unfortunately,
TYPE(MPI_Status) is actually smaller than this, so we overrun the
buffer.  Hilarity ensues.

The simple fix for this is to make TYPE(MPI_Status) the same size as
INTEGER(MPI_STATUS_SIZE), but we can't do that here on the release
branch because it will break ABI.

This commit does the following:

- checks to see if we're in a sizeof(int) != sizeof(INTEGER) scenario
- if so, if the user has not specifically excluded building the
  mpi_f08 module, display a Giant Error Message (GEM) and abort
  configure.

This is unusual; we don't usually abort configure when feature XYZ
can't be built -- if the user didn't specifically ask for XYZ, we
just emit a notice that we won't build XYZ and continue.

This situation is a little different because we're on a release
branch: prior releases have built mpi_f08 by default -- even in this
"bad" scenario.  Hence, in this case, we explicitly tell the user that
this is now a known-bad scenario and abort.  In the GEM, we give the
user two options:

1. Change their compiler flags so that sizeof(int) == sizeof(INTEGER)
   and re-run configure, or
2. Explicitly disable the mpi_f08 module via --enable-mpi-fortran=usempi

Thanks to @ahaichen for reporting the issue.

Note: the proper fix has been implemented on master (i.e., what will
become v5.0.0), but since that breaks ABI, we can't cherry pick it
back here to an existing release branch series. Hence, we
cherry-picked this fix from the v4.0.x branch.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 27836a614b9c29d7636cdf1a9b838b1532281a8a)
2020-07-10 14:39:34 -07:00
Brian Barrett
64c5e2158c
Merge pull request #7911 from bwbarrett/dist/v4.1.x
Prep for v4.1.0rc1 release
2020-07-06 14:15:20 -07:00
Brian Barrett
9dc3a9e85d dist: Update NEWS file for 4.1.0rc1
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-07-06 19:35:45 +00:00
Brian Barrett
55eab422b5 dist: Move version to 4.1.0rc1
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-07-06 19:35:38 +00:00
Jeff Squyres
80eebbee58
Merge pull request #7896 from rhc54/cmr41/rk
v4.1.x: Increment the vpid after assignment
2020-07-06 13:38:32 -04:00
Jeff Squyres
d17685a6c9
Merge pull request #7890 from tkordenbrock/topic/v4.1.x/portals4.call-pml-add_procs
v4.1.x: mtl-portals4: use the active PML to call add_procs()
2020-07-06 07:34:41 -04:00
Jeff Squyres
13b7844513
Merge pull request #7881 from cniethammer/uct-supported-version-update
v4.1.x: Accept UCX 1.8 in configure of btl/uct
2020-07-06 07:34:17 -04:00
Jeff Squyres
a878569386
Merge pull request #7895 from tkordenbrock/topic/v4.1.x/portals4.fix-inappropriate-use-of-abort
v4.1.x: portals4: fix inappropriate use of abort() in mtl-portals4 and coll-portals4 components
2020-07-06 07:32:32 -04:00
Ralph Castain
12468349d2
Increment the vpid after assignment
Fix the rank-by operation

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-06-30 06:54:04 -07:00
Jeff Squyres
bc6587d3fa
Merge pull request #7873 from devreal/osc-ucx-rget-rput-fetch-alignment-v4.1.x
OSC UCX: make sure no-op fetch in rget/rput is properly aligned (v4.1.x)
2020-06-29 15:20:12 -04:00
Jeff Squyres
249c57a4bc
Merge pull request #7889 from devreal/osc-rdma-noncontig-requests-v4.1.x
osc rdma: check for outstanding fragments before completing a request (II) (v4.1.x)
2020-06-29 15:19:49 -04:00
Todd Kordenbrock
20f9ed98f2 mtl-portals4: replace abort() with ompi_rte_abort()
coll-portals4: replace abort() with ompi_rte_abort()

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
(cherry picked from commit 04b94637dd)
2020-06-29 10:06:12 -05:00
Todd Kordenbrock
540b14fc32 Use the active PML to call add_procs()
ompi_mtl_portals4_get_endpoint() was incorrectly making a direct
call to ompi_mtl_portals4_add_procs().  Instead use the actve PML
to call add_procs().  If add_procs() fails, call ompi_rte_abort()
to terminate the job.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>

(cherry picked from commit 0a637967fa)
2020-06-29 09:55:23 -05:00
Joseph Schuchart
2d3f862f1d osc rdma: check for outstanding fragments before completing a request in ompi_osc_rdma_put_complete_flush as well
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit caed3b2eed)
2020-06-29 15:44:44 +02:00
Jeff Squyres
220f9ed99f
Merge pull request #7885 from jsquyres/pr/v4.1.x/die-enable-install-libpmix-die-die-die
v4.1.x: pmix3x: Remove --enable-install-libpmix option
2020-06-28 10:41:37 -04:00
Jeff Squyres
b4106e94e1 pmix3x: Remove --enable-install-libpmix option
This option is problematic, and has never worked in an Open MPI v4.0.x
release tarball.  Given that PMIx is now available elsewhere, it isn't
worth fixing this option.

See https://github.com/open-mpi/ompi/issues/6228 for more detail.

NOTE: This is a v4.0.x-specific commit because this option no longer
exists on master because we deleted the entire pmix3x component.
Hence, it's not possible to cherry-pick anything from master back to
the v4.0.x branch.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 447b14061880e218371f9eb0cbe427b8358d45b8)
2020-06-27 10:01:23 -07:00
Brian Barrett
a6d97e2d6d
Merge pull request #7815 from raafatfeki/topic/ime
Topic/ime: Bring over IME component from master to v4.1
2020-06-26 12:40:35 -07:00
Christoph Niethammer
ad1d427d60 Accept UCX 1.8 in configure of btl/uct
The configure script for the btl uct component reports an error for
the new UCX 1.8.0 versions as it was fixed up to UCX 1.7.

This fixes #7612

Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
(cherry picked from commit 9b10f46126)
2020-06-26 21:20:49 +02:00
Brian Barrett
25abbb219a
Merge pull request #7808 from bwbarrett/backports/v4.1.x-collectives-updates
Backport Collective changes from master to v4.1.x
2020-06-26 12:01:51 -07:00
raafatfeki
6e145188d9 fs/ime & fbtl/ime: Support of IME file system
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2020-06-26 12:26:51 -04:00
Jeff Squyres
7987a7f56e common_ofi: fix preprocessor macro typo
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f64c30e93c)
2020-06-26 07:56:53 -07:00
Brian Barrett
339ee6378a dist: Add Collectives backports to NEWS
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:07:26 +00:00
William Zhang
db6ed187b2 coll/tuned: Add NULL check to prevent segfault
Signed-off-by: William Zhang <wilzhang@amazon.com>

cr https://code.amazon.com/reviews/CR-23837553

(cherry picked from commit 771f9c011d)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
William Zhang
03758b1ef7 coll/tuned: Fix typos
Signed-off-by: William Zhang <wilzhang@amazon.com>
(cherry picked from commit 50640402ab)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Brinskii
7eb94164a0 COLL/TUNED: Add linear scatter using isend for mlnx platform
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit f2cbd4806e)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Gilles Gouaillardet
221fad6862 coll/cuda: remove unnecessary references to ORTE
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 531171ca50)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Tomislav Janjusic
f51bd8ca0c Coll/hcoll: adding scatterv interface
Signed-off-by: Valentin Petrov valentinp@mellanox.com
(cherry picked from commit 6ea920e225)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Alex Anenkov
2891a23329 coll/libnbc: add recursive doubling algorithm for MPI_Iallreduce
Signed-off-by: Alex Anenkov <anenkov.ru@gmail.com>
(cherry picked from commit 77d466edf3)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
ba11f31fc8 coll/libnbc: remove debug output
1. Remove debug output in iallgather (I have forgotten to remove it).
2. Remove an incorrect comment in description of ibcast

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 64abd0f405)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
bf1c8bb394 coll/libnbc/ireduce: silence Coverity warning CID 1440360
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 8b511c7889)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
5ee1fb62b9 coll/libnbc: add Rabenseifner's algorithm for MPI_Iallreduce
An implementation of R. Rabenseifner's algorithm for MPI_Iallreduce.

This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by an allgather.

Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 73e048b62a)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
George Bosilca
fd29cce114 Remove few warnings in libnbc identified by clang-1000.11.45.2
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 66182a294d)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
91a4b4c799 coll/libnbc: add recursive doubling algorithm for MPI_Iallgather
Implements recursive doubling algorithm for MPI_Iallgather.
The algorithm can be used only for power-of-two number of processes.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit a7386c1e09)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
6971dab943 coll/libnbc: add knomial tree algorithm for MPI_Ibcast
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit b0429d25df)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
a318f117f6 coll/libnbc: add Rabenseifner's algorithm for MPI_Ireduce
An implementation of R. Rabenseifner's algorithm for MPI_Ireduce.
This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by a gather.

Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 7bd63e79c8)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Brian Barrett
6f6d8180a3 coll libnbc: Remove dead code
Remove dead code that was causing warnings about unused static
functions.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 2e24e6ec08)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
de5e435dee coll/libnbc: add recursive doubling algorithm for MPI_Iexscan
Implements recursive doubling algorithm for MPI_Iexscan.
The algorithm preserves order of operations so it can be used both
by commutative and non-commutative operations.

The MCA parameter 'coll_libnbc_iexscan_algorithm' was added for dynamic
algorithm selection.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit dfe203e167)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
65990af3ad coll/libnbc: add recursive doubling algorithm for MPI_Iscan
Implements recursive doubling algorithm for MPI_Iscan. The algorithm preserves order of operations so it can be used both by commutative and non-commutative operations.

The MCA parameter coll_libnbc_iscan_algorithm was added for dynamic algorithm selection.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 3d43ff0f32)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Jeff Squyres
547fb3d933 libnbc: remove some stale/dead code
Gcc 8 identified hb_tree_csearch() as an infinite recursion, and it
turns out that we never call this function, anyway.  So just remove
it.

Fixes #5670.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 06c1bf73da)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Aurelien Bouteiller
2692840d40 Always return a valid error code from collective operations
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
(cherry picked from commit 466217fadd)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Gilles Gouaillardet
d9d84d5dd6 coll/libnbc: fix NBC_Unpack()
always initialize 'size'.

Only the a2a_sched_diss() alltoall algorithm is impacted,
and this algo is currently unused, so there is no need
to backport nor update the NEWS file for now.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit ff48e92864)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
ba221e1a08 coll/base/allgatherv: fix MPI_IN_PLACE processing
The call of MPI_Allgatherv with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit b45e190e66)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Jeff Squyres
465414953d
Merge pull request #7828 from edgargabriel/pr/v4.1.x-avg-fview-size
common/ompio: use avg. file view size in the aggregator selection logic
2020-06-25 11:45:48 -04:00
Brian Barrett
173142bf32
Merge pull request #7824 from hoopoepg/topic/ucx-test-external-events-v4.1
COMMON/UCX: improved missing events test - v4.1
2020-06-25 08:10:54 -07:00
Joseph Schuchart
d4219f8144 OSC UCX: make sure no-op fetch in rget/rput is properly aligned
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit c1f7776341)
2020-06-25 17:05:48 +02:00
Jeff Squyres
a3258afad9
Merge pull request #7814 from raafatfeki/topic/gpfs_4.1.x
fs/gpfs: Bring over GPFS component from master to v4.1
2020-06-25 10:47:39 -04:00
Jeff Squyres
470b1c518d
Merge pull request #7812 from hppritcha/topic/cobalt_for_orte
RAS:ALPS add support for ANL Cobalt
2020-06-25 10:40:57 -04:00