1
1
Граф коммитов

29933 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
12468349d2
Increment the vpid after assignment
Fix the rank-by operation

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-06-30 06:54:04 -07:00
Jeff Squyres
bc6587d3fa
Merge pull request #7873 from devreal/osc-ucx-rget-rput-fetch-alignment-v4.1.x
OSC UCX: make sure no-op fetch in rget/rput is properly aligned (v4.1.x)
2020-06-29 15:20:12 -04:00
Jeff Squyres
249c57a4bc
Merge pull request #7889 from devreal/osc-rdma-noncontig-requests-v4.1.x
osc rdma: check for outstanding fragments before completing a request (II) (v4.1.x)
2020-06-29 15:19:49 -04:00
Joseph Schuchart
2d3f862f1d osc rdma: check for outstanding fragments before completing a request in ompi_osc_rdma_put_complete_flush as well
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit caed3b2eed)
2020-06-29 15:44:44 +02:00
Jeff Squyres
220f9ed99f
Merge pull request #7885 from jsquyres/pr/v4.1.x/die-enable-install-libpmix-die-die-die
v4.1.x: pmix3x: Remove --enable-install-libpmix option
2020-06-28 10:41:37 -04:00
Jeff Squyres
b4106e94e1 pmix3x: Remove --enable-install-libpmix option
This option is problematic, and has never worked in an Open MPI v4.0.x
release tarball.  Given that PMIx is now available elsewhere, it isn't
worth fixing this option.

See https://github.com/open-mpi/ompi/issues/6228 for more detail.

NOTE: This is a v4.0.x-specific commit because this option no longer
exists on master because we deleted the entire pmix3x component.
Hence, it's not possible to cherry-pick anything from master back to
the v4.0.x branch.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 447b14061880e218371f9eb0cbe427b8358d45b8)
2020-06-27 10:01:23 -07:00
Brian Barrett
a6d97e2d6d
Merge pull request #7815 from raafatfeki/topic/ime
Topic/ime: Bring over IME component from master to v4.1
2020-06-26 12:40:35 -07:00
Brian Barrett
25abbb219a
Merge pull request #7808 from bwbarrett/backports/v4.1.x-collectives-updates
Backport Collective changes from master to v4.1.x
2020-06-26 12:01:51 -07:00
raafatfeki
6e145188d9 fs/ime & fbtl/ime: Support of IME file system
Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2020-06-26 12:26:51 -04:00
Jeff Squyres
7987a7f56e common_ofi: fix preprocessor macro typo
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f64c30e93c)
2020-06-26 07:56:53 -07:00
Brian Barrett
339ee6378a dist: Add Collectives backports to NEWS
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:07:26 +00:00
William Zhang
db6ed187b2 coll/tuned: Add NULL check to prevent segfault
Signed-off-by: William Zhang <wilzhang@amazon.com>

cr https://code.amazon.com/reviews/CR-23837553

(cherry picked from commit 771f9c011d)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
William Zhang
03758b1ef7 coll/tuned: Fix typos
Signed-off-by: William Zhang <wilzhang@amazon.com>
(cherry picked from commit 50640402ab)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Brinskii
7eb94164a0 COLL/TUNED: Add linear scatter using isend for mlnx platform
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit f2cbd4806e)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Gilles Gouaillardet
221fad6862 coll/cuda: remove unnecessary references to ORTE
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 531171ca50)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Tomislav Janjusic
f51bd8ca0c Coll/hcoll: adding scatterv interface
Signed-off-by: Valentin Petrov valentinp@mellanox.com
(cherry picked from commit 6ea920e225)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Alex Anenkov
2891a23329 coll/libnbc: add recursive doubling algorithm for MPI_Iallreduce
Signed-off-by: Alex Anenkov <anenkov.ru@gmail.com>
(cherry picked from commit 77d466edf3)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
ba11f31fc8 coll/libnbc: remove debug output
1. Remove debug output in iallgather (I have forgotten to remove it).
2. Remove an incorrect comment in description of ibcast

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 64abd0f405)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
bf1c8bb394 coll/libnbc/ireduce: silence Coverity warning CID 1440360
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 8b511c7889)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
5ee1fb62b9 coll/libnbc: add Rabenseifner's algorithm for MPI_Iallreduce
An implementation of R. Rabenseifner's algorithm for MPI_Iallreduce.

This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by an allgather.

Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 73e048b62a)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
George Bosilca
fd29cce114 Remove few warnings in libnbc identified by clang-1000.11.45.2
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 66182a294d)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
91a4b4c799 coll/libnbc: add recursive doubling algorithm for MPI_Iallgather
Implements recursive doubling algorithm for MPI_Iallgather.
The algorithm can be used only for power-of-two number of processes.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit a7386c1e09)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
6971dab943 coll/libnbc: add knomial tree algorithm for MPI_Ibcast
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit b0429d25df)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
a318f117f6 coll/libnbc: add Rabenseifner's algorithm for MPI_Ireduce
An implementation of R. Rabenseifner's algorithm for MPI_Ireduce.
This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by a gather.

Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 7bd63e79c8)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Brian Barrett
6f6d8180a3 coll libnbc: Remove dead code
Remove dead code that was causing warnings about unused static
functions.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 2e24e6ec08)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
de5e435dee coll/libnbc: add recursive doubling algorithm for MPI_Iexscan
Implements recursive doubling algorithm for MPI_Iexscan.
The algorithm preserves order of operations so it can be used both
by commutative and non-commutative operations.

The MCA parameter 'coll_libnbc_iexscan_algorithm' was added for dynamic
algorithm selection.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit dfe203e167)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
65990af3ad coll/libnbc: add recursive doubling algorithm for MPI_Iscan
Implements recursive doubling algorithm for MPI_Iscan. The algorithm preserves order of operations so it can be used both by commutative and non-commutative operations.

The MCA parameter coll_libnbc_iscan_algorithm was added for dynamic algorithm selection.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 3d43ff0f32)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Jeff Squyres
547fb3d933 libnbc: remove some stale/dead code
Gcc 8 identified hb_tree_csearch() as an infinite recursion, and it
turns out that we never call this function, anyway.  So just remove
it.

Fixes #5670.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 06c1bf73da)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Aurelien Bouteiller
2692840d40 Always return a valid error code from collective operations
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
(cherry picked from commit 466217fadd)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Gilles Gouaillardet
d9d84d5dd6 coll/libnbc: fix NBC_Unpack()
always initialize 'size'.

Only the a2a_sched_diss() alltoall algorithm is impacted,
and this algo is currently unused, so there is no need
to backport nor update the NEWS file for now.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit ff48e92864)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Mikhail Kurnosov
ba221e1a08 coll/base/allgatherv: fix MPI_IN_PLACE processing
The call of MPI_Allgatherv with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit b45e190e66)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-25 23:06:51 +00:00
Jeff Squyres
465414953d
Merge pull request #7828 from edgargabriel/pr/v4.1.x-avg-fview-size
common/ompio: use avg. file view size in the aggregator selection logic
2020-06-25 11:45:48 -04:00
Brian Barrett
173142bf32
Merge pull request #7824 from hoopoepg/topic/ucx-test-external-events-v4.1
COMMON/UCX: improved missing events test - v4.1
2020-06-25 08:10:54 -07:00
Joseph Schuchart
d4219f8144 OSC UCX: make sure no-op fetch in rget/rput is properly aligned
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit c1f7776341)
2020-06-25 17:05:48 +02:00
Jeff Squyres
a3258afad9
Merge pull request #7814 from raafatfeki/topic/gpfs_4.1.x
fs/gpfs: Bring over GPFS component from master to v4.1
2020-06-25 10:47:39 -04:00
Jeff Squyres
470b1c518d
Merge pull request #7812 from hppritcha/topic/cobalt_for_orte
RAS:ALPS add support for ANL Cobalt
2020-06-25 10:40:57 -04:00
Jeff Squyres
2ba26c746d
Merge pull request #7831 from hppritcha/backports/v4.1.x-psm2-updates
Backports/v4.1.x psm2 updates
2020-06-25 10:40:38 -04:00
Jeff Squyres
4e95b7ae60
Merge pull request #7836 from jsquyres/pr/v4.1.x/fix-test-asm-run-tests-script
v4.1.x: tests/asm/run_tests: fix basename usage
2020-06-25 10:40:18 -04:00
Brian Barrett
9ffe9535f3
Merge pull request #7837 from markalle/interception_early_toc_read_v41x
noinline to avoid compiler reading TOC before PATCHER_BEGIN
2020-06-24 10:55:35 -07:00
Brian Barrett
9550cb2a97
Merge pull request #7863 from hppritcha/topic/patch_for_ofi_v41x
add a common ofi whitelist/blacklist
2020-06-24 10:55:03 -07:00
Howard Pritchard
8fcf9cee3b add a common ofi whitelist/blacklist
also add common verbose variable.

Note the verbosity thing is a little tricky owing to the way the MCA frameworks and components are registered and
and initialized.  The BTL's are registered/initialized prior to the MTL components even getting registered.

Here's the change in ofi mtl mca parameters.  Before commit:

            MCA mtl ofi: parameter "mtl_ofi_provider_include" (current value: "psm2", data source: environment, level: 1 user/basic, type: string)
                          Comma-delimited list of OFI providers that are considered for use (e.g., "psm,psm2"; an empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_exclude.
             MCA mtl ofi: parameter "mtl_ofi_provider_exclude" (current value: "shm,sockets,tcp,udp,rstream", data source: default, level: 1 user/basic, type: string)
                          Comma-delimited list of OFI providers that are not considered for use (default: "sockets,mxm"; empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_include.

After commit:

             MCA btl ofi: parameter "btl_ofi_provider_include" (current value: "", data source: default, level: 1 user/basic, type: string, synonym of: opal_common_ofi_provider_include)
                          Comma-delimited list of OFI providers that are considered for use (e.g., "psm,psm2"; an empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_exclude.
             MCA btl ofi: parameter "btl_ofi_provider_exclude" (current value: "shm,sockets,tcp,udp,rstream", data source: default, level: 1 user/basic, type: string, synonym of: opal_common_ofi_provider_exclude)
                          Comma-delimited list of OFI providers that are not considered for use (default: "sockets,mxm"; empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_include.
             MCA mtl ofi: parameter "mtl_ofi_provider_exclude" (current value: "shm,sockets,tcp,udp,rstream", data source: default, level: 1 user/basic, type: string, synonym of: opal_common_ofi_provider_exclude)
                          Comma-delimited list of OFI providers that are not considered for use (default: "sockets,mxm"; empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_include.
             MCA mtl ofi: parameter "mtl_ofi_verbose" (current value: "0", data source: default, level: 3 user/all, type: int, synonym of: opal_common_ofi_verbose)

related to #7755

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 9f1081a07a)
(cherry picked from commit 45b643d0cf)
2020-06-23 14:02:33 -06:00
Jeff Squyres
6d550a32ca
Merge pull request #7856 from hjelmn/v4.1.x_add_a_missing_osc_rdma_commit_for_dynamic_memory_windows_that_was_forgotten_before_the_last_release_doh
osc/rdma: fix bug in attach for non-debug builds
2020-06-22 17:16:42 -04:00
Jeff Squyres
4a466e4f08
Merge pull request #7853 from devreal/osc-rdma-noncontig-requests-v4.1.x
osc rdma: check for outstanding fragments before completing a request (v4.1.x)
2020-06-22 11:53:48 -04:00
Jeff Squyres
c8e5227e1a
Merge pull request #7173 from jjhursey/v4-jsm-schizo
Add detection for JSM direct launch
2020-06-22 11:51:52 -04:00
Nathan Hjelm
059c9614a6 osc/rdma: fix bug in attach for non-debug builds
This commit fixes an issue with non-debug builds where adding an
attachment to the attachment list doesn't actually happen. This
causes all MPI_Win_detach calls to fail. The call was within an
assert which is optimized out in optimized builds.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 8ee80d8855)
2020-06-22 08:34:43 -06:00
Joseph Schuchart
1af51d07d6 osc rdma: check for outstanding fragments before completing a request
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 85ed26f2f8)
2020-06-22 08:32:14 +02:00
Mark Allen
1f5adc65bc noinline to avoid compiler reading TOC before PATCHER_BEGIN
This bug was first seen in a different product that's using the same
interception code as OMPI.  But I think it's potentially in OMPI too.

In my vanilla build of OMPI master on RH8 if I "gdb libopen-pal.so" and
"disassemble intercept_brk", I'm seeing a suspicious extra instruction
in front of PATCHER_BEGIN:
   0x00000000000d6778 <+40>:    std     r2,24(r1) // something gcc put in front
   0x00000000000d677c <+44>:    std     r2,96(r1) // PATCHER_BEGIN's toc_save
   0x00000000000d6780 <+48>:    nop               // NOPs from PATCHER_BEGIN
   0x00000000000d6784 <+52>:    nop               // that get replaced
   0x00000000000d6788 <+56>:    nop               // by instructions that
   0x00000000000d678c <+60>:    nop               // change r2
   0x00000000000d6790 <+64>:    nop               //

Later there are loads from that location like
   0x000000000019e0e4 <+132>:   ld      r2,24(r1)
that make me nervous since that's the pre-updated value.

I believe this is the same thing Nathan is describing way back in a9bc692d
and his solution was to put a second call around each interception, where
the outer call is just
    intercept_brk():
        PATCHER_BEGIN
        _intercept_brk()
        PATCHER_END
and the inner call _intercept_brk() is where the bulk of the code goes.

What I'm seeing is that _intercept_brk() is being inlined and probably
negating Nathan's fix.  So I want to add __opal_attribute_noinline__ to
restore the fix.

With this commit in place, the disassembly of intercept_brk becomes tiny
because it's no longer inlining _intercept_brk() and the susipicious
early save of r2 is gone.  I made the same fix to all the intercept_*
functions, although intercept_brk was the only one that had a suspicious
save of r2.

As far as empirical failures though, we only have those from the non-OMPI
product that's using the same patcher code.  I'm not actually getting OMPI
to fail from the above suspicious data being saved in r1+24.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit ddd1f578ec)
2020-06-17 18:45:54 -04:00
Brian Barrett
a4e8f2b2cb
Merge pull request #7807 from bwbarrett/backports/v4.1.x-ofi-updates
Backport OFI changes from master to v4.1.x
2020-06-17 15:07:32 -07:00
Brian Barrett
204922fff6 dist: Add OFI backports to NEWS
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00
Harumi Kuno
61b11e4101 mtl_btl_ofi_rcache_init() before creating domain
mtl_btl_ofi_rcache_init() initializes patcher which should only take
place things are single threaded.  OFI providers may start spawn threads,
so initialize the rcache before creating OFI objects to prevent races.

Authored-by: John L. Byrne <john.l.byrne@hpe.com>
Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
(cherry picked from commit f1b21cb776)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2020-06-17 20:49:19 +00:00