1
1

1109 Коммитов

Автор SHA1 Сообщение Дата
Todd Kordenbrock
36369f9133 coll-portals4: retry PtlMEUnlink() if PTL_IN_USE
In the cleanup phase, it is possible for PtlMEUnlink() to return
PTL_IN_USE if the NIC is not done with the ME.  This should not
be considered an error.  This commit adds a retry loop around
PtlMEUnlink().

In some cases, the return value of PtlMEUnlink() and PtlCTFree()
was not checked at all.  Check them with the same retry loop as
above.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
(cherry picked from commit f3f2a826b40cc0d4a45a63614835162ec6eef78e)
2018-08-07 11:23:51 -05:00
Mikhail Kurnosov
c540dfb18c coll-base-allgather: fix MPI_IN_PLACE processing
The call of MPI_Allgather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 540c2d1)
2018-07-25 08:11:28 +07:00
Gilles Gouaillardet
1a41482720 coll/libnbc: do not recursively call opal_progress()
instead of invoking ompi_request_test_all(), that will end up
calling opal_progress() recursively, manually check the status
of the requests.

the same method is used in ompi_comm_request_progress()

Refs open-mpi/ompi#3901

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 09:45:08 -06:00
Mikhail Kurnosov
ba83cc91eb coll/base: add MPI_Bcast based on a binomial tree scatter followed by a ring allgather
Implements MPI_Bcast using a binomial tree scatter followed by a ring allgather.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-16 08:56:09 -06:00
KAWASHIMA Takahiro
37a05e74aa coll/libnbc: Suppress compiler warnings
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-12 14:42:39 +09:00
Howard Pritchard
34bc77747c
Merge pull request #5388 from mkurnosov/base-gather-bmtree-fix-mpi-in-place
coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing
2018-07-11 18:34:35 -05:00
Gilles Gouaillardet
76292951e5 coll/libnbc: fix integer overflow
Use internal pack/unpack subroutines that operate on MPI_Aint instead of int
and hence solve some integer overflows.

Thanks Clyde Stanfield for reporting this issue.

Refs open-mpi/ompi#5383

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-09 10:08:33 -06:00
Mikhail Kurnosov
22fa5a8a67 coll/base/scatter: replaces right skewed binomial tree (in order) with left skewed binomial tree
Current implementation of `coll/base/MPI_Scatter` is based on in-order binomial tree. This tree is right skewed and it provides good performance for a MPI_Gather operation. But for a MPI_Scatter operation left skewed binomial tree is effective.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-09 10:04:41 -06:00
Mikhail Kurnosov
b9e14cd7d0 coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing
The call of MPI_Gather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault in the root process.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard (page 150, line 37), sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-07 20:59:39 +07:00
KAWASHIMA Takahiro
a8da78eeaa
Merge pull request #4618 from ggouaillardet/topic/pcoll
Add the persistent collectives feature
2018-06-26 12:36:34 +09:00
Joshua Ladd
98afc838aa
Merge pull request #5294 from yosefe/topic/coll-hcoll-progress-fn
coll_hcoll: register progress callback directly without a proxy
2018-06-25 15:07:26 -04:00
Yossi Itigin
e3ee11608b coll_hcoll: register progress callback directly without a proxy
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-06-24 18:06:07 +03:00
Mikhail Kurnosov
c500739293 coll/base: Add MPI_Bcast based on a scatter followed by an allgather
Implements MPI_Bcast using a binomial tree scatter followed by
an recursive doubling allgather.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-06-21 11:47:07 -06:00
Mikhail Kurnosov
66bc86a25b Change the tree_next to a flexible array member
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-06-19 13:01:26 -06:00
Mikhail Kurnosov
6547b58316 coll/base: add knomial tree algorithm for MPI_Bcast
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-06-19 13:01:26 -06:00
KAWASHIMA Takahiro
a38e9e064f coll: Update COLL module interface version to 2.3.0
Members for persistent operations are added to the module structure
in a prior commit.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
e12a5056f1 coll/libnbc: Rename internal functions
The `nbc_i*` functions don't start communication, but create a request.
`nbc_*_init` are appropriate names for them.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
5c21903477 coll/libnbc: Add assertion for NBC_A2A_DISS
Persistent operation for `NBC_A2A_DISS` is not supported currently.
Though the algorithm is not selected at all currently, I put an
assertion not to select it by mistake.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
0b8b0f8393 coll/libnbc: Implement MPI_STARTALL
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
ed0144bad4 coll/libnbc: Adapt local copy for persistent request
`NBC_Copy` shoud not be called in `MPI_*_INIT`.
`NBC_Sched_copy` should be called instead.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
5c5de3a4fb coll/libnbc: Fix handling of completed request
Because a persistent reuqest does not free its `schedule` object
when the communication completes, the `NBC_Progress` function cannot
determine the completion using `schedule`.

Without this change, a hang occurs when the `NBC_Progress` function
is called recursively through the `NBC_Start_round` function.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
8e5690bf5c coll/libnbc: Correct persistent request handling
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
e69e99575e coll: Enable func check in mca_coll_base_comm_select
Now libnbc COLL supports persistent collectives and all `*_init`
functions of the COLL interface are available. So let's enable the
check of availability of those functions on a communicator creation.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 09:53:37 +09:00
Gilles Gouaillardet
a9609b6bf8 coll/libnbc: add persistent collectives implementation
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-11 09:53:37 +09:00
KAWASHIMA Takahiro
a9fdea51aa coll: Add persistent collective communication request feature
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 09:53:37 +09:00
Gilles Gouaillardet
c753e9baff coll/libnbc: code refactoring
prepare the upcoming persistent collectives by pre-factoring some code

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

fixup 808c3c62cd9475edd91ecde9d2d53b12e28b2c04
2018-06-11 09:53:37 +09:00
Gilles Gouaillardet
fe0bb6c310 coll/libnbc: misc revamp - merge NBC_Init_handle() into NBC_Schedule_request() - set schedule in NBC_Schedule_request instead of NBC_Start() - update NBC_Start() prototype
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-11 09:53:37 +09:00
Gilles Gouaillardet
360a76f440 coll/libnbc: revamp ibcast and use NBC_Schedule_request()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-11 09:53:37 +09:00
Mikhail Kurnosov
3adf96fdb8 coll/base: add butterfly algorithm for MPI_Reduce_scatter
Implements butterfly algorithm for MPI_Reduce_scatter.
The algorithm can be used both by commutative and non-commutative operations, for power-of-two and non-power-of-two number of processes.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-06-05 15:53:13 +07:00
Jeff Squyres
35438ae9b5 mpi/finalized: revamp INITIALIZED/FINALIZED
Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be
invoked during an attribute destruction callback (e.g., during the
destruction of keyvals on MPI_COMM_SELF during the very beginning of
MPI_FINALIZE).  In such cases, MPI_FINALIZED must return "false".

Prior to this commit, we hung in FINALIZED if it were invoked during
a COMM_SELF attribute destruction callback in FINALIZE.  See
https://github.com/open-mpi/ompi/issues/5084.

This commit converts the MPI_INITIALIZED / MPI_FINALIZED
infrastructure to use a single enum (ompi_mpi_state, set atomically)
to represent the state of MPI:

- not initialized
- init started
- init completed
- finalize started
- finalize past COMM_SELF destruction
- finalize completed

The "finalize past COMM_SELF destruction" state is what allows us to
return "false" from MPI_FINALIZED before COMM_SELF has been fully
destroyed / all attribute callbacks have been invoked.

Since this state is checked at nearly every MPI API call (to see if
we're outside of the INIT/FINALIZE epoch), care was taken to use
atomics to *set* the ompi_mpi_state value in ompi_mpi_init() and
ompi_mpi_finalize(), but performance-critical code paths can simply
read the variable without needing to use a slow call to an
opal_atomic_*() function.

Thanks to @AndrewGaspar for reporting the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-01 13:36:29 -07:00
Mikhail Kurnosov
28d5837dd9 coll: reduce_scatter_block: add butterfly algorithm
Implements butterfly algorithm for MPI_Reduce_scatter_block.
The algorithm can be used both by commutative and non-commutative
operations, for power-of-two and non-power-of-two number of processes.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-05-27 14:17:41 +07:00
George Bosilca
7191ea120c
Fix merge conflict related to function renaming.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-05-15 11:34:20 -04:00
bosilca
d13b9a2e25
Merge pull request #5156 from ggouaillardet/topic/reduce_scatter_block
coll: reduce_scatter_block: rename and MCA parameter description fix
2018-05-15 11:13:26 -04:00
Mikhail Kurnosov
82299a9c04 coll: reduce_scatter_block: add recursive halving algorithm
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-05-15 08:20:32 +07:00
Gilles Gouaillardet
ce7b3113f6 coll: reduce_scatter_block: rename and MCA parameter description fix
- rename ompi_coll_base_reduce_scatter_block_basic to
   more self descriptive ompi_coll_base_reduce_scatter_block_basic_linear
 - fix the description of the coll_tuned_reduce_scatter_block_algorithm
   MCA param

this fixes and documents previous open-mpi/ompi@0e8b35b615

MPI_Reduce_scatter_block used to be implemented by the coll/basic module only.
A new algo (recursive doubling) was recently introduced and can be used via the coll/tuned module,
but we never intended to make it the default algo.
In order to "restore" the previous default, the initial algo was moved from coll/basic to coll/base,
and is now used by default by coll/tuned.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-05-09 08:54:48 +09:00
Jeff Squyres
b39bbfb3c0
Merge pull request #5142 from mkurnosov/base-reduce-remove-warnings
coll/base/reduce: remove warning identified by Coverity Scan
2018-05-07 15:49:56 -04:00
Gilles Gouaillardet
0e8b35b615 coll/tuned: use basic algo for reduce_scatter_block by default
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-05-07 16:11:44 +09:00
Gilles Gouaillardet
32095be0d6 coll/{base,basic}: move reduce_scatter_block from basic to base
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-05-07 16:11:38 +09:00
Mikhail Kurnosov
ba968e4490 coll/base/reduce: Remove warning identified by Coverity Scan
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-05-04 20:48:37 +07:00
Mikhail Kurnosov
8cf8553abd Resolve merge conflicts
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-05-03 07:28:32 +07:00
Mikhail Kurnosov
787ec8929b Rename function rounddown into ompi_rounddown
FreeBSD 11 `/sys/param.h` has declaration of `rounddown`

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-04-24 07:59:49 +07:00
Mikhail Kurnosov
4cbcff7fcd coll/base: add recursive doubling algorithm for MPI_Reduce_scatter_block
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-04-23 11:02:31 +07:00
Mikhail Kurnosov
82a3a5bdb5 Fix dynamic decision for Scan and bug in Allreduce
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-04-06 11:03:17 +07:00
Gilles Gouaillardet
e85fa469f3 coll/tuned: add recursive doubling algo for [ex]scan
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-04-04 14:56:23 +09:00
Gilles Gouaillardet
393376bbd9 coll/basic: move [ex]scan from coll/basic to coll/base
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-04-04 13:41:01 +09:00
Gilles Gouaillardet
65fa0b59c3 coll/tuned: add Rabenseifner algo for [all]reduce
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-04-04 13:25:41 +09:00
Mikhail Kurnosov
177c6ce51f Move algorithms from coll/spacc to coll/base and remove coll/spacc
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-04-04 10:21:06 +07:00
Mikhail Kurnosov
1d2d43bdf0 Fix compile error with dtype
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-04-01 08:27:34 +07:00
Mikhail Kurnosov
50ec214d42 Add recursive doubling algorithm for MPI_Scan and MPI_Exscan to coll/base
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-03-30 10:12:51 +07:00
Mikhail Kurnosov
bd12e2b1c6 Add recursive doubling algorithm for Scan and Exscan
Implements recursive doubling algorithm for MPI_Scan and MPI_Exscan.
The algorithm preserves order of operations so it can be used both
by commutative and non-commutative operations.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-03-28 16:27:11 +07:00