1
1

1204 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
ee405ccaa5 coll/adapt and coll/han: fix trivial compiler warnings
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-10-30 10:41:14 -04:00
George Bosilca
154117515a Fix HAN issues reported by Coverity.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-10-28 15:44:16 -04:00
George Bosilca
16b49dc5b3 A complete overhaul of the HAN code.
Among many other things:
- Fix an imbalance bug in MPI_allgather
- Accept more human readable configuration files. We can now specify
  the collective by name instead of a magic number, and the component
  we want to use also by name.
- Add the capability to have optional arguments in the collective
  communication configuration file. Right now the capability exists
  for segment lengths, but is yet to be connected with the algorithms.
- Redo the initialization of all HAN collectives.

Cleanup the fallback collective support.
- In case the module is unable to deliver the expected result, it will fallback
  executing the collective operation on another collective component. This change
  make the support for this fallback simpler to use.
- Implement a fallback allowing a HAN module to remove itself as
  potential active collective module, and instead fallback to the
  next module in line.
- Completely disable the HAN modules on error. From the moment an error is
  encountered they remove themselves from the communicator, and in case some
  other modules calls them simply behave as a pass-through.

Communicator: provide ompi_comm_split_with_info to split and provide info at the same time
Add ompi_comm_coll_preference info key to control collective component selection

COLL HAN: use info keys instead of component-level variable to communicate topology level between abstraction layers
- The info value is a comma-separated list of entries, which are chosen with
  decreasing priorities. This overrides the priority of the component,
  unless the component has disqualified itself.
  An entry prefixed with ^ starts the ignore-list. Any entry following this
  character will be ingnored during the collective component selection for the
  communicator.
  Example: "sm,libnbc,^han,adapt" gives sm the highest preference, followed
  by libnbc. The components han and adapt are ignored in the selection process.
- Allocate a temporary buffer for all lower-level leaders (length 2 segments)
- Fix the handling of MPI_IN_PLACE for gather and scatter.

COLL HAN: Fix topology handling
 - HAN should not rely on node names to determine the ordering of ranks.
   Instead, use the node leaders as identifiers and short-cut if the
   node-leaders agree that ranks are consecutive. Also, error out if
   the rank distribution is imbalanced for now.

Signed-off-by: Xi Luo <xluo12@vols.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-10-25 18:13:16 -04:00
bsergentm
220b997a58 Coll/han Bull
* first import of Bull specific modifications to HAN

* Cleaning, renaming and compilation fixing Changed all future into han.

* Import BULL specific modifications in coll/tuned and coll/base

* Fixed compilation issues in Han

* Changed han_output to directly point to coll framework output.

* The verbosity MCA parameter was removed as a duplicated of coll verbosity

* Add fallback in han reduce when op cannot commute and ppn are imbalanced

* Added fallback wfor han bcast when nodes do not have the same number of process

* Add fallback in han scatter when ppn are imbalanced

+ fixed missing scatter_fn pointer in the module interface

Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>
Co-authored-by: a700850 <pierre.lemarinier@atos.net>
Co-authored-by: germainf <florent.germain@atos.net>
2020-10-09 14:17:46 -04:00
Xi Luo
182c333b21 Initial import of the HAN collective module
a hierarchical, architecture-aware collective communication module.

Add Reduce and remove up_seg_size and low_seg_size in Bcast
Increase HAN's priority

Signed-off-by: Xi Luo <xluo12@vols.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-10-09 14:17:46 -04:00
George Bosilca
96fea22cdd
Don't allow some rank to don't count the collective if they have no data
to exchange.

This is the same logic as in 77eaa5c applied to ialltoallw.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-09-24 13:29:01 -04:00
George Bosilca
77eaa5c8b8
Keep the non-blocking collective tags globally in sync.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-09-18 12:52:14 -04:00
George Bosilca
c98e387a53
Many fixes and improvements to ADAPT
- Add support for fallback to previous coll module on non-commutative operations (#30)
- Replace mutexes by atomic operations.
- Use the correct nbc request type (for both ibcast and ireduce)
  * coll/base: document type casts in ompi_coll_base_retain_*
- add module-wide topology cache
- use standard instead of synchronous send and add mca parameter to control mode of initial send in ireduce/ibcast
- reduce number of memory allocations
- call the default request completion.
  - Remove the requests from the Fortran lookup conversion tables before completing
    and free it.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>

Co-authored-by: Joseph Schuchart <schuchart@hlrs.de>
2020-09-18 12:50:17 -04:00
Jeff Squyres
560ebc5780
Merge pull request #7716 from bosilca/coll/adapt
ADAPT: Event-driven collective implementation
2020-09-01 11:29:53 -04:00
William Zhang
57b95bcb45 coll/tuned: Revert RSB and RS default algorithms
Reduce scatter block and reduce scatter algorithms were hitting
correctness issues for non commutative strided tests. We will revert to
the original default algorithms for those two collectives (basic linear
and non overlapping respectively) in the non commutative op case.

See #8010

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-08-25 08:44:24 -07:00
George Bosilca
ee592f3672 Address the comments on the PR.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
Xi Luo
e59bde912e Remove the code handling zero count cases in ADAPT.
Set request in ibcast.c to empty when the count is 0.

Signed-off-by: Xi Luo <xluo12@vols.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
George Bosilca
c2970a3695 Correctly handle non-blocking collectives tags
As it is possible to have multiple outstanding non-blocking collectives
provided by different collective modules, we need a consistent
mechanism to allow them to select unique tags for each instance of a
collective.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
George Bosilca
d71264569e Fix the atomic management of the bcast and reduce freelist
API consistent with other collective modules
Add comments
Other minor cleanups.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
bsergentm
a4be3bb93d Coll/adapt Bull (#15)
* piggybacking Bull functionalities

* coll/adapt: Fix naming conventions and C11 atomic use

This commit fixes some naming convention issues, such as function names
which should follow the naming ompi_coll_adapt instead of
mca_coll_adapt, reserved for component and module naming (cf. tuned
collective component);

It also fixes the use of _Atomic construct, which is only valid in C11.
OPAL constructs have already been adapted to that use, so use
opal_atomic_* types instead.

* coll/adapt: Remove unused component field in module

This commit removes an unneeded field referencing the component in the
module of adapt, as it is already available through the
mca_coll_adapt_component global variable.

Signed-off-by: Marc Sergent <marc.sergent@atos.net>
Co-authored-by: Lemarinier, Pierre <pierre.lemarinier@atos.net>
Co-authored-by: pierrele <31764860+pierrele@users.noreply.github.com>
2020-08-24 12:13:38 -07:00
Xi Luo
fe73586808 Add ADAPT module
Add comments in the ADAPT module

Signed-off-by: Xi Luo <xluo12@vols.utk.edu>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-08-24 12:13:38 -07:00
Brian Barrett
41df122083
Merge pull request #7730 from wckzhang/newdefaults
coll/tuned: Change the default collective algorithm selection
2020-07-28 15:27:46 -07:00
William Zhang
ce40cfbaa5 coll/tuned: Change the default collective algorithm selection
The default algorithm selections were out of date and not performing
well. After gathering data from OMPI developers, new default algorithm
decisions were selected for:

    allgather
    allgatherv
    allreduce
    alltoall
    alltoallv
    barrier
    bcast
    gather
    reduce
    reduce_scatter_block
    reduce_scatter
    scatter

These results were gathered using the ompi-collectives-tuning package
and then averaged amongst the results gathered from multiple OMPI
developers on their clusters.

You can access the graphs and averaged data here:
https://drive.google.com/drive/folders/1MV5E9gN-5tootoWoh62aoXmN0jiWiqh3

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-07-28 10:41:48 -07:00
Joshua Ladd
366e92ce54
Merge pull request #7860 from vspetrov/hcoll_reduce_scatter
Coll/Hcoll: reduce_scatter(block) interface
2020-07-22 09:45:34 -04:00
Todd Kordenbrock
4358e75a75
Merge pull request #7866 from tkordenbrock/topic/master/portals4.fix-inappropriate-use-of-abort
portals4: fix inappropriate use of abort() in mtl-portals4 and coll-portals4 components
2020-06-30 08:46:03 -05:00
Valentin Petrov
d366812030 coll/hcoll: compile warning fix
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2020-06-30 09:44:46 +03:00
Valentin Petrov
1d54071fc1 coll/hcoll: reduce_scatter(block) interface
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2020-06-30 09:44:46 +03:00
Todd Kordenbrock
04b94637dd mtl-portals4: replace abort() with ompi_rte_abort()
coll-portals4: replace abort() with ompi_rte_abort()

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2020-06-24 11:31:26 -05:00
Joseph Schuchart
9a60f5b7fb Add missing free calls to ompi_coll_base_reduce_intra_basic_linear
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-19 12:27:36 +02:00
Joseph Schuchart
8e24c0d532 Add missing free calls to ompi_coll_base_allgather_intra_bruck
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-06-19 12:24:56 +02:00
Sergey Oblomov
df0f2ac026 OMPI/HCOLL: fixed typo in vars description
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2020-05-29 20:13:35 +03:00
William Zhang
50823fe9a9 coll/tuned: Fix dynamic message size for gather and scatter
The gather and scatter operations did not use the correct message size
(Only did datatype size * com size). This did not correctly reflect the
total message size and prevents fine tuning within a com size. This
patch multiplies the value by the number of elements sent.

Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-05-14 12:17:52 -07:00
William Zhang
771f9c011d coll/tuned: Add NULL check to prevent segfault
Signed-off-by: William Zhang <wilzhang@amazon.com>

cr https://code.amazon.com/reviews/CR-23837553
2020-04-21 17:53:46 +00:00
William Zhang
50640402ab coll/tuned: Fix typos
Signed-off-by: William Zhang <wilzhang@amazon.com>
2020-04-21 17:39:37 +00:00
Nathan Hjelm
160ff188b8
Merge pull request #7169 from hjelmn/fix_what_wg21_calls_our_problem_not_theirs_seriously__in_some_ways_they_are_correct_but_wtf
configure: use -iquote for non-system include paths
2020-03-30 09:22:54 -07:00
Mikhail Kurnosov
66b6b8d34e Fix Bcast scatter_allgather
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2020-03-11 12:47:47 +07:00
Gilles Gouaillardet
69bc2e8372 misc: fix <> vs "" includes throught the ompi codebase
This commit fixes an issue with the include usage in some
ompi source files. These source files are using the <> form
of include when the "" form is correct (as these are internal,
**not** system headers).

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-03-09 21:13:49 -04:00
bosilca
c4d36859ec
Merge pull request #7228 from devreal/progress-returns
Harmonize return values of progress callbacks
2020-02-28 20:15:37 -05:00
Gilles Gouaillardet
174e967dbc
Remove ORTE project
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.

remove orte: misc fixes

 - UCX fixes
 - VPATH issue
 - oshmem fixes
 - remove useless definition
 - Add PRRTE submodule
 - Get autogen.pl to traverse PRRTE submodule
 - Remove stale orcm reference
 - Configure embedded PRRTE
 - Correctly pass the prefix to PRRTE
 - Correctly set the OMPI_WANT_PRRTE am_conditional
 - Move prrte configuration to the end of OMPI's configure.ac
 - Make mpirun a symlink to prun, when available
 - Fix makedist with --no-orte/--no-prrte option
 - Add a `--no-prrte` option which is the same as the legacy
   `--no-orte` option.
 - Remove embedded PMIx tarball. Replace it with new submodule
   pointing to OpenPMIx master repo's master branch
 - Some cleanup in PRRTE integration and add config summary entry
 - Correctly set the hostname
 - Fix locality
 - Fix singleton operations
 - Fix support for "tune" and "am" options

Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-02-07 18:20:06 -08:00
Joseph Schuchart
2c97187ee0 Harmonize return values of progress callbacks
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2020-01-28 20:15:03 +01:00
Austen Lauria
969eb0286c
Merge pull request #6970 from ggouaillardet/topic/cuda_no_orte
coll/cuda: remove unnecessary references to ORTE
2020-01-27 13:12:32 -05:00
Austen Lauria
b65ec27307 Fix some compiler warnings.
Silence unused variables, incompatible pointer types,
un-initialized variables, and signed/unsigned comparisons.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2020-01-10 13:10:53 -05:00
Jeff Squyres
8b424c3863
Merge pull request #7232 from bosilca/hjelmn_neighbor_alltoall_fix
Neighbor alltoall fix
2019-12-17 17:24:05 -05:00
George Bosilca
86acdee460
Fix the communication ordering for all cartesian neighbor collectives.
This work is rooted in the [MPI Forum issue
153](https://github.com/mpi-forum/mpi-issues/issues/153).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-12-11 12:40:38 -05:00
Maxwell Coil
52241dbbcd libnbc: fixed uninitialized variable
Squash compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
2019-12-08 14:03:48 -05:00
Mikhail Brinskii
f2cbd4806e COLL/TUNED: Add linear scatter using isend for mlnx platform
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-11-07 11:04:39 +02:00
Nathan Hjelm
196a91e604 coll/basic: fix neighbor alltoall message ordering
This commit updates the coll/basic component to correctly order sends
and receives for cartesian communicators with cyclic boundaries. This
addresses an issue identified by mpi-forum/mpi-issues#153. This issue
occurs when the size in any dimension is 1. This gives the same
neighbor in the positive and negative directions. The old code was
sending and receiving in the same order so the -1 buffer contained
the +1 result and vise-versa. The problem is addressed by using
unique tags for each send. This should cover both the case where
overtaking is allowed and is not allowed. The former case will be
possible is a MPI_Cart_create_with_info() call is added to the
standard.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2019-10-08 21:10:49 -07:00
Gilles Gouaillardet
531171ca50 coll/cuda: remove unnecessary references to ORTE
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-09-10 13:55:59 +09:00
Valentin Petrov
a0d99ad190 Coll/hcoll: fixes hcoll non-blocking colls support
open-mpi/ompi@0fe756d416 Introduced
    a bug in coll/hcoll component. The ompi_requests allocated by
    libhcoll would be treated as coll_base_nbc_request during
    ompi_coll_base_retain_<> call. Afterwards this would lead to a
    segv in the request cleanup.

    Fix: since libhcoll interface does not distinguish between the
    blocling/non-blocking requests use coll_base_nbc_request all the
    time and initialize it properly in
    coll/hcoll/get_coll_handle(). It is still within 2 cache lines.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-08-27 17:22:58 +03:00
Gilles Gouaillardet
63d3ccde9d coll/base: only retain datatypes/op if the request has not yet completed
a non blocking collective might return ompi_request_null, so we should not
retain anything in that case.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-08-09 09:57:56 +09:00
Gilles Gouaillardet
0862c409f1 coll/base: cleanup ompi_coll_base_nbc_request_t elements
Since ompi_coll_base_nbc_request_t is to be used in an
opal_free_list_t, it must be returned into a "clean" state.
So cleanup some data in the callback completion subroutines.

This fixes a regression introduced in open-mpi/ompi@0fe756d416

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-08-08 10:48:06 +09:00
Gilles Gouaillardet
f8eef0fde9 coll/libnbc: fixes ompi ompi_coll_libnbc_request_t parent
base ompi_coll_libnbc_request_t on top of ompi_coll_base_nbc_request_t
to correctly support the retention of datatypes/operators

This fixes a regression introduced in open-mpi/ompi@0fe756d416

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-08-08 10:47:48 +09:00
Yossi Itigin
98d0ecfe14
Merge pull request #6814 from brminich/tuned_all2all_select
COLL/TUNED: Update alltoall selection rule for mellanox platform
2019-07-25 17:51:55 +03:00
Mikhail Brinskii
65618f8db8 COLL/TUNED: Minor var names/comments fixes
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-07-24 10:23:38 +00:00
Mikhail Brinskii
404c480068 COLL/TUNED: Update alltoall selection rule for mlx
Use linear with sync alltoall algorithm for certain message/comm size
ranges. Does not affect default fixed decision, unless HPCX (with its
custom parameters) is used or corresponding mca is set.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-07-13 23:27:40 +03:00