openmpi

Автор	SHA1	Сообщение	Дата
Jeff Squyres	560ebc5780	Merge pull request #7716 from bosilca/coll/adapt ADAPT: Event-driven collective implementation	2020-09-01 11:29:53 -04:00
William Zhang	57b95bcb45	coll/tuned: Revert RSB and RS default algorithms Reduce scatter block and reduce scatter algorithms were hitting correctness issues for non commutative strided tests. We will revert to the original default algorithms for those two collectives (basic linear and non overlapping respectively) in the non commutative op case. See #8010 Signed-off-by: William Zhang <wilzhang@amazon.com>	2020-08-25 08:44:24 -07:00
George Bosilca	ee592f3672	Address the comments on the PR. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-08-24 12:13:38 -07:00
Xi Luo	e59bde912e	Remove the code handling zero count cases in ADAPT. Set request in ibcast.c to empty when the count is 0. Signed-off-by: Xi Luo <xluo12@vols.utk.edu> Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-08-24 12:13:38 -07:00
George Bosilca	c2970a3695	Correctly handle non-blocking collectives tags As it is possible to have multiple outstanding non-blocking collectives provided by different collective modules, we need a consistent mechanism to allow them to select unique tags for each instance of a collective. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-08-24 12:13:38 -07:00
George Bosilca	d71264569e	Fix the atomic management of the bcast and reduce freelist API consistent with other collective modules Add comments Other minor cleanups. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-08-24 12:13:38 -07:00
bsergentm	a4be3bb93d	Coll/adapt Bull (#15 ) * piggybacking Bull functionalities * coll/adapt: Fix naming conventions and C11 atomic use This commit fixes some naming convention issues, such as function names which should follow the naming ompi_coll_adapt instead of mca_coll_adapt, reserved for component and module naming (cf. tuned collective component); It also fixes the use of _Atomic construct, which is only valid in C11. OPAL constructs have already been adapted to that use, so use opal_atomic_* types instead. * coll/adapt: Remove unused component field in module This commit removes an unneeded field referencing the component in the module of adapt, as it is already available through the mca_coll_adapt_component global variable. Signed-off-by: Marc Sergent <marc.sergent@atos.net> Co-authored-by: Lemarinier, Pierre <pierre.lemarinier@atos.net> Co-authored-by: pierrele <31764860+pierrele@users.noreply.github.com>	2020-08-24 12:13:38 -07:00
Xi Luo	fe73586808	Add ADAPT module Add comments in the ADAPT module Signed-off-by: Xi Luo <xluo12@vols.utk.edu> Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2020-08-24 12:13:38 -07:00
Brian Barrett	41df122083	Merge pull request #7730 from wckzhang/newdefaults coll/tuned: Change the default collective algorithm selection	2020-07-28 15:27:46 -07:00
William Zhang	ce40cfbaa5	coll/tuned: Change the default collective algorithm selection The default algorithm selections were out of date and not performing well. After gathering data from OMPI developers, new default algorithm decisions were selected for: allgather allgatherv allreduce alltoall alltoallv barrier bcast gather reduce reduce_scatter_block reduce_scatter scatter These results were gathered using the ompi-collectives-tuning package and then averaged amongst the results gathered from multiple OMPI developers on their clusters. You can access the graphs and averaged data here: https://drive.google.com/drive/folders/1MV5E9gN-5tootoWoh62aoXmN0jiWiqh3 Signed-off-by: William Zhang <wilzhang@amazon.com>	2020-07-28 10:41:48 -07:00
Joshua Ladd	366e92ce54	Merge pull request #7860 from vspetrov/hcoll_reduce_scatter Coll/Hcoll: reduce_scatter(block) interface	2020-07-22 09:45:34 -04:00
Todd Kordenbrock	4358e75a75	Merge pull request #7866 from tkordenbrock/topic/master/portals4.fix-inappropriate-use-of-abort portals4: fix inappropriate use of abort() in mtl-portals4 and coll-portals4 components	2020-06-30 08:46:03 -05:00
Valentin Petrov	d366812030	coll/hcoll: compile warning fix Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2020-06-30 09:44:46 +03:00
Valentin Petrov	1d54071fc1	coll/hcoll: reduce_scatter(block) interface Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2020-06-30 09:44:46 +03:00
Todd Kordenbrock	04b94637dd	mtl-portals4: replace abort() with ompi_rte_abort() coll-portals4: replace abort() with ompi_rte_abort() Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>	2020-06-24 11:31:26 -05:00
Joseph Schuchart	9a60f5b7fb	Add missing free calls to ompi_coll_base_reduce_intra_basic_linear Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-19 12:27:36 +02:00
Joseph Schuchart	8e24c0d532	Add missing free calls to ompi_coll_base_allgather_intra_bruck Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-06-19 12:24:56 +02:00
Sergey Oblomov	df0f2ac026	OMPI/HCOLL: fixed typo in vars description Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2020-05-29 20:13:35 +03:00
William Zhang	50823fe9a9	coll/tuned: Fix dynamic message size for gather and scatter The gather and scatter operations did not use the correct message size (Only did datatype size * com size). This did not correctly reflect the total message size and prevents fine tuning within a com size. This patch multiplies the value by the number of elements sent. Signed-off-by: William Zhang <wilzhang@amazon.com>	2020-05-14 12:17:52 -07:00
William Zhang	771f9c011d	coll/tuned: Add NULL check to prevent segfault Signed-off-by: William Zhang <wilzhang@amazon.com> cr https://code.amazon.com/reviews/CR-23837553	2020-04-21 17:53:46 +00:00
William Zhang	50640402ab	coll/tuned: Fix typos Signed-off-by: William Zhang <wilzhang@amazon.com>	2020-04-21 17:39:37 +00:00
Nathan Hjelm	160ff188b8	Merge pull request #7169 from hjelmn/fix_what_wg21_calls_our_problem_not_theirs_seriously__in_some_ways_they_are_correct_but_wtf configure: use -iquote for non-system include paths	2020-03-30 09:22:54 -07:00
Mikhail Kurnosov	66b6b8d34e	Fix Bcast scatter_allgather Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2020-03-11 12:47:47 +07:00
Gilles Gouaillardet	69bc2e8372	misc: fix <> vs "" includes throught the ompi codebase This commit fixes an issue with the include usage in some ompi source files. These source files are using the <> form of include when the "" form is correct (as these are internal, not system headers). Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Nathan Hjelm <hjelmn@google.com>	2020-03-09 21:13:49 -04:00
bosilca	c4d36859ec	Merge pull request #7228 from devreal/progress-returns Harmonize return values of progress callbacks	2020-02-28 20:15:37 -05:00
Gilles Gouaillardet	174e967dbc	Remove ORTE project Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build without reference to ORTE. Setup opal/pmix framework to be static. Remove support for all PMI-1 and PMI-2 libraries. Add support for "external" pmix component as well as internal v4 one. remove orte: misc fixes - UCX fixes - VPATH issue - oshmem fixes - remove useless definition - Add PRRTE submodule - Get autogen.pl to traverse PRRTE submodule - Remove stale orcm reference - Configure embedded PRRTE - Correctly pass the prefix to PRRTE - Correctly set the OMPI_WANT_PRRTE am_conditional - Move prrte configuration to the end of OMPI's configure.ac - Make mpirun a symlink to prun, when available - Fix makedist with --no-orte/--no-prrte option - Add a `--no-prrte` option which is the same as the legacy `--no-orte` option. - Remove embedded PMIx tarball. Replace it with new submodule pointing to OpenPMIx master repo's master branch - Some cleanup in PRRTE integration and add config summary entry - Correctly set the hostname - Fix locality - Fix singleton operations - Fix support for "tune" and "am" options Signed-off-by: Ralph Castain <rhc@pmix.org> Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>	2020-02-07 18:20:06 -08:00
Joseph Schuchart	2c97187ee0	Harmonize return values of progress callbacks Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2020-01-28 20:15:03 +01:00
Austen Lauria	969eb0286c	Merge pull request #6970 from ggouaillardet/topic/cuda_no_orte coll/cuda: remove unnecessary references to ORTE	2020-01-27 13:12:32 -05:00
Austen Lauria	b65ec27307	Fix some compiler warnings. Silence unused variables, incompatible pointer types, un-initialized variables, and signed/unsigned comparisons. Signed-off-by: Austen Lauria <awlauria@us.ibm.com>	2020-01-10 13:10:53 -05:00
Jeff Squyres	8b424c3863	Merge pull request #7232 from bosilca/hjelmn_neighbor_alltoall_fix Neighbor alltoall fix	2019-12-17 17:24:05 -05:00
George Bosilca	86acdee460	Fix the communication ordering for all cartesian neighbor collectives. This work is rooted in the [MPI Forum issue 153](https://github.com/mpi-forum/mpi-issues/issues/153). Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-12-11 12:40:38 -05:00
Maxwell Coil	52241dbbcd	libnbc: fixed uninitialized variable Squash compiler warning. Signed-off-by: Maxwell Coil <mcoil@nd.edu>	2019-12-08 14:03:48 -05:00
Mikhail Brinskii	f2cbd4806e	COLL/TUNED: Add linear scatter using isend for mlnx platform Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>	2019-11-07 11:04:39 +02:00
Nathan Hjelm	196a91e604	coll/basic: fix neighbor alltoall message ordering This commit updates the coll/basic component to correctly order sends and receives for cartesian communicators with cyclic boundaries. This addresses an issue identified by mpi-forum/mpi-issues#153. This issue occurs when the size in any dimension is 1. This gives the same neighbor in the positive and negative directions. The old code was sending and receiving in the same order so the -1 buffer contained the +1 result and vise-versa. The problem is addressed by using unique tags for each send. This should cover both the case where overtaking is allowed and is not allowed. The former case will be possible is a MPI_Cart_create_with_info() call is added to the standard. Signed-off-by: Nathan Hjelm <hjelmn@google.com>	2019-10-08 21:10:49 -07:00
Gilles Gouaillardet	531171ca50	coll/cuda: remove unnecessary references to ORTE Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-09-10 13:55:59 +09:00
Valentin Petrov	a0d99ad190	Coll/hcoll: fixes hcoll non-blocking colls support open-mpi/ompi@0fe756d416 Introduced a bug in coll/hcoll component. The ompi_requests allocated by libhcoll would be treated as coll_base_nbc_request during ompi_coll_base_retain_<> call. Afterwards this would lead to a segv in the request cleanup. Fix: since libhcoll interface does not distinguish between the blocling/non-blocking requests use coll_base_nbc_request all the time and initialize it properly in coll/hcoll/get_coll_handle(). It is still within 2 cache lines. Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2019-08-27 17:22:58 +03:00
Gilles Gouaillardet	63d3ccde9d	coll/base: only retain datatypes/op if the request has not yet completed a non blocking collective might return ompi_request_null, so we should not retain anything in that case. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-08-09 09:57:56 +09:00
Gilles Gouaillardet	0862c409f1	coll/base: cleanup ompi_coll_base_nbc_request_t elements Since ompi_coll_base_nbc_request_t is to be used in an opal_free_list_t, it must be returned into a "clean" state. So cleanup some data in the callback completion subroutines. This fixes a regression introduced in open-mpi/ompi@0fe756d416 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-08-08 10:48:06 +09:00
Gilles Gouaillardet	f8eef0fde9	coll/libnbc: fixes ompi ompi_coll_libnbc_request_t parent base ompi_coll_libnbc_request_t on top of ompi_coll_base_nbc_request_t to correctly support the retention of datatypes/operators This fixes a regression introduced in open-mpi/ompi@0fe756d416 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-08-08 10:47:48 +09:00
Yossi Itigin	98d0ecfe14	Merge pull request #6814 from brminich/tuned_all2all_select COLL/TUNED: Update alltoall selection rule for mellanox platform	2019-07-25 17:51:55 +03:00
Mikhail Brinskii	65618f8db8	COLL/TUNED: Minor var names/comments fixes Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>	2019-07-24 10:23:38 +00:00
Mikhail Brinskii	404c480068	COLL/TUNED: Update alltoall selection rule for mlx Use linear with sync alltoall algorithm for certain message/comm size ranges. Does not affect default fixed decision, unless HPCX (with its custom parameters) is used or corresponding mca is set. Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>	2019-07-13 23:27:40 +03:00
Gilles Gouaillardet	0fe756d416	mpi: retain operation and datatype in non blocking collectives MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd after a call to a non blocking collective and before the non-blocking collective completes. Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is invoked, and set a request callback so they are free'd when the MPI_Request completes. Thanks Thomas Ponweiser for reporting this Fixes open-mpi/ompi#2151 Fixes open-mpi/ompi#1304 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-07-12 09:15:45 +09:00
Jeff Squyres	7c3aeb3061	Merge pull request #6686 from alex-anenkov/coll-iallreduce-recursivedoubling coll/libnbc: add recursive doubling algorithm for MPI_Iallreduce	2019-06-10 10:09:51 -04:00
Mikhail Brinskii	79006f4e5a	COLL/BASE: Fix linear sync all2all Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>	2019-06-06 19:22:42 +03:00
Tomislav Janjusic	6ea920e225	Coll/hcoll: adding scatterv interface Signed-off-by: Valentin Petrov valentinp@mellanox.com	2019-05-27 12:27:43 +03:00
Valentin Petrov	f19f6f432a	Coll/hcoll: don't init opal memhooks unless explicitely requested by user If user sets HCOLL_EXTERNAL_UCM_EVENTS=1 then we try init opal memory framework and register a mem release cb. Otherwise, rely on ucx. Signed-off-by: Valentin Petrov <valentinp@mellanox.com>	2019-05-20 11:17:44 +03:00
Alex Anenkov	77d466edf3	coll/libnbc: add recursive doubling algorithm for MPI_Iallreduce Signed-off-by: Alex Anenkov <anenkov.ru@gmail.com>	2019-05-19 18:39:11 +07:00
KAWASHIMA Takahiro	4d7bde27fb	ompi/datatype: Use `short float` for `MPI_REAL2` ... and add `MPI_COMPLEX4`. This commit changes values of existing `OMPI_DATATYPE_MPI_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OMPI internal code. On the other hand, `ompi_datatype_t::id` values of existing datatypes are not changed and 73 is newly assigned to for `MPI_COMPLEX4` to retain ABI compatibility. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 13:01:10 +09:00
KAWASHIMA Takahiro	4375c11a58	ompi/datatype: Add `ompi_mpi_short_float` ... and `ompi_mpi_c_short_float_complex` and `ompi_mpi_cxx_sfltcplex`. These are Open MPI internal variables intended to be defined as `MPI_SHORT_FLOAT`, `MPI_C_SHORT_FLOAT_COMPLEX`, and `MPI_CXX_SHORT_FLOAT_COMPLEX` in the future. `OMPI_DATATYPE_MPI_C_SHORT_FLOAT_COMPLEX` is also required to support `MPI_COMPLEX4` in the next commit. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:43:13 +09:00

1 2 3 4 5 ...

1196 Коммитов