1
1

10320 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
5f3dbdb5c8 mtl/ofi: replace OMPI_UNLIKELY with OPAL version
one off patch for v4.0.x.  for some reason commit on master
didn't have this problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-09-26 16:01:28 -05:00
Geoff Paulsen
32984ceb65
Merge pull request #7005 from mwheinz/REFS6976-4.0.x
v4.0.x: REF6976 Silent failure of OMPI over OFI with large messages sizes
2019-09-24 12:25:43 -05:00
Michael Heinz
89be953cfd REF6976 Silent failure of OMPI over OFI with large messages sizes
INTERNAL: STL-59403

The OFI (libfabric) MTL does not respect the maximum message size
parameter that OFI provides in the fi_info data.

This patch adds this missing max_msg_size field to the mca_ofi_module_t
structure and adds a length check to the low-level send routines.

(cherry-picked from commit 3aca4af548a3d781b6b52f89f4d6c7e66d379609)
Change-Id: Ie50445e5edfb0f30916de0836db0edc64ecf7c60
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
Reviewed-by: Adam Goldman <adam.goldman@intel.com>
Reviewed-by: Brendan Cunningham <brendan.cunningham@intel.com>
2019-09-23 17:19:10 -04:00
guserav
9bf1873215 Fix osc sm posts when only 32 bit atomics support
Signed-off-by: guserav <erik.zeiske@hpe.com>
(cherry picked from commit 3c9f4e682369e6fd5860b46ba81d79f2d1599a35)
2019-08-31 12:31:19 -07:00
Geoff Paulsen
2d515f747f
Merge pull request #6934 from devreal/osc-ucx-excl-lock-v4.0.x
UCX osc: properly release exclusive lock to avoid lockup (v4.0.x)
2019-08-29 13:41:03 -05:00
Joseph Schuchart
8d130e1964 UCX osc: properly release exclusive lock to avoid lockup
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 08cb6389e034c1a70368671f745f20904c774a1e)
2019-08-27 23:12:56 +02:00
Valentin Petrov
83a2518994 Coll/hcoll: fixes hcoll non-blocking colls support
open-mpi/ompi@0fe756d416 Introduced
    a bug in coll/hcoll component. The ompi_requests allocated by
    libhcoll would be treated as coll_base_nbc_request during
    ompi_coll_base_retain_<> call. Afterwards this would lead to a
    segv in the request cleanup.

    Fix: since libhcoll interface does not distinguish between the
    blocling/non-blocking requests use coll_base_nbc_request all the
    time and initialize it properly in
    coll/hcoll/get_coll_handle(). It is still within 2 cache lines.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-08-27 17:23:52 +03:00
Howard Pritchard
e4adbeefe7
Merge pull request #6905 from edgargabriel/pr/file-seek-end-fix-v4.0.x
io_ompio_file_open: fix offset calculation with SEEK_END
2019-08-23 13:11:33 -06:00
Geoff Paulsen
390e0bc5b2
Merge pull request #6863 from bosilca/topic/backport_6695
Refresh of the datatype engine from Topic/backport 6695
2019-08-21 10:49:37 -05:00
Howard Pritchard
f96994b12f
Merge pull request #6865 from rhc54/cmr40/locality
Provide locality for all procs on node
2019-08-19 13:26:59 -06:00
Howard Pritchard
7b09c15b90
Merge pull request #6892 from janjust/v4.0.x-osc_fix
v4.0.x: osc/ucx: Fix possible win creation/destruction race condition
2019-08-19 13:26:32 -06:00
George Bosilca
c9f48e2e77
Whitespace cleanup
No code or logic changes.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-08-16 10:27:43 -04:00
Edgar Gabriel
d72d39bfee io_ompio_file_open: fix offset calculation with SEEK_END
and SEEK_CUR. fixes an issue reported by Wei-keng Liao

Fixes Issue #6858

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-08-16 09:03:10 -05:00
Ralph Castain
e17203b4f7
Silence Coverity warning
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-08-12 12:42:41 -07:00
Ralph Castain
14f3fbb8c1
Provide locality for all procs on node
Update PMIx to latest master to get supporting updates. For
connect/accept (part of comm_spawn as well), lookup locality for all
participating procs on the node and compute the relative locality so it
can be used for MPI operations.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit d202e10c1407d2f9177e9b871eadde1f25526676)
2019-08-12 12:42:40 -07:00
Tomislav Janjusic
e9a0343780 osc/ucx: Fix possible win creation/destruction race condition
To avoid fully initializing the osc/ucx component for MPI application
that are not using One-Sided functionality, the initialization happens
at the first MPI window creation.

This commit ensures atomicity of global state modifications.

ported from: 6678ac0f557935b291ec2310216b7ea46e0c13b1
Signed-off-by: Artem Polyakov <artpol84@gmail.com>

fix alignment, and fix error path
2019-08-12 22:23:17 +03:00
Gilles Gouaillardet
39ec580b76 coll/base: only retain datatypes/op if the request has not yet completed
a non blocking collective might return ompi_request_null, so we should not
retain anything in that case.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@63d3ccde9d)
2019-08-13 00:13:40 +09:00
Gilles Gouaillardet
ae26957619 coll/base: cleanup ompi_coll_base_nbc_request_t elements
Since ompi_coll_base_nbc_request_t is to be used in an
opal_free_list_t, it must be returned into a "clean" state.
So cleanup some data in the callback completion subroutines.

This fixes a regression introduced in open-mpi/ompi@0fe756d416

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@0862c409f1)
2019-08-13 00:13:40 +09:00
Gilles Gouaillardet
b37c85dcca coll/libnbc: fixes ompi ompi_coll_libnbc_request_t parent
base ompi_coll_libnbc_request_t on top of ompi_coll_base_nbc_request_t
to correctly support the retention of datatypes/operators

This fixes a regression introduced in open-mpi/ompi@0fe756d416

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@f8eef0fde9)
2019-08-13 00:13:40 +09:00
Sergey Oblomov
2fa112c0a6 UCX: added PPN hint for UCX context
- added PPN hint for UCX context init

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 43186e494b47ca29e8d5e7a864b6b98b8e873195)

Conflicts:
	opal/mca/common/ucx/common_ucx_wpool.c
2019-08-09 11:51:30 +03:00
George Bosilca
8b794235b8
Update the datatype dump to match the actual types.
Update the comments to better reflect what is going on.
Minor indentations.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:37:47 -04:00
George Bosilca
4f754d0156
Optimized datatype description.
Move toward a base type of vector (count, type, blocklen, extent, disp)
with disp and extent applying toward the count repertition and blocklen
being a contiguous memory of type type.
Implement 2 optimizations on this description used during type_commit:
- collapse: successive similar datatype descriptions are collapsed
together with an increased count.
- fusion: fuse successive datatype descriptions in order to minimize the
number of resulting memcpy during pack/unpack.

Fixes at the OMPI datatype level including:
 - Fix the create_hindexed and vector creation.
 - Fix the handling of [get|set]_elements and _count.
 - Correctly compute the dispacement for block indexed types.
 - Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:35:07 -04:00
George Bosilca
f68b06e9ee
Fix incorrect behavior with length == 0
Fixes #6575.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:33:28 -04:00
Howard Pritchard
e547a2b94d
Merge pull request #6838 from ggouaillardet/topic/v4.0.x/misc_fortran_bindings
v4.0.x: misc Fortran related backports
2019-08-02 13:00:31 -06:00
Howard Pritchard
31aa52f11a
Merge pull request #6846 from nysal/topic/v4.0.x/ucx_accumulate_fix
v4.0.x: osc/ucx: Fix data corruption with non-contiguous accumulates
2019-08-02 12:43:40 -06:00
Nysal Jan K.A
359cdf2b53 osc/ucx: Fix data corruption with non-contiguous accumulates
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
(cherry picked from commit 3529d447020684ab305411caa97423826bb40906)
2019-07-26 14:41:08 +05:30
Mikhail Brinskii
b9998a14dc COLL/TUNED: Minor var names/comments fixes
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 65618f8db848613c95cbe112033df94721d326a8)
2019-07-26 11:29:12 +03:00
Mikhail Brinskii
3d5b7b4a1b COLL/TUNED: Update alltoall selection rule for mlx
Use linear with sync alltoall algorithm for certain message/comm size
ranges. Does not affect default fixed decision, unless HPCX (with its
custom parameters) is used or corresponding mca is set.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 404c4800688548b021bda68bdf10792424e6b1c5)
2019-07-26 11:28:47 +03:00
KAWASHIMA Takahiro
1ffb9b10bb pcollreq/mpif-h: fix MPIX_Alltoallw_init() binding
These issues were introduced in the recent commit b71af0eca0.
This commit fixes Coverity CID 1451661 and 1451660.

Though `c_info` part was an actual bug, the `c_sendtypes` part was not.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>

(cherry picked from commit open-mpi/ompi@facf8c5e98)
2019-07-24 17:12:10 +09:00
Gilles Gouaillardet
13ba2b0d75 pcollreq/mpif-h: fix MPIX_Alltoallw_init() binding
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@b71af0eca0)
2019-07-24 17:11:44 +09:00
Gilles Gouaillardet
5ab26e490a fortran/mpif-h: fix [i]alltoallw bindings
Fix a regression introduced in open-mpi/ompi@cdaed89d04

Fixes CID 1451610, 1451611 and 1451612

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@ed703bec1b)
2019-07-24 17:10:58 +09:00
Gilles Gouaillardet
fbf7d31fd1 fortran/mpif-h: fix MPI_[I]Alltoallw() binding
- ignore sendcounts, sendispls and sendtypes arguments when MPI_IN_PLACE is used
 - use the right size when an inter-communicator is used.

Thanks Markus Geimer for reporting this.

Refs. open-mpi/ompi#5459

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@cdaed89d04)
2019-07-24 17:10:27 +09:00
Gilles Gouaillardet
aae73d9cf7 fortran/mpif-h: fix C to Fortran error code conversion
- remove incorrect use of OMPI_INT_2_FINT()
 - use homogenous syntax (e.g. c_ierr = PMPI_...())

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@223e6cc537)
2019-07-24 17:10:00 +09:00
Howard Pritchard
667aba9913
Merge pull request #6810 from janjust/v4.0.x
v4.0.x OSC: Reset external request to NULL
2019-07-23 09:05:03 -06:00
Tomislav Janjusic
63605fc466 v4.0.x OSC: Reset external request to NULL to avoid double request
completion
Co-authored with Artem Polyakov <artemp@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-07-12 22:49:34 +03:00
Gilles Gouaillardet
c9e4240e70 mpi: retain operation and datatype in non blocking collectives
MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd
after a call to a non blocking collective and before the non-blocking
collective completes.
Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is
invoked, and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi/ompi#2151
Fixes open-mpi/ompi#1304

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@0fe756d416)
2019-07-12 10:27:04 +09:00
Aurelien Bouteiller
9499dcfe41 Manage errors in NBC collective ops
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

Correctly bubble up errors in NBC collective operations

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

The error field of requests needs to be rearmed at start, not at create

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

(cherry picked from commit open-mpi/ompi@65660e5999)
2019-07-12 10:26:08 +09:00
Nysal Jan K.A
b6da090090 pml/ucx: Fix the max tag and context id values
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
(cherry picked from commit fe4ef147f81b2ac56661175005de6c330eace690)
2019-07-03 16:38:07 +03:00
Geoff Paulsen
514e273968
Merge pull request #6770 from devreal/osc_winalloc_err_v4.0.x
OSC rdma win allocate: propagate errors to avoid deadlocks (v4.0.x)
2019-06-28 14:04:12 -05:00
Howard Pritchard
6424857029
Merge pull request #6634 from jsquyres/pr/v4.0.x/ob1-fixes
v4.0.x: Cherry pick ob1 fixes from master
2019-06-26 10:49:32 -06:00
Harald Klimach
16e1d74c8f Suggestion to fix division by zero in file view.
In common_ompi_aggregators calc_cost routine:
do not cast the real division to an int intermediately.
This patch removes the obsolete int variable c and assigns
the result of the P_a/P_x division directly to n_as.

With the intermediate int c variable, n_as gets 0 if P_a < P_x,
resulting in a division by 0 when computing n_s.

Signed-off-by: Harald Klimach <harald.klimach@uni-siegen.de>
(cherry picked from commit e222a04ae57e5d09b8559f3c111de1f10a47246a)
2019-06-25 09:29:08 -06:00
Howard Pritchard
28d300915f
Merge pull request #6725 from bosilca/cherrypick/6683
Cherrypick/6683
2019-06-24 13:24:02 -06:00
Joseph Schuchart
c5cf3432b9 OSC rdma win allocate: synchronize error codes across shared memory group
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 8f27cc26d9845b5b207979b2a4621ef1089d1afb)
2019-06-24 17:49:26 +02:00
Howard Pritchard
73c4aac12d
Merge pull request #6750 from brminich/topic/all2all_linear_sync_fix_v4.0
COLL/BASE: Fix linear sync all2all - v4.0.x
2019-06-17 13:45:52 -06:00
Howard Pritchard
cb8dd569ff
Merge pull request #6747 from devreal/rdma-fetchop-local-v4.0.x
OSC rdma: make sure accumulating in shared memory is safe
2019-06-13 18:55:53 -06:00
Mikhail Brinskii
adba7f55f7 COLL/BASE: Fix linear sync all2all
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 79006f4e5a578d32bfa08de7b98e747ae18706f6)
2019-06-09 21:31:19 +03:00
Joseph Schuchart
900f0fa21f OSC rdma: make sure accumulating in shared memory is safe
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit c67e2291937a09947c421dc84c6b3a8d07bec07f)
2019-06-07 12:45:00 +02:00
Tsubasa Yanagibashi
5dd8830dca mpiext/pcollreq: Add _f08 to procedure names
The procedure names don't contain "_f08" of Fortran 2008 bindings of
Persistent Collective Operations(mpiext/pcollreq/use-mpi-f08).
This fix adds "_f08" to the procedure names of pcollreq/use-mpi-f08,
same as other Fortran 2008 routines in `ompi/mpi/fortran/use-mpi-f08/mod`.

Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
(cherry picked from commit 3148b0cfaa04843e7219acb8c7e04f43f6d219fe)
2019-06-07 10:59:01 +09:00
Geoff Paulsen
a04f5f0c70
Merge pull request #6692 from vspetrov/v4.0.x
V4.0.x Coll/hcoll: don't init opal memhooks unless explicitely requested
2019-06-03 15:00:36 -05:00
George Bosilca
a8d5da67db
Fix the man pages for some of the MPI_T_* functions.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-31 00:19:14 -04:00