1
1
Граф коммитов

10501 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
e06595d7f6 lustre: squash some compiler warnings
Compiling OMPI on cray systems using latest Cray compilers (clang based)
yielded some compiler warnings from ompio/lustre.  Squash these warnings.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit e66a7cef11)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 16:00:01 -05:00
Geoff Paulsen
93c879962e
Merge pull request #7168 from wbailey2/pr/fix-yield_when_idle
v4.0.x: schizo/ompi: correctly handle the yield_when_idle option
2020-01-03 14:06:36 -06:00
Robert Wespetal
47c435e531 mtl/ofi: ignore case when comparing provider names
Change the provider include and exclude list name comparison check to
ignore case. The UDP provider's name is uppercase and was being selected
despite being in the exclude list.

Signed-off-by: Robert Wespetal <wesper@amazon.com>
(cherry picked from commit 9b72e9465d)
2020-01-03 08:52:24 -08:00
Howard Pritchard
6a739f8357
Merge pull request #7243 from jsquyres/pr/v4.0.x/neighbor-alltoall-fix
v4.0.x: neighbor alltoall fix
2019-12-23 13:40:28 -07:00
Aravind Gopalakrishnan
1bee429a8d MTL/OFI: Check threshold number of peers allowed per rank
When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have
sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when
this limit is crossed.

Check the max allowed number of ranks during add_procs() and return if there is
danger of exceeding this threshold.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 5cf43de445)
2019-12-19 22:36:43 +00:00
Howard Pritchard
f6914ee35c
Merge pull request #7229 from tkordenbrock/topic/v4.0.x/portals4.fix.flowcontrol.bugs
v4.0.x: portals4: fix flow control bugs
2019-12-18 08:35:46 -07:00
George Bosilca
be58cf7982 Fix the communication ordering for all cartesian neighbor collectives.
This work is rooted in the [MPI Forum issue
153](https://github.com/mpi-forum/mpi-issues/issues/153).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 86acdee460)
2019-12-17 14:25:22 -08:00
Nathan Hjelm
21221eb70a coll/basic: fix neighbor alltoall message ordering
This commit updates the coll/basic component to correctly order sends
and receives for cartesian communicators with cyclic boundaries. This
addresses an issue identified by mpi-forum/mpi-issues#153. This issue
occurs when the size in any dimension is 1. This gives the same
neighbor in the positive and negative directions. The old code was
sending and receiving in the same order so the -1 buffer contained
the +1 result and vise-versa. The problem is addressed by using
unique tags for each send. This should cover both the case where
overtaking is allowed and is not allowed. The former case will be
possible is a MPI_Cart_create_with_info() call is added to the
standard.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 196a91e604)
2019-12-17 14:25:22 -08:00
Howard Pritchard
3f752f1d4f
Merge pull request #7237 from mcoil1/pr/v4.0.x/wbailey2-fixes
v4.0.x/two fixes
2019-12-17 09:22:07 -07:00
William Bailey
c01a71fbe9 romio: fix uninitialized variable
Squash compiler warning.

ROMIO is third-party software but has an annoying compiler warning;
this is the minimum distance fix.

Signed-off-by: William Bailey <wbailey2@nd.edu>
(cherry picked from commit 30bda56bce)
2019-12-14 17:18:57 -05:00
Maxwell Coil
6fdd902d3f romio: Update ADIOI_R_Exchange_data function
Squash compiler warning due to whitespace/brace problems.

The code block from lines 829-839 was improperly indented, which led to
both the code being confusing and a compiler warning. Comparing this code to
the current version in the MPICH repo made it clear that the code was simply
improperly indented. Fixing the indentation both makes the code readable and
squashes the compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
(cherry picked from commit 8c237e2684)
2019-12-14 12:25:51 -05:00
Maxwell Coil
84a67bd6cf libnbc: fixed uninitialized variable
Squash compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
(cherry picked from commit 52241dbbcd)
2019-12-14 12:25:18 -05:00
Maxwell Coil
879a25c239 ompi/dpm/dpm.c: Fix uninititalized variable
Squash compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
(cherry picked from commit 3ced33c2eb)
2019-12-14 12:24:57 -05:00
Howard Pritchard
59d8d62555
Merge pull request #7116 from ggouaillardet/topic/v4.0.x/f08_bind_c_constants_revamp
v4.0.x: fortran/use-mpi-f08: revamp mpi_f08 constants
2019-12-13 08:07:42 -07:00
Todd Kordenbrock
1f5a79bbd4 mtl-portals4: don't finalize flow control if Portals4 was not initialized
This commit fixes a segfault in mtl-portals4 finalize().  The segfault
occurs if finalize() is called without any calls to add_procs().  This
commit resolves the segfault by skipping the flow control fini() call if
Portals4 was not initialized.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
(cherry picked from commit e7b867c044)
2019-12-12 08:43:20 -06:00
William Bailey
71fe9d78e0 fcoll/two_phase: Compiler warning for wrong variable type used
Squash compiler warning. Changed output specifier to match variable type (long int -> long long int).

Signed-off-by: William Bailey <wbailey2@nd.edu>
(cherry picked from commit e2718e0196)
2019-12-08 14:15:29 -05:00
Gilles Gouaillardet
b004f4c391 schizo/ompi: correctly handle the yield_when_idle option
in schizo/ompi, sets the new OMPI_MCA_mpi_oversubscribe environment
variable according to the node oversubscription state.

This MCA parameter is used to set the default value of the
mpi_yield_when_idle parameter.

This two steps tango is needed so the mpi_yield_when_idle setting
is always honored when set in a config file.

Refs. open-mpi/ompi#6433

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry-picked from cc97c0f611)
2019-12-02 17:18:26 -05:00
Howard Pritchard
d4dd837a3c
Merge pull request #7194 from edgargabriel/pr/two-phase-aggr-calc-32bits-bug-v4.0.x
fcoll/two_phase: fix error in calculating aggregators in 32bit mode
2019-11-27 12:20:34 -07:00
Edgar Gabriel
02da54c174 fcoll/two_phase: fix error in calculating aggregators in 32bit mode
In fcoll_two_phase_supprot_fns.c: calculation of the aggregator index
failed for large offsets on 32bit machine, due to improper handling of
64bit offsets.

Fixes Issue #7110

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit ea1355beae)
2019-11-25 09:06:36 -06:00
Edgar Gabriel
39acc3a251 common/ompio: fix calculation in simple-grouping option
This is based on a bug reported on the mailing list using a netcdf testcase.
The problem occurs if processes are using a custom file view, but on some
of them it appears as if the default file view is being used. Because of that,
the simple-grouping option lead to different number of aggregators used on different
processes, and ultimately to a deadlock. This patch fixes the problem by not using
the file_view size anymore for the calculation in the simple-grouping option,
but the contiguous chunk size (which is identical on all processes).

Fixes issue #7109

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit ad5d0df4e9)
2019-11-25 09:04:13 -06:00
Gilles Gouaillardet
02c79ac0c8 fortran/use-mpi-f08: misc fixes
- fix typos from open-mpi/ompi@b10a60a5a9
 - remove remaining references to OMPI_PROTECTED from open-mpi/ompi@df6d763a53

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@fda4d040da)
2019-11-06 10:10:22 +09:00
Gilles Gouaillardet
0ab61c9b74 fortran/use-mpi-f08: remove unused references to OMPI_PROTECTED
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@df6d763a53)
2019-11-06 10:10:22 +09:00
Gilles Gouaillardet
23ed2f44a2 fortran/use-mpi-f08: revamp constant declarations
In order to work around an issue with flang based compilers,
avoid declaring bind(C) constants and use plain Fortran parameter
instead.

For example,
type(MPI_Comm), bind(C, name="ompi_f08_mpi_comm_world") OMPI_PROTECTED :: MPI_COMM_WORLD
is changed to
type(MPI_Comm), parameter :: MPI_COMM_WORLD = MPI_Comm(OMPI_MPI_COMM_WORLD)

Note that in order to preserve ABI compatibility, ompi/mpi/fortran/use-mpi-f08/constants.{c,h}
have been kept even if its symbols are no more referenced by Open MPI.

Refs. open-mpi/ompi#7091

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@b10a60a5a9)
2019-11-06 10:10:18 +09:00
KAWASHIMA Takahiro
56d2865cf6 fortran/use-mpi-f08: Add C++ datatypes and MPI_NO_OP
Though the MPI standard does not have `MPI_CXX_COMPLEX`, `mpi.h`,
`mpif.h`, and `mpi.mod` have it. So I added it for consistency.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>

(cherry picked from commit open-mpi/ompi@63ecf01610)
2019-11-06 09:49:17 +09:00
KAWASHIMA Takahiro
792b5a01a5 fortran/use-mpi-f08: Remove unnecessary ;
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>

(cherry picked from commit open-mpi/ompi@e0c5bad195)
2019-11-06 09:48:58 +09:00
Gilles Gouaillardet
628883b38b fortran/use-mpi-f08: add MPI C types
Refers open-mpi/ompi#5801

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@69f1a19c5d)
2019-11-06 09:48:11 +09:00
Geoff Paulsen
524960dcdd
Merge pull request #7119 from devreal/grequestx-progress-v4.0.x
Ensure that grequestx continuously make progress (v4.0.x)
2019-11-01 14:12:48 -05:00
Howard Pritchard
608502ff85
Merge pull request #7102 from edgargabriel/pr/v4.0.x-romio321-status-set-elements-fix
MPIR_Status_set_bytes: fix for large count sizes
2019-11-01 13:08:35 -06:00
Joseph Schuchart
b7f5c17d83 Ensure that grequestx continuously make progress
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 37e6bbb1e1)
2019-10-29 10:31:23 +01:00
Edgar Gabriel
a3e1ecc14b comomn_ompio_file_read/write: fix 2GB limiting issue
individual read/write operations exceeding 2GB fail in ompio
due to improper conversions from size_t to int in two different
locations. This commit fixes an issue reported by Richard Warren
from the HDF5 group.

Fixes Issue #7045

Cherry-picked from commit a130f569df

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-10-22 12:12:55 -05:00
Edgar Gabriel
6185fa1946 MPIR_Status_set_bytes: fix for large count sizes
Change the ncounts argument to MPI_Count and use
MPI_Status_set_elements_x for enabling read/write operations beyond
the 2GB limit.

Thanks to  Richard Warren from the HDF5 group for reporting the issue
and providing the suggested fix for romio.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit 8a3abbf803)
2019-10-22 09:51:29 -05:00
Howard Pritchard
5f3dbdb5c8 mtl/ofi: replace OMPI_UNLIKELY with OPAL version
one off patch for v4.0.x.  for some reason commit on master
didn't have this problem.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-09-26 16:01:28 -05:00
Geoff Paulsen
32984ceb65
Merge pull request #7005 from mwheinz/REFS6976-4.0.x
v4.0.x: REF6976 Silent failure of OMPI over OFI with large messages sizes
2019-09-24 12:25:43 -05:00
Michael Heinz
89be953cfd REF6976 Silent failure of OMPI over OFI with large messages sizes
INTERNAL: STL-59403

The OFI (libfabric) MTL does not respect the maximum message size
parameter that OFI provides in the fi_info data.

This patch adds this missing max_msg_size field to the mca_ofi_module_t
structure and adds a length check to the low-level send routines.

(cherry-picked from commit 3aca4af548)
Change-Id: Ie50445e5edfb0f30916de0836db0edc64ecf7c60
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
Reviewed-by: Adam Goldman <adam.goldman@intel.com>
Reviewed-by: Brendan Cunningham <brendan.cunningham@intel.com>
2019-09-23 17:19:10 -04:00
guserav
9bf1873215 Fix osc sm posts when only 32 bit atomics support
Signed-off-by: guserav <erik.zeiske@hpe.com>
(cherry picked from commit 3c9f4e6823)
2019-08-31 12:31:19 -07:00
Geoff Paulsen
2d515f747f
Merge pull request #6934 from devreal/osc-ucx-excl-lock-v4.0.x
UCX osc: properly release exclusive lock to avoid lockup (v4.0.x)
2019-08-29 13:41:03 -05:00
Joseph Schuchart
8d130e1964 UCX osc: properly release exclusive lock to avoid lockup
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 08cb6389e034c1a70368671f745f20904c774a1e)
2019-08-27 23:12:56 +02:00
Valentin Petrov
83a2518994 Coll/hcoll: fixes hcoll non-blocking colls support
open-mpi/ompi@0fe756d416 Introduced
    a bug in coll/hcoll component. The ompi_requests allocated by
    libhcoll would be treated as coll_base_nbc_request during
    ompi_coll_base_retain_<> call. Afterwards this would lead to a
    segv in the request cleanup.

    Fix: since libhcoll interface does not distinguish between the
    blocling/non-blocking requests use coll_base_nbc_request all the
    time and initialize it properly in
    coll/hcoll/get_coll_handle(). It is still within 2 cache lines.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-08-27 17:23:52 +03:00
Howard Pritchard
e4adbeefe7
Merge pull request #6905 from edgargabriel/pr/file-seek-end-fix-v4.0.x
io_ompio_file_open: fix offset calculation with SEEK_END
2019-08-23 13:11:33 -06:00
Geoff Paulsen
390e0bc5b2
Merge pull request #6863 from bosilca/topic/backport_6695
Refresh of the datatype engine from Topic/backport 6695
2019-08-21 10:49:37 -05:00
Howard Pritchard
f96994b12f
Merge pull request #6865 from rhc54/cmr40/locality
Provide locality for all procs on node
2019-08-19 13:26:59 -06:00
Howard Pritchard
7b09c15b90
Merge pull request #6892 from janjust/v4.0.x-osc_fix
v4.0.x: osc/ucx: Fix possible win creation/destruction race condition
2019-08-19 13:26:32 -06:00
George Bosilca
c9f48e2e77
Whitespace cleanup
No code or logic changes.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-08-16 10:27:43 -04:00
Edgar Gabriel
d72d39bfee io_ompio_file_open: fix offset calculation with SEEK_END
and SEEK_CUR. fixes an issue reported by Wei-keng Liao

Fixes Issue #6858

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-08-16 09:03:10 -05:00
Ralph Castain
e17203b4f7
Silence Coverity warning
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-08-12 12:42:41 -07:00
Ralph Castain
14f3fbb8c1
Provide locality for all procs on node
Update PMIx to latest master to get supporting updates. For
connect/accept (part of comm_spawn as well), lookup locality for all
participating procs on the node and compute the relative locality so it
can be used for MPI operations.

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit d202e10c14)
2019-08-12 12:42:40 -07:00
Tomislav Janjusic
e9a0343780 osc/ucx: Fix possible win creation/destruction race condition
To avoid fully initializing the osc/ucx component for MPI application
that are not using One-Sided functionality, the initialization happens
at the first MPI window creation.

This commit ensures atomicity of global state modifications.

ported from: 6678ac0f55
Signed-off-by: Artem Polyakov <artpol84@gmail.com>

fix alignment, and fix error path
2019-08-12 22:23:17 +03:00
Gilles Gouaillardet
39ec580b76 coll/base: only retain datatypes/op if the request has not yet completed
a non blocking collective might return ompi_request_null, so we should not
retain anything in that case.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@63d3ccde9d)
2019-08-13 00:13:40 +09:00
Gilles Gouaillardet
ae26957619 coll/base: cleanup ompi_coll_base_nbc_request_t elements
Since ompi_coll_base_nbc_request_t is to be used in an
opal_free_list_t, it must be returned into a "clean" state.
So cleanup some data in the callback completion subroutines.

This fixes a regression introduced in open-mpi/ompi@0fe756d416

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@0862c409f1)
2019-08-13 00:13:40 +09:00
Gilles Gouaillardet
b37c85dcca coll/libnbc: fixes ompi ompi_coll_libnbc_request_t parent
base ompi_coll_libnbc_request_t on top of ompi_coll_base_nbc_request_t
to correctly support the retention of datatypes/operators

This fixes a regression introduced in open-mpi/ompi@0fe756d416

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@f8eef0fde9)
2019-08-13 00:13:40 +09:00
Sergey Oblomov
2fa112c0a6 UCX: added PPN hint for UCX context
- added PPN hint for UCX context init

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 43186e494b)

Conflicts:
	opal/mca/common/ucx/common_ucx_wpool.c
2019-08-09 11:51:30 +03:00
George Bosilca
8b794235b8
Update the datatype dump to match the actual types.
Update the comments to better reflect what is going on.
Minor indentations.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:37:47 -04:00
George Bosilca
4f754d0156
Optimized datatype description.
Move toward a base type of vector (count, type, blocklen, extent, disp)
with disp and extent applying toward the count repertition and blocklen
being a contiguous memory of type type.
Implement 2 optimizations on this description used during type_commit:
- collapse: successive similar datatype descriptions are collapsed
together with an increased count.
- fusion: fuse successive datatype descriptions in order to minimize the
number of resulting memcpy during pack/unpack.

Fixes at the OMPI datatype level including:
 - Fix the create_hindexed and vector creation.
 - Fix the handling of [get|set]_elements and _count.
 - Correctly compute the dispacement for block indexed types.
 - Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:35:07 -04:00
George Bosilca
f68b06e9ee
Fix incorrect behavior with length == 0
Fixes #6575.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-05 09:33:28 -04:00
Howard Pritchard
e547a2b94d
Merge pull request #6838 from ggouaillardet/topic/v4.0.x/misc_fortran_bindings
v4.0.x: misc Fortran related backports
2019-08-02 13:00:31 -06:00
Howard Pritchard
31aa52f11a
Merge pull request #6846 from nysal/topic/v4.0.x/ucx_accumulate_fix
v4.0.x: osc/ucx: Fix data corruption with non-contiguous accumulates
2019-08-02 12:43:40 -06:00
Nysal Jan K.A
359cdf2b53 osc/ucx: Fix data corruption with non-contiguous accumulates
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
(cherry picked from commit 3529d44702)
2019-07-26 14:41:08 +05:30
Mikhail Brinskii
b9998a14dc COLL/TUNED: Minor var names/comments fixes
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 65618f8db8)
2019-07-26 11:29:12 +03:00
Mikhail Brinskii
3d5b7b4a1b COLL/TUNED: Update alltoall selection rule for mlx
Use linear with sync alltoall algorithm for certain message/comm size
ranges. Does not affect default fixed decision, unless HPCX (with its
custom parameters) is used or corresponding mca is set.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 404c480068)
2019-07-26 11:28:47 +03:00
KAWASHIMA Takahiro
1ffb9b10bb pcollreq/mpif-h: fix MPIX_Alltoallw_init() binding
These issues were introduced in the recent commit b71af0eca0.
This commit fixes Coverity CID 1451661 and 1451660.

Though `c_info` part was an actual bug, the `c_sendtypes` part was not.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>

(cherry picked from commit open-mpi/ompi@facf8c5e98)
2019-07-24 17:12:10 +09:00
Gilles Gouaillardet
13ba2b0d75 pcollreq/mpif-h: fix MPIX_Alltoallw_init() binding
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@b71af0eca0)
2019-07-24 17:11:44 +09:00
Gilles Gouaillardet
5ab26e490a fortran/mpif-h: fix [i]alltoallw bindings
Fix a regression introduced in open-mpi/ompi@cdaed89d04

Fixes CID 1451610, 1451611 and 1451612

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@ed703bec1b)
2019-07-24 17:10:58 +09:00
Gilles Gouaillardet
fbf7d31fd1 fortran/mpif-h: fix MPI_[I]Alltoallw() binding
- ignore sendcounts, sendispls and sendtypes arguments when MPI_IN_PLACE is used
 - use the right size when an inter-communicator is used.

Thanks Markus Geimer for reporting this.

Refs. open-mpi/ompi#5459

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@cdaed89d04)
2019-07-24 17:10:27 +09:00
Gilles Gouaillardet
aae73d9cf7 fortran/mpif-h: fix C to Fortran error code conversion
- remove incorrect use of OMPI_INT_2_FINT()
 - use homogenous syntax (e.g. c_ierr = PMPI_...())

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@223e6cc537)
2019-07-24 17:10:00 +09:00
Howard Pritchard
667aba9913
Merge pull request #6810 from janjust/v4.0.x
v4.0.x OSC: Reset external request to NULL
2019-07-23 09:05:03 -06:00
Tomislav Janjusic
63605fc466 v4.0.x OSC: Reset external request to NULL to avoid double request
completion
Co-authored with Artem Polyakov <artemp@mellanox.com>
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-07-12 22:49:34 +03:00
Gilles Gouaillardet
c9e4240e70 mpi: retain operation and datatype in non blocking collectives
MPI standard states a user MPI_Op and/or user MPI_Datatype can be free'd
after a call to a non blocking collective and before the non-blocking
collective completes.
Retain user (only) MPI_Op and MPI_Datatype when the non blocking call is
invoked, and set a request callback so they are free'd when the MPI_Request
completes.

Thanks Thomas Ponweiser for reporting this

Fixes open-mpi/ompi#2151
Fixes open-mpi/ompi#1304

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@0fe756d416)
2019-07-12 10:27:04 +09:00
Aurelien Bouteiller
9499dcfe41 Manage errors in NBC collective ops
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

Correctly bubble up errors in NBC collective operations

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

The error field of requests needs to be rearmed at start, not at create

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

(cherry picked from commit open-mpi/ompi@65660e5999)
2019-07-12 10:26:08 +09:00
Nysal Jan K.A
b6da090090 pml/ucx: Fix the max tag and context id values
Signed-off-by: Nysal Jan K.A <jnysal@in.ibm.com>
(cherry picked from commit fe4ef147f8)
2019-07-03 16:38:07 +03:00
Geoff Paulsen
514e273968
Merge pull request #6770 from devreal/osc_winalloc_err_v4.0.x
OSC rdma win allocate: propagate errors to avoid deadlocks (v4.0.x)
2019-06-28 14:04:12 -05:00
Howard Pritchard
6424857029
Merge pull request #6634 from jsquyres/pr/v4.0.x/ob1-fixes
v4.0.x: Cherry pick ob1 fixes from master
2019-06-26 10:49:32 -06:00
Harald Klimach
16e1d74c8f Suggestion to fix division by zero in file view.
In common_ompi_aggregators calc_cost routine:
do not cast the real division to an int intermediately.
This patch removes the obsolete int variable c and assigns
the result of the P_a/P_x division directly to n_as.

With the intermediate int c variable, n_as gets 0 if P_a < P_x,
resulting in a division by 0 when computing n_s.

Signed-off-by: Harald Klimach <harald.klimach@uni-siegen.de>
(cherry picked from commit e222a04ae5)
2019-06-25 09:29:08 -06:00
Howard Pritchard
28d300915f
Merge pull request #6725 from bosilca/cherrypick/6683
Cherrypick/6683
2019-06-24 13:24:02 -06:00
Joseph Schuchart
c5cf3432b9 OSC rdma win allocate: synchronize error codes across shared memory group
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 8f27cc26d9)
2019-06-24 17:49:26 +02:00
Howard Pritchard
73c4aac12d
Merge pull request #6750 from brminich/topic/all2all_linear_sync_fix_v4.0
COLL/BASE: Fix linear sync all2all - v4.0.x
2019-06-17 13:45:52 -06:00
Howard Pritchard
cb8dd569ff
Merge pull request #6747 from devreal/rdma-fetchop-local-v4.0.x
OSC rdma: make sure accumulating in shared memory is safe
2019-06-13 18:55:53 -06:00
Mikhail Brinskii
adba7f55f7 COLL/BASE: Fix linear sync all2all
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 79006f4e5a)
2019-06-09 21:31:19 +03:00
Joseph Schuchart
900f0fa21f OSC rdma: make sure accumulating in shared memory is safe
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit c67e229193)
2019-06-07 12:45:00 +02:00
Tsubasa Yanagibashi
5dd8830dca mpiext/pcollreq: Add _f08 to procedure names
The procedure names don't contain "_f08" of Fortran 2008 bindings of
Persistent Collective Operations(mpiext/pcollreq/use-mpi-f08).
This fix adds "_f08" to the procedure names of pcollreq/use-mpi-f08,
same as other Fortran 2008 routines in `ompi/mpi/fortran/use-mpi-f08/mod`.

Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
(cherry picked from commit 3148b0cfaa)
2019-06-07 10:59:01 +09:00
Geoff Paulsen
a04f5f0c70
Merge pull request #6692 from vspetrov/v4.0.x
V4.0.x Coll/hcoll: don't init opal memhooks unless explicitely requested
2019-06-03 15:00:36 -05:00
George Bosilca
a8d5da67db
Fix the man pages for some of the MPI_T_* functions.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-31 00:19:14 -04:00
George Bosilca
dbf89404d7
Fix the SPC initialization.
Use the PVAR ctx to save the SPC index, so that no lookup nor
restriction on the SPC vars position is imposed.
Make sure the PVAR are always registered.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-31 00:19:14 -04:00
George Bosilca
cadf315ca9
Fixed SPC/MPI_T initialization error.
Signed-off-by: Yong Qin <yongq@mellanox.com>
2019-05-30 17:54:26 -04:00
Howard Pritchard
e78851a6c7
Merge pull request #6704 from edgargabriel/pr/v4.0.x-empty-fileview-fix
common/ompio: fix division by zero problem with empty fview
2019-05-26 09:45:52 -06:00
Howard Pritchard
386ed07d54
Merge pull request #6689 from hoopoepg/topic/suppressed-pml-ucx-mt-warning-v4.0
PML/UCX: disable PML UCX if MT is requested but not supported - v4.0
2019-05-26 09:44:05 -06:00
Edgar Gabriel
c7250cd11d common/ompio: fix division by zero problem with empty fview
When using an empty fileview, a division by zero bug can occur in ompio. Not entirely sure why the problem did not show up previously, but some recent changes trigger that bug in one of our tests.

This pr is part of a fix applied in commit f6b3a0a

Fixes Issue #6703

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-23 13:48:57 -05:00
Valentin Petrov
8f82c899bc Coll/hcoll: don't init opal memhooks unless explicitely requested by user
If user sets HCOLL_EXTERNAL_UCM_EVENTS=1 then we try init opal
    memory framework and register a mem release cb. Otherwise, rely on ucx.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-05-20 14:00:50 +03:00
Sergey Oblomov
1edd36638b PML/UCX: disable PML UCX if MT is requested but not supported
- in case if multithreading requested but not supported
  disable PML UCX

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit a3578d9ece)
2019-05-20 09:59:59 +03:00
Yossi Itigin
4f9fb3e9ce OSC/UCX: Fix deadlock with atomic lock
Atomic lock must progress local worker while obtaining the remote lock,
otherwise an active message which actually releases the lock might not
be processed while polling on local memory location.

(picked from master 9d1994b)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-20 09:54:01 +03:00
George Bosilca
4946570b24 Remove few warnings identified by @rhc in #5514.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

(cherry picked from commit open-mpi/ompi@6d11a45f44)
2019-05-11 16:38:31 +09:00
Geoff Paulsen
73f9bcc374
Merge pull request #6632 from brminich/topic/shmem_all2all_put_4.0.x
SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h 4.0.x
2019-05-07 08:05:01 -05:00
Howard Pritchard
8e968f16a6
Merge pull request #6626 from ggouaillardet/topic/v4.0.x/mpi_combiner_xyz_integer
v4.0.x: mpi: mark MPI_COMBINER_{HVECTOR,HINDEXED,STRUCT}_INTEGER removed
2019-05-04 07:25:40 -06:00
George Bosilca
48f824327c Fix the leak of fragments for persistent sends.
The rdma_frag attached to the send request was not correctly released
upon request completion, leaking until MPI_Finalize. A quick solution
would have been to add RDMA_FRAG_RETURN at different locations on the
send request completion, but it would have unnecessarily made the
sendreq completion path more complex. Instead, I added the length to
the RDMA fragment so that it can be completed during the remote ack.

Be more explicit on the comment.

The rdma_frag can only be freed once when the peer forced a protocol
change (from RDMA GET to send/recv). Otherwise the fragment will be
returned once all data pertaining to it has been trasnferred.

NOTE: Had to add a typedef for "opal_atomic_size_t" from master into
opal/threads/thread_usage.h into this cherry pick (it is in
opal/include/opal_stdatomic.h on master, but that file does not exist
here on the v4.0.x branch).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit a16cf0e4dd)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-03 06:20:02 -07:00
Brelle Emmanuel
c44821aef5 pml/ob1: fixed local handle sent during PUT control message
In case of using a btl_put in ob1, the handle of the locally registered
memory is sent with a PUT control message. In the current master code
the sent handle is necessary the handle in the frag but if the handle
has been successfully registered in the request, the frag structure does
not have any valid handle and all fragments use the request one.

I suggest to check if the handle in the fragment is valid and if not to
send the handle from the request.

Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>
(cherry picked from commit e630046a4b)
2019-05-03 05:53:35 -07:00
Mikhail Brinskii
e4ee56d1f3 SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 2ef5bd8b36)
2019-05-02 21:25:59 +03:00
Howard Pritchard
3cafd02c7f
Merge pull request #6572 from markalle/v40x_fortran_macro
in-place conversion macro writes into INPUT argument
2019-05-01 11:54:12 -06:00
Howard Pritchard
41ef5c7a10
Merge pull request #6594 from vspetrov/osc_ucx_rget_rkey_fix
OSC/UCX: use correct rkey for atomic_fadd in rget/rput
2019-05-01 11:53:17 -06:00
Gilles Gouaillardet
e2638dbbf2 mpi: mark MPI_COMBINER_{HVECTOR,HINDEXED,STRUCT}_INTEGER removed
unless configure'd with --enable-mpi1-compatibility

This is a one-off commit for the v4.0.x branch since these symbols were
simply removed from master.

Thanks Lisandro Dalcin for reporting this.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-05-01 10:50:57 +09:00
Mark Allen
c081757462 fixing an unsafe usage of integer disps[] (romio321 gpfs)
There are a couple MPI_Alltoallv calls in ad_gpfs_aggrs.c where the
send/recv data comes from places like req[r].lens, and the send
buffer and send displacements for example were being calculated as
    sbuf = pick one of the reqs: req[bottom].lens
    sdisps[r] = req[r].lens - req[bottom].lens
which might be okay if the .lens was data inside of req[] so they'd
all be close to each other. But each .lens field is just a pointer
that's malloced, so those addresses can be all over the place, so the
integer-sized sdisps[] isn't safe.

I changed it to have a new extra array sbuf and rbuf for those two
Alltoallv calls, and copied the data into the sbuf from the same
locations it used to be setting up the sdisps[] at, and after the
Alltoallv I copy the data out of the new rbuf into the same
locations it used to be setting up the rdisps[] at.

For what it's worth I was able to get this to fail -np 2 on a GPFS
filesystem with hints romio_cb_write enable. I didn't whittle the
test down to something small, but it was failing in an
MPI_File_write_all call.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit d85cac8f1a)
2019-04-25 14:22:19 -04:00
Brelle Emmanuel
2a4bc0cb58 pml/ob1: fixed exit from get_frag_fail when falling back on btl_put
In the case the btl_get fails Ob1 tries to fallback on btl_put first but
the return code was ignored. So the code fell back on both btl_put and
btl_send.

Signed-off-by: Brelle Emmanuel <emmanuel.brelle@atos.net>
(cherry picked from commit 9c689f2225)
2019-04-22 14:25:34 -07:00
Valentin Petrov
2947ab2dbc OSC/UCX: correctly handle NULL origin addr and MPI_NO_OP
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-17 10:35:34 +03:00
Valentin Petrov
68c88e86f2 OSC/UCX: use correct rkey for atomic_fadd in rget/rput
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-04-16 15:24:57 +03:00
Thananon Patinyasakdikul
5999fdad5a pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.
We missed an assert to check if ALLOW_OVERTAKE is set or not before
validating the sequence number and this will cause deadlock.

Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
(cherry picked from commit 0263456cf4)
2019-04-09 11:24:24 -07:00
Mark Allen
36583df689 in-place conversion macro writes into INPUT argument
In fint_2_int.h there are some conversion macros for logicals. It has
one path for OMPI_SIZEOF_FORTRAN_LOGICAL != SIZEOF_INT where a new array
would be allocated and the conversions then might expand to
    c_array[i] = (array[i] == 0 ? 0 : 1)
and another path for OMPI_SIZEOF_FORTRAN_LOGICAL == SIZEOF_INT where it
does things "in place", so the same conversion there would just be
    array[i] = (array[i] == 0 ? 0 : 1)

The problem is some of the logical arrays being converted are INPUT
arguments. And it's possible for some compilers to even put the argument
in read-only memory so the above "in place" conversion SEGV's.  A
testcase I have used
    call MPI_CART_SUB(oldcomm, (/.true.,.false./), newcomm, ierr)
and gfortran put the second arg in read-only mem.

In cart_sub_f.c you can trace the ompi_fortran_logical_t *remain_dims arg.
remain_dims[] is for input only, but the file uses
    OMPI_LOGICAL_ARRAY_NAME_DECL(remain_dims);
    OMPI_ARRAY_LOGICAL_2_INT(remain_dims, ndims);
    PMPI_Cart_sub(..., OMPI_LOGICAL_ARRAY_NAME_CONVERT(remain_dims), ...);
    OMPI_ARRAY_INT_2_LOGICAL(remain_dims, ndims);
to convert it to c-ints make a C call then restore it to Fortran logicals
before returning.

It's not always wrong to convert purely in-place, eg cart_get_f.c has
a periods[] that's exclusively for OUTPUT and it would be fine with the
macros as they were. But I still say the macros are invalid because they
don't distinguish whether they're being used on INPUT or OUTPUT args and
thus they can't be used in a way that's legal for both cases.

It might be possible to fix the macros by adding more of them so that
cart_create_f.c and cart_get_f.c would use different macros that give
more context. But my fix here is just to turn off the first block and
make all paths run as if OMPI_SIZEOF_FORTRAN_LOGICAL != SIZEOF_INT.

The main macros that get enlarged by this change are
    define OMPI_ARRAY_LOGICAL_2_INT_ALLOC : mallocs now
    define OMPI_ARRAY_LOGICAL_2_INT : also mallocs now
But these are only used in 4 places, three of which are the purpose of
this checkin, to avoid the former in-place expansion of an INPUT arg:
    cart_create_f.c
    cart_map_f.c
    cart_sub_f.c
and one of which is an OUPUT arg that was fine and that gets
unnecessarily expanded into a separate array by this checkin.
    cart_get_f.c

So I think an unnecessary malloc in cart_get_f.c is the only downside
to this change, where the logicals array argument could have been used
and converted in place.

Signed-off-by: Mark Allen <markalle@us.ibm.com>

Update provided by Gilles Gouaillardet to keep the in-place option
if OMPI_FORTRAN_VALUE_TRUE == 1 where no conversion is needed.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 0a7f1e3cc5)
2019-04-05 13:34:09 -04:00
Howard Pritchard
702199f39e
Merge pull request #6545 from bertwesarg/v4.0.x-fix-cpp-condition
Fix use of bitwise operation in CPP condition (v4.0.x)
2019-04-05 07:58:09 -06:00
James Clark
d8dc69feb5 Add a compilation flag that adds unwind info to all files that are present in the stack starting from MPI_Init.
This is so when a debugger attaches using MPIR, it can step out of this stack back into main.
This cannot be done with certain aggressive optimisations and missing debug information.

Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

Co-authored-by: Jeff Squyres <jsquyres@cisco.com>

(cherry-picked from 20f5840)
2019-04-01 11:10:04 +01:00
Bert Wesarg
7f65e5b720 Fix use of bitwise operation in CPP condition
Signed-off-by: Bert Wesarg <bert.wesarg@tu-dresden.de>
(cherry picked from commit 18525ce39b)
2019-03-29 10:17:09 +01:00
Sergey Oblomov
14c271f993 PML/SPML/UCX: added evaluation of mmap events
- there was a set of UCX related issues reported which caused
  by mmap API hooks conflicts. We added diagnostic of such
  problems to simplify bug-resolving pipeline

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d8e3562bae)
2019-03-14 16:48:25 +02:00
Austen Lauria
8138cdbb49 Fix integer overflows with indexed datatype creation.
The types of count, disp, and extent passed into
ompi_datatype_add() should be size_t, ptrdiff_t and ptrdiff_t,
respectively. This prevents integer overflows and errors in
computing the size of large indexed datatypes.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit b61e6242d3)
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2019-03-13 14:20:26 -04:00
Howard Pritchard
5f7454a224 ompi_info: report whether MPI1 compat is enabled
Its so easy to misspell compatability (sic) that we need
to have ompi_info help us out.

Related to #6470

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit a5ba48c21839e0aab4c96afa97466a10f8bdc721)
2019-03-11 13:13:29 -06:00
Bert Wesarg
73134ab9e7 v4.0.x: Allow user to overwrite OMPI_ENABLE_MPI1_COMPAT
Follow-up to #6120.

As mentioned in [1], it may be desirable to nevertheless get the hidden
MPI 1 prototypes, for users who know what they are doing, i.e., the tools
guys. @ggouaillardet mentioned in [2], that `-DOMPI_OMIT_MPI1_COMPAT_DECLS=0`
should work, but it does not, as than we only get redefinition warnings.
See [3].

This topic does not relate to master, as we can remove the actual symbols
there, but here in v4.0.x land, the symbols are always there.

[1] https://github.com/open-mpi/ompi/pull/6120#issuecomment-443104700
[2] https://github.com/open-mpi/ompi/pull/6120#issuecomment-443117892
[3] https://github.com/open-mpi/ompi/pull/6120#issuecomment-468962596

Signed-off-by: Bert Wesarg <bert.wesarg@tu-dresden.de>
2019-03-07 09:54:20 +01:00
Geoffrey Paulsen
6df6a3f4bc mpi.h.in: Revamp MPI-1 removed function warnings
Refs https://github.com/open-mpi/ompi/issues/6278.

This commit is intended to be cherry-picked to v4.0.x and
the following commit will ammend to this functionality for
master's removal.

Changes the prototypes for MPI removed functions in the
following ways:

There are 4 cases:

 1) User wants MPI-1 compatibility (--enable-mpi1-compatibility)

    MPI_Address (and friends) are declared in mpi.h with
    deprecation notice

 2) User does not want MPI-1 compatibility, and has a C11-capable
    compiler

    Declare an MPI_Address (etc.) macro in mpi.h, which will
    cause a compile-time error using _Static_assert C11 feature

 3) User does not want MPI-1 compatibility, and does not have a
    C11-capable compiler, but the compiler supports error function
    attributes.

    Declare an MPI_Address (etc.) macro in mpi.h, which will
    cause a compile-time error using error function attribute.

 4) User does not want MPI-1 compatibility, and does not have a
    C11-capable compiler, or a compiler that supports error
    function attributes.

    Do not declare MPI_Address (etc.) in mpi.h at all.
    Unless the user is compiling with something like -Werror,
    this will allow the user's code to compile. We are
    choosing this because it seems like a losing battle to
    make some kind of compile time error that is friendly to
    the user (and doesn't make it look like mpi.h itself is broken).

    On v4.0.x, this will allow the user code to both compile
    (albeit with a warning) and link (because the MPI_Address
    will be in the MPI library because we are preserving ABI
    back to 3.0.x).

    On master/v5.0.x, this will allow the user code to compile,
    but it will fail to link (because the MPI_Address symbol will
    not be in the MPI library).

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
(cherry-picked from 3136a1706c)
2019-02-27 08:25:23 -08:00
Howard Pritchard
056d7ad0a3
Merge pull request #6419 from hppritcha/topic/fix_pgi_usempif08_4.0.x
fortran:use mpif08  fix for PGI linking
2019-02-25 15:54:15 -07:00
Geoff Paulsen
1920769946
Merge pull request #6423 from abouteiller/pr6417to4.0.x
v4.x: Cart/Graph create would not run the next_cid  algorithm
2019-02-22 16:25:38 -06:00
Aurelien Bouteiller
d6e8d51d5f
Cart/Graph create would not run the next_cid algorithm and create
disjoint communicator with inconsistent cid.

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2019-02-22 15:11:56 -05:00
Howard Pritchard
6596277ee8 fortran:use mpif08 fix for PGI linking
commit c6070fd2e broke building fortran bindings
with PGI compilers.  Turns out PGI compilers need
to link in the *.o from a module file whether or
not there are module subroutines defined or not in
the module file.

Related to #6411

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 266bc3aced)
2019-02-22 11:47:40 -07:00
Howard Pritchard
7aeb65579b
Merge pull request #6395 from brminich/topic/ucx_net_waddr_4.0.x
PML/UCX: Use net worker address for remote peers - v4.0.x
2019-02-21 20:29:47 -07:00
Mikhail Brinskii
1c514948f6 PML/UCX: Use net worker address for remote peers
For remote node peers pack smaller worker address, which contains
network device addresses only. This would reduce amount of OOB traffic
during startup.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 751d88192d)
2019-02-21 16:58:20 +02:00
Howard Pritchard
83cb9ca51e
Merge pull request #6404 from ggouaillardet/topic/v4.0.x/osc_rdma_self
osc/rdma: correctly handle communications to self
2019-02-20 09:53:50 -07:00
KAWASHIMA Takahiro
7b71369632 man: fix more typos in MPI_Win_attach man page
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>

[skip ci]
bot:notest

(cherry picked from commit open-mpi/ompi@7095ad10a5)
2019-02-20 13:26:48 +09:00
Gilles Gouaillardet
3ab227df30 man: fix typos in MPI_Win_{attach,detach} man pages
no code change

[skip ci]
bot:notest

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@7c0596819b)
2019-02-20 13:25:12 +09:00
Gilles Gouaillardet
749f51845b osc/rdma: correctly handle communications to self
mark the "self" peer OMPI_OSC_RDMA_PEER_LOCAL_BASE when
the window is dynamically created and use_cpu_atomics is set
in order to correctly handle communications to self.

Thanks Bart Janssens for reporting this issue.

Refs. open-mpi/ompi#6394

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@fe05fcc11a)
2019-02-20 13:06:05 +09:00
Howard Pritchard
40db950c7d
Merge pull request #6340 from jsquyres/pr/v4.0.x/make-mpi.h-a-little-friendlier-to-c++
v4.0.x: mpi.h.in: use C++ static_cast<> where appropriate
2019-02-14 17:06:47 -07:00
Howard Pritchard
d2745ad0ad
Merge pull request #6327 from ggouaillardet/topic/v4.0.x/op
ompi/op: fix support of non predefined datatypes with predefined oper…
2019-02-14 17:05:32 -07:00
Howard Pritchard
0b915b7e56
Merge pull request #6333 from jsquyres/pr/v4.0.x/hwloc-macro-conflict-fixes
v4.0.x: Various minor hwloc cleanups
2019-02-12 09:13:19 -07:00
Howard Pritchard
5dd63405ce
Merge pull request #6368 from jsquyres/pr/v4.0.x/fix-ofi-configury
v4.0.x: fix OFI configury
2019-02-11 13:15:52 -07:00
Howard Pritchard
8552d0e608
Merge pull request #6330 from ggouaillardet/topic/v4.0.x/ompi_datatype_set_args
ompi/datatype: fix how we compute the space needed for the args
2019-02-08 14:44:08 -07:00
Jeff Squyres
9ad871fc38 ofi: revamp OPAL_CHECK_OFI configury
Update the OPAL_CHECK_OFI configury macro:

- Make it safe to call the macro multiple times:
  - The checks only execute the first time it is invoked
  - Subsequent invocations, it just emits a friendly "checking..."
    message so that configure output is sensible/logical
- With the goal of ultimately removing opal/mca/common/ofi, rename the
  output variables from OPAL_CHECK_OFI to be
  opal_ofi_{happy|CPPFLAGS|LDFLAGS|LIBS}.
- Update btl/usnic and mtl/ofi for these new conventions.
- Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that
  causes the macro to be invoked at a fairly random time, which makes
  configure stdout confusing / hard to grok.
- Remove a little left-over kruft in OPAL_CHECK_OFI, too (which
  resulted in an indenting change, making the change to
  opal_check_ofi.m4 look larger than it really is).

Thanks Alastair McKinstry for the report and initial fix.
Thanks Rashika Kheria for the reminder.

Updated from master cherry pick: the OFI BTL does not exist on the
v4.0.x branch.  Therefore, did not include the OFI BTL changes on
master in this cherry pick.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f5e1a672cc)
2019-02-07 06:36:35 -08:00
Jeff Squyres
c39426ec91 mpi.h.in: use C++ static_cast<> where appropriate
When compiling mpi.h with a modern C++ compiler and a high degree of
pickyness (e.g., -Wold-style-cast), casting using (void*) in the
OMPI_PREDEFINED_GLOBAL and MPI_STATUS*_IGNORE macros will emit
warnings.  So if we're compiling with a C++ compiler, use C++'s
static_cast<> instead of (void*).

Thanks to @shadow-fax for identifying the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 30afdcead9)
2019-01-31 04:16:07 -08:00
René Widera
e30e5b95c6 common/ompio: possible rounding issue
Similar to #6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <r.widera@hzdr.de>
(cherry picked from commit a91fab80a1)
2019-01-30 12:31:39 -06:00
Edgar Gabriel
d1e8779fe3 common/ompio: fix a floating point division problem
This commit fixes  a problem reported on the mailing list with
individual writes larger than 512 MB.

The culprit is a floating point division of two large, close values.
Changing the datatypes from float to double (which is what is being
used in the fcoll components) fixes the problem.

See issue #6285 and

 https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118

Thanks for Axel Huebl and René Widera for reporting the issue.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit c0f8ce0fff)
2019-01-30 12:31:16 -06:00
Gilles Gouaillardet
a247292275 topo/treematch: silence a hwloc related warning
treematch/km_partitioning.c #include "config.h",
but there is no such file when the embedded treematch is used.

In order to prevent the embedded treematch from incorrectly using
the config.h from the embedded hwloc, generate a dummy config.h.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 0aeb27f776)
2019-01-30 07:33:33 -05:00
Gilles Gouaillardet
fd157a960a ompi/datatype: fix how we compute the space needed for the args
Refs. open-mpi/ompi#6275

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@45fb69b2b9)
2019-01-30 11:01:11 +09:00
Gilles Gouaillardet
f76c81a758 ompi/op: fix support of non predefined datatypes with predefined operators
ACCUMULATE, unlike REDUCE, can use with derived
datatypes with predefinied operations, with some
restrictions outlined in MPI-3:11.3.4.  The derived
datatype must be composed entierly from one predefined
datatype (so you can do all the construction you want,
but at the bottom, you can only use one datatype, say,
MPI_INT).

Refs. open-mpi/ompi#6275

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@bc1cab5498)
2019-01-30 10:29:39 +09:00
Howard Pritchard
c9764f661b
Merge pull request #6263 from jsquyres/pr/v4.0.x/minor-fortran-valgrind-fix
v4.0.x: mpi/fortran: Fix valgrind warnings for type create
2019-01-13 12:31:46 -07:00
Howard Pritchard
bc58e22b03
Merge pull request #6120 from gpaulsen/topic/v4.0.x/re-add-deprecated-oops
v4.0.x: Re-add removed deprecate-only MPI-2.0 symbols
2019-01-09 20:10:02 -07:00
Risto Toijala
979b401936 mpi/fortran: Fix valgrind warnings for type create
Valgrind warns that *newtype is uninitialized when calling from
Fortran as e.g.
    use mpi
    integer :: t, err
    call MPI_Type_create_f90_integer(5, t, err)

Since newtype is intent(out), this should not happen. There is
no reason to convert the type using PMPI_Type_f2c, only to over-
write it immediately afterwards. The other type_create_* functions
did not convert newtype.

The valgrind warnings:
==28441== Conditional jump or move depends on uninitialised value(s)
==28441==    at 0x581B555: PMPI_Type_f2c (in [...]/lib/libmpi.so.0.0.0)
==28441==    by 0x4E87AB7: MPI_TYPE_CREATE_F90_INTEGER (in [...]/lib/libmpi_mpifh.so.0.0.0)
==28441==    by 0x400BA1: MAIN__ (in [...])
==28441==    by 0x400C46: main (in [...])
==28441==
==28441== Conditional jump or move depends on uninitialised value(s)
==28441==    at 0x581B563: PMPI_Type_f2c (in [...]/lib/libmpi.so.0.0.0)
==28441==    by 0x4E87AB7: MPI_TYPE_CREATE_F90_INTEGER (in [...]/lib/libmpi_mpifh.so.0.0.0)
==28441==    by 0x400BA1: MAIN__ (in [..])
==28441==    by 0x400C46: main (in [...])
==28441==
==28441== Use of uninitialised value of size 8
==28441==    at 0x581B577: PMPI_Type_f2c (in [...]/lib/libmpi.so.0.0.0)
==28441==    by 0x4E87AB7: MPI_TYPE_CREATE_F90_INTEGER (in [...]/lib/libmpi_mpifh.so.0.0.0)
==28441==    by 0x400BA1: MAIN__ (in [...])
==28441==    by 0x400C46: main (in [...])
==28441==

Signed-off-by: Risto Toijala <risto.toijala@gmail.com>
(cherry picked from commit f14a0f4fc9)
2019-01-09 07:24:22 -08:00
Jeff Squyres
1a1a932acc romio321: ensure to distribute ompi_grequestx.h
Refs https://github.com/open-mpi/ompi/issues/6227.  Thanks to
George Marselis for reporting.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 62321be186)
2018-12-28 13:18:10 -08:00
Geoffrey Paulsen
4aa91e1ffb Return MPI1 function implementations to build list
Adding the implementations of the functions that were removed
from the MPI standard to the build list, regardless of the
state of the OMPI_ENABLE_MPI1_COMPAT.

According to the README, we want the OMPI_ENABLE_MPI1_COMPAT
configure flag to control which MPI prototypes are exposed in
mpi.h, NOT, which are built into the mpi library.  Those will
remain in the mpi library until a future major release (5.0?)

NOTE: for the Fortran implementations, we instead define
      OMPI_OMIT_MPI1_COMPAT_DECLS to 0 instead of
      OMPI_ENABLE_MPI1_COMPAT to 1.  I'm not sure why, but
      this seems to work correctly.

Also changing the removed MPI_Errhandler_create implementation
to use the non removed MPI_Comm_errhandler_function prototype
(prototype remains unchanged from MPI_Comm_errhandler_fn)

NOTE: This commit is *NOT* a cherry-pick from master, because
      on master, we are no longer building those symbols by
      default, but on v4.0.x we _ARE_ still building these
      symbols by default.   This is because the v4.0.x branch
      is to remain backwards compatible with v3.0.x, while at
      the same time removing the "removed" symbols from mpi.h
      (unless the user configures with --enable-mpi1-compatibility)

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2018-12-20 12:22:04 -06:00
Howard Pritchard
4be4282312
Merge pull request #6128 from ggouaillardet/topic/v4.0.x/mpiext_short_path
mpiext: keep paths short
2018-12-17 13:22:19 -07:00
Howard Pritchard
71b83e8a09
Merge pull request #6193 from kawashima-fj/pr/v4.0.x/fix-type-create-f90
v4.0.x: mpi/c: Fix MPI_TYPE_CREATE_F90_{REAL,COMPLEX}
2018-12-17 13:21:21 -07:00
KAWASHIMA Takahiro
8eb90ae9aa mpi/c: Fix MPI_TYPE_CREATE_F90_{REAL,COMPLEX}
This commit fixes edge cases of `r = 38` and `r = 308`.

As defined in the MPI standard, `TYPE_CREATE_F90_REAL` and
`TYPE_CREATE_F90_COMPLEX` must be consistent with the Fortran
`SELECTED_REAL_KIND` function. The `SELECTED_REAL_KIND` function is
defined based on the `RANGE` function. The `RANGE` function returns
`INT(MIN(LOG10(HUGE(X)), -LOG10(TINY(X))))` for a real value `X`.

The old code considers only `INT(LOG10(HUGE(X)))` using `*_MAX_10_EXP`.
This commit adds `INT(-LOG10(TINY(X)))` part using `*_MIN_10_EXP`.

This bug affected the following `p`-`r` combinations.

| p             | r   | expected  | returned  | expected  | returned  |
| :------------ | --: | :-------- | :-------- | :-------  | :-------- |
| MPI_UNDEFINED |  38 | REAL8     | REAL4     | COMPLEX16 | COMPLEX8  |
| 0 <= p <= 6   |  38 | REAL8     | REAL4     | COMPLEX16 | COMPLEX8  |
| MPI_UNDEFINED | 308 | REAL16    | REAL8     | COMPLEX32 | COMPLEX16 |
| 0 <= p <= 15  | 308 | REAL16    | REAL8     | COMPLEX32 | COMPLEX16 |

MPICH returns the same result as Open MPI with this fix.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 6fb01f64fe)
2018-12-13 16:01:56 +09:00
Gilles Gouaillardet
a79ce7d17f mpiext: updates for header file locations
Per discussion on https://github.com/open-mpi/ompi/pull/6030
and https://github.com/open-mpi/ompi/pull/6145, move
around where MPI extension header files are installed (specifically:
the installation tree path does not need to match the source tree
path).

For reference, header files were installed like this :

 - <prefix>/include/openmpi/ompi/mpiext/pcollreq/mpif-h/mpiext_pcollreq_mpifh.h
 - <prefix>/include/openmpi/ompi/mpiext/pcollreq/c/mpiext_pcollreq_c.h

and they are now installed like this :

 - <prefix>/include/openmpi/mpiext/mpiext_pcollreq_mpifh.h
 - <prefix>/include/openmpi/mpiext/mpiext_pcollreq_c.h

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@975e3cd0c9)
2018-12-12 09:24:45 +09:00
Gilles Gouaillardet
0ade49c286 mpi/c: add back (some more) deprecated subroutines
- MPI_NULL_DELETE_FN
 - MPI_NULL_COPY_FN
 - MPI_DUP_FN

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 5a968306d6)
2018-12-11 09:55:33 -06:00
Bert Wesarg
5e4a6db23b Re-add removed deprecate-only MPI-2.0 symbols
See #6114

Signed-off-by: Bert Wesarg <bert.wesarg@tu-dresden.de>
(cherry picked from commit b3f3281290)
2018-12-11 09:55:33 -06:00
Matias A Cabral
b2327049c1 MTL/PSM2: add missing default priority
Missing default priority after PR #6153

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
(cherry picked from commit c76c6d8b28)
2018-12-07 16:22:59 -08:00
Matias A Cabral
80113a368f MTL/PSM2: Do not lower the priority when all processes are local.
The intention of lowering the priority when all processes are local
was to favor Vader BTL. However, in builds including the OFI MTL it
gets selected instead.

Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com>
Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com>
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
(cherry picked from commit fc8582c560)
2018-12-07 11:11:43 -08:00
Howard Pritchard
804f65f247
Merge pull request #6035 from ggouaillardet/topic/v4.0.x/mpiext_cuda
mpiext/cuda: do not include automatically generated file into dist ta…
2018-12-04 09:26:55 -07:00
Geoff Paulsen
752bbd195f
Merge pull request #6102 from hoopoepg/topic/set-osc-ucx-level-200-v4.0
OSC: set UCX module used by default - v4.0
2018-12-04 10:26:37 -06:00
Geoff Paulsen
03cf3e4400
Merge pull request #6112 from kawashima-fj/pr/v4.0.x/update-pcoll-doc
v4.0.x: README & man: Update pcollreq documentation
2018-11-30 13:58:34 -06:00
Geoff Paulsen
bd2990f502
Merge pull request #6131 from devreal/rdma-plug-memleak-v4.0.x
v4.0.x: Plug two memory leaks in rdma osc
2018-11-30 13:54:51 -06:00
Joseph Schuchart
c5346751e6 Plug two memory leaks in rdma osc
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 91885f5876)
2018-11-29 10:19:26 -05:00
Sergey Oblomov
6651672711 OSC/UCX: set max level value to 60
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2d230b3aac)
2018-11-27 20:35:30 +02:00
Yossi Itigin
a112d10c93 pml_ucx: initialize req_mpi_object.comm for error handler
without this fix, an error handler invoked on pml_ucx request would
segfault while trying to dereference requests[i]->req_mpi_object.comm

(picked from master f36eeef)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-11-26 11:57:34 +02:00
KAWASHIMA Takahiro
6f68483fd5 README & man: Update pcollreq documentation
The feature of persistent collectives is approved in the Sept. 2018
MPI Forum meeting and 2018 Draft Specification of the MPI standard is
published during SC18.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 5f0fcf0f45)
2018-11-26 18:28:08 +09:00
Sergey Oblomov
38a4953707 OSC/UCX: added UCX version evaluation
- added UCX version evaluation to set OSC UCX priority

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e91f214982)
2018-11-22 11:31:53 +02:00
Sergey Oblomov
012e27af77 OSC: set UCX module used by default
- OSC/UCX module set priority to 200 to be used by default

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 36934a8bb2)
2018-11-22 10:59:43 +02:00
Howard Pritchard
8adaeb1536
Merge pull request #6007 from aravindksg/coll-tuned-fix-40x
coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
2018-11-19 13:15:40 -07:00
Howard Pritchard
ec79631ba2
Merge pull request #5936 from edgargabriel/pr/testmpio-v4.0.x
Pr/testmpio v4.0.x
2018-11-19 13:11:50 -07:00
Gilles Gouaillardet
9366c6eb2e mpiext/cuda: do not include automatically generated file into dist tarball
ompi/mpiext/cuda/c/mpiext_cuda_c.h is automatically generated from
ompi/mpiext/cuda/c/mpiext_cuda_c.h.in at configure time.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@f8318f0a8f)
(cherry picked from commit open-mpi/ompi@b3ce25af95)
2018-11-13 00:09:01 -06:00
Jeff Squyres
d0efdfd9c8 MPI_Type_get_envelope: remove MPI-1 deleted names
Several names are now no longer returned by MPI_Type_get_envelope.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 65eb118e08)
2018-11-06 10:07:05 -08:00
Geoffrey Paulsen
2d3b4bb91a mpi.h: restore some MPI-deprecated items to default builds
Commit 89da9651b inadvertantly #if'ed out both deprecated *and*
removed items from mpi.h.  The intent was only to #if out items that
have been *removed* from the MPI specification and leave all items
that are merely deprecated.

This commit also re-orders the deleted typedef+functions to be in the
same order as they are listed in MPI-3.1 chapter 17, just to make
verifying/checking the code easier.

Note that --enable-mpi1-compatibility can still be used to restore
prototypes for the items that have been removed from the MPI
specification (e.g., MPI_Address()).

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit b03a39d359)
2018-11-02 14:07:26 -05:00
Howard Pritchard
e851879081
Merge pull request #5994 from rhc54/cmr40/cleanup
Remove stale defunct tools
2018-10-31 13:29:57 -06:00
Aravind Gopalakrishnan
5a74ddb34d coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.

MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 88d781056f)
2018-10-31 11:37:29 -07:00
Ralph Castain
ba6ad9fe42 Remove stale defunct tools
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 05ac8fa71c)
2018-10-30 08:51:25 -07:00
Sergey Oblomov
0846c9d112 COMMON/UCX: added error code to log output
Also fixes a PGI compilation error with --enable-debug.

Signed-off-by: Geoff Paulsen <gpaulsen@users.noreply.github.com>
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 1099d5f023)
2018-10-30 09:55:25 -05:00
Ralph Castain
712ddd326f Remove the stale orte-dvm code
Users should migrate to https://github.com/pmix/prrte

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 1bd772e8eb)
2018-10-30 07:54:35 -07:00
Howard Pritchard
f9d2f3b912
Merge pull request #5941 from hppritcha/topic/remove_bfo_pml_v4.0.x
v4.0.x: remove the bfo pml
2018-10-22 09:50:05 -06:00
Howard Pritchard
a806d09450 remove the bfo pml
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 7d6774acf8)
2018-10-17 14:00:11 -06:00
Edgar Gabriel
278ecf2205 io/ompio: add verification for data representations.
check for providing a data representation that is actually supported
by ompio.

Add also one check for a non-NULL pointer in mpi/c/file_set_view
for the data representation.

Also fixes parts of issue #5643

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:48 -05:00
Edgar Gabriel
a07c9e96b1 io/ompio: execute barrier before sync
this ensures that all processes are done modifying a file
before syncing. Fixes an error in the testmpio testsuite.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:35 -05:00
Edgar Gabriel
96c1a5b9dc common/ompio: check datatypes when setting file view
return MPI_ERR_ARG if the size of the fileview is not a
multiple of the size of the etype provided.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:19 -05:00
Edgar Gabriel
425a71799e common/ompio: return correct error code for improper access
return MPI_ERR_ACCESS if the user tries to read from  a file
that was opened using MPI_MODE_WRONLY

return MPI_ERR_READ_ONLY if the user tries to write a file
that was opened using MPI_MODE_RDONLY

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:04 -05:00
Edgar Gabriel
c65dda6f5f io/ompio: fix seek position calculation for SEEK_CUR
This commit fixes the calculation of the position where to
seek to, in case SEEK_CUR is used.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:21:47 -05:00
Howard Pritchard
cd7d70156c
Merge pull request #5899 from jsquyres/pr/v4.0.x/fix-c99-comments-in-mpih
v4.0.x: mpi.h.in: remove C99-style comments
2018-10-16 16:33:08 -06:00
Howard Pritchard
2752d43f65
Merge pull request #5875 from kawashima-fj/pr/v4.0.x/javadoc-tag
v4.0.x: java: Fix javadoc build failure with OpenJDK 11
2018-10-16 16:29:34 -06:00
Howard Pritchard
7ceb508b93
Merge pull request #5889 from yosefe/topic/pml-ucx-fix-datatype-leak-v4.0.x
pml_ucx: add ompi datatype attribute to release ucp_datatype - v4.0.x
2018-10-16 16:29:01 -06:00
Howard Pritchard
e2cf1e3ec5
Merge pull request #5887 from yosefe/topic/osc-ucx-fix-finalize-hang-v4.0.x
osc_ucx: fix hang/timeout in component finalize - v4.0
2018-10-16 09:21:50 -06:00
Howard Pritchard
753087ab17
Merge pull request #5888 from hoopoepg/topic/fixed-zero-size-window-v4.0
OSC/UCX: fixed zero-size window processing - v4.0.x
2018-10-16 09:21:08 -06:00
Gilles Gouaillardet
2b5a7ca816 fortran: add CHARACTER and LOGICAL support to MPI_Sizeof()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@e4001040b4)
2018-10-12 14:10:45 +09:00
Jeff Squyres
600967d2ed mpi.h.in: remove C99-style comments
While we require C99 to build Open MPI, we do not require C99 to build
user MPI applications.  As such, we shouldn't have C99-style comments
(i.e., "//"-style) in mpi.h.in.

Thanks to @AdamSimpson for reporting the issue.

This commit simply converts a //-style comment to a /**/-style
comment.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f4b3ccabf7)
2018-10-11 11:54:30 -04:00
Yossi Itigin
eabc94cab0 osc_ucx: add worker flush before osc module free
Make sure all pending communications are done on all ranks before
closing the window. This way it will be safe to close the endpoints when
closing the component.

(picked from master b8e1af6)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 23:02:19 +03:00
Yossi Itigin
4a97d6b9fa pml_ucx: fix return code from mca_pml_ucx_init()
(picked from master 40ac9e4)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:23:49 +03:00
Yossi Itigin
1bffd196ef pml_ucx: add ompi datatype attribute to release ucp_datatype
(picked from master 4763822)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:23:26 +03:00
Sergey Oblomov
274cbc3c03 OSC/UCX: fixed zero-size window processing
- added processing of zero-size MPI window

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit ae6f81983f)
2018-10-10 16:49:02 +03:00
Howard Pritchard
d18ea98263
Merge pull request #5843 from kawashima-fj/pr/v4.0.x/correct-f08-signatures
v4.0.x: fortran/use-mpi-f08: Correct f08 routine signatures
2018-10-09 10:22:07 -05:00
KAWASHIMA Takahiro
dd1b3eac1e java: Fix javadoc build failure with OpenJDK 11
OpenJDK 11 changed the default javadoc output HTML version to HTML 5
from HTML 4.01. It causes an error on building Open MPI configured
with `--enable-mpi-java` (default: disable). This fix is compatible
with older OpenJDK.

I don't know whether this problem exists with other vender's JDKs.
But this fix should be compatible with other JDKs because the new
syntax is used in other places in the same file.

Thanks to Siegmar Gross for the bug report.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit b491b454dc)
2018-10-09 21:48:10 +09:00
Geoff Paulsen
499ddedd7c
Merge pull request #5844 from kawashima-fj/pr/v4.0.x/pcollreq-f08-signatures
v4.0.x: mpiext/pcollreq: Correct f08 routine signatures
2018-10-05 13:42:35 -05:00
KAWASHIMA Takahiro
4dd21111f0 mpiext/pcollreq: Add Fortran bindings in man
Fortran bindings were added to persistent collectives in 9e0115c980
but man was not updated.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 43d85dbc81)
2018-10-05 09:43:39 +09:00
KAWASHIMA Takahiro
092cf1937d man: Correct markup of MPI_Neighbor_allgather
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 994b345253)
2018-10-05 09:43:39 +09:00
KAWASHIMA Takahiro
080c52f906 mpiext/pcollreq: Add missing f08 asynchronous
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit be91a26fd8)
2018-10-05 09:33:17 +09:00
KAWASHIMA Takahiro
fcc698f27f mpiext/pcollreq: Correct f08 routine signatures
Changes of nonblocking collectives in e98d794e8b and f750c6932c
are applied to persistent collectives.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 357531847e)
2018-10-05 09:33:16 +09:00
KAWASHIMA Takahiro
b9316d3136 fortran/use-mpi-f08: Correct f08 routine signatures
Following the commit f750c6932c, I compared
`ompi/mpi/fortran/use-mpi-f08/*.F90` and
`ompi/mpi/fortran/use-mpi-f08/profile/p*.F90`, and
`ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces.F90` and
`ompi/mpi/fortran/use-mpi-f08/mod/pmpi-f08-interfaces.F90`.

There are many differences. Some are bugs of `MPI_*`, some are
bugs of `PMPI_*`. I'm not sure how these bugs affect applications.

To make it easy to compare these files future, I also removed
editorial differences.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit cf6d28cb66)
2018-10-05 09:04:17 +09:00
Geoff Paulsen
c0796664b1
Merge pull request #5780 from jsquyres/pr/v4.0.x/moar-fortran-fixes
v4.0.x: Fortran 08 bindings fixes
2018-10-04 16:08:30 -05:00
Geoff Paulsen
5cae0ec25b
Merge pull request #5794 from bwbarrett/v4.0.x-ofi-mtl-selection
mtl ofi: Change from opt-in to opt-out provider selection
2018-10-03 08:31:07 -05:00
Jeff Squyres
46dd266e45 mpi.h: remove MPI_UB/MPI_LB when not enabling MPI-1 compat
When --enable-mpi1-compatibility was specified, the ompi_mpi_ub/lb
symbols were #if'ed out of mpi.h.  But the #defines for MPI_UB/LB
still remained.  This commit also #if's out the MPI_UB/LB macros when
--enable-mpi1-compatibility is specified.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 7223334d4d)
2018-09-28 10:01:48 -07:00
Brian Barrett
10d0a430c4 mtl ofi: Change from opt-in to opt-out provider selection
Change default provider selection logic for the OFI MTL.  The
old logic was whitelist-only, so any new HPC NIC provider would
have to ask users to do extra work or wait for an OMPI release
to be whitelisted.  The reason for the logic was to avoid
selecting a "generic" provider like sockets or shm that would
frequently have worse performance than the optimized BTL options
Open MPI supports.

With the change, we blacklist the (small, relatively static) list
of providers that duplicate internal capabilities.  Users can use
one of thse blacklisted providers in two ways: first, they can
explicitly request the provider in the include list (which will
override the default exclude list) and second, the can set a new
empty exclude list.

Since most HPC networks require special libraries and therefore
an explicit build of libfabric, it is highly unlikely that this
change will cause users to use libfabric when they didn't want to
do so.  It does, however, solve the whitelisting problem.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit c5eaa38491)
2018-09-27 18:41:47 +00:00
Gilles Gouaillardet
ce5959ba6c fortran/use-mpi-f08: Corrections to PMPI signatures of collectives
Corrected the signatures of the collectives used by the Fortran 2008
interface to state correct intent for inout arguments and use the
ASYNCHRONOUS attribute in non-blocking collective calls.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit f750c6932c)
2018-09-26 12:34:46 -07:00
Philipp Otte
e98eae3da6 fortran/use-mpi-f08: Corrections to Fortran08 signatures of collectives
Corrected the signatures of the collectives used by the Fortran 2008
interface to state correct intent for inout arguments and use the
ASYNCHRONOUS attribute in non-blocking collective calls. Also corrected
the C-bindings in Fortran accordingly

Signed-off-by: Philipp Otte <philipp.j.otte@googlemail.com>
(cherry picked from commit e98d794e8b)
2018-09-26 12:34:46 -07:00
Geoff Paulsen
9d9ae9286c
Merge pull request #5753 from gpaulsen/man-page-script-abstraction-break
Fix script abstraction break: mv make_manpage.pl to config
2018-09-23 09:01:19 -05:00