1
1
Граф коммитов

10372 Коммитов

Автор SHA1 Сообщение Дата
René Widera
e30e5b95c6 common/ompio: possible rounding issue
Similar to #6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <r.widera@hzdr.de>
(cherry picked from commit a91fab80a1)
2019-01-30 12:31:39 -06:00
Edgar Gabriel
d1e8779fe3 common/ompio: fix a floating point division problem
This commit fixes  a problem reported on the mailing list with
individual writes larger than 512 MB.

The culprit is a floating point division of two large, close values.
Changing the datatypes from float to double (which is what is being
used in the fcoll components) fixes the problem.

See issue #6285 and

 https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118

Thanks for Axel Huebl and René Widera for reporting the issue.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit c0f8ce0fff)
2019-01-30 12:31:16 -06:00
Gilles Gouaillardet
a247292275 topo/treematch: silence a hwloc related warning
treematch/km_partitioning.c #include "config.h",
but there is no such file when the embedded treematch is used.

In order to prevent the embedded treematch from incorrectly using
the config.h from the embedded hwloc, generate a dummy config.h.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 0aeb27f776)
2019-01-30 07:33:33 -05:00
Gilles Gouaillardet
fd157a960a ompi/datatype: fix how we compute the space needed for the args
Refs. open-mpi/ompi#6275

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@45fb69b2b9)
2019-01-30 11:01:11 +09:00
Gilles Gouaillardet
f76c81a758 ompi/op: fix support of non predefined datatypes with predefined operators
ACCUMULATE, unlike REDUCE, can use with derived
datatypes with predefinied operations, with some
restrictions outlined in MPI-3:11.3.4.  The derived
datatype must be composed entierly from one predefined
datatype (so you can do all the construction you want,
but at the bottom, you can only use one datatype, say,
MPI_INT).

Refs. open-mpi/ompi#6275

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@bc1cab5498)
2019-01-30 10:29:39 +09:00
Howard Pritchard
c9764f661b
Merge pull request #6263 from jsquyres/pr/v4.0.x/minor-fortran-valgrind-fix
v4.0.x: mpi/fortran: Fix valgrind warnings for type create
2019-01-13 12:31:46 -07:00
Howard Pritchard
bc58e22b03
Merge pull request #6120 from gpaulsen/topic/v4.0.x/re-add-deprecated-oops
v4.0.x: Re-add removed deprecate-only MPI-2.0 symbols
2019-01-09 20:10:02 -07:00
Risto Toijala
979b401936 mpi/fortran: Fix valgrind warnings for type create
Valgrind warns that *newtype is uninitialized when calling from
Fortran as e.g.
    use mpi
    integer :: t, err
    call MPI_Type_create_f90_integer(5, t, err)

Since newtype is intent(out), this should not happen. There is
no reason to convert the type using PMPI_Type_f2c, only to over-
write it immediately afterwards. The other type_create_* functions
did not convert newtype.

The valgrind warnings:
==28441== Conditional jump or move depends on uninitialised value(s)
==28441==    at 0x581B555: PMPI_Type_f2c (in [...]/lib/libmpi.so.0.0.0)
==28441==    by 0x4E87AB7: MPI_TYPE_CREATE_F90_INTEGER (in [...]/lib/libmpi_mpifh.so.0.0.0)
==28441==    by 0x400BA1: MAIN__ (in [...])
==28441==    by 0x400C46: main (in [...])
==28441==
==28441== Conditional jump or move depends on uninitialised value(s)
==28441==    at 0x581B563: PMPI_Type_f2c (in [...]/lib/libmpi.so.0.0.0)
==28441==    by 0x4E87AB7: MPI_TYPE_CREATE_F90_INTEGER (in [...]/lib/libmpi_mpifh.so.0.0.0)
==28441==    by 0x400BA1: MAIN__ (in [..])
==28441==    by 0x400C46: main (in [...])
==28441==
==28441== Use of uninitialised value of size 8
==28441==    at 0x581B577: PMPI_Type_f2c (in [...]/lib/libmpi.so.0.0.0)
==28441==    by 0x4E87AB7: MPI_TYPE_CREATE_F90_INTEGER (in [...]/lib/libmpi_mpifh.so.0.0.0)
==28441==    by 0x400BA1: MAIN__ (in [...])
==28441==    by 0x400C46: main (in [...])
==28441==

Signed-off-by: Risto Toijala <risto.toijala@gmail.com>
(cherry picked from commit f14a0f4fc9)
2019-01-09 07:24:22 -08:00
Jeff Squyres
1a1a932acc romio321: ensure to distribute ompi_grequestx.h
Refs https://github.com/open-mpi/ompi/issues/6227.  Thanks to
George Marselis for reporting.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 62321be186)
2018-12-28 13:18:10 -08:00
Geoffrey Paulsen
4aa91e1ffb Return MPI1 function implementations to build list
Adding the implementations of the functions that were removed
from the MPI standard to the build list, regardless of the
state of the OMPI_ENABLE_MPI1_COMPAT.

According to the README, we want the OMPI_ENABLE_MPI1_COMPAT
configure flag to control which MPI prototypes are exposed in
mpi.h, NOT, which are built into the mpi library.  Those will
remain in the mpi library until a future major release (5.0?)

NOTE: for the Fortran implementations, we instead define
      OMPI_OMIT_MPI1_COMPAT_DECLS to 0 instead of
      OMPI_ENABLE_MPI1_COMPAT to 1.  I'm not sure why, but
      this seems to work correctly.

Also changing the removed MPI_Errhandler_create implementation
to use the non removed MPI_Comm_errhandler_function prototype
(prototype remains unchanged from MPI_Comm_errhandler_fn)

NOTE: This commit is *NOT* a cherry-pick from master, because
      on master, we are no longer building those symbols by
      default, but on v4.0.x we _ARE_ still building these
      symbols by default.   This is because the v4.0.x branch
      is to remain backwards compatible with v3.0.x, while at
      the same time removing the "removed" symbols from mpi.h
      (unless the user configures with --enable-mpi1-compatibility)

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2018-12-20 12:22:04 -06:00
Howard Pritchard
4be4282312
Merge pull request #6128 from ggouaillardet/topic/v4.0.x/mpiext_short_path
mpiext: keep paths short
2018-12-17 13:22:19 -07:00
Howard Pritchard
71b83e8a09
Merge pull request #6193 from kawashima-fj/pr/v4.0.x/fix-type-create-f90
v4.0.x: mpi/c: Fix MPI_TYPE_CREATE_F90_{REAL,COMPLEX}
2018-12-17 13:21:21 -07:00
KAWASHIMA Takahiro
8eb90ae9aa mpi/c: Fix MPI_TYPE_CREATE_F90_{REAL,COMPLEX}
This commit fixes edge cases of `r = 38` and `r = 308`.

As defined in the MPI standard, `TYPE_CREATE_F90_REAL` and
`TYPE_CREATE_F90_COMPLEX` must be consistent with the Fortran
`SELECTED_REAL_KIND` function. The `SELECTED_REAL_KIND` function is
defined based on the `RANGE` function. The `RANGE` function returns
`INT(MIN(LOG10(HUGE(X)), -LOG10(TINY(X))))` for a real value `X`.

The old code considers only `INT(LOG10(HUGE(X)))` using `*_MAX_10_EXP`.
This commit adds `INT(-LOG10(TINY(X)))` part using `*_MIN_10_EXP`.

This bug affected the following `p`-`r` combinations.

| p             | r   | expected  | returned  | expected  | returned  |
| :------------ | --: | :-------- | :-------- | :-------  | :-------- |
| MPI_UNDEFINED |  38 | REAL8     | REAL4     | COMPLEX16 | COMPLEX8  |
| 0 <= p <= 6   |  38 | REAL8     | REAL4     | COMPLEX16 | COMPLEX8  |
| MPI_UNDEFINED | 308 | REAL16    | REAL8     | COMPLEX32 | COMPLEX16 |
| 0 <= p <= 15  | 308 | REAL16    | REAL8     | COMPLEX32 | COMPLEX16 |

MPICH returns the same result as Open MPI with this fix.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 6fb01f64fe)
2018-12-13 16:01:56 +09:00
Gilles Gouaillardet
a79ce7d17f mpiext: updates for header file locations
Per discussion on https://github.com/open-mpi/ompi/pull/6030
and https://github.com/open-mpi/ompi/pull/6145, move
around where MPI extension header files are installed (specifically:
the installation tree path does not need to match the source tree
path).

For reference, header files were installed like this :

 - <prefix>/include/openmpi/ompi/mpiext/pcollreq/mpif-h/mpiext_pcollreq_mpifh.h
 - <prefix>/include/openmpi/ompi/mpiext/pcollreq/c/mpiext_pcollreq_c.h

and they are now installed like this :

 - <prefix>/include/openmpi/mpiext/mpiext_pcollreq_mpifh.h
 - <prefix>/include/openmpi/mpiext/mpiext_pcollreq_c.h

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@975e3cd0c9)
2018-12-12 09:24:45 +09:00
Gilles Gouaillardet
0ade49c286 mpi/c: add back (some more) deprecated subroutines
- MPI_NULL_DELETE_FN
 - MPI_NULL_COPY_FN
 - MPI_DUP_FN

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 5a968306d6)
2018-12-11 09:55:33 -06:00
Bert Wesarg
5e4a6db23b Re-add removed deprecate-only MPI-2.0 symbols
See #6114

Signed-off-by: Bert Wesarg <bert.wesarg@tu-dresden.de>
(cherry picked from commit b3f3281290)
2018-12-11 09:55:33 -06:00
Matias A Cabral
b2327049c1 MTL/PSM2: add missing default priority
Missing default priority after PR #6153

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
(cherry picked from commit c76c6d8b28)
2018-12-07 16:22:59 -08:00
Matias A Cabral
80113a368f MTL/PSM2: Do not lower the priority when all processes are local.
The intention of lowering the priority when all processes are local
was to favor Vader BTL. However, in builds including the OFI MTL it
gets selected instead.

Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com>
Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com>
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
(cherry picked from commit fc8582c560)
2018-12-07 11:11:43 -08:00
Howard Pritchard
804f65f247
Merge pull request #6035 from ggouaillardet/topic/v4.0.x/mpiext_cuda
mpiext/cuda: do not include automatically generated file into dist ta…
2018-12-04 09:26:55 -07:00
Geoff Paulsen
752bbd195f
Merge pull request #6102 from hoopoepg/topic/set-osc-ucx-level-200-v4.0
OSC: set UCX module used by default - v4.0
2018-12-04 10:26:37 -06:00
Geoff Paulsen
03cf3e4400
Merge pull request #6112 from kawashima-fj/pr/v4.0.x/update-pcoll-doc
v4.0.x: README & man: Update pcollreq documentation
2018-11-30 13:58:34 -06:00
Geoff Paulsen
bd2990f502
Merge pull request #6131 from devreal/rdma-plug-memleak-v4.0.x
v4.0.x: Plug two memory leaks in rdma osc
2018-11-30 13:54:51 -06:00
Joseph Schuchart
c5346751e6 Plug two memory leaks in rdma osc
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 91885f5876)
2018-11-29 10:19:26 -05:00
Sergey Oblomov
6651672711 OSC/UCX: set max level value to 60
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2d230b3aac)
2018-11-27 20:35:30 +02:00
Yossi Itigin
a112d10c93 pml_ucx: initialize req_mpi_object.comm for error handler
without this fix, an error handler invoked on pml_ucx request would
segfault while trying to dereference requests[i]->req_mpi_object.comm

(picked from master f36eeef)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-11-26 11:57:34 +02:00
KAWASHIMA Takahiro
6f68483fd5 README & man: Update pcollreq documentation
The feature of persistent collectives is approved in the Sept. 2018
MPI Forum meeting and 2018 Draft Specification of the MPI standard is
published during SC18.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 5f0fcf0f45)
2018-11-26 18:28:08 +09:00
Sergey Oblomov
38a4953707 OSC/UCX: added UCX version evaluation
- added UCX version evaluation to set OSC UCX priority

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e91f214982)
2018-11-22 11:31:53 +02:00
Sergey Oblomov
012e27af77 OSC: set UCX module used by default
- OSC/UCX module set priority to 200 to be used by default

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 36934a8bb2)
2018-11-22 10:59:43 +02:00
Howard Pritchard
8adaeb1536
Merge pull request #6007 from aravindksg/coll-tuned-fix-40x
coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
2018-11-19 13:15:40 -07:00
Howard Pritchard
ec79631ba2
Merge pull request #5936 from edgargabriel/pr/testmpio-v4.0.x
Pr/testmpio v4.0.x
2018-11-19 13:11:50 -07:00
Gilles Gouaillardet
9366c6eb2e mpiext/cuda: do not include automatically generated file into dist tarball
ompi/mpiext/cuda/c/mpiext_cuda_c.h is automatically generated from
ompi/mpiext/cuda/c/mpiext_cuda_c.h.in at configure time.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@f8318f0a8f)
(cherry picked from commit open-mpi/ompi@b3ce25af95)
2018-11-13 00:09:01 -06:00
Jeff Squyres
d0efdfd9c8 MPI_Type_get_envelope: remove MPI-1 deleted names
Several names are now no longer returned by MPI_Type_get_envelope.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 65eb118e08)
2018-11-06 10:07:05 -08:00
Geoffrey Paulsen
2d3b4bb91a mpi.h: restore some MPI-deprecated items to default builds
Commit 89da9651b inadvertantly #if'ed out both deprecated *and*
removed items from mpi.h.  The intent was only to #if out items that
have been *removed* from the MPI specification and leave all items
that are merely deprecated.

This commit also re-orders the deleted typedef+functions to be in the
same order as they are listed in MPI-3.1 chapter 17, just to make
verifying/checking the code easier.

Note that --enable-mpi1-compatibility can still be used to restore
prototypes for the items that have been removed from the MPI
specification (e.g., MPI_Address()).

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit b03a39d359)
2018-11-02 14:07:26 -05:00
Howard Pritchard
e851879081
Merge pull request #5994 from rhc54/cmr40/cleanup
Remove stale defunct tools
2018-10-31 13:29:57 -06:00
Aravind Gopalakrishnan
5a74ddb34d coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.

MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 88d781056f)
2018-10-31 11:37:29 -07:00
Ralph Castain
ba6ad9fe42 Remove stale defunct tools
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 05ac8fa71c)
2018-10-30 08:51:25 -07:00
Sergey Oblomov
0846c9d112 COMMON/UCX: added error code to log output
Also fixes a PGI compilation error with --enable-debug.

Signed-off-by: Geoff Paulsen <gpaulsen@users.noreply.github.com>
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 1099d5f023)
2018-10-30 09:55:25 -05:00
Ralph Castain
712ddd326f Remove the stale orte-dvm code
Users should migrate to https://github.com/pmix/prrte

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 1bd772e8eb)
2018-10-30 07:54:35 -07:00
Howard Pritchard
f9d2f3b912
Merge pull request #5941 from hppritcha/topic/remove_bfo_pml_v4.0.x
v4.0.x: remove the bfo pml
2018-10-22 09:50:05 -06:00
Howard Pritchard
a806d09450 remove the bfo pml
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 7d6774acf8)
2018-10-17 14:00:11 -06:00
Edgar Gabriel
278ecf2205 io/ompio: add verification for data representations.
check for providing a data representation that is actually supported
by ompio.

Add also one check for a non-NULL pointer in mpi/c/file_set_view
for the data representation.

Also fixes parts of issue #5643

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:48 -05:00
Edgar Gabriel
a07c9e96b1 io/ompio: execute barrier before sync
this ensures that all processes are done modifying a file
before syncing. Fixes an error in the testmpio testsuite.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:35 -05:00
Edgar Gabriel
96c1a5b9dc common/ompio: check datatypes when setting file view
return MPI_ERR_ARG if the size of the fileview is not a
multiple of the size of the etype provided.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:19 -05:00
Edgar Gabriel
425a71799e common/ompio: return correct error code for improper access
return MPI_ERR_ACCESS if the user tries to read from  a file
that was opened using MPI_MODE_WRONLY

return MPI_ERR_READ_ONLY if the user tries to write a file
that was opened using MPI_MODE_RDONLY

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:04 -05:00
Edgar Gabriel
c65dda6f5f io/ompio: fix seek position calculation for SEEK_CUR
This commit fixes the calculation of the position where to
seek to, in case SEEK_CUR is used.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:21:47 -05:00
Howard Pritchard
cd7d70156c
Merge pull request #5899 from jsquyres/pr/v4.0.x/fix-c99-comments-in-mpih
v4.0.x: mpi.h.in: remove C99-style comments
2018-10-16 16:33:08 -06:00
Howard Pritchard
2752d43f65
Merge pull request #5875 from kawashima-fj/pr/v4.0.x/javadoc-tag
v4.0.x: java: Fix javadoc build failure with OpenJDK 11
2018-10-16 16:29:34 -06:00
Howard Pritchard
7ceb508b93
Merge pull request #5889 from yosefe/topic/pml-ucx-fix-datatype-leak-v4.0.x
pml_ucx: add ompi datatype attribute to release ucp_datatype - v4.0.x
2018-10-16 16:29:01 -06:00
Howard Pritchard
e2cf1e3ec5
Merge pull request #5887 from yosefe/topic/osc-ucx-fix-finalize-hang-v4.0.x
osc_ucx: fix hang/timeout in component finalize - v4.0
2018-10-16 09:21:50 -06:00
Howard Pritchard
753087ab17
Merge pull request #5888 from hoopoepg/topic/fixed-zero-size-window-v4.0
OSC/UCX: fixed zero-size window processing - v4.0.x
2018-10-16 09:21:08 -06:00
Gilles Gouaillardet
2b5a7ca816 fortran: add CHARACTER and LOGICAL support to MPI_Sizeof()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@e4001040b4)
2018-10-12 14:10:45 +09:00
Jeff Squyres
600967d2ed mpi.h.in: remove C99-style comments
While we require C99 to build Open MPI, we do not require C99 to build
user MPI applications.  As such, we shouldn't have C99-style comments
(i.e., "//"-style) in mpi.h.in.

Thanks to @AdamSimpson for reporting the issue.

This commit simply converts a //-style comment to a /**/-style
comment.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f4b3ccabf7)
2018-10-11 11:54:30 -04:00
Yossi Itigin
eabc94cab0 osc_ucx: add worker flush before osc module free
Make sure all pending communications are done on all ranks before
closing the window. This way it will be safe to close the endpoints when
closing the component.

(picked from master b8e1af6)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 23:02:19 +03:00
Yossi Itigin
4a97d6b9fa pml_ucx: fix return code from mca_pml_ucx_init()
(picked from master 40ac9e4)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:23:49 +03:00
Yossi Itigin
1bffd196ef pml_ucx: add ompi datatype attribute to release ucp_datatype
(picked from master 4763822)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:23:26 +03:00
Sergey Oblomov
274cbc3c03 OSC/UCX: fixed zero-size window processing
- added processing of zero-size MPI window

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit ae6f81983f)
2018-10-10 16:49:02 +03:00
Howard Pritchard
d18ea98263
Merge pull request #5843 from kawashima-fj/pr/v4.0.x/correct-f08-signatures
v4.0.x: fortran/use-mpi-f08: Correct f08 routine signatures
2018-10-09 10:22:07 -05:00
KAWASHIMA Takahiro
dd1b3eac1e java: Fix javadoc build failure with OpenJDK 11
OpenJDK 11 changed the default javadoc output HTML version to HTML 5
from HTML 4.01. It causes an error on building Open MPI configured
with `--enable-mpi-java` (default: disable). This fix is compatible
with older OpenJDK.

I don't know whether this problem exists with other vender's JDKs.
But this fix should be compatible with other JDKs because the new
syntax is used in other places in the same file.

Thanks to Siegmar Gross for the bug report.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit b491b454dc)
2018-10-09 21:48:10 +09:00
Geoff Paulsen
499ddedd7c
Merge pull request #5844 from kawashima-fj/pr/v4.0.x/pcollreq-f08-signatures
v4.0.x: mpiext/pcollreq: Correct f08 routine signatures
2018-10-05 13:42:35 -05:00
KAWASHIMA Takahiro
4dd21111f0 mpiext/pcollreq: Add Fortran bindings in man
Fortran bindings were added to persistent collectives in 9e0115c980
but man was not updated.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 43d85dbc81)
2018-10-05 09:43:39 +09:00
KAWASHIMA Takahiro
092cf1937d man: Correct markup of MPI_Neighbor_allgather
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 994b345253)
2018-10-05 09:43:39 +09:00
KAWASHIMA Takahiro
080c52f906 mpiext/pcollreq: Add missing f08 asynchronous
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit be91a26fd8)
2018-10-05 09:33:17 +09:00
KAWASHIMA Takahiro
fcc698f27f mpiext/pcollreq: Correct f08 routine signatures
Changes of nonblocking collectives in e98d794e8b and f750c6932c
are applied to persistent collectives.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 357531847e)
2018-10-05 09:33:16 +09:00
KAWASHIMA Takahiro
b9316d3136 fortran/use-mpi-f08: Correct f08 routine signatures
Following the commit f750c6932c, I compared
`ompi/mpi/fortran/use-mpi-f08/*.F90` and
`ompi/mpi/fortran/use-mpi-f08/profile/p*.F90`, and
`ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces.F90` and
`ompi/mpi/fortran/use-mpi-f08/mod/pmpi-f08-interfaces.F90`.

There are many differences. Some are bugs of `MPI_*`, some are
bugs of `PMPI_*`. I'm not sure how these bugs affect applications.

To make it easy to compare these files future, I also removed
editorial differences.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit cf6d28cb66)
2018-10-05 09:04:17 +09:00
Geoff Paulsen
c0796664b1
Merge pull request #5780 from jsquyres/pr/v4.0.x/moar-fortran-fixes
v4.0.x: Fortran 08 bindings fixes
2018-10-04 16:08:30 -05:00
Geoff Paulsen
5cae0ec25b
Merge pull request #5794 from bwbarrett/v4.0.x-ofi-mtl-selection
mtl ofi: Change from opt-in to opt-out provider selection
2018-10-03 08:31:07 -05:00
Jeff Squyres
46dd266e45 mpi.h: remove MPI_UB/MPI_LB when not enabling MPI-1 compat
When --enable-mpi1-compatibility was specified, the ompi_mpi_ub/lb
symbols were #if'ed out of mpi.h.  But the #defines for MPI_UB/LB
still remained.  This commit also #if's out the MPI_UB/LB macros when
--enable-mpi1-compatibility is specified.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 7223334d4d)
2018-09-28 10:01:48 -07:00
Brian Barrett
10d0a430c4 mtl ofi: Change from opt-in to opt-out provider selection
Change default provider selection logic for the OFI MTL.  The
old logic was whitelist-only, so any new HPC NIC provider would
have to ask users to do extra work or wait for an OMPI release
to be whitelisted.  The reason for the logic was to avoid
selecting a "generic" provider like sockets or shm that would
frequently have worse performance than the optimized BTL options
Open MPI supports.

With the change, we blacklist the (small, relatively static) list
of providers that duplicate internal capabilities.  Users can use
one of thse blacklisted providers in two ways: first, they can
explicitly request the provider in the include list (which will
override the default exclude list) and second, the can set a new
empty exclude list.

Since most HPC networks require special libraries and therefore
an explicit build of libfabric, it is highly unlikely that this
change will cause users to use libfabric when they didn't want to
do so.  It does, however, solve the whitelisting problem.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit c5eaa38491)
2018-09-27 18:41:47 +00:00
Gilles Gouaillardet
ce5959ba6c fortran/use-mpi-f08: Corrections to PMPI signatures of collectives
Corrected the signatures of the collectives used by the Fortran 2008
interface to state correct intent for inout arguments and use the
ASYNCHRONOUS attribute in non-blocking collective calls.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit f750c6932c)
2018-09-26 12:34:46 -07:00
Philipp Otte
e98eae3da6 fortran/use-mpi-f08: Corrections to Fortran08 signatures of collectives
Corrected the signatures of the collectives used by the Fortran 2008
interface to state correct intent for inout arguments and use the
ASYNCHRONOUS attribute in non-blocking collective calls. Also corrected
the C-bindings in Fortran accordingly

Signed-off-by: Philipp Otte <philipp.j.otte@googlemail.com>
(cherry picked from commit e98d794e8b)
2018-09-26 12:34:46 -07:00
Geoff Paulsen
9d9ae9286c
Merge pull request #5753 from gpaulsen/man-page-script-abstraction-break
Fix script abstraction break: mv make_manpage.pl to config
2018-09-23 09:01:19 -05:00
Jeff Squyres
c83b30755a Fix script abstraction break: mv make_manpage.pl to config
Having the "make_manpage.pl" script in the ompi/ tree broke
"./autogen.pl --no-ompi" (specifically: "make distcheck" of --no-ompi
builds would break).

(cherry picked from commit 89773c41)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-22 15:11:06 -05:00
Geoff Paulsen
3d4164e1e1
Merge pull request #5752 from gpaulsen/misc-warnings-fixes
Miscellaneous compiler warning stomps.
2018-09-22 15:01:53 -05:00
Geoff Paulsen
bc798b6135
Merge pull request #5755 from gpaulsen/osc_rdma_cleanup
osc/rdma: clean out stale aggregation code
2018-09-22 15:00:21 -05:00
Nathan Hjelm
72fc8acb50 osc/rdma: quiet warning
gcc complains about ret possibly being used uninitialized. That will
never happen but we should still quiet the warning. This commit sets
ret to a valid value.

Fixes #5513

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:44:56 -05:00
Nathan Hjelm
56e31f8206 osc/rdma: clean out stale aggregation code
The aggregation code in osc/rdma is currently broken and will likely
not be reused. This commit cleans it out.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:42:45 -05:00
Jeff Squyres
2e37f97a38 Miscellaneous compiler warning stomps.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit fe0852bcb4)
2018-09-21 14:35:51 -05:00
Geoff Paulsen
4688da0631
Merge pull request #5736 from hoopoepg/topic/topic/common-del-procs-v4.0
MCA/COMMON/UCX: del_procs calls are unified to common module - v4.0
2018-09-20 18:12:25 -05:00
Geoff Paulsen
1a65b0ab66
Merge pull request #5741 from ggouaillardet/topic/v4.0.x/use_mpi_f08_bindings
v4.0.x: fortran/use-mpi-f08: clean [p]ompi_FOO_f bindings
2018-09-20 18:10:11 -05:00
Gilles Gouaillardet
d0a0fe818f fortran/use-mpi-f08: use bindings from ompi_mpifh_bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@c4ce01d104)
2018-09-20 10:37:33 +09:00
Gilles Gouaillardet
afb66d222b fortran/use-mpi-f08: fix [p]ompi_FOO_f symbols handling
- do not generate bindings for pompi_FOO_f symbols
   (they are simply not used anywhere)

 - move ompi_FOO_f bindings out of mpi_f08.mod into
   ompi_mpifh_bindings.mod that is only used at build time

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@c6070fd2e0)
2018-09-20 10:37:01 +09:00
Gilles Gouaillardet
03d994c9cf configury: do not define "dummy" empty targets any more.
We previously needed to have empty targets because AM couldn't handle
having an AM_CONDITIONAL was targets in the "if" statement but not in the
"else".  :-(

That now appears as an old automake bug that has been fixed,
so cleanup some Makefile.am

Thanks Jeff for the pointer.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@6e04b2a66a)
2018-09-20 10:36:41 +09:00
Gilles Gouaillardet
98156b7ace use-mpi-f08: fix a typo in [P]MPI_Dist_graph_create_adjacent bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@d2393251f7)
2018-09-20 10:36:01 +09:00
Sergey Oblomov
3cace87749 MCA/COMMON/UCX: del_procs calls are unified to common module
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 920cc2e0d9)
2018-09-19 10:47:27 +03:00
George Bosilca
8d892f9917 Be conservative with the array_of_indices
We were assuming that the array_of_indices has the same size as the
number of requests (incount), instead of the numberr of actually
active requests. While the patch is trivial, the question of the
size of the array_of_indices should be clarified in the MPI Forum.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit a5fbfa476a)
2018-09-18 12:05:51 -07:00
Howard Pritchard
3a584fee53
Merge pull request #5723 from ggouaillardet/topic/v4.0.x/libnbc_error_path
coll/libnbc: fix various error paths
2018-09-18 09:29:45 -06:00
KAWASHIMA Takahiro
e83e118ae7 mpiext/pcollreq: fix more typos
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 4a0a2598f6)
2018-09-18 15:43:06 +09:00
Gilles Gouaillardet
ece18aed45 coll/libnbc: fix various error paths
The parameter passed to NBC_Return_handle() was incorrectly casted
and not dereferenced.

Thanks Yossi for the bug report.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@8b51862fb2)
2018-09-18 15:29:33 +09:00
Gilles Gouaillardet
73f531a8f2 mpiext/pcollreq: fix misc typos
Thanks Jeff for the report

Fixes open-mpi/ompi#5712

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@8dc6985a5a)
2018-09-18 12:47:04 +09:00
Geoff Paulsen
de0c595ca5
Merge pull request #5650 from matcabral/remove_psm2_shadow_env_40x
v4.0.x: MTL PSM2: Remove shadow variables from v4.0.x
2018-09-17 14:40:59 -05:00
Geoff Paulsen
17aab5ea5b
Merge pull request #5659 from ggouaillardet/topic/v4.0.x/misc_finalize_leaks
Plug misc leaks on MPI_Finalize()
2018-09-10 14:06:31 -05:00
KAWASHIMA Takahiro
6858028596 mpiext/pcollreq: Fix zero-count reduction
We need to return a persistent request.
`ompi_request_empty` is not a persistent request.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>

(cherry picked from commit 69901a5156)
2018-09-10 13:11:59 +09:00
Gilles Gouaillardet
ff8600f2e4 ompi/hook: plug a misc memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@b79b37465c)
2018-09-10 09:21:49 +09:00
Gilles Gouaillardet
4bd5c538a2 pml/ob1: plug a memory leak in mca_pml_ob1_component_fini()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@fed33c1530)
2018-09-10 09:21:12 +09:00
Gilles Gouaillardet
c767c63a3b ompi/info: plug memory leaks in ompi_mpiinfo_finalize()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@d0d399c9a9)
2018-09-10 09:18:15 +09:00
Gilles Gouaillardet
080e20fa02 mtl/psm2: fix a misc memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@316e4e38f4)
2018-09-10 09:17:54 +09:00
matcabral
8fa172e60b MTL PSM2: Remove shadow variables from v4.0.x
As agreed on #4574, where removed in past release branches
to avoid perfomance impacts in the default values for
some paramters.

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2018-09-05 18:44:40 -04:00
Howard Pritchard
7e10bc0833
Merge pull request #5607 from edgargabriel/pr/sharedfp-naming-conflict-v4.0
sharedfp/sm and lockedfile: fix naming bug
2018-09-02 16:03:14 -04:00
Geoff Paulsen
3282c61048
Merge pull request #5625 from hoopoepg/topic/optimize-blocked-calls-v4.0
PML/UCX: blocked calls optimizations - v4.0
2018-08-31 14:11:11 -05:00
Geoff Paulsen
334748753c
Merge pull request #5626 from hoopoepg/topic/opal-mem-hooks-syno-v4.0
MCA/COMMON/UCX: added synonim to opal_mem_hook variable - v4.0
2018-08-31 14:09:14 -05:00
Geoff Paulsen
51e685ff40
Merge pull request #5622 from aravindksg/ofi_race_fix_40x
MTL OFI: Fix race condition due to global progress entries array
2018-08-31 14:07:42 -05:00
Sergey Oblomov
028bcb8a73 MCA/COMMON/UCX: added synonim to opal_mem_hook variable
- added synonim to common ucx variables to allow
  to print it in opal_info -a

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e00f7a68ba)
2018-08-29 15:17:00 +03:00
Sergey Oblomov
9215eb9a3b PML/UCX: blocked calls optimizations
- refactoring of opal/UCX progress calls
- added UCX progress priority

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit b0f87f2235)
2018-08-29 14:38:22 +03:00
Aravind Gopalakrishnan
37d1a202be MTL OFI: Fix race condition due to global progress entries array
Since progress entries array is globally allocated, it is susceptible
to race conditions when using multi-threaded applications. Allocating it
on the stack resolves any potential races as it is thread local by default.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit ed2343034d)
2018-08-28 14:23:56 -07:00
Edgar Gabriel
2e3cf6fb12 io/base: fixes to file_delete selection logic
file_delete triggers underneath the hood the full component selection
logic, since we do not have a file handle, just a file name.

As part of the selection logic, we have to however initiate the
framework-open of the fs component in case of ompio, since ompio
will call the delete function of the selected fs componentn, which
is based on the file system where the file is located.

This was not handled correctly so far. The problem however only
shows up if the first I/O operatin to be executed is a file_delete,
other wise the file_open will lead to the correct opening and initialization
of the fs framework. This commit ensures that we do the right thing
even if file_delete is the first file I/O operation in the application.

Fixes issue #5611

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-08-28 08:18:59 -05:00
Edgar Gabriel
a489a6fc9d sharedfp/sm and lockedfile: fix naming bug
If an application opens a file for reading from multiple processes
using MPI_COMM_SELF (or another communicator that has distinct
process groups but the same comm-id, as can happen as the result
of comm_split), the naming chosen for the lockedfile or the mmapped
file used by the sharedfp/sm component would collide. This patch
ensures that the filename is different by integrating the process id
of rank 0 for each sub-communicator.

This fixes one aspect of the problem reported in github issue 5593

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-08-27 14:11:03 -05:00
Howard Pritchard
37440aca90
Merge pull request #5497 from markalle/apply_romio314_patch_to_v40x
v4.0.x: apply romio314 patch to romio321
2018-08-25 11:12:08 -04:00
Howard Pritchard
b926c35df0
Merge pull request #5562 from edgargabriel/pr/file_open_sharedfp_ordering_v4.0x
common/ompio: fix an ordering problem during file_open
2018-08-21 22:17:45 -04:00
Howard Pritchard
4c8852c2c8
Merge pull request #5555 from karasevb/v4.0.x_pmix_fence_status
v4.0.x/pmix: added check for pmix fence status
2018-08-21 09:28:17 -06:00
Edgar Gabriel
2da601a350 common/ompio: fix an ordering problem during file_open
the sharedfp component has to be selected and opened before
we set the default file view during file_open. Otherwise
there is a sperious error message from the sharefp_file_seek
operation that is called during the file_set_view.

Fixes Issue #5560

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-08-20 10:23:32 -05:00
Boris Karasev
8873d901e8 pmix: added check for pmix fence status
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit 57683366ca)

Conflicts:
	opal/mca/common/ucx/common_ucx.c
	opal/mca/common/ucx/common_ucx.h

Modified:
	ompi/mca/pml/ucx/pml_ucx.c
	oshmem/mca/spml/ucx/spml_ucx.c
2018-08-17 21:33:50 +06:00
Jeff Squyres
7f443a159a fortran/use TKR: remove excess declaration for PMPI_Type_extent
This declaration was accidentally left behind in 89da9651bb.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 8a0b5454ae)
2018-08-16 13:13:14 -07:00
Howard Pritchard
cdc315c1ac
Merge pull request #5523 from tkordenbrock/topic/v4.0.x/fix.PtlMEUnlink.in.use
v4.0.x: coll-portals4: retry PtlMEUnlink() if PTL_IN_USE
2018-08-13 14:19:10 -06:00
Howard Pritchard
7b6a2da71a
Merge pull request #5504 from rhc54/cmr40/ofi
MTL OFI: send/isend split into blocking/non-blocking paths
2018-08-13 14:18:05 -06:00
Todd Kordenbrock
36369f9133 coll-portals4: retry PtlMEUnlink() if PTL_IN_USE
In the cleanup phase, it is possible for PtlMEUnlink() to return
PTL_IN_USE if the NIC is not done with the ME.  This should not
be considered an error.  This commit adds a retry loop around
PtlMEUnlink().

In some cases, the return value of PtlMEUnlink() and PtlCTFree()
was not checked at all.  Check them with the same retry loop as
above.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
(cherry picked from commit f3f2a826b4)
2018-08-07 11:23:51 -05:00
Howard Pritchard
9a6f6e61f0
Merge pull request #5499 from nrspruit/ns_cancel_fix_4.0
MTL OFI: Fix Deadlock in fi_cancel given completion during cancel
2018-08-07 09:16:56 -06:00
Howard Pritchard
2386994c9d
Merge pull request #5495 from hoopoepg/topic/ucx-init-c99-v4.0
PML/SPML/UCX: init global objects using C99 style - v4.0
2018-08-04 16:03:56 -06:00
Spruit, Neil R
1fbbae1907 MTL OFI: send/isend split into blocking/non-blocking paths
-Updated blocking send to directly call functionality and
set completion events expected to 0 initally. This allows for optimization for
providers that support fi_tinject up to larger sizes. This also reduces
latency on running the OFI mtl with smaller sizes without requiring
calls to progress given fi_tinject is required to complete the messaging
before returning and will not create any events in the Completion Queue.

-Updated non-blocking send to directly call fi_tsend and avoid calling
fi_tinject as the functionality should not wait on completions. This
resolves a bug where applications calling MPI_Isend can overrun the
TX buffer with small (inject) messages causing a deadlock. In addition
this improves performance in message rates by preventing
waiting on any size message to complete in non-blocking send messages.

-Created common ompi_mtl_ofi_ssend_recv function to post the ssend recv
which is common between isend and send code paths.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit 7dc8c8ba3f)
2018-08-01 06:45:48 -07:00
Ralph Castain
7830d9971e
Merge pull request #5467 from rhc54/cmr40/ofi
MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow
2018-07-31 13:08:03 -07:00
Mark Allen
e2b6e9ee09 apply romio314 patch to romio321
When romio314 was first pulled in an extra patch was applied to it, see commit
92f6c7c1e2. Most of that patch is already present
in vanilla romio321, but the fix for MPIO_DATATYPE_ISCOMMITTED() isn't.

If that macro doesn't set err_ then some paths end up with a variable being used
uninitialized. In particular you can trace through romio321/romio/mpi-io/read.c
to see what happens with error_code. It's an uninitialized stack variable that goes
through three MPIO_CHECK_* macros none of which set it. The macros consistently set
error_code to a failure if they see something wrong, but they don't consistently
set it to success when things are fine.

And then in the last macro MPIO_CHECK_DATATYPE it tries to look at the value
of error_code that was never set.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit f413ef6b14)
2018-07-30 17:23:59 -04:00
Spruit, Neil R
9cc6bc1ea6 MTL OFI: Fix Deadlock in fi_cancel given completion during cancel
- If a message for a recv that is being cancelled gets completed after
the call to fi_cancel, then the OFI mtl will enter a deadlock state
waiting for ofi_req->super.ompi_req->req_status._cancelled which will
never happen since the recv was successfully finished.

- To resolve this issue, the OFI mtl now checks ofi_req->req_started
to see if the request has been started within the loop waiting for the
event to be cancelled. If the request is being completed, then the loop
is broken and fi_cancel exits setting
ofi_req->super.ompi_req->req_status._cancelled = false;

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit 767135c580)
2018-07-30 07:17:40 -07:00
Sergey Oblomov
b64502977a PML/SPML/UCX: init global objects using C99 style
- to avoid value mix used C99 style of object initializations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2806504290)
2018-07-28 16:47:43 +03:00
Mikhail Kurnosov
c540dfb18c coll-base-allgather: fix MPI_IN_PLACE processing
The call of MPI_Allgather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 540c2d1)
2018-07-25 08:11:28 +07:00
Spruit, Neil R
ac8d2e01f9 MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow
- Added support in MTL_OFI_RETRY_UNTIL_DONE to handle -FI_EAGAIN
  from the provider and correctly attempt to progress the OFI Completion
  queue by calling ompi_mtl_ofi_progress.

- If events were pending that blocked OFI operations from being enqueued
  they will be completed and the OFI operation will be retried once
  ompi_mtl_ofi_progress has successfully completed.

- Updated MTL_OFI_RETRY_UNTIL_DONE to take a RETURN variable instead of
  requiring the existance of a "ret" variable to pass back the return
  value from completing the OFI operation.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit d4f408a7f8)
2018-07-23 11:14:42 -07:00
Sergey Oblomov
af0e7b190e PML/UCX: fixed ucp request free on persistent request completion
- in sine cases persistent request was deleted during completion
  callback, this cause double free of linked UCX request (assert
  in debug build or hang in release build)
- UCX request is freed prior completion callback

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 6fe0a73861)
2018-07-20 22:20:14 +03:00
Sergey Oblomov
74d6ad09bc OSC/UCX: fixed hang on OSC init
- there worked progress was missed on startup which caused hang
  on one of ranks

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit a081fba046)
2018-07-19 15:23:01 +03:00
Edgar Gabriel
b6b9552ca9
Merge pull request #5444 from gbossu/fix-file-delete
io/ompio: Call component-specific file_delete function instead of POSIX unlink
2018-07-18 08:45:57 -05:00
Gilles Gouaillardet
fed1e7766e
Merge pull request #5430 from ggouaillardet/pr/pcollreq-fort
mpiext/pcollreq: add Fortran bindings
2018-07-18 09:52:59 +09:00
Joshua Ladd
3add13c72e
Merge pull request #5441 from hoopoepg/topic/ucx-memhooks-to-common-module
MCA/COMMON/UCX: shift opal memhooks into common UCX
2018-07-17 15:52:44 -04:00
Matias Cabral
be3cb01cb4
Merge pull request #5397 from nrspruit/ns_ofi_mtl_ssend
MTL OFI: Redesign sync send with reduced tag bits and quick ack
2018-07-17 10:14:33 -07:00
Gaëtan Bossu
8522ba112c MCA/IO/OMPIO: fix MPI_File_delete implementation.
OMPIO now uses the correct delete function depending on the fs

mca_common_ompio_file_delete now works this way instead
of calling POSIX unlink:
 - create a minimal file handle with the given file name
 - select the best fs component using this file handle
 - call the component-specific file delete function

Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>
2018-07-17 18:17:13 +02:00
Gaëtan Bossu
ac6f75e3d1 MCA/FS: check communicator validity in query functions
It is needed because the fs components might be queried due to a MPI_File_delete call.
And in this case, we don't have a communicator value.

Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>
2018-07-17 18:16:21 +02:00
Josh Hursey
9aa5168795
Merge pull request #5353 from ggouaillardet/topic/romio321_grequests
io/romio321: make grequest extensions internal
2018-07-17 10:53:53 -05:00
Gilles Gouaillardet
1a41482720 coll/libnbc: do not recursively call opal_progress()
instead of invoking ompi_request_test_all(), that will end up
calling opal_progress() recursively, manually check the status
of the requests.

the same method is used in ompi_comm_request_progress()

Refs open-mpi/ompi#3901

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 09:45:08 -06:00
Sergey Oblomov
1c7ae22dfb MCA/COMMON/UCX: shift opal memhooks into common UCX
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-17 13:46:38 +03:00
Gilles Gouaillardet
47351b7fac mpiext/pcollreq: Add Fortran use-mpi-f08 bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 16:29:41 +09:00
Kurita, Takehiro
73e038ec18 mpiext/pcollreq: Add Fortran use-mpi bindings
Signed-off-by: Kurita, Takehiro <fj6370fp@aa.jp.fujitsu.com>
2018-07-17 16:29:41 +09:00
Gilles Gouaillardet
9e0115c980 mpiext/pcollreq: Add Fortran mpif-h bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 16:29:33 +09:00
Gilles Gouaillardet
44110a575d mpiext/pcollreq: do include PMPIX_* subroutines to C bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 16:29:33 +09:00
KAWASHIMA Takahiro
5ddf0f6418 mpi/fortran: Fix IN_PLACE detection of ISCATTER(V)
Blocking `MPI_SCATTER` and `MPI_SCATTERV` were fixed in 506d0e96f4
but noblocking `MPI_ISCATTER` and `MPI_ISCATTERV` were not fixed yet.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-17 14:15:21 +09:00
Mikhail Kurnosov
ba83cc91eb coll/base: add MPI_Bcast based on a binomial tree scatter followed by a ring allgather
Implements MPI_Bcast using a binomial tree scatter followed by a ring allgather.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-16 08:56:09 -06:00
Gilles Gouaillardet
61b3308871 mpiext/pcollreq: check subroutine parameters and add profiling symbols
- check subroutine parameters
 - implement PMPIX_* subroutines

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-14 14:14:37 +09:00
Gilles Gouaillardet
dec1663364 spc: add missing subroutines
add counters for :
 - MPI_Exscan
 - MPI_Iexscan
 - MPI_Igatherv

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-14 14:14:37 +09:00
Howard Pritchard
9a5fd48388
Merge pull request #5079 from jsquyres/pr/fortran-is-the-devil
status_set_cancelled: fix F08 binding
2018-07-13 15:36:02 -05:00
Joshua Ladd
b12868239c
Merge pull request #4765 from xinzhao3/topic/osc-ucx-mem-hook
OMPI/OSC/UCX: move memory hooks init in osc to win creation.
2018-07-13 09:36:20 -04:00
Xin Zhao
74ef51af1b OMPI/OSC/UCX: move memory hooks init in osc to win creation.
Move memory hooks init (for request based operation) in osc ucx to window
creation time, to avoid performance issue in MPI initialization.

Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-07-12 15:03:02 -07:00
Nathan Hjelm
304a6a52d4 osc/rdma: use local base for local process when possible
This commit fixes a crash that occurs when using btl/vader as an RDMA
btl. This btl supports using CPU atomics and does not support using
the btl for self communication so we must use the local memory
optimizations in osc/rdma.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-12 15:50:50 -06:00
KAWASHIMA Takahiro
c87a3df0c9
Merge pull request #5416 from kawashima-fj/pr/coll-libnbc-suppress-warnings
coll/libnbc: Suppress compiler warnings
2018-07-12 15:45:59 +09:00
KAWASHIMA Takahiro
37a05e74aa coll/libnbc: Suppress compiler warnings
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-12 14:42:39 +09:00
KAWASHIMA Takahiro
0021616984 pml/ob1: Fix data corruption of MPI_BSEND
Data transferred by `MPI_BSEND` may corrupt if all of the following
conditions are met.

- The message size is less than the eager limit.
- The `btl_alloc` function in the BTL interface returns `NULL`
  for some reason.
- The MPI program overwrites the send buffer after `MPI_BSEND`
  returns.

The problem is in the way of pending a send request in ob1 PML.
The `mca_pml_ob1_send_request_start_copy` function retruns
`OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns
`des = NULL`. In this case, the send request is added to the
`send_pending` list and `MPI_BSEND` returns immediately. Next time
the `mca_pml_ob1_send_request_start_copy` function tries sending,
the user buffer may have been overwritten by the MPI program.

Call hierarchy of `MPI_BSEND`:

```
  MPI_Bsend
    mca_pml_ob1_send
      if (MCA_PML_BASE_SEND_BUFFERED == sendmode)
        mca_pml_ob1_isend
          MCA_PML_OB1_SEND_REQUEST_START_W_SEQ
            mca_pml_ob1_send_request_start_seq
              mca_pml_ob1_send_request_start_btl
                if (size <= eager_limit)
                  if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED)
                    mca_pml_ob1_send_request_start_copy
                      mca_bml_base_alloc
                        btl_alloc
              if (OMPI_ERR_OUT_OF_RESOURCE == rc)
                add_request_to_send_pending
        ompi_request_free
```

To solve this problem, we should save the data to the buffer
attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`.

This problem was introduced by ob1 optimization (commits 2b57f422
and a06e491c) in v1.8 series.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-12 14:30:58 +09:00
Howard Pritchard
34bc77747c
Merge pull request #5388 from mkurnosov/base-gather-bmtree-fix-mpi-in-place
coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing
2018-07-11 18:34:35 -05:00
Nathan Hjelm
35a75a6bf5 osc/sm: avoid filename collision when multiple windows share same CID
This commit fixes an issue identified by MTT where we can have two
different sets of processes on the same node creating a shared memory
window with communicators sharing the same CID. To avoid this issue
the temporary filename now includes the creating processes vpid.

References #5363

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-11 14:32:27 -06:00
Nathan Hjelm
037656bc1d osc/rdma: fix bug introduced in b90c838
This commit fixes an bug that was introduced back in 2016 which
impacts request-based RMA in some cases.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-10 18:17:55 -06:00
Gilles Gouaillardet
76292951e5 coll/libnbc: fix integer overflow
Use internal pack/unpack subroutines that operate on MPI_Aint instead of int
and hence solve some integer overflows.

Thanks Clyde Stanfield for reporting this issue.

Refs open-mpi/ompi#5383

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-09 10:08:33 -06:00
Mikhail Kurnosov
22fa5a8a67 coll/base/scatter: replaces right skewed binomial tree (in order) with left skewed binomial tree
Current implementation of `coll/base/MPI_Scatter` is based on in-order binomial tree. This tree is right skewed and it provides good performance for a MPI_Gather operation. But for a MPI_Scatter operation left skewed binomial tree is effective.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-09 10:04:41 -06:00
Spruit, Neil R
9a17864278 MTL OFI: Redesign sync send with reduced tag bits and quick ack
-Updated the design for sync send MPI calls to use 2 protocol bits for
denoting "sync_send" or "sync_send_ack".

-"Sync_send" is added to the send tag only and is masked out in receives
such that it can be read by the original Recv posted in the send/recv
operation.

-"Sync_send_ack" is sent from the recv callback to the send side. This 0
byte send does not generate a completion entry and instead sends the
message and immediately completes the opal completion in the recv.

-Tag formats ofi_tag_1 and ofi_tag_2 have been updated to include 2
more tag bits per format type due to the reduced protocal bits required
by OMPI.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-07-09 06:50:21 -07:00
Yossi Itigin
e77e31b50b
Merge pull request #5378 from hoopoepg/topic/unify-ucx-logging
MCA/COMMON/UCX: unified logging across all UCX modules
2018-07-08 12:45:26 +03:00
Mikhail Kurnosov
b9e14cd7d0 coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing
The call of MPI_Gather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault in the root process.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard (page 150, line 37), sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-07 20:59:39 +07:00
Sergey Oblomov
240670152e MCA/COMMON/UCX: code beautify - alignment
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-06 19:40:58 +03:00
Sergey Oblomov
eb7010933d OSC/UCX: suppressed compilation warnings
- suppressed sing/unsign-compare warnings

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-06 10:58:09 +03:00
Sergey Oblomov
bef47b792c MCA/COMMON/UCX: unified logging across all UCX modules
- added common logging infrastructure for all
  UCX modules
- all UCX modules are switched to new infra

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-05 16:25:39 +03:00
Sergey Oblomov
8080283b3d MCA/COMMON/UCX: changed return type for wait_request
- for now wait_request returns OMPI status
- updated callers

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 23:29:38 +03:00
Sergey Oblomov
c2bd6af9f2 MCA/COMMON/UCX: minor unification of del_proces calls
- some common functionality of del_procs calls is moved into
  mca_common module
- blocking ucp_put call is replaced by non-blocking routine

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 15:10:53 +03:00
Yossi Itigin
09c10d5e09
Merge pull request #5345 from hoopoepg/topic/pml-ucx-suppress-compiler-warning
PML/UCX: suppressed compilation warning
2018-07-02 13:41:12 +03:00
Edgar Gabriel
d191ed6b4f fs/base: move redundant code to fs/base
moving some code from fs/ufs into fs/base. The benefit of this approach is
that fs components that are fundamentally based on posix I/O (and only differ
in some non-posix functionality such as setting stripe size, or which hints
are being supported) can avoid having to replicate the same code over and
over again. First beneficiary is the lustre fs component, but more
are to follow soon.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-07-01 10:20:32 -05:00
Xin Zhao
c1ac0c00c5
Merge pull request #5185 from jjolly/fix-memcpy-size-mismatch
- Build warning: stringop-overflow in get_dynamic_win_info() at osc_ucx_comm.c
2018-06-29 19:37:53 -07:00
Jeff Squyres
f4320193e3 mpi.h.in: remove some deprecation/removed warnings
Intentionally do not mark some MPI-1 function pointer typedefs as
`__mpi_interface_removed__` because we have to use them in prototyping
some MPI-1 functions when `--enable-mpi1-compatibility` is used.

Fixes #5357.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-29 07:43:51 -07:00
Gilles Gouaillardet
7363906e4e io/romio321: make grequest extensions internal
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-29 16:41:27 +09:00
Jeff Squyres
c1ccbece2f
Merge pull request #5347 from jsquyres/pr/fix-f90-removed-interfaces
F90 removed interfaces: add missing "end interface"
2018-06-27 13:54:02 -04:00
Jeff Squyres
768b800533 F90 removed interfaces: add missing "end interface"
Thanks to @fsciortino for reporting.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-27 13:02:16 -04:00
Sergey Oblomov
074f30ba27 PML/UCX: suppressed compilation warning
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-27 12:05:07 +03:00
Yossi Itigin
aca61a6bfb
Merge pull request #5238 from hoopoepg/topic/fixed-coverity-issues-ucx-pml
UCX/PML: fixed few coverity issues
2018-06-27 11:14:06 +03:00
Nathan Hjelm
4c230683e7 osc/sm: fix a typo
This commit fixes a typo where a bcast is used instead of the intended
collective (barrier).

References #5262

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-06-26 12:53:12 -06:00
Sergey Oblomov
502d04bf12 UCX/PML/SPML: fixed few coverity issues
- fixed incorrect pointer manipulation/free
- cleaned dead code
- minor optimization on process delete routine
- fixed error handling - free pointers
- added debug output for woker flush failure

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-26 18:52:39 +03:00
Yossi Itigin
ee873f4f79
Merge pull request #5322 from hoopoepg/topic/mca-ucx-common
MCA/UCX: added common module
2018-06-26 13:54:12 +03:00
Gilles Gouaillardet
e609cf7bc3
Merge pull request #5337 from ggouaillardet/topic/generalized_requests
ompi/requests: implement generalized request extensions
2018-06-26 13:01:04 +09:00
KAWASHIMA Takahiro
a8da78eeaa
Merge pull request #4618 from ggouaillardet/topic/pcoll
Add the persistent collectives feature
2018-06-26 12:36:34 +09:00
Gilles Gouaillardet
5c394377d0 io/romio312: use Grequest extensions provided by Open MPI
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-26 10:52:18 +09:00
Gilles Gouaillardet
f72922b8b1 io/romio321: do not use removed MPI1 primitives
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-26 10:52:18 +09:00
Gilles Gouaillardet
383f23bf35 ompi/request: implement MPI Generalized request extensions
so latest ROM-IO can be used with Open MPI.

Note this first and naive implementation does not use the wait_fn callback.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-26 10:52:18 +09:00
Gilles Gouaillardet
1e5404873f io/romio321: update .gitignore
and remove two files that should have never been commited

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-26 10:52:17 +09:00
Joshua Ladd
256ad707f1
Merge pull request #5293 from yosefe/topic/osc-ucx-on-demand-progress
osc_ucx: register progress on-demand
2018-06-25 15:09:11 -04:00
Joshua Ladd
98afc838aa
Merge pull request #5294 from yosefe/topic/coll-hcoll-progress-fn
coll_hcoll: register progress callback directly without a proxy
2018-06-25 15:07:26 -04:00
Nathan Hjelm
e4989714c2 osc/rdma: fix data race on teardown
The osc/rdma module did not wait for all pending atomics to complete
before tearing down. This could lead to weird issues as the target
location may no longer be registered or allocated.

This commit also fixes an offset calculation issue in
ompi_osc_get_data_blocking ().

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-06-25 11:47:34 -06:00
Nathan Hjelm
c9e58cedc1 mpi.h: fix warning with gcc
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-06-25 11:45:36 -06:00
Ralph Castain
3b2390e5d5 Silence coverity warnings, remove/ignore build product
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-25 08:01:28 -07:00
Sergey Oblomov
bf7fd480e9 MCA/COMMON/UCX: added non-blocking implementations of atomics
- added implementation of swap/cswap/fadd operations
- blocking add64 is replaced by non-blocking routine

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-25 12:25:31 +03:00
Sergey Oblomov
63e7ba6843 MCA/COMMON/UCX: added parameter for UCX/opal progress
- added parameter to set UCX/opal progresses
- minor refactoring of request wait routines

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-25 11:00:12 +03:00
Yossi Itigin
e3ee11608b coll_hcoll: register progress callback directly without a proxy
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-06-24 18:06:07 +03:00
Edgar Gabriel
edfdcb6e82
Merge pull request #5324 from edgargabriel/pr/minor-fixes
Pr/minor fixes
2018-06-22 17:20:02 -05:00
Howard Pritchard
8babaad35c
Merge pull request #4520 from ggouaillardet/refresh/romio321
io/romio321: refresh ROMIO based on latest stable MPICH 3.2.1
2018-06-22 16:58:46 -05:00
Edgar Gabriel
cf5cdad40f fcoll: make vulcan the default component
make vulcan the default component except for Lustre file systems.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-22 14:12:02 -05:00
Edgar Gabriel
fd8c5fba4e common/ompio: fix the fview based grouping options
a bug sneaked into constructing the list of aggregators
processes when using the fileview based grouping options

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-22 14:01:31 -05:00
Sergey Oblomov
d57ae62dee MCA/UCX: added common module
- implemented non-blocking routines for flush operations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-22 16:41:09 +03:00
Gilles Gouaillardet
edd02b7144 pml/ucx: silence a warning
declare 'fenced' volatile in order to silence CID 1437465

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-22 13:11:42 +09:00
Edgar Gabriel
743e0dff5a common/ompio: fix zero size fview issue
handle the situation where the user requests a non-zero amount
of data but has a zero-size fileview. My instrinct would have been
to return an error code, but according to the test that I used
it should be MPI_SUCCESS and zero bytes. It is definitely better
than segfaulting :-)

THis makes another test from the IBM testsuite pass.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-21 17:02:13 -05:00
Edgar Gabriel
7643ccfbcf sharedfp/sm and sharedfp/lockedfile: fix seek offset calculation
the seek offset calculation did not treat the offset as a multiple
of the etype provided. Fixing this makes some more ibm tests pass.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-21 14:26:36 -05:00
Mikhail Kurnosov
c500739293 coll/base: Add MPI_Bcast based on a scatter followed by an allgather
Implements MPI_Bcast using a binomial tree scatter followed by
an recursive doubling allgather.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-06-21 11:47:07 -06:00
Edgar Gabriel
fb16d40775
Merge pull request #5196 from edgargabriel/topic/cuda
io/ompio: introduce initial support for cuda buffers in ompio
2018-06-21 10:14:43 -05:00
Edgar Gabriel
7808379a47 common/ompio: incorporate George's comments
incorporate a couple of comments by George as part of the
review on github.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-21 09:29:49 -05:00