1
1
Граф коммитов

10503 Коммитов

Автор SHA1 Сообщение Дата
Joseph Schuchart
a193ae26bf Fix regression introduced earlier by re-adding a barrier after shared memory has been registered
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2018-10-24 15:54:19 -04:00
Aurelien Bouteiller
96c91e94eb
Manage errors in communicator creations (cid)
In order for this to work, error management needs to also be added to
NBC, from separate PR

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

The error field of requests needs to be rearmed at start, not at create

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2018-10-23 23:43:33 -04:00
Yossi Itigin
4c442f2601
Merge pull request #5934 from hoopoepg/topic/suppressed-cov-warn-added-log-msg
COMMON/UCX: suppressed coverity warnings
2018-10-22 11:00:47 +03:00
Mikhail Kurnosov
8b511c7889 coll/libnbc/ireduce: silence Coverity warning CID 1440360
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-10-22 11:20:28 +07:00
Sergey Oblomov
1099d5f023 COMMON/UCX: added error code to log output
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-21 11:37:25 +03:00
Nathan Hjelm
a66373454e
Merge pull request #5943 from bosilca/fix/libnbc_warnings
Remove few warnings in libnbc identified by clang-1000.11.45.2
2018-10-20 21:24:30 -06:00
Bernhard M. Wiedemann
bc23993dea Allow to override build user and host
using the standard $USER and $HOSTNAME environment variables
to make reproducible builds possible.
See https://reproducible-builds.org/ for why this is good.

This helps improve issue #3759

Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
2018-10-20 09:27:00 -04:00
bosilca
c3abedbd2c
Merge pull request #5759 from bosilca/fix/monitoring
Fix/monitoring
2018-10-19 07:18:41 -07:00
Nathan Hjelm
dbae9c0958 romio/romio321: silence some compiler warnings
Some compilers complain when comparing signed and unsigned. romio321
was doing just this. The check is meant to check whether a size (which
is an ADIO_Offset-- a signed number) will work with memcpy which takes
a size_t. To silence the warning I added a new type (ADIO_Size) which
is an unsigned type and cast the ADIO_Offset to this new type.

Fixes #5951

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-10-18 13:36:51 -06:00
George Bosilca
dc972f0b92
Fix the PML monitoring.
The monitoring PML hides it's existence from the OMPI infrastructure by
removing itself from the list of PML loaded components, remaining hidden
until MPI_Finalize.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-10-18 00:29:23 -04:00
George Bosilca
668aa15dda
Early selection of the best PML.
With this patch the best PML is selected earlier, before finalizing
the others PML. This provides a simpler mechanism to intercept and
highjack the PML (as done in the monitoring PML)

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-10-18 00:29:23 -04:00
Mikhail Kurnosov
73e048b62a coll/libnbc: add Rabenseifner's algorithm for MPI_Iallreduce
An implementation of R. Rabenseifner's algorithm for MPI_Iallreduce.

This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by an allgather.

Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-10-18 08:50:16 +07:00
Ralph Castain
1bd772e8eb Remove the stale orte-dvm code
Users should migrate to https://github.com/pmix/prrte

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-10-17 15:11:38 -07:00
George Bosilca
66182a294d
Remove few warnings in libnbc identified by clang-1000.11.45.2
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-10-17 18:04:39 -04:00
Howard Pritchard
a435bfe1cf
Merge pull request #5933 from hppritcha/topic/remove_bfo_pml
remove the bfo pml
2018-10-17 09:39:58 -06:00
Nathan Hjelm
43547ade4c
Merge pull request #5663 from mkurnosov/coll-ireduce-rabenseifner
coll/libnbc: add Rabenseifner's algorithm for MPI_Ireduce
2018-10-17 09:02:06 -06:00
Nathan Hjelm
979a199b4f
Merge pull request #5896 from mkurnosov/coll-iallgather-recursivedoubling
coll/libnbc: add recursive doubling algorithm for MPI_Iallgather
2018-10-17 09:01:14 -06:00
Sergey Oblomov
df765595e3 COMMON/UCX: suppressed coverity warnings
- suppressed coverity warnings - added log messages on failed calls

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-17 16:11:03 +03:00
Howard Pritchard
7d6774acf8 remove the bfo pml
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-10-17 06:50:11 -06:00
Nathan Hjelm
1ff3cfedb6
Merge pull request #5921 from devreal/ompi-rdma-preinit
RDMA OSC: initialize segment memory before registering the segment
2018-10-16 15:10:02 -06:00
Joseph Schuchart
d9dcdfdfba RDMA OSC: initialize segment memory before registering the segment
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2018-10-16 16:12:14 -04:00
Edgar Gabriel
069084e6ad
Merge pull request #5907 from edgargabriel/topic/testmpio-fixes
Topic/testmpio fixes
2018-10-16 13:03:22 -07:00
Edgar Gabriel
ba95588332 io/ompio: add verification for data representations.
check for providing a data representation that is actually supported
by ompio.

Add also one check for a non-NULL pointer in mpi/c/file_set_view
for the data representation.

Also fixes parts of issue #5643

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-16 12:45:33 -05:00
Jeff Squyres
54ca3310ea ompi: cleanup various string operations
Several fixes to string handling:

1. strncpy() -> opal_string_copy() (because opal_string_copy()
   guarantees to NULL-terminate, and strncpy() does not)
2. Simplify a few places, such as:
   * Since opal_string_copy() guarantees to NULL terminate, eliminate
     some memsets(), etc.
   * Use opal_asprintf() to eliminate multi-step string creation

There's more work that could be done; e.g., this commit doesn't
attempt to clean up any strcpy() usage.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-10-14 16:10:20 -07:00
Yossi Itigin
a5b1c9a91d
Merge pull request #5898 from yosefe/topic/pml-ucx-init-err-code
pml_ucx: fix return code from mca_pml_ucx_init() error flow
2018-10-14 11:34:00 +03:00
Gilles Gouaillardet
0a09b0419e
Merge pull request #5812 from ggouaillardet/topic/mpi_sizeof_misc_additions
fortran: add CHARACTER and LOGICAL support to MPI_Sizeof()
2018-10-12 14:08:27 +09:00
Edgar Gabriel
849d0452a0 io/ompio: execute barrier before sync
this ensures that all processes are done modifying a file
before syncing. Fixes an error in the testmpio testsuite.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-11 17:39:05 -05:00
Edgar Gabriel
bf058ca6b0 common/ompio: check datatypes when setting file view
return MPI_ERR_ARG if the size of the fileview is not a
multiple of the size of the etype provided.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-11 14:43:32 -05:00
Edgar Gabriel
05d25383c2 common/ompio: return correct error code for improper access
return MPI_ERR_ACCESS if the user tries to read from  a file
that was opened using MPI_MODE_WRONLY

return MPI_ERR_READ_ONLY if the user tries to write a file
that was opened using MPI_MODE_RDONLY

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-11 14:41:58 -05:00
Edgar Gabriel
c0d7b578be io/ompio: fix seek position calculation for SEEK_CUR
This commit fixes the calculation of the position where to
seek to, in case SEEK_CUR is used.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-11 14:09:03 -05:00
Yossi Itigin
b71e85b8d5 pml_ucx: fix return code from mca_pml_ucx_init() error flow
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-11 18:48:54 +03:00
Jeff Squyres
f4b3ccabf7 mpi.h.in: remove C99-style comments
While we require C99 to build Open MPI, we do not require C99 to build
user MPI applications.  As such, we shouldn't have C99-style comments
(i.e., "//"-style) in mpi.h.in.

Thanks to @AdamSimpson for reporting the issue.

This commit simply converts a //-style comment to a /**/-style
comment.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-10-11 10:58:06 -04:00
Mikhail Kurnosov
a7386c1e09 coll/libnbc: add recursive doubling algorithm for MPI_Iallgather
Implements recursive doubling algorithm for MPI_Iallgather.
The algorithm can be used only for power-of-two number of processes.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-10-11 21:43:13 +07:00
Yossi Itigin
b8e1af6fcb osc_ucx: add worker flush before osc module free
Make sure all pending communications are done on all ranks before
closing the window. This way it will be safe to close the endpoints when
closing the component.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:47:16 +03:00
Yossi Itigin
bcc48515e4 Revert "osc_ucx: fix hang/timeout in component finalize"
This reverts commit 438d13b4ca1e7333b789ca3fb536fda17b0feb38.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:47:13 +03:00
Yossi Itigin
27d8c8e83c
Merge pull request #5878 from yosefe/topic/pml-ucx-fix-datatype-leak
pml_ucx: add ompi datatype attribute to release ucp_datatype
2018-10-10 20:18:39 +03:00
Yossi Itigin
a012ee91d8
Merge pull request #5886 from yosefe/topic/osc-ucx-fix-finalize-hang
osc_ucx: fix hang/timeout in component finalize
2018-10-10 16:29:29 +03:00
Yossi Itigin
9a365555b0
Merge pull request #5879 from hoopoepg/topic/fixed-zero-size-window
OSC/UCX: fixed zero-size window processing
2018-10-10 16:28:55 +03:00
Yossi Itigin
40ac9e4771 pml_ucx: fix return code from mca_pml_ucx_init()
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 14:41:05 +03:00
Yossi Itigin
dc6809495d osc_ucx: fix hang/timeout in component finalize
Add barrier to make sure all endpoints are destroyed before destroying
the worker.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 14:38:06 +03:00
Sergey Oblomov
ae6f81983f OSC/UCX: fixed zero-size window processing
- added processing of zero-size MPI window

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-10-10 13:08:01 +03:00
Nathan Hjelm
32682aa2c0
Merge pull request #5772 from mkurnosov/coll-ibcast-knomial
coll/libnbc: add knomial tree algorithm for MPI_Ibcast
2018-10-09 16:26:13 -06:00
Jeff Squyres
bb13941b69
Merge pull request #5811 from ggouaillardet/topic/mpi_f08_c_types
fortran/use-mpi-f08: add MPI C types
2018-10-09 13:17:30 -04:00
Yossi Itigin
4763822a64 pml_ucx: add ompi datatype attribute to release ucp_datatype
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-09 17:34:34 +03:00
Mikhail Kurnosov
b0429d25df coll/libnbc: add knomial tree algorithm for MPI_Ibcast
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-10-09 20:43:04 +07:00
Mikhail Kurnosov
7bd63e79c8 coll/libnbc: add Rabenseifner's algorithm for MPI_Ireduce
An implementation of R. Rabenseifner's algorithm for MPI_Ireduce.
This algorithm is a combination of a reduce-scatter implemented with recursive vector halving
and recursive distance doubling, followed either by a gather.

Limitations:
-- count >= 2^{\floor{\log_2 p}}
-- commutative operations only
-- intra-communicators only

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-10-09 20:27:09 +07:00
KAWASHIMA Takahiro
b491b454dc java: Fix javadoc build failure with OpenJDK 11
OpenJDK 11 changed the default javadoc output HTML version to HTML 5
from HTML 4.01. It causes an error on building Open MPI configured
with `--enable-mpi-java` (default: disable). This fix is compatible
with older OpenJDK.

I don't know whether this problem exists with other vender's JDKs.
But this fix should be compatible with other JDKs because the new
syntax is used in other places in the same file.

Thanks to Siegmar Gross for the bug report.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-10-09 17:49:30 +09:00
Brian Barrett
e9e4d2a4bc Handle asprintf errors with opal_asprintf wrapper
The Open MPI code base assumed that asprintf always behaved like
the FreeBSD variant, where ptr is set to NULL on error.  However,
the C standard (and Linux) only guarantee that the return code will
be -1 on error and leave ptr undefined.  Rather than fix all the
usage in the code, we use opal_asprintf() wrapper instead, which
guarantees the BSD-like behavior of ptr always being set to NULL.
In addition to being correct, this will fix many, many warnings
in the Open MPI code base.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-08 16:43:53 -07:00
Mikhail Kurnosov
9557fa087f Resolve merge conflicts
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-10-05 21:40:27 +07:00
KAWASHIMA Takahiro
5f1c940c8b
Merge pull request #5840 from kawashima-fj/pr/pcollreq-f08-signatures
mpiext/pcollreq: Correct f08 routine signatures
2018-10-05 08:59:03 +09:00
KAWASHIMA Takahiro
43d85dbc81 mpiext/pcollreq: Add Fortran bindings in man
Fortran bindings were added to persistent collectives in 9e0115c980
but man was not updated.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-10-04 21:05:38 +09:00
KAWASHIMA Takahiro
994b345253 man: Correct markup of MPI_Neighbor_allgather
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-10-04 21:02:35 +09:00
KAWASHIMA Takahiro
be91a26fd8 mpiext/pcollreq: Add missing f08 asynchronous
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-10-04 20:36:30 +09:00
KAWASHIMA Takahiro
357531847e mpiext/pcollreq: Correct f08 routine signatures
Changes of nonblocking collectives in e98d794e8b and f750c6932c
are applied to persistent collectives.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-10-04 19:51:40 +09:00
Nathan Hjelm
88a560fa3c
Merge pull request #5744 from mkurnosov/coll-iscan-recursivedoubling
coll/libnbc: add recursive doubling algorithm for MPI_Iscan
2018-10-03 09:02:05 -06:00
Gilles Gouaillardet
69f1a19c5d fortran/use-mpi-f08: add MPI C types
Refers open-mpi/ompi#5801

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-10-03 16:09:00 +09:00
KAWASHIMA Takahiro
eb65e1c6fb
Merge pull request #5799 from kawashima-fj/pr/correct-f08-signatures
fortran/use-mpi-f08: Correct f08 routine signatures
2018-10-03 10:37:21 +09:00
Brian Barrett
b2ee56aa81 fortran: Fix ident warning
On OS X, where #pragma ident and #ident aren't supported, the
use of a static const star that was never used was generating
a warning (and, it should be noted, was useless, because the
compiler would optimize it away).  Fix up the ident declaration
so that it is only created once in libmpi_mpifh.la.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-02 13:35:15 -04:00
Brian Barrett
2e24e6ec08 coll libnbc: Remove dead code
Remove dead code that was causing warnings about unused static
functions.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-10-02 13:35:15 -04:00
Gilles Gouaillardet
e4001040b4 fortran: add CHARACTER and LOGICAL support to MPI_Sizeof()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-10-01 13:37:05 +09:00
KAWASHIMA Takahiro
cf6d28cb66 fortran/use-mpi-f08: Correct f08 routine signatures
Following the commit f750c6932c, I compared
`ompi/mpi/fortran/use-mpi-f08/*.F90` and
`ompi/mpi/fortran/use-mpi-f08/profile/p*.F90`, and
`ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces.F90` and
`ompi/mpi/fortran/use-mpi-f08/mod/pmpi-f08-interfaces.F90`.

There are many differences. Some are bugs of `MPI_*`, some are
bugs of `PMPI_*`. I'm not sure how these bugs affect applications.

To make it easy to compare these files future, I also removed
editorial differences.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-09-29 01:39:01 +09:00
Jeff Squyres
7223334d4d mpi.h: remove MPI_UB/MPI_LB when not enabling MPI-1 compat
When --enable-mpi1-compatibility was specified, the ompi_mpi_ub/lb
symbols were #if'ed out of mpi.h.  But the #defines for MPI_UB/LB
still remained.  This commit also #if's out the MPI_UB/LB macros when
--enable-mpi1-compatibility is specified.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-28 09:10:03 -07:00
Jeff Squyres
11ab621555 mpi.h: file errhandeler typedef: use new form of name
The old/deprecated form of the file errhandler typedef used "fn" as a
suffix.  The new form uses the name "function".

The MPI API typedef name has already been updated to use "function";
this commit updates the internal Open MPI typedef to use the name
"function" to match the MPI API name and avoid confusion.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-28 07:49:28 -07:00
Brian Barrett
c5eaa38491 mtl ofi: Change from opt-in to opt-out provider selection
Change default provider selection logic for the OFI MTL.  The
old logic was whitelist-only, so any new HPC NIC provider would
have to ask users to do extra work or wait for an OMPI release
to be whitelisted.  The reason for the logic was to avoid
selecting a "generic" provider like sockets or shm that would
frequently have worse performance than the optimized BTL options
Open MPI supports.

With the change, we blacklist the (small, relatively static) list
of providers that duplicate internal capabilities.  Users can use
one of thse blacklisted providers in two ways: first, they can
explicitly request the provider in the include list (which will
override the default exclude list) and second, the can set a new
empty exclude list.

Since most HPC networks require special libraries and therefore
an explicit build of libfabric, it is highly unlikely that this
change will cause users to use libfabric when they didn't want to
do so.  It does, however, solve the whitelisting problem.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-09-27 11:02:18 -07:00
Jeff Squyres
6bb356ab87 Squash a bunch of harmless compiler warnings.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-26 12:15:21 -07:00
Gilles Gouaillardet
f750c6932c fortran/use-mpi-f08: Corrections to PMPI signatures of collectives
Corrected the signatures of the collectives used by the Fortran 2008
interface to state correct intent for inout arguments and use the
ASYNCHRONOUS attribute in non-blocking collective calls.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-09-25 11:17:01 +09:00
Philipp Otte
e98d794e8b fortran/use-mpi-f08: Corrections to Fortran08 signatures of collectives
Corrected the signatures of the collectives used by the Fortran 2008
interface to state correct intent for inout arguments and use the
ASYNCHRONOUS attribute in non-blocking collective calls. Also corrected
the C-bindings in Fortran accordingly

Signed-off-by: Philipp Otte <philipp.j.otte@googlemail.com>
2018-09-25 11:16:52 +09:00
Mikhail Kurnosov
dfe203e167 coll/libnbc: add recursive doubling algorithm for MPI_Iexscan
Implements recursive doubling algorithm for MPI_Iexscan.
The algorithm preserves order of operations so it can be used both
by commutative and non-commutative operations.

The MCA parameter 'coll_libnbc_iexscan_algorithm' was added for dynamic
algorithm selection.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-09-23 19:54:27 +07:00
Mikhail Kurnosov
3d43ff0f32 coll/libnbc: add recursive doubling algorithm for MPI_Iscan
Implements recursive doubling algorithm for MPI_Iscan. The algorithm preserves order of operations so it can be used both by commutative and non-commutative operations.

The MCA parameter coll_libnbc_iscan_algorithm was added for dynamic algorithm selection.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-09-22 21:09:12 +07:00
bosilca
3f598e9e83
Merge pull request #5450 from mkurnosov/coll-base-allgather-fix-in-place
coll-base-allgather: fix MPI_IN_PLACE processing
2018-09-21 14:51:45 -04:00
bosilca
17f1684438
Merge pull request #5491 from mkurnosov/coll-base-allgatherv-fix-mpi-in-place
coll/base/allgatherv: fix MPI_IN_PLACE processing
2018-09-21 14:48:16 -04:00
bosilca
441727fcb0
Merge pull request #5680 from ggouaillardet/topic/nbc_unpack
coll/libnbc: fix NBC_Unpack()
2018-09-19 10:17:21 -04:00
bosilca
2ae3cfd9bc
Merge pull request #5699 from ICLDisco/export/coll_errors
Error cases in base collectives
2018-09-19 09:47:24 -04:00
Jeff Squyres
3dae8703a5
Merge pull request #5451 from ggouaillardet/topic/use_mpi_f08_bindings
fortran/use-mpi-f08: clean [p]ompi_FOO_f bindings
2018-09-19 07:53:04 -04:00
Kurita, Takehiro
fb8311d331 java: Fix typos of javadoc
Signed-off-by: Kurita, Takehiro <fj6370fp@aa.jp.fujitsu.com>
2018-09-19 14:45:17 +09:00
Gilles Gouaillardet
c4ce01d104 fortran/use-mpi-f08: use bindings from ompi_mpifh_bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-09-19 14:13:00 +09:00
Gilles Gouaillardet
c6070fd2e0 fortran/use-mpi-f08: fix [p]ompi_FOO_f symbols handling
- do not generate bindings for pompi_FOO_f symbols
   (they are simply not used anywhere)

 - move ompi_FOO_f bindings out of mpi_f08.mod into
   ompi_mpifh_bindings.mod that is only used at build time

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-09-19 14:12:55 +09:00
Gilles Gouaillardet
6e04b2a66a configury: do not define "dummy" empty targets any more.
We previously needed to have empty targets because AM couldn't handle
having an AM_CONDITIONAL was targets in the "if" statement but not in the
"else".  :-(

That now appears as an old automake bug that has been fixed,
so cleanup some Makefile.am

Thanks Jeff for the pointer.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-09-19 14:12:50 +09:00
Gilles Gouaillardet
d2393251f7 use-mpi-f08: fix a typo in [P]MPI_Dist_graph_create_adjacent bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-09-19 14:12:46 +09:00
Jeff Squyres
09d6740a72
Merge pull request #4897 from bosilca/topic/waitsome
Be conservative with the array_of_indices
2018-09-18 12:34:22 -04:00
KAWASHIMA Takahiro
e27f519a5e
Merge pull request #5722 from kawashima-fj/pr/pcoll-typo
mpiext/pcollreq: fix more typos
2018-09-18 15:41:20 +09:00
KAWASHIMA Takahiro
4a0a2598f6 mpiext/pcollreq: fix more typos
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-09-18 14:45:33 +09:00
Gilles Gouaillardet
8b51862fb2 coll/libnbc: fix various error paths
The parameter passed to NBC_Return_handle() was incorrectly casted
and not dereferenced.

Thanks Yossi for the bug report.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-09-18 13:17:43 +09:00
Gilles Gouaillardet
8dc6985a5a mpiext/pcollreq: fix misc typos
Thanks Jeff for the report

Fixes open-mpi/ompi#5712

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-09-18 10:02:04 +09:00
Jeff Squyres
5be0ba0247 common/monitoring: fix include files
Move includes to top of file.  Set some #defines so that
monitoring_prof.c compiles without warning (as identified by gcc 8 on
MacOS).  Also ensure to include the internal Open MPI "mpi.h" file
(not some random system <mpi.h> file).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-15 06:04:13 -07:00
Jeff Squyres
06c1bf73da libnbc: remove some stale/dead code
Gcc 8 identified hb_tree_csearch() as an infinite recursion, and it
turns out that we never call this function, anyway.  So just remove
it.

Fixes #5670.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-15 06:04:13 -07:00
Jeff Squyres
8f2620d3af misc: compiler warning fixes
A variety of small compiler warning fixes.  The 2 PMIx fixes are
already committed upstream.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-15 06:04:13 -07:00
Nathan Hjelm
1071d72130
Merge pull request #5445 from hjelmn/asm_type
Update opal to use C11 atomics if available
2018-09-14 12:32:56 -06:00
Aurelien Bouteiller
466217fadd
Always return a valid error code from collective operations
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2018-09-14 13:46:35 -04:00
Nathan Hjelm
000f9eed4d opal: add types for atomic variables
This commit updates the entire codebase to use specific opal types for
all atomic variables. This is a change from the prior atomic support
which required the use of the volatile keyword. This is the first step
towards implementing support for C11 atomics as that interface
requires the use of types declared with the _Atomic keyword.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-14 10:48:55 -06:00
Ralph Castain
466cad6cb2 Update master to PMIx v4
Retain ext3x for PMIx 3  compatibility
Get the blasted permissions correct on config files

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-09-13 08:24:17 -07:00
Gilles Gouaillardet
ff48e92864 coll/libnbc: fix NBC_Unpack()
always initialize 'size'.

Only the a2a_sched_diss() alltoall algorithm is impacted,
and this algo is currently unused, so there is no need
to backport nor update the NEWS file for now.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-09-13 10:55:29 +09:00
KAWASHIMA Takahiro
69901a5156 mpiext/pcollreq: Fix zero-count reduction
We need to return a persistent request.
`ompi_request_empty` is not a persistent request.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-09-10 11:19:07 +09:00
Gilles Gouaillardet
42b0e3bd61
Merge pull request #5494 from markalle/apply_romio314_patch_to_master
apply romio314 patch to romio321
2018-09-05 10:27:19 +09:00
Nathan Hjelm
1c89631db5
Merge pull request #5630 from hjelmn/osc_portals4_c99
osc/portal: use c99 subobject naming to initialize module
2018-08-30 09:20:55 -06:00
Gilles Gouaillardet
b79b37465c ompi/hook: plug a misc memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-08-30 10:07:18 +09:00
Gilles Gouaillardet
316e4e38f4 mtl/psm2: fix a misc memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-08-30 10:07:17 +09:00
Gilles Gouaillardet
fed33c1530 pml/ob1: plug a memory leak in mca_pml_ob1_component_fini()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-08-30 10:07:17 +09:00
Gilles Gouaillardet
d0d399c9a9 ompi/info: plug memory leaks in ompi_mpiinfo_finalize()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-08-30 10:07:17 +09:00
Nathan Hjelm
7fdf887937 osc/portal: use c99 subobject naming to initialize module
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-29 10:34:24 -06:00
Yossi Itigin
68206a5635
Merge pull request #5569 from hoopoepg/topic/optimize-blocked-calls
PML/UCX: blocked calls optimizations
2018-08-29 14:19:09 +03:00
Yossi Itigin
4bb6845888
Merge pull request #5570 from hoopoepg/topic/opal-mem-hooks-syno
MCA/COMMON/UCX: added synonym to opal_mem_hook variable
2018-08-29 14:16:33 +03:00
Edgar Gabriel
2303f0f17c io/base: fixes to file_delete selection logic
file_delete triggers underneath the hood the full component selection
logic, since we do not have a file handle, just a file name.

As part of the selection logic, we have to however initiate the
framework-open of the fs component in case of ompio, since ompio
will call the delete function of the selected fs componentn, which
is based on the file system where the file is located.

This was not handled correctly so far. The problem however only
shows up if the first I/O operatin to be executed is a file_delete,
other wise the file_open will lead to the correct opening and initialization
of the fs framework. This commit ensures that we do the right thing
even if file_delete is the first file I/O operation in the application.

Fixes issue #5611

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-08-27 16:01:48 -05:00
Edgar Gabriel
9b65ec9445 sharedfp/sm and lockedfile: fix naming bug
If an application opens a file for reading from multiple processes
using MPI_COMM_SELF (or another communicator that has distinct
process groups but the same comm-id, as can happen as the result
of comm_split), the naming chosen for the lockedfile or the mmapped
file used by the sharedfp/sm component would collide. This patch
ensures that the filename is different by integrating the process id
of rank 0 for each sub-communicator.

This fixes one aspect of the problem reported in github issue 5593

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-08-27 12:40:49 -05:00
Nathan Hjelm
2221720fc7
Merge pull request #5591 from hjelmn/osc_rdma_cleanup
osc/rdma: clean out stale aggregation code
2018-08-27 10:05:50 -06:00
Ralph Castain
b68ac81efd
Merge pull request #5590 from aravindksg/aravindksg/thread_settings
MTL OFI: Ask for FI_THREAD_DOMAIN support as needed
2018-08-27 06:33:57 -07:00
Sergey Oblomov
c201c0abb3 PML/UCX: blocked calls optimizations: removed reset progress count
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-27 09:50:39 +03:00
Sergey Oblomov
2cd9e04166 PML/UCX: optimization of mprobe call - renamed vars
- renamed of internal variable names
- used unsigned datatypes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-27 09:50:39 +03:00
Sergey Oblomov
38e908f83e PML/UCX: optimization of mprobe call
- refactoring of opal/UCX progress calls

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-27 09:50:38 +03:00
Sergey Oblomov
b0f87f2235 PML/UCX: blocked calls optimizations
- added UCX progress priority

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-27 09:50:38 +03:00
Sergey Oblomov
b72dd83f05 MCA/COMMON/UCX: added synonims for common ucx variables
- added synonims for atomic/osc modules

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-26 18:25:21 +03:00
Jeff Squyres
fe0852bcb4 Miscellaneous compiler warning stomps.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-24 07:39:14 -07:00
Nathan Hjelm
feb0e90301
Merge pull request #5589 from hjelmn/threads_cleanup
config: remove OPAL_ENABLE_MULTI_THREADS config macro
2018-08-23 15:43:13 -06:00
Nathan Hjelm
d0cd80e902 osc/rdma: clean out stale aggregation code
The aggregation code in osc/rdma is currently broken and will likely
not be reused. This commit cleans it out.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-23 15:40:21 -06:00
Aravind Gopalakrishnan
5cbcae79d8 MTL OFI: Ask for FI_THREAD_DOMAIN support when not using MPI_THREAD_MULTIPLE
When an application is not using multiple threads to call into MPI, we can
safely ask for FI_THREAD_DOMAIN setting from the provider as it should
translate to the least amount of locking in provider.

Conversely, for applications using THREAD_MULTIPLE, explicitly ask for
FI_THREAD_SAFE to prevent race conditions.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-08-23 14:18:32 -07:00
Nathan Hjelm
1c84f48640 config: remove OPAL_ENABLE_MULTI_THREADS config macro
We long ago hard-coded this value to 1. This commit cleans it out
entirely.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-23 13:47:02 -06:00
Ralph Castain
f7655280cb
Merge pull request #5503 from aravindksg/aravindksg/fix_ofi_race
MTL OFI: Fix race condition due to global progress entries array
2018-08-22 14:31:38 -07:00
Nathan Hjelm
29320872b3 osc/rdma: quiet warning
gcc complains about ret possibly being used uninitialized. That will
never happen but we should still quiet the warning. This commit sets
ret to a valid value.

Fixes #5513

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-21 15:54:53 -06:00
Nathan Hjelm
438c40de03 osc/pt2pt: use c99 for module initialization
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-21 11:23:33 -06:00
Sergey Oblomov
e00f7a68ba MCA/COMMON/UCX: added synonim to opal_mem_hook variable
- added synonim to opal_mem_hook variable to allow
  to print it in opal_info -a

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-08-21 15:05:12 +03:00
Edgar Gabriel
e6a344ba63
Merge pull request #5561 from edgargabriel/pr/file_open_sharedfp_ordering
common/ompio: fix an ordering problem during file_open
2018-08-20 10:18:14 -05:00
Edgar Gabriel
eaabfdd028
Merge pull request #5539 from DDNStorage/ime-support
ompio: support for DDN's Infinite Memory Engine
2018-08-20 09:52:22 -05:00
Edgar Gabriel
2742273ee3 common/ompio: fix an ordering problem during file_open
the sharedfp component has to be selected and opened before
we set the default file view during file_open. Otherwise
there is a sperious error message from the sharefp_file_seek
operation that is called during the file_set_view.

Fixes Issue #5560

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-08-20 09:28:29 -05:00
Jeff Squyres
8a0b5454ae fortran/use TKR: remove excess declaration for PMPI_Type_extent
This declaration was accidentally left behind in 89da9651bb.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-16 10:31:41 -07:00
Gaëtan Bossu
ccc96efc2e DDN's Infinite Memory Engine support for OMPIO
Changes made:
 - Create a new fs component for IME
 - Create a new fbtl component for IME
 - Modify the close function of OMPIO to finalize IME if necessary

Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>
Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>
2018-08-16 11:45:47 +02:00
Aravind Gopalakrishnan
ed2343034d MTL OFI: Fix race condition due to global progress entries array
Since progress entries array is globally allocated, it is susceptible
to race conditions when using multi-threaded applications. Allocating it
on the stack resolves any potential races as it is thread local by default.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-08-09 10:52:28 -07:00
Jeff Squyres
89773c41a2 Fix script abstraction break: mv make_manpage.pl to config
Having the "make_manpage.pl" script in the ompi/ tree broke
"./autogen.pl --no-ompi" (specifically: "make distcheck" of --no-ompi
builds would break).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-08 08:50:55 -07:00
Todd Kordenbrock
e9f378e851
Merge pull request #5500 from tkordenbrock/topic/master/fix.PtlMEUnlink.in.use
coll-portals4: retry PtlMEUnlink() if PTL_IN_USE
2018-08-07 11:21:00 -05:00
Nathan Hjelm
c294bbc352
Merge pull request #5508 from hjelmn/fuzzy_match
Bring fuzzy matching support into master
2018-08-06 13:52:04 -06:00
Nathan Hjelm
eeae3f9b93
Merge pull request #5517 from bosilca/topic/treematch_warnings
Remove few warnings identified by @rhc in #5514.
2018-08-06 13:25:07 -06:00
Matthew Dosanjh
c8d13486cc Fixed promotion bug
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-06 12:56:36 -06:00
Boris Karasev
57683366ca pmix: added check for pmix fence status
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-08-06 15:01:57 +06:00
George Bosilca
6d11a45f44
Remove few warnings identified by @rhc in #5514.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-08-03 16:21:06 -04:00
George Bosilca
a5fbfa476a
Be conservative with the array_of_indices
We were assuming that the array_of_indices has the same size as the
number of requests (incount), instead of the numberr of actually
active requests. While the patch is trivial, the question of the
size of the array_of_indices should be clarified in the MPI Forum.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-08-03 14:58:13 -04:00
Nathan Hjelm
dd74c6252f pml/ob1: custom matching cleanup and configury
This commit updates the new custom matching code in pml/ob1 so it can
not be enabled with a configure option. This commit also renames the
fuzzy-matching headers to avoid potential name conflicts and removes
the use of C reserved identifiers.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-02 13:06:19 -06:00
Matthew Dosanjh
572694b621 Adding custom match source.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-02 12:23:08 -06:00
Ralph Castain
1aef0a64aa
Merge pull request #5477 from nrspruit/ns_mtl_send_isend
MTL OFI: send/isend split into blocking/non-blocking paths
2018-07-31 13:08:37 -07:00
Ralph Castain
8744320a18
Merge pull request #5476 from nrspruit/ns_cancel_fix
MTL OFI: Fix Deadlock in fi_cancel given completion during cancel
2018-07-31 13:07:41 -07:00
Todd Kordenbrock
f3f2a826b4 coll-portals4: retry PtlMEUnlink() if PTL_IN_USE
In the cleanup phase, it is possible for PtlMEUnlink() to return
PTL_IN_USE if the NIC is not done with the ME.  This should not
be considered an error.  This commit adds a retry loop around
PtlMEUnlink().

In some cases, the return value of PtlMEUnlink() and PtlCTFree()
was not checked at all.  Check them with the same retry loop as
above.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
2018-07-31 10:20:55 -05:00
Mark Allen
f413ef6b14 apply romio314 patch to romio321
When romio314 was first pulled in an extra patch was applied to it, see commit
92f6c7c1e2. Most of that patch is already present
in vanilla romio321, but the fix for MPIO_DATATYPE_ISCOMMITTED() isn't.

If that macro doesn't set err_ then some paths end up with a variable being used
uninitialized. In particular you can trace through romio321/romio/mpi-io/read.c
to see what happens with error_code. It's an uninitialized stack variable that goes
through three MPIO_CHECK_* macros none of which set it. The macros consistently set
error_code to a failure if they see something wrong, but they don't consistently
set it to success when things are fine.

And then in the last macro MPIO_CHECK_DATATYPE it tries to look at the value
of error_code that was never set.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2018-07-30 17:14:56 -04:00
Sergey Oblomov
d204b8a678 PML/SPML/UCX/COMPONENT: applied C99 initialization
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-28 09:44:03 +03:00
Mikhail Kurnosov
b45e190e66 coll/base/allgatherv: fix MPI_IN_PLACE processing
The call of MPI_Allgatherv with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-27 09:34:17 +07:00
Sergey Oblomov
2806504290 PML/SPML/UCX: init global objects using C99 style
- to avoid value mix used C99 style of object initializations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-25 14:52:45 +03:00
Spruit, Neil R
7dc8c8ba3f MTL OFI: send/isend split into blocking/non-blocking paths
-Updated blocking send to directly call functionality and
set completion events expected to 0 initally. This allows for optimization for
providers that support fi_tinject up to larger sizes. This also reduces
latency on running the OFI mtl with smaller sizes without requiring
calls to progress given fi_tinject is required to complete the messaging
before returning and will not create any events in the Completion Queue.

-Updated non-blocking send to directly call fi_tsend and avoid calling
fi_tinject as the functionality should not wait on completions. This
resolves a bug where applications calling MPI_Isend can overrun the
TX buffer with small (inject) messages causing a deadlock. In addition
this improves performance in message rates by preventing
waiting on any size message to complete in non-blocking send messages.

-Created common ompi_mtl_ofi_ssend_recv function to post the ssend recv
which is common between isend and send code paths.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-07-24 07:54:24 -07:00
Spruit, Neil R
767135c580 MTL OFI: Fix Deadlock in fi_cancel given completion during cancel
- If a message for a recv that is being cancelled gets completed after
the call to fi_cancel, then the OFI mtl will enter a deadlock state
waiting for ofi_req->super.ompi_req->req_status._cancelled which will
never happen since the recv was successfully finished.

- To resolve this issue, the OFI mtl now checks ofi_req->req_started
to see if the request has been started within the loop waiting for the
event to be cancelled. If the request is being completed, then the loop
is broken and fi_cancel exits setting
ofi_req->super.ompi_req->req_status._cancelled = false;

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-07-24 03:12:44 -07:00
Matias Cabral
d996f529c0 MTL OFI: Add support for mem_tag_format
OFI providers may reserve some of the upper bits of the tag for
internal usage and expose it using mem_tag_format. Check for that
and adjust communicator bits as needed.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2018-07-23 11:39:40 -07:00
Matias Cabral
30fb635836
Merge pull request #5446 from nrspruit/ns_mtl_ofi_overflow
MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow
2018-07-20 14:53:53 -07:00
Sergey Oblomov
6fe0a73861 PML/UCX: fixed ucp request free on persistent request completion
- in sine cases persistent request was deleted during completion
  callback, this cause double free of linked UCX request (assert
  in debug build or hang in release build)
- UCX request is freed prior completion calback

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-20 19:32:20 +03:00
Yossi Itigin
bdb6ece3dd
Merge pull request #5452 from hoopoepg/topic/osc-ucx-fox-hang
OSC/UCX: fixed hang on OSC init
2018-07-19 13:57:51 +03:00
Sergey Oblomov
fa33e322e7 OSC/UCX: code deduplication
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-19 12:39:15 +03:00
Sergey Oblomov
6f0a7a2005 OSC/UCX: opal progress register/unregister optimization
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-19 12:07:26 +03:00
Yossi Itigin
29812494f2
Merge pull request #5402 from hoopoepg/topic/common-del-procs
MCA/COMMON/UCX: del_procs calls are unified to common module
2018-07-19 11:19:45 +03:00
Sergey Oblomov
55b934bacf OSC/UCX: enable progress when at least one window is allocated
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-18 17:52:30 +03:00
Sergey Oblomov
a081fba046 OSC/UCX: fixed hang on OSC init
- there worked progress was missed on startup which caused hang
  on one of ranks

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-18 17:01:53 +03:00
Edgar Gabriel
b6b9552ca9
Merge pull request #5444 from gbossu/fix-file-delete
io/ompio: Call component-specific file_delete function instead of POSIX unlink
2018-07-18 08:45:57 -05:00
Sergey Oblomov
920cc2e0d9 MCA/COMMON/UCX: del_procs calls are unified to common module
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-18 07:37:25 +03:00
Mikhail Kurnosov
540c2d1617 coll-base-allgather: fix MPI_IN_PLACE processing
The call of MPI_Allgather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-18 10:27:00 +07:00
Gilles Gouaillardet
fed1e7766e
Merge pull request #5430 from ggouaillardet/pr/pcollreq-fort
mpiext/pcollreq: add Fortran bindings
2018-07-18 09:52:59 +09:00
Joshua Ladd
3add13c72e
Merge pull request #5441 from hoopoepg/topic/ucx-memhooks-to-common-module
MCA/COMMON/UCX: shift opal memhooks into common UCX
2018-07-17 15:52:44 -04:00
Matias Cabral
be3cb01cb4
Merge pull request #5397 from nrspruit/ns_ofi_mtl_ssend
MTL OFI: Redesign sync send with reduced tag bits and quick ack
2018-07-17 10:14:33 -07:00
Gaëtan Bossu
8522ba112c MCA/IO/OMPIO: fix MPI_File_delete implementation.
OMPIO now uses the correct delete function depending on the fs

mca_common_ompio_file_delete now works this way instead
of calling POSIX unlink:
 - create a minimal file handle with the given file name
 - select the best fs component using this file handle
 - call the component-specific file delete function

Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>
2018-07-17 18:17:13 +02:00
Gaëtan Bossu
ac6f75e3d1 MCA/FS: check communicator validity in query functions
It is needed because the fs components might be queried due to a MPI_File_delete call.
And in this case, we don't have a communicator value.

Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>
2018-07-17 18:16:21 +02:00
Josh Hursey
9aa5168795
Merge pull request #5353 from ggouaillardet/topic/romio321_grequests
io/romio321: make grequest extensions internal
2018-07-17 10:53:53 -05:00
Gilles Gouaillardet
1a41482720 coll/libnbc: do not recursively call opal_progress()
instead of invoking ompi_request_test_all(), that will end up
calling opal_progress() recursively, manually check the status
of the requests.

the same method is used in ompi_comm_request_progress()

Refs open-mpi/ompi#3901

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 09:45:08 -06:00
Sergey Oblomov
1c7ae22dfb MCA/COMMON/UCX: shift opal memhooks into common UCX
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-17 13:46:38 +03:00
Spruit, Neil R
d4f408a7f8 MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow
- Added support in MTL_OFI_RETRY_UNTIL_DONE to handle -FI_EAGAIN
  from the provider and correctly attempt to progress the OFI Completion
  queue by calling ompi_mtl_ofi_progress.

- If events were pending that blocked OFI operations from being enqueued
  they will be completed and the OFI operation will be retried once
  ompi_mtl_ofi_progress has successfully completed.

- Updated MTL_OFI_RETRY_UNTIL_DONE to take a RETURN variable instead of
  requiring the existance of a "ret" variable to pass back the return
  value from completing the OFI operation.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-07-17 03:00:38 -07:00
Gilles Gouaillardet
47351b7fac mpiext/pcollreq: Add Fortran use-mpi-f08 bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 16:29:41 +09:00
Kurita, Takehiro
73e038ec18 mpiext/pcollreq: Add Fortran use-mpi bindings
Signed-off-by: Kurita, Takehiro <fj6370fp@aa.jp.fujitsu.com>
2018-07-17 16:29:41 +09:00
Gilles Gouaillardet
9e0115c980 mpiext/pcollreq: Add Fortran mpif-h bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 16:29:33 +09:00
Gilles Gouaillardet
44110a575d mpiext/pcollreq: do include PMPIX_* subroutines to C bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 16:29:33 +09:00
KAWASHIMA Takahiro
5ddf0f6418 mpi/fortran: Fix IN_PLACE detection of ISCATTER(V)
Blocking `MPI_SCATTER` and `MPI_SCATTERV` were fixed in 506d0e96f4
but noblocking `MPI_ISCATTER` and `MPI_ISCATTERV` were not fixed yet.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-17 14:15:21 +09:00
Mikhail Kurnosov
ba83cc91eb coll/base: add MPI_Bcast based on a binomial tree scatter followed by a ring allgather
Implements MPI_Bcast using a binomial tree scatter followed by a ring allgather.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-16 08:56:09 -06:00
Gilles Gouaillardet
61b3308871 mpiext/pcollreq: check subroutine parameters and add profiling symbols
- check subroutine parameters
 - implement PMPIX_* subroutines

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-14 14:14:37 +09:00
Gilles Gouaillardet
dec1663364 spc: add missing subroutines
add counters for :
 - MPI_Exscan
 - MPI_Iexscan
 - MPI_Igatherv

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-14 14:14:37 +09:00
Howard Pritchard
9a5fd48388
Merge pull request #5079 from jsquyres/pr/fortran-is-the-devil
status_set_cancelled: fix F08 binding
2018-07-13 15:36:02 -05:00
Joshua Ladd
b12868239c
Merge pull request #4765 from xinzhao3/topic/osc-ucx-mem-hook
OMPI/OSC/UCX: move memory hooks init in osc to win creation.
2018-07-13 09:36:20 -04:00
Xin Zhao
74ef51af1b OMPI/OSC/UCX: move memory hooks init in osc to win creation.
Move memory hooks init (for request based operation) in osc ucx to window
creation time, to avoid performance issue in MPI initialization.

Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-07-12 15:03:02 -07:00
Nathan Hjelm
304a6a52d4 osc/rdma: use local base for local process when possible
This commit fixes a crash that occurs when using btl/vader as an RDMA
btl. This btl supports using CPU atomics and does not support using
the btl for self communication so we must use the local memory
optimizations in osc/rdma.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-12 15:50:50 -06:00
KAWASHIMA Takahiro
c87a3df0c9
Merge pull request #5416 from kawashima-fj/pr/coll-libnbc-suppress-warnings
coll/libnbc: Suppress compiler warnings
2018-07-12 15:45:59 +09:00
KAWASHIMA Takahiro
37a05e74aa coll/libnbc: Suppress compiler warnings
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-12 14:42:39 +09:00
KAWASHIMA Takahiro
0021616984 pml/ob1: Fix data corruption of MPI_BSEND
Data transferred by `MPI_BSEND` may corrupt if all of the following
conditions are met.

- The message size is less than the eager limit.
- The `btl_alloc` function in the BTL interface returns `NULL`
  for some reason.
- The MPI program overwrites the send buffer after `MPI_BSEND`
  returns.

The problem is in the way of pending a send request in ob1 PML.
The `mca_pml_ob1_send_request_start_copy` function retruns
`OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns
`des = NULL`. In this case, the send request is added to the
`send_pending` list and `MPI_BSEND` returns immediately. Next time
the `mca_pml_ob1_send_request_start_copy` function tries sending,
the user buffer may have been overwritten by the MPI program.

Call hierarchy of `MPI_BSEND`:

```
  MPI_Bsend
    mca_pml_ob1_send
      if (MCA_PML_BASE_SEND_BUFFERED == sendmode)
        mca_pml_ob1_isend
          MCA_PML_OB1_SEND_REQUEST_START_W_SEQ
            mca_pml_ob1_send_request_start_seq
              mca_pml_ob1_send_request_start_btl
                if (size <= eager_limit)
                  if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED)
                    mca_pml_ob1_send_request_start_copy
                      mca_bml_base_alloc
                        btl_alloc
              if (OMPI_ERR_OUT_OF_RESOURCE == rc)
                add_request_to_send_pending
        ompi_request_free
```

To solve this problem, we should save the data to the buffer
attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`.

This problem was introduced by ob1 optimization (commits 2b57f422
and a06e491c) in v1.8 series.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-12 14:30:58 +09:00
Howard Pritchard
34bc77747c
Merge pull request #5388 from mkurnosov/base-gather-bmtree-fix-mpi-in-place
coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing
2018-07-11 18:34:35 -05:00
Nathan Hjelm
35a75a6bf5 osc/sm: avoid filename collision when multiple windows share same CID
This commit fixes an issue identified by MTT where we can have two
different sets of processes on the same node creating a shared memory
window with communicators sharing the same CID. To avoid this issue
the temporary filename now includes the creating processes vpid.

References #5363

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-11 14:32:27 -06:00
Nathan Hjelm
037656bc1d osc/rdma: fix bug introduced in b90c838
This commit fixes an bug that was introduced back in 2016 which
impacts request-based RMA in some cases.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-10 18:17:55 -06:00
Gilles Gouaillardet
76292951e5 coll/libnbc: fix integer overflow
Use internal pack/unpack subroutines that operate on MPI_Aint instead of int
and hence solve some integer overflows.

Thanks Clyde Stanfield for reporting this issue.

Refs open-mpi/ompi#5383

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-09 10:08:33 -06:00
Mikhail Kurnosov
22fa5a8a67 coll/base/scatter: replaces right skewed binomial tree (in order) with left skewed binomial tree
Current implementation of `coll/base/MPI_Scatter` is based on in-order binomial tree. This tree is right skewed and it provides good performance for a MPI_Gather operation. But for a MPI_Scatter operation left skewed binomial tree is effective.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-09 10:04:41 -06:00
Spruit, Neil R
9a17864278 MTL OFI: Redesign sync send with reduced tag bits and quick ack
-Updated the design for sync send MPI calls to use 2 protocol bits for
denoting "sync_send" or "sync_send_ack".

-"Sync_send" is added to the send tag only and is masked out in receives
such that it can be read by the original Recv posted in the send/recv
operation.

-"Sync_send_ack" is sent from the recv callback to the send side. This 0
byte send does not generate a completion entry and instead sends the
message and immediately completes the opal completion in the recv.

-Tag formats ofi_tag_1 and ofi_tag_2 have been updated to include 2
more tag bits per format type due to the reduced protocal bits required
by OMPI.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-07-09 06:50:21 -07:00
Yossi Itigin
e77e31b50b
Merge pull request #5378 from hoopoepg/topic/unify-ucx-logging
MCA/COMMON/UCX: unified logging across all UCX modules
2018-07-08 12:45:26 +03:00
Mikhail Kurnosov
b9e14cd7d0 coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing
The call of MPI_Gather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault in the root process.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard (page 150, line 37), sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-07 20:59:39 +07:00
Sergey Oblomov
240670152e MCA/COMMON/UCX: code beautify - alignment
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-06 19:40:58 +03:00
Sergey Oblomov
eb7010933d OSC/UCX: suppressed compilation warnings
- suppressed sing/unsign-compare warnings

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-06 10:58:09 +03:00
Sergey Oblomov
bef47b792c MCA/COMMON/UCX: unified logging across all UCX modules
- added common logging infrastructure for all
  UCX modules
- all UCX modules are switched to new infra

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-05 16:25:39 +03:00
Sergey Oblomov
8080283b3d MCA/COMMON/UCX: changed return type for wait_request
- for now wait_request returns OMPI status
- updated callers

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 23:29:38 +03:00
Sergey Oblomov
c2bd6af9f2 MCA/COMMON/UCX: minor unification of del_proces calls
- some common functionality of del_procs calls is moved into
  mca_common module
- blocking ucp_put call is replaced by non-blocking routine

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 15:10:53 +03:00
Yossi Itigin
09c10d5e09
Merge pull request #5345 from hoopoepg/topic/pml-ucx-suppress-compiler-warning
PML/UCX: suppressed compilation warning
2018-07-02 13:41:12 +03:00
Edgar Gabriel
d191ed6b4f fs/base: move redundant code to fs/base
moving some code from fs/ufs into fs/base. The benefit of this approach is
that fs components that are fundamentally based on posix I/O (and only differ
in some non-posix functionality such as setting stripe size, or which hints
are being supported) can avoid having to replicate the same code over and
over again. First beneficiary is the lustre fs component, but more
are to follow soon.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-07-01 10:20:32 -05:00
Xin Zhao
c1ac0c00c5
Merge pull request #5185 from jjolly/fix-memcpy-size-mismatch
- Build warning: stringop-overflow in get_dynamic_win_info() at osc_ucx_comm.c
2018-06-29 19:37:53 -07:00
Jeff Squyres
f4320193e3 mpi.h.in: remove some deprecation/removed warnings
Intentionally do not mark some MPI-1 function pointer typedefs as
`__mpi_interface_removed__` because we have to use them in prototyping
some MPI-1 functions when `--enable-mpi1-compatibility` is used.

Fixes #5357.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-29 07:43:51 -07:00
Gilles Gouaillardet
7363906e4e io/romio321: make grequest extensions internal
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-29 16:41:27 +09:00
Jeff Squyres
c1ccbece2f
Merge pull request #5347 from jsquyres/pr/fix-f90-removed-interfaces
F90 removed interfaces: add missing "end interface"
2018-06-27 13:54:02 -04:00
Jeff Squyres
768b800533 F90 removed interfaces: add missing "end interface"
Thanks to @fsciortino for reporting.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-27 13:02:16 -04:00
Sergey Oblomov
074f30ba27 PML/UCX: suppressed compilation warning
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-27 12:05:07 +03:00
Yossi Itigin
aca61a6bfb
Merge pull request #5238 from hoopoepg/topic/fixed-coverity-issues-ucx-pml
UCX/PML: fixed few coverity issues
2018-06-27 11:14:06 +03:00
Nathan Hjelm
4c230683e7 osc/sm: fix a typo
This commit fixes a typo where a bcast is used instead of the intended
collective (barrier).

References #5262

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-06-26 12:53:12 -06:00
Sergey Oblomov
502d04bf12 UCX/PML/SPML: fixed few coverity issues
- fixed incorrect pointer manipulation/free
- cleaned dead code
- minor optimization on process delete routine
- fixed error handling - free pointers
- added debug output for woker flush failure

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-26 18:52:39 +03:00
Yossi Itigin
ee873f4f79
Merge pull request #5322 from hoopoepg/topic/mca-ucx-common
MCA/UCX: added common module
2018-06-26 13:54:12 +03:00
Gilles Gouaillardet
e609cf7bc3
Merge pull request #5337 from ggouaillardet/topic/generalized_requests
ompi/requests: implement generalized request extensions
2018-06-26 13:01:04 +09:00
KAWASHIMA Takahiro
a8da78eeaa
Merge pull request #4618 from ggouaillardet/topic/pcoll
Add the persistent collectives feature
2018-06-26 12:36:34 +09:00
Gilles Gouaillardet
5c394377d0 io/romio312: use Grequest extensions provided by Open MPI
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-26 10:52:18 +09:00
Gilles Gouaillardet
f72922b8b1 io/romio321: do not use removed MPI1 primitives
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-26 10:52:18 +09:00
Gilles Gouaillardet
383f23bf35 ompi/request: implement MPI Generalized request extensions
so latest ROM-IO can be used with Open MPI.

Note this first and naive implementation does not use the wait_fn callback.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-26 10:52:18 +09:00
Gilles Gouaillardet
1e5404873f io/romio321: update .gitignore
and remove two files that should have never been commited

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-26 10:52:17 +09:00
Joshua Ladd
256ad707f1
Merge pull request #5293 from yosefe/topic/osc-ucx-on-demand-progress
osc_ucx: register progress on-demand
2018-06-25 15:09:11 -04:00
Joshua Ladd
98afc838aa
Merge pull request #5294 from yosefe/topic/coll-hcoll-progress-fn
coll_hcoll: register progress callback directly without a proxy
2018-06-25 15:07:26 -04:00
Nathan Hjelm
e4989714c2 osc/rdma: fix data race on teardown
The osc/rdma module did not wait for all pending atomics to complete
before tearing down. This could lead to weird issues as the target
location may no longer be registered or allocated.

This commit also fixes an offset calculation issue in
ompi_osc_get_data_blocking ().

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-06-25 11:47:34 -06:00
Nathan Hjelm
c9e58cedc1 mpi.h: fix warning with gcc
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-06-25 11:45:36 -06:00
Ralph Castain
3b2390e5d5 Silence coverity warnings, remove/ignore build product
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-06-25 08:01:28 -07:00
Sergey Oblomov
bf7fd480e9 MCA/COMMON/UCX: added non-blocking implementations of atomics
- added implementation of swap/cswap/fadd operations
- blocking add64 is replaced by non-blocking routine

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-25 12:25:31 +03:00
Sergey Oblomov
63e7ba6843 MCA/COMMON/UCX: added parameter for UCX/opal progress
- added parameter to set UCX/opal progresses
- minor refactoring of request wait routines

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-25 11:00:12 +03:00
Yossi Itigin
e3ee11608b coll_hcoll: register progress callback directly without a proxy
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-06-24 18:06:07 +03:00
Edgar Gabriel
edfdcb6e82
Merge pull request #5324 from edgargabriel/pr/minor-fixes
Pr/minor fixes
2018-06-22 17:20:02 -05:00
Howard Pritchard
8babaad35c
Merge pull request #4520 from ggouaillardet/refresh/romio321
io/romio321: refresh ROMIO based on latest stable MPICH 3.2.1
2018-06-22 16:58:46 -05:00
Edgar Gabriel
cf5cdad40f fcoll: make vulcan the default component
make vulcan the default component except for Lustre file systems.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-22 14:12:02 -05:00
Edgar Gabriel
fd8c5fba4e common/ompio: fix the fview based grouping options
a bug sneaked into constructing the list of aggregators
processes when using the fileview based grouping options

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-22 14:01:31 -05:00
Sergey Oblomov
d57ae62dee MCA/UCX: added common module
- implemented non-blocking routines for flush operations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-22 16:41:09 +03:00
Gilles Gouaillardet
edd02b7144 pml/ucx: silence a warning
declare 'fenced' volatile in order to silence CID 1437465

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-22 13:11:42 +09:00
Edgar Gabriel
743e0dff5a common/ompio: fix zero size fview issue
handle the situation where the user requests a non-zero amount
of data but has a zero-size fileview. My instrinct would have been
to return an error code, but according to the test that I used
it should be MPI_SUCCESS and zero bytes. It is definitely better
than segfaulting :-)

THis makes another test from the IBM testsuite pass.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-21 17:02:13 -05:00
Edgar Gabriel
7643ccfbcf sharedfp/sm and sharedfp/lockedfile: fix seek offset calculation
the seek offset calculation did not treat the offset as a multiple
of the etype provided. Fixing this makes some more ibm tests pass.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-21 14:26:36 -05:00
Mikhail Kurnosov
c500739293 coll/base: Add MPI_Bcast based on a scatter followed by an allgather
Implements MPI_Bcast using a binomial tree scatter followed by
an recursive doubling allgather.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-06-21 11:47:07 -06:00
Edgar Gabriel
fb16d40775
Merge pull request #5196 from edgargabriel/topic/cuda
io/ompio: introduce initial support for cuda buffers in ompio
2018-06-21 10:14:43 -05:00
Edgar Gabriel
7808379a47 common/ompio: incorporate George's comments
incorporate a couple of comments by George as part of the
review on github.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-21 09:29:49 -05:00
Edgar Gabriel
3c10ed4ed1 common/ompio: use allocator to manage temporary buffers
use an allocator to manage temporary buffers when copying
unmanaged data from GPU buffer to host. This is necessary,
since the buffers have to be pinned for better performance,
which is an expensive operation.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-21 09:25:50 -05:00
Edgar Gabriel
ac79e576ef fcoll/base: do not use the two_phase compoment with CUDA support
the two_phase compoment does not work with some collective I/O
operations on CUDA buffers due to the data sieving (i.e.
both read and write operations) executed on some buffers, which are
not anticipated in the GPU buffer management of the code.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-21 09:25:50 -05:00
Edgar Gabriel
6a532101aa io/ompio and common/ompio: add initial support for cuda buffers in ompio
this commit adds the initial support for cuda buffers in ompio, for blocking
and non-blocking individual read and write operations.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-21 09:25:50 -05:00
Yossi Itigin
db26c08336
Merge pull request #5307 from hoopoepg/topic/async-progress-on-mpi-fin
PML/UCX: fixed hang on MPI_Finalize
2018-06-21 13:44:14 +03:00
Sergey Oblomov
5f03628560 PML/UCX: removed uneeded flush
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-21 12:40:46 +03:00
Sergey Oblomov
2745da7dcc PML/UCX: use non-blocking fence instead of async progress
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-21 09:46:03 +03:00
Edgar Gabriel
7bbeaf30ff
Merge pull request #5306 from edgargabriel/pr/minor-improvements
Pr/minor improvements
2018-06-20 08:43:41 -05:00
Sergey Oblomov
10f2d831ec PML/UCX: fixed hang on MPI_Finalize
- added async UCX progress thread to allow
  pending requests to complete

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-20 16:12:05 +03:00
Edgar Gabriel
0757cb11a8 fcoll/all components: minor updates
two minor updates:
 - in all components: use the fh->f_bytes_per_agg value
   (which might have been set by an info object) instead
   of re-reading the mca parameter
 - vulcan and dynamic_gen2: replace one allgather operation
   by an allreduce, since it is used to determine the sum
   of an array.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-20 07:47:29 -05:00
Gilles Gouaillardet
9d7f0e1c95 Replace MPI_Type_extent with MPI_Type_get_extent in ROMIO.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>

(back-ported from commit open-mpi/ompi@34ec0bd8ab)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-20 14:28:17 +09:00
Gilles Gouaillardet
11428e400a Replace MPI_Address with MPI_Get_address in ROMIO.
Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>

(back-ported from commit open-mpi/ompi@756cc67221)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-20 14:28:16 +09:00
Gilles Gouaillardet
ad8c49053d io/romio321: fix two more MPI-3 compliance issues
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>

(back-ported from commit open-mpi/ompi@ae17908f35)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-20 14:28:16 +09:00
Gilles Gouaillardet
e5460dcb4a io/romio: do not use removed functions
This commit attempts to update the romio io component to not use
functions removed in MPI-3.0 (2012). This is a first cut and will
probably need to be reviewed for correctness.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>

(back-ported from commit open-mpi/ompi@84765001aa)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-20 14:28:16 +09:00
Gilles Gouaillardet
c29301da95 io/romio321: fix minmax datatypes
romio assumes that all predefined datatypes are contiguous. Because of
the (terribly named) composed datatypes MPI_SHORT_INT, MPI_DOUBLE_INT,
MPI_LONG_INT, etc this is an incorrect assumption. The simplest way to
fix this is to override the MPI_Type_get_envelope and
MPI_Type_get_contents calls with calls that will work on these
datatypes. Note that not all calls to these MPI functions are
replaced, only the ones used when flattening a non-contiguous
datatype.

References #5009

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>

(back-ported from commit open-mpi/ompi@4d876ec6fe)

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-20 14:28:16 +09:00
Gilles Gouaillardet
4355a67740 ROMIO 3.2.1 refresh: add refresh notes
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-20 14:28:15 +09:00
Gilles Gouaillardet
bf23e843df ROMIO 3.2.1 refresh: remove old romio
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-20 14:28:15 +09:00
Gilles Gouaillardet
2f0db1945c ROMIO 3.2.1 refresh: patch mpich romio for OMPI
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-20 14:28:14 +09:00
Gilles Gouaillardet
2f391a99a7 ROMIO 3.2.1 refresh: import romio from mpich 3.2.1 tarball
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-20 14:28:14 +09:00
Gilles Gouaillardet
4272b57089 ROMIO 3.2.1 refresh: prepare new romio directory ompi/mca/io/romio321
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-20 14:28:13 +09:00
Edgar Gabriel
df4431bd48 io/ompio: add support for some info objects
add support for the info objects cb_buffer_size and collective_buffering.
Also, introduce a new mca parameter that allows to give feedback
on whether an info object is recognized (and honored).

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-19 19:34:36 -05:00
Mikhail Kurnosov
66bc86a25b Change the tree_next to a flexible array member
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-06-19 13:01:26 -06:00
Mikhail Kurnosov
6547b58316 coll/base: add knomial tree algorithm for MPI_Bcast
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-06-19 13:01:26 -06:00
Edgar Gabriel
c3ac06dc1b sharedfp/sm and lockedfile: fix coverty warnings
this commit fixes the coverty warnings CID 1437402 and
CID 1437401

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-19 10:04:51 -05:00
Yossi Itigin
c2fbf3a3e8 osc_ucx: register progress on-demand
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-06-19 12:47:08 +03:00
Edgar Gabriel
9986a15b57 sharedfp/individual: only complain about fseek if sharedfp operations are really in use
this component can only be used in very specific scenarios. However, since some file systems do not support file locking and processes might be distributed over multiple nodes (hence the sm sharedfp component is also inelligible), the component might be selected in some scenarios, even if an application does not intend to use shared file pointers.

Since the fseek_shared function is involved as part of the File_set_view operation, only complain about the inability to perform the seek_shared operation if actual shared file pointer operations are being used. This avoid spurious error values being returned.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-18 18:25:29 -05:00
Edgar Gabriel
bb1522472f
Merge pull request #5286 from edgargabriel/topic/sharedfp-revamp
sharedfp/all components: revamp internal operations
2018-06-18 16:09:54 -05:00
Edgar Gabriel
bc0f60dfd9 sharedfp/all components: revamp internal operations
this commit revamps the internal operations of the sharedfp components.
Specifically, it is focused around removing the second file_open
operation for shared file pointers. This makes the code more efficient.
Because of that, there is no necessity anymore for the sharedfp_lazy_open
mca parameter.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-18 14:34:05 -05:00
Yossi Itigin
733cac864a
Merge pull request #5282 from yosefe/topic/pml-ucx-opal-mem-hooks
pml_ucx: add option to use opal memhooks instead of ucx internal hooks
2018-06-18 19:07:01 +03:00
Gilles Gouaillardet
3f874c9857 spc: remove ompi_spc_get_count() prototype from ompi_spc.h
This function is only used in ompi_spc.c and is hence declared as static.
Remove its prototype from the header file in order to silence compiler warnings who will typically consider ompi_spc_get_count() as a declared but not defined function.

Fixes open-mpi/ompi#5279
Fixes open-mpi/ompi#5273

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-18 16:07:11 +09:00
Yossi Itigin
564f80d362 pml_ucx: add option to use opal memhooks instead of ucx internal hooks
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-06-17 15:30:44 +03:00
Gilles Gouaillardet
2caf1bf0e5
Merge pull request #5263 from ggouaillardet/topic/ompio_abstraction
ompio: fix abstraction
2018-06-16 23:29:29 +09:00
Matias A Cabral
e6674556aa MTL OFI: add support for FI_REMOTE_CQ_DATA.
Extend number of supported ranks with providers that support
FI_REMOTE_CQ_DATA. Add README file to OFI MTL
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2018-06-14 17:17:38 -07:00
Edgar Gabriel
d5bdcf8595 fs/pvfs2: fix compilation problem
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-14 09:30:45 -05:00
Howard Pritchard
7dcab6e4a4
Merge pull request #5269 from hppritcha/topic/squash_gcc7.3.0_warnings
topo/treematch - quash compiler warning
2018-06-13 21:13:04 -05:00
Gilles Gouaillardet
cd45c7abb6 ompio: misc renames
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-14 09:41:10 +09:00
Gilles Gouaillardet
36b35ae0db ompio: fix abstraction
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-14 09:41:10 +09:00
Howard Pritchard
64de269cc3 topo/treematch - quash compiler warning
quash a compiler warning showing up with gcc 7.3

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-06-13 16:34:17 -05:00
Thananon Patinyasakdikul
390d72addd
Merge pull request #4885 from davideberius/spc_pr
Initial Software-based Performance Counters PR
2018-06-12 14:04:49 -07:00
David Eberius
d377a6b6f4 Added Software-based Performance Counters driver code along with several counters.
This code is the implementation of Software-base Performance Counters as described in the paper 'Using Software-Base Performance Counters to Expose Low-Level Open MPI Performance Information' in EuroMPI/USA '17 (http://icl.cs.utk.edu/news_pub/submissions/software-performance-counters.pdf).  More practical usage information can be found here: https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI.

All software events functions are put in macros that become no-ops when SOFTWARE_EVENTS_ENABLE is not defined.  The internal timer units have been changed to cycles to avoid division operations which was a large source of overhead as discussed in the paper.  Added a --with-spc configure option to enable SPCs in the Open MPI build.  This defines SOFTWARE_EVENTS_ENABLE.  Added an MCA parameter, mpi_spc_enable, for turning on specific counters.  Added an MCA parameter, mpi_spc_dump_enabled, for turning on and off dumping SPC counters in MPI_Finalize.  Added an SPC test and example.

Signed-off-by: David Eberius <deberius@vols.utk.edu>
2018-06-11 22:48:16 -04:00
KAWASHIMA Takahiro
a38e9e064f coll: Update COLL module interface version to 2.3.0
Members for persistent operations are added to the module structure
in a prior commit.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
e12a5056f1 coll/libnbc: Rename internal functions
The `nbc_i*` functions don't start communication, but create a request.
`nbc_*_init` are appropriate names for them.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
5c21903477 coll/libnbc: Add assertion for NBC_A2A_DISS
Persistent operation for `NBC_A2A_DISS` is not supported currently.
Though the algorithm is not selected at all currently, I put an
assertion not to select it by mistake.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
0b8b0f8393 coll/libnbc: Implement MPI_STARTALL
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
ed0144bad4 coll/libnbc: Adapt local copy for persistent request
`NBC_Copy` shoud not be called in `MPI_*_INIT`.
`NBC_Sched_copy` should be called instead.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
5c5de3a4fb coll/libnbc: Fix handling of completed request
Because a persistent reuqest does not free its `schedule` object
when the communication completes, the `NBC_Progress` function cannot
determine the completion using `schedule`.

Without this change, a hang occurs when the `NBC_Progress` function
is called recursively through the `NBC_Start_round` function.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
8e5690bf5c coll/libnbc: Correct persistent request handling
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
e72f510daf ompi/request: Add ompi_request_persistent_noop_create
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
Gilles Gouaillardet
9a63dacf1c mpi: check MPI_Start[all] is invoked on persistent requests
and errors otherwise.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
545e9af896 mpiext/pcollreq: Add MPIX_Bcast_init etc.
Until the MPI Forum decides to add the persistent collective
communication request feature to the MPI Standard, these functions
are supported through MPI extensions with the `MPIX_` prefix.

Only C bindings are supported currently.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro
e69e99575e coll: Enable func check in mca_coll_base_comm_select
Now libnbc COLL supports persistent collectives and all `*_init`
functions of the COLL interface are available. So let's enable the
check of availability of those functions on a communicator creation.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 09:53:37 +09:00
Gilles Gouaillardet
a9609b6bf8 coll/libnbc: add persistent collectives implementation
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-11 09:53:37 +09:00
KAWASHIMA Takahiro
a9fdea51aa coll: Add persistent collective communication request feature
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-06-11 09:53:37 +09:00
Gilles Gouaillardet
c753e9baff coll/libnbc: code refactoring
prepare the upcoming persistent collectives by pre-factoring some code

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

fixup 808c3c62cd9475edd91ecde9d2d53b12e28b2c04
2018-06-11 09:53:37 +09:00
Gilles Gouaillardet
fe0bb6c310 coll/libnbc: misc revamp - merge NBC_Init_handle() into NBC_Schedule_request() - set schedule in NBC_Schedule_request instead of NBC_Start() - update NBC_Start() prototype
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-11 09:53:37 +09:00
Gilles Gouaillardet
360a76f440 coll/libnbc: revamp ibcast and use NBC_Schedule_request()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-06-11 09:53:37 +09:00
Jeff Squyres
84701cd2b0
Merge pull request #5204 from markalle/info_snprintf
fix info-subscribe to use snprintf() and warn on long key
2018-06-08 15:22:55 -04:00
Edgar Gabriel
2d8a769bfd fcoll/static: remove component
now that we have a shiny new fcoll component, no need
to keep the static component around. No use for it anymore.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-08 07:39:46 -05:00
Edgar Gabriel
b27a40cdf9
Merge pull request #5246 from edgargabriel/topic/ibm-testsuite-fixes
Topic/ibm testsuite fixes
2018-06-08 06:06:49 -05:00
Yossi Itigin
fd12540751
Merge pull request #5227 from hoopoepg/topic/pml-ucx-hang-on-finalize
PML/UCX: fixed hand on MPI_Finalize
2018-06-08 13:19:49 +03:00
KAWASHIMA Takahiro
317e53f83f
Merge pull request #5243 from t-kurita/pr/mpiext-mpi-f08-logical
Fortran: Enable using `LOGICAL` parameter in MPI extensions.
2018-06-08 13:11:13 +09:00
Edgar Gabriel
a1484ec69a io/ompio: check error conditions before executing file_sync
check for pending I/O operations and invalid modes
and return proper error codes before executing MPI_File_sync

makes the e_sync_1 test from the ibm testsuite pass.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 19:30:27 -05:00
Edgar Gabriel
5f1e88d265 mpi/c: check for valid datatype in file_get_type_extend
the interface if file_get_type_extent did not check
whether the input datatype is valid or not.

Makes the e_get_type_extend_2 test from the ibm testsuite pass.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 19:30:27 -05:00
Edgar Gabriel
14bd114973 common/ompio: return error code from file_delete operation in file_close
in case the user opened a file using the DELETE_ON_CLOSE flag,
return the error code generated in the delete operation.

Note, that this is however just a partial fix to the e_close_1 test
from the ibm testsuite, since the object destructor that triggers
the file_close function does not have a mechanism right now to recognize
and return an error code.

Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>
2018-06-07 19:30:14 -05:00
Edgar Gabriel
f7cae7731c io/ompio: return error code for invalid offset
in file_get_byte_offset, return an error code if the offset
leads to an invalid position in file.

Makes the e_get_byte_offset_1 test from the ibm testsuite pass.

Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>
2018-06-07 18:46:17 -05:00
Edgar Gabriel
deaeaa60de fcoll/vulcan: minor bugfix
when creating the groups_per_proc arrays

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 17:52:32 -05:00
Edgar Gabriel
8feb497dbe io/ompio: cleanup the aggregator selection logic
and some internal structure elements/components. Along the way,
add support for the cb_nodes Info object.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 16:47:10 -05:00
Edgar Gabriel
529d882ff0 io/ompio and common/ompio: relocate ompio_request code to common
since the request code is now being accessed also from the vulcan fcoll
component, the request code was relocated into the common/ompio
directory to avoid ld load problems.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-06-07 16:13:12 -05:00
raafatfeki
5ecb4a56e3 fcoll/vulcan: Support of asynchronous write in collective writeAll
We introduced a new mca_vulcan parameter that specify the I/O synchronization
type (Async/sync I/O) applied within the collective write operation.
The user can explicitly choose to use async or sync write operation or make
the choice automatically made.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2018-06-07 16:13:12 -05:00
raafatfeki
4f7172ddf6 fcoll/vulcan: Support of larger offsets
For very large offsets, the data chunk size to be written by each aggregator
exceeds the capacity of an integer variable. Besides, some variables were
not large enough to hold intermediate values.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2018-06-07 16:13:12 -05:00