1
1
Граф коммитов

10221 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
600967d2ed mpi.h.in: remove C99-style comments
While we require C99 to build Open MPI, we do not require C99 to build
user MPI applications.  As such, we shouldn't have C99-style comments
(i.e., "//"-style) in mpi.h.in.

Thanks to @AdamSimpson for reporting the issue.

This commit simply converts a //-style comment to a /**/-style
comment.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f4b3ccabf7)
2018-10-11 11:54:30 -04:00
Yossi Itigin
eabc94cab0 osc_ucx: add worker flush before osc module free
Make sure all pending communications are done on all ranks before
closing the window. This way it will be safe to close the endpoints when
closing the component.

(picked from master b8e1af6)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 23:02:19 +03:00
Yossi Itigin
4a97d6b9fa pml_ucx: fix return code from mca_pml_ucx_init()
(picked from master 40ac9e4)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:23:49 +03:00
Yossi Itigin
1bffd196ef pml_ucx: add ompi datatype attribute to release ucp_datatype
(picked from master 4763822)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:23:26 +03:00
Sergey Oblomov
274cbc3c03 OSC/UCX: fixed zero-size window processing
- added processing of zero-size MPI window

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit ae6f81983f)
2018-10-10 16:49:02 +03:00
Howard Pritchard
d18ea98263
Merge pull request #5843 from kawashima-fj/pr/v4.0.x/correct-f08-signatures
v4.0.x: fortran/use-mpi-f08: Correct f08 routine signatures
2018-10-09 10:22:07 -05:00
KAWASHIMA Takahiro
dd1b3eac1e java: Fix javadoc build failure with OpenJDK 11
OpenJDK 11 changed the default javadoc output HTML version to HTML 5
from HTML 4.01. It causes an error on building Open MPI configured
with `--enable-mpi-java` (default: disable). This fix is compatible
with older OpenJDK.

I don't know whether this problem exists with other vender's JDKs.
But this fix should be compatible with other JDKs because the new
syntax is used in other places in the same file.

Thanks to Siegmar Gross for the bug report.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit b491b454dc)
2018-10-09 21:48:10 +09:00
Geoff Paulsen
499ddedd7c
Merge pull request #5844 from kawashima-fj/pr/v4.0.x/pcollreq-f08-signatures
v4.0.x: mpiext/pcollreq: Correct f08 routine signatures
2018-10-05 13:42:35 -05:00
KAWASHIMA Takahiro
4dd21111f0 mpiext/pcollreq: Add Fortran bindings in man
Fortran bindings were added to persistent collectives in 9e0115c980
but man was not updated.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 43d85dbc81)
2018-10-05 09:43:39 +09:00
KAWASHIMA Takahiro
092cf1937d man: Correct markup of MPI_Neighbor_allgather
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 994b345253)
2018-10-05 09:43:39 +09:00
KAWASHIMA Takahiro
080c52f906 mpiext/pcollreq: Add missing f08 asynchronous
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit be91a26fd8)
2018-10-05 09:33:17 +09:00
KAWASHIMA Takahiro
fcc698f27f mpiext/pcollreq: Correct f08 routine signatures
Changes of nonblocking collectives in e98d794e8b and f750c6932c
are applied to persistent collectives.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 357531847e)
2018-10-05 09:33:16 +09:00
KAWASHIMA Takahiro
b9316d3136 fortran/use-mpi-f08: Correct f08 routine signatures
Following the commit f750c6932c, I compared
`ompi/mpi/fortran/use-mpi-f08/*.F90` and
`ompi/mpi/fortran/use-mpi-f08/profile/p*.F90`, and
`ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces.F90` and
`ompi/mpi/fortran/use-mpi-f08/mod/pmpi-f08-interfaces.F90`.

There are many differences. Some are bugs of `MPI_*`, some are
bugs of `PMPI_*`. I'm not sure how these bugs affect applications.

To make it easy to compare these files future, I also removed
editorial differences.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit cf6d28cb66)
2018-10-05 09:04:17 +09:00
Geoff Paulsen
c0796664b1
Merge pull request #5780 from jsquyres/pr/v4.0.x/moar-fortran-fixes
v4.0.x: Fortran 08 bindings fixes
2018-10-04 16:08:30 -05:00
Geoff Paulsen
5cae0ec25b
Merge pull request #5794 from bwbarrett/v4.0.x-ofi-mtl-selection
mtl ofi: Change from opt-in to opt-out provider selection
2018-10-03 08:31:07 -05:00
Jeff Squyres
46dd266e45 mpi.h: remove MPI_UB/MPI_LB when not enabling MPI-1 compat
When --enable-mpi1-compatibility was specified, the ompi_mpi_ub/lb
symbols were #if'ed out of mpi.h.  But the #defines for MPI_UB/LB
still remained.  This commit also #if's out the MPI_UB/LB macros when
--enable-mpi1-compatibility is specified.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 7223334d4d)
2018-09-28 10:01:48 -07:00
Brian Barrett
10d0a430c4 mtl ofi: Change from opt-in to opt-out provider selection
Change default provider selection logic for the OFI MTL.  The
old logic was whitelist-only, so any new HPC NIC provider would
have to ask users to do extra work or wait for an OMPI release
to be whitelisted.  The reason for the logic was to avoid
selecting a "generic" provider like sockets or shm that would
frequently have worse performance than the optimized BTL options
Open MPI supports.

With the change, we blacklist the (small, relatively static) list
of providers that duplicate internal capabilities.  Users can use
one of thse blacklisted providers in two ways: first, they can
explicitly request the provider in the include list (which will
override the default exclude list) and second, the can set a new
empty exclude list.

Since most HPC networks require special libraries and therefore
an explicit build of libfabric, it is highly unlikely that this
change will cause users to use libfabric when they didn't want to
do so.  It does, however, solve the whitelisting problem.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit c5eaa38491)
2018-09-27 18:41:47 +00:00
Gilles Gouaillardet
ce5959ba6c fortran/use-mpi-f08: Corrections to PMPI signatures of collectives
Corrected the signatures of the collectives used by the Fortran 2008
interface to state correct intent for inout arguments and use the
ASYNCHRONOUS attribute in non-blocking collective calls.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit f750c6932c)
2018-09-26 12:34:46 -07:00
Philipp Otte
e98eae3da6 fortran/use-mpi-f08: Corrections to Fortran08 signatures of collectives
Corrected the signatures of the collectives used by the Fortran 2008
interface to state correct intent for inout arguments and use the
ASYNCHRONOUS attribute in non-blocking collective calls. Also corrected
the C-bindings in Fortran accordingly

Signed-off-by: Philipp Otte <philipp.j.otte@googlemail.com>
(cherry picked from commit e98d794e8b)
2018-09-26 12:34:46 -07:00
Geoff Paulsen
9d9ae9286c
Merge pull request #5753 from gpaulsen/man-page-script-abstraction-break
Fix script abstraction break: mv make_manpage.pl to config
2018-09-23 09:01:19 -05:00
Jeff Squyres
c83b30755a Fix script abstraction break: mv make_manpage.pl to config
Having the "make_manpage.pl" script in the ompi/ tree broke
"./autogen.pl --no-ompi" (specifically: "make distcheck" of --no-ompi
builds would break).

(cherry picked from commit 89773c41)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-22 15:11:06 -05:00
Geoff Paulsen
3d4164e1e1
Merge pull request #5752 from gpaulsen/misc-warnings-fixes
Miscellaneous compiler warning stomps.
2018-09-22 15:01:53 -05:00
Geoff Paulsen
bc798b6135
Merge pull request #5755 from gpaulsen/osc_rdma_cleanup
osc/rdma: clean out stale aggregation code
2018-09-22 15:00:21 -05:00
Nathan Hjelm
72fc8acb50 osc/rdma: quiet warning
gcc complains about ret possibly being used uninitialized. That will
never happen but we should still quiet the warning. This commit sets
ret to a valid value.

Fixes #5513

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:44:56 -05:00
Nathan Hjelm
56e31f8206 osc/rdma: clean out stale aggregation code
The aggregation code in osc/rdma is currently broken and will likely
not be reused. This commit cleans it out.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:42:45 -05:00
Jeff Squyres
2e37f97a38 Miscellaneous compiler warning stomps.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit fe0852bcb4)
2018-09-21 14:35:51 -05:00
Geoff Paulsen
4688da0631
Merge pull request #5736 from hoopoepg/topic/topic/common-del-procs-v4.0
MCA/COMMON/UCX: del_procs calls are unified to common module - v4.0
2018-09-20 18:12:25 -05:00
Geoff Paulsen
1a65b0ab66
Merge pull request #5741 from ggouaillardet/topic/v4.0.x/use_mpi_f08_bindings
v4.0.x: fortran/use-mpi-f08: clean [p]ompi_FOO_f bindings
2018-09-20 18:10:11 -05:00
Gilles Gouaillardet
d0a0fe818f fortran/use-mpi-f08: use bindings from ompi_mpifh_bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@c4ce01d104)
2018-09-20 10:37:33 +09:00
Gilles Gouaillardet
afb66d222b fortran/use-mpi-f08: fix [p]ompi_FOO_f symbols handling
- do not generate bindings for pompi_FOO_f symbols
   (they are simply not used anywhere)

 - move ompi_FOO_f bindings out of mpi_f08.mod into
   ompi_mpifh_bindings.mod that is only used at build time

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@c6070fd2e0)
2018-09-20 10:37:01 +09:00
Gilles Gouaillardet
03d994c9cf configury: do not define "dummy" empty targets any more.
We previously needed to have empty targets because AM couldn't handle
having an AM_CONDITIONAL was targets in the "if" statement but not in the
"else".  :-(

That now appears as an old automake bug that has been fixed,
so cleanup some Makefile.am

Thanks Jeff for the pointer.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@6e04b2a66a)
2018-09-20 10:36:41 +09:00
Gilles Gouaillardet
98156b7ace use-mpi-f08: fix a typo in [P]MPI_Dist_graph_create_adjacent bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@d2393251f7)
2018-09-20 10:36:01 +09:00
Sergey Oblomov
3cace87749 MCA/COMMON/UCX: del_procs calls are unified to common module
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 920cc2e0d9)
2018-09-19 10:47:27 +03:00
George Bosilca
8d892f9917 Be conservative with the array_of_indices
We were assuming that the array_of_indices has the same size as the
number of requests (incount), instead of the numberr of actually
active requests. While the patch is trivial, the question of the
size of the array_of_indices should be clarified in the MPI Forum.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit a5fbfa476a)
2018-09-18 12:05:51 -07:00
Howard Pritchard
3a584fee53
Merge pull request #5723 from ggouaillardet/topic/v4.0.x/libnbc_error_path
coll/libnbc: fix various error paths
2018-09-18 09:29:45 -06:00
KAWASHIMA Takahiro
e83e118ae7 mpiext/pcollreq: fix more typos
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 4a0a2598f6)
2018-09-18 15:43:06 +09:00
Gilles Gouaillardet
ece18aed45 coll/libnbc: fix various error paths
The parameter passed to NBC_Return_handle() was incorrectly casted
and not dereferenced.

Thanks Yossi for the bug report.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@8b51862fb2)
2018-09-18 15:29:33 +09:00
Gilles Gouaillardet
73f531a8f2 mpiext/pcollreq: fix misc typos
Thanks Jeff for the report

Fixes open-mpi/ompi#5712

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@8dc6985a5a)
2018-09-18 12:47:04 +09:00
Geoff Paulsen
de0c595ca5
Merge pull request #5650 from matcabral/remove_psm2_shadow_env_40x
v4.0.x: MTL PSM2: Remove shadow variables from v4.0.x
2018-09-17 14:40:59 -05:00
Geoff Paulsen
17aab5ea5b
Merge pull request #5659 from ggouaillardet/topic/v4.0.x/misc_finalize_leaks
Plug misc leaks on MPI_Finalize()
2018-09-10 14:06:31 -05:00
KAWASHIMA Takahiro
6858028596 mpiext/pcollreq: Fix zero-count reduction
We need to return a persistent request.
`ompi_request_empty` is not a persistent request.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>

(cherry picked from commit 69901a5156)
2018-09-10 13:11:59 +09:00
Gilles Gouaillardet
ff8600f2e4 ompi/hook: plug a misc memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@b79b37465c)
2018-09-10 09:21:49 +09:00
Gilles Gouaillardet
4bd5c538a2 pml/ob1: plug a memory leak in mca_pml_ob1_component_fini()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@fed33c1530)
2018-09-10 09:21:12 +09:00
Gilles Gouaillardet
c767c63a3b ompi/info: plug memory leaks in ompi_mpiinfo_finalize()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@d0d399c9a9)
2018-09-10 09:18:15 +09:00
Gilles Gouaillardet
080e20fa02 mtl/psm2: fix a misc memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@316e4e38f4)
2018-09-10 09:17:54 +09:00
matcabral
8fa172e60b MTL PSM2: Remove shadow variables from v4.0.x
As agreed on #4574, where removed in past release branches
to avoid perfomance impacts in the default values for
some paramters.

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2018-09-05 18:44:40 -04:00
Howard Pritchard
7e10bc0833
Merge pull request #5607 from edgargabriel/pr/sharedfp-naming-conflict-v4.0
sharedfp/sm and lockedfile: fix naming bug
2018-09-02 16:03:14 -04:00
Geoff Paulsen
3282c61048
Merge pull request #5625 from hoopoepg/topic/optimize-blocked-calls-v4.0
PML/UCX: blocked calls optimizations - v4.0
2018-08-31 14:11:11 -05:00
Geoff Paulsen
334748753c
Merge pull request #5626 from hoopoepg/topic/opal-mem-hooks-syno-v4.0
MCA/COMMON/UCX: added synonim to opal_mem_hook variable - v4.0
2018-08-31 14:09:14 -05:00
Geoff Paulsen
51e685ff40
Merge pull request #5622 from aravindksg/ofi_race_fix_40x
MTL OFI: Fix race condition due to global progress entries array
2018-08-31 14:07:42 -05:00
Sergey Oblomov
028bcb8a73 MCA/COMMON/UCX: added synonim to opal_mem_hook variable
- added synonim to common ucx variables to allow
  to print it in opal_info -a

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e00f7a68ba)
2018-08-29 15:17:00 +03:00
Sergey Oblomov
9215eb9a3b PML/UCX: blocked calls optimizations
- refactoring of opal/UCX progress calls
- added UCX progress priority

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit b0f87f2235)
2018-08-29 14:38:22 +03:00
Aravind Gopalakrishnan
37d1a202be MTL OFI: Fix race condition due to global progress entries array
Since progress entries array is globally allocated, it is susceptible
to race conditions when using multi-threaded applications. Allocating it
on the stack resolves any potential races as it is thread local by default.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit ed2343034d)
2018-08-28 14:23:56 -07:00
Edgar Gabriel
2e3cf6fb12 io/base: fixes to file_delete selection logic
file_delete triggers underneath the hood the full component selection
logic, since we do not have a file handle, just a file name.

As part of the selection logic, we have to however initiate the
framework-open of the fs component in case of ompio, since ompio
will call the delete function of the selected fs componentn, which
is based on the file system where the file is located.

This was not handled correctly so far. The problem however only
shows up if the first I/O operatin to be executed is a file_delete,
other wise the file_open will lead to the correct opening and initialization
of the fs framework. This commit ensures that we do the right thing
even if file_delete is the first file I/O operation in the application.

Fixes issue #5611

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-08-28 08:18:59 -05:00
Edgar Gabriel
a489a6fc9d sharedfp/sm and lockedfile: fix naming bug
If an application opens a file for reading from multiple processes
using MPI_COMM_SELF (or another communicator that has distinct
process groups but the same comm-id, as can happen as the result
of comm_split), the naming chosen for the lockedfile or the mmapped
file used by the sharedfp/sm component would collide. This patch
ensures that the filename is different by integrating the process id
of rank 0 for each sub-communicator.

This fixes one aspect of the problem reported in github issue 5593

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-08-27 14:11:03 -05:00
Howard Pritchard
37440aca90
Merge pull request #5497 from markalle/apply_romio314_patch_to_v40x
v4.0.x: apply romio314 patch to romio321
2018-08-25 11:12:08 -04:00
Howard Pritchard
b926c35df0
Merge pull request #5562 from edgargabriel/pr/file_open_sharedfp_ordering_v4.0x
common/ompio: fix an ordering problem during file_open
2018-08-21 22:17:45 -04:00
Howard Pritchard
4c8852c2c8
Merge pull request #5555 from karasevb/v4.0.x_pmix_fence_status
v4.0.x/pmix: added check for pmix fence status
2018-08-21 09:28:17 -06:00
Edgar Gabriel
2da601a350 common/ompio: fix an ordering problem during file_open
the sharedfp component has to be selected and opened before
we set the default file view during file_open. Otherwise
there is a sperious error message from the sharefp_file_seek
operation that is called during the file_set_view.

Fixes Issue #5560

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-08-20 10:23:32 -05:00
Boris Karasev
8873d901e8 pmix: added check for pmix fence status
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit 57683366ca)

Conflicts:
	opal/mca/common/ucx/common_ucx.c
	opal/mca/common/ucx/common_ucx.h

Modified:
	ompi/mca/pml/ucx/pml_ucx.c
	oshmem/mca/spml/ucx/spml_ucx.c
2018-08-17 21:33:50 +06:00
Jeff Squyres
7f443a159a fortran/use TKR: remove excess declaration for PMPI_Type_extent
This declaration was accidentally left behind in 89da9651bb.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 8a0b5454ae)
2018-08-16 13:13:14 -07:00
Howard Pritchard
cdc315c1ac
Merge pull request #5523 from tkordenbrock/topic/v4.0.x/fix.PtlMEUnlink.in.use
v4.0.x: coll-portals4: retry PtlMEUnlink() if PTL_IN_USE
2018-08-13 14:19:10 -06:00
Howard Pritchard
7b6a2da71a
Merge pull request #5504 from rhc54/cmr40/ofi
MTL OFI: send/isend split into blocking/non-blocking paths
2018-08-13 14:18:05 -06:00
Todd Kordenbrock
36369f9133 coll-portals4: retry PtlMEUnlink() if PTL_IN_USE
In the cleanup phase, it is possible for PtlMEUnlink() to return
PTL_IN_USE if the NIC is not done with the ME.  This should not
be considered an error.  This commit adds a retry loop around
PtlMEUnlink().

In some cases, the return value of PtlMEUnlink() and PtlCTFree()
was not checked at all.  Check them with the same retry loop as
above.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
(cherry picked from commit f3f2a826b4)
2018-08-07 11:23:51 -05:00
Howard Pritchard
9a6f6e61f0
Merge pull request #5499 from nrspruit/ns_cancel_fix_4.0
MTL OFI: Fix Deadlock in fi_cancel given completion during cancel
2018-08-07 09:16:56 -06:00
Howard Pritchard
2386994c9d
Merge pull request #5495 from hoopoepg/topic/ucx-init-c99-v4.0
PML/SPML/UCX: init global objects using C99 style - v4.0
2018-08-04 16:03:56 -06:00
Spruit, Neil R
1fbbae1907 MTL OFI: send/isend split into blocking/non-blocking paths
-Updated blocking send to directly call functionality and
set completion events expected to 0 initally. This allows for optimization for
providers that support fi_tinject up to larger sizes. This also reduces
latency on running the OFI mtl with smaller sizes without requiring
calls to progress given fi_tinject is required to complete the messaging
before returning and will not create any events in the Completion Queue.

-Updated non-blocking send to directly call fi_tsend and avoid calling
fi_tinject as the functionality should not wait on completions. This
resolves a bug where applications calling MPI_Isend can overrun the
TX buffer with small (inject) messages causing a deadlock. In addition
this improves performance in message rates by preventing
waiting on any size message to complete in non-blocking send messages.

-Created common ompi_mtl_ofi_ssend_recv function to post the ssend recv
which is common between isend and send code paths.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit 7dc8c8ba3f)
2018-08-01 06:45:48 -07:00
Ralph Castain
7830d9971e
Merge pull request #5467 from rhc54/cmr40/ofi
MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow
2018-07-31 13:08:03 -07:00
Mark Allen
e2b6e9ee09 apply romio314 patch to romio321
When romio314 was first pulled in an extra patch was applied to it, see commit
92f6c7c1e2. Most of that patch is already present
in vanilla romio321, but the fix for MPIO_DATATYPE_ISCOMMITTED() isn't.

If that macro doesn't set err_ then some paths end up with a variable being used
uninitialized. In particular you can trace through romio321/romio/mpi-io/read.c
to see what happens with error_code. It's an uninitialized stack variable that goes
through three MPIO_CHECK_* macros none of which set it. The macros consistently set
error_code to a failure if they see something wrong, but they don't consistently
set it to success when things are fine.

And then in the last macro MPIO_CHECK_DATATYPE it tries to look at the value
of error_code that was never set.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit f413ef6b14)
2018-07-30 17:23:59 -04:00
Spruit, Neil R
9cc6bc1ea6 MTL OFI: Fix Deadlock in fi_cancel given completion during cancel
- If a message for a recv that is being cancelled gets completed after
the call to fi_cancel, then the OFI mtl will enter a deadlock state
waiting for ofi_req->super.ompi_req->req_status._cancelled which will
never happen since the recv was successfully finished.

- To resolve this issue, the OFI mtl now checks ofi_req->req_started
to see if the request has been started within the loop waiting for the
event to be cancelled. If the request is being completed, then the loop
is broken and fi_cancel exits setting
ofi_req->super.ompi_req->req_status._cancelled = false;

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit 767135c580)
2018-07-30 07:17:40 -07:00
Sergey Oblomov
b64502977a PML/SPML/UCX: init global objects using C99 style
- to avoid value mix used C99 style of object initializations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2806504290)
2018-07-28 16:47:43 +03:00
Mikhail Kurnosov
c540dfb18c coll-base-allgather: fix MPI_IN_PLACE processing
The call of MPI_Allgather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 540c2d1)
2018-07-25 08:11:28 +07:00
Spruit, Neil R
ac8d2e01f9 MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow
- Added support in MTL_OFI_RETRY_UNTIL_DONE to handle -FI_EAGAIN
  from the provider and correctly attempt to progress the OFI Completion
  queue by calling ompi_mtl_ofi_progress.

- If events were pending that blocked OFI operations from being enqueued
  they will be completed and the OFI operation will be retried once
  ompi_mtl_ofi_progress has successfully completed.

- Updated MTL_OFI_RETRY_UNTIL_DONE to take a RETURN variable instead of
  requiring the existance of a "ret" variable to pass back the return
  value from completing the OFI operation.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit d4f408a7f8)
2018-07-23 11:14:42 -07:00
Sergey Oblomov
af0e7b190e PML/UCX: fixed ucp request free on persistent request completion
- in sine cases persistent request was deleted during completion
  callback, this cause double free of linked UCX request (assert
  in debug build or hang in release build)
- UCX request is freed prior completion callback

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 6fe0a73861)
2018-07-20 22:20:14 +03:00
Sergey Oblomov
74d6ad09bc OSC/UCX: fixed hang on OSC init
- there worked progress was missed on startup which caused hang
  on one of ranks

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit a081fba046)
2018-07-19 15:23:01 +03:00
Edgar Gabriel
b6b9552ca9
Merge pull request #5444 from gbossu/fix-file-delete
io/ompio: Call component-specific file_delete function instead of POSIX unlink
2018-07-18 08:45:57 -05:00
Gilles Gouaillardet
fed1e7766e
Merge pull request #5430 from ggouaillardet/pr/pcollreq-fort
mpiext/pcollreq: add Fortran bindings
2018-07-18 09:52:59 +09:00
Joshua Ladd
3add13c72e
Merge pull request #5441 from hoopoepg/topic/ucx-memhooks-to-common-module
MCA/COMMON/UCX: shift opal memhooks into common UCX
2018-07-17 15:52:44 -04:00
Matias Cabral
be3cb01cb4
Merge pull request #5397 from nrspruit/ns_ofi_mtl_ssend
MTL OFI: Redesign sync send with reduced tag bits and quick ack
2018-07-17 10:14:33 -07:00
Gaëtan Bossu
8522ba112c MCA/IO/OMPIO: fix MPI_File_delete implementation.
OMPIO now uses the correct delete function depending on the fs

mca_common_ompio_file_delete now works this way instead
of calling POSIX unlink:
 - create a minimal file handle with the given file name
 - select the best fs component using this file handle
 - call the component-specific file delete function

Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>
2018-07-17 18:17:13 +02:00
Gaëtan Bossu
ac6f75e3d1 MCA/FS: check communicator validity in query functions
It is needed because the fs components might be queried due to a MPI_File_delete call.
And in this case, we don't have a communicator value.

Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>
2018-07-17 18:16:21 +02:00
Josh Hursey
9aa5168795
Merge pull request #5353 from ggouaillardet/topic/romio321_grequests
io/romio321: make grequest extensions internal
2018-07-17 10:53:53 -05:00
Gilles Gouaillardet
1a41482720 coll/libnbc: do not recursively call opal_progress()
instead of invoking ompi_request_test_all(), that will end up
calling opal_progress() recursively, manually check the status
of the requests.

the same method is used in ompi_comm_request_progress()

Refs open-mpi/ompi#3901

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 09:45:08 -06:00
Sergey Oblomov
1c7ae22dfb MCA/COMMON/UCX: shift opal memhooks into common UCX
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-17 13:46:38 +03:00
Gilles Gouaillardet
47351b7fac mpiext/pcollreq: Add Fortran use-mpi-f08 bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 16:29:41 +09:00
Kurita, Takehiro
73e038ec18 mpiext/pcollreq: Add Fortran use-mpi bindings
Signed-off-by: Kurita, Takehiro <fj6370fp@aa.jp.fujitsu.com>
2018-07-17 16:29:41 +09:00
Gilles Gouaillardet
9e0115c980 mpiext/pcollreq: Add Fortran mpif-h bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 16:29:33 +09:00
Gilles Gouaillardet
44110a575d mpiext/pcollreq: do include PMPIX_* subroutines to C bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 16:29:33 +09:00
KAWASHIMA Takahiro
5ddf0f6418 mpi/fortran: Fix IN_PLACE detection of ISCATTER(V)
Blocking `MPI_SCATTER` and `MPI_SCATTERV` were fixed in 506d0e96f4
but noblocking `MPI_ISCATTER` and `MPI_ISCATTERV` were not fixed yet.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-17 14:15:21 +09:00
Mikhail Kurnosov
ba83cc91eb coll/base: add MPI_Bcast based on a binomial tree scatter followed by a ring allgather
Implements MPI_Bcast using a binomial tree scatter followed by a ring allgather.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-16 08:56:09 -06:00
Gilles Gouaillardet
61b3308871 mpiext/pcollreq: check subroutine parameters and add profiling symbols
- check subroutine parameters
 - implement PMPIX_* subroutines

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-14 14:14:37 +09:00
Gilles Gouaillardet
dec1663364 spc: add missing subroutines
add counters for :
 - MPI_Exscan
 - MPI_Iexscan
 - MPI_Igatherv

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-14 14:14:37 +09:00
Howard Pritchard
9a5fd48388
Merge pull request #5079 from jsquyres/pr/fortran-is-the-devil
status_set_cancelled: fix F08 binding
2018-07-13 15:36:02 -05:00
Joshua Ladd
b12868239c
Merge pull request #4765 from xinzhao3/topic/osc-ucx-mem-hook
OMPI/OSC/UCX: move memory hooks init in osc to win creation.
2018-07-13 09:36:20 -04:00
Xin Zhao
74ef51af1b OMPI/OSC/UCX: move memory hooks init in osc to win creation.
Move memory hooks init (for request based operation) in osc ucx to window
creation time, to avoid performance issue in MPI initialization.

Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-07-12 15:03:02 -07:00
Nathan Hjelm
304a6a52d4 osc/rdma: use local base for local process when possible
This commit fixes a crash that occurs when using btl/vader as an RDMA
btl. This btl supports using CPU atomics and does not support using
the btl for self communication so we must use the local memory
optimizations in osc/rdma.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-12 15:50:50 -06:00
KAWASHIMA Takahiro
c87a3df0c9
Merge pull request #5416 from kawashima-fj/pr/coll-libnbc-suppress-warnings
coll/libnbc: Suppress compiler warnings
2018-07-12 15:45:59 +09:00
KAWASHIMA Takahiro
37a05e74aa coll/libnbc: Suppress compiler warnings
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-12 14:42:39 +09:00
KAWASHIMA Takahiro
0021616984 pml/ob1: Fix data corruption of MPI_BSEND
Data transferred by `MPI_BSEND` may corrupt if all of the following
conditions are met.

- The message size is less than the eager limit.
- The `btl_alloc` function in the BTL interface returns `NULL`
  for some reason.
- The MPI program overwrites the send buffer after `MPI_BSEND`
  returns.

The problem is in the way of pending a send request in ob1 PML.
The `mca_pml_ob1_send_request_start_copy` function retruns
`OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns
`des = NULL`. In this case, the send request is added to the
`send_pending` list and `MPI_BSEND` returns immediately. Next time
the `mca_pml_ob1_send_request_start_copy` function tries sending,
the user buffer may have been overwritten by the MPI program.

Call hierarchy of `MPI_BSEND`:

```
  MPI_Bsend
    mca_pml_ob1_send
      if (MCA_PML_BASE_SEND_BUFFERED == sendmode)
        mca_pml_ob1_isend
          MCA_PML_OB1_SEND_REQUEST_START_W_SEQ
            mca_pml_ob1_send_request_start_seq
              mca_pml_ob1_send_request_start_btl
                if (size <= eager_limit)
                  if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED)
                    mca_pml_ob1_send_request_start_copy
                      mca_bml_base_alloc
                        btl_alloc
              if (OMPI_ERR_OUT_OF_RESOURCE == rc)
                add_request_to_send_pending
        ompi_request_free
```

To solve this problem, we should save the data to the buffer
attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`.

This problem was introduced by ob1 optimization (commits 2b57f422
and a06e491c) in v1.8 series.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-12 14:30:58 +09:00
Howard Pritchard
34bc77747c
Merge pull request #5388 from mkurnosov/base-gather-bmtree-fix-mpi-in-place
coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing
2018-07-11 18:34:35 -05:00