1
1
Граф коммитов

10205 Коммитов

Автор SHA1 Сообщение Дата
Brian Barrett
10d0a430c4 mtl ofi: Change from opt-in to opt-out provider selection
Change default provider selection logic for the OFI MTL.  The
old logic was whitelist-only, so any new HPC NIC provider would
have to ask users to do extra work or wait for an OMPI release
to be whitelisted.  The reason for the logic was to avoid
selecting a "generic" provider like sockets or shm that would
frequently have worse performance than the optimized BTL options
Open MPI supports.

With the change, we blacklist the (small, relatively static) list
of providers that duplicate internal capabilities.  Users can use
one of thse blacklisted providers in two ways: first, they can
explicitly request the provider in the include list (which will
override the default exclude list) and second, the can set a new
empty exclude list.

Since most HPC networks require special libraries and therefore
an explicit build of libfabric, it is highly unlikely that this
change will cause users to use libfabric when they didn't want to
do so.  It does, however, solve the whitelisting problem.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit c5eaa38491)
2018-09-27 18:41:47 +00:00
Gilles Gouaillardet
ce5959ba6c fortran/use-mpi-f08: Corrections to PMPI signatures of collectives
Corrected the signatures of the collectives used by the Fortran 2008
interface to state correct intent for inout arguments and use the
ASYNCHRONOUS attribute in non-blocking collective calls.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit f750c6932c)
2018-09-26 12:34:46 -07:00
Philipp Otte
e98eae3da6 fortran/use-mpi-f08: Corrections to Fortran08 signatures of collectives
Corrected the signatures of the collectives used by the Fortran 2008
interface to state correct intent for inout arguments and use the
ASYNCHRONOUS attribute in non-blocking collective calls. Also corrected
the C-bindings in Fortran accordingly

Signed-off-by: Philipp Otte <philipp.j.otte@googlemail.com>
(cherry picked from commit e98d794e8b)
2018-09-26 12:34:46 -07:00
Geoff Paulsen
9d9ae9286c
Merge pull request #5753 from gpaulsen/man-page-script-abstraction-break
Fix script abstraction break: mv make_manpage.pl to config
2018-09-23 09:01:19 -05:00
Jeff Squyres
c83b30755a Fix script abstraction break: mv make_manpage.pl to config
Having the "make_manpage.pl" script in the ompi/ tree broke
"./autogen.pl --no-ompi" (specifically: "make distcheck" of --no-ompi
builds would break).

(cherry picked from commit 89773c41)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-22 15:11:06 -05:00
Geoff Paulsen
3d4164e1e1
Merge pull request #5752 from gpaulsen/misc-warnings-fixes
Miscellaneous compiler warning stomps.
2018-09-22 15:01:53 -05:00
Geoff Paulsen
bc798b6135
Merge pull request #5755 from gpaulsen/osc_rdma_cleanup
osc/rdma: clean out stale aggregation code
2018-09-22 15:00:21 -05:00
Nathan Hjelm
72fc8acb50 osc/rdma: quiet warning
gcc complains about ret possibly being used uninitialized. That will
never happen but we should still quiet the warning. This commit sets
ret to a valid value.

Fixes #5513

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:44:56 -05:00
Nathan Hjelm
56e31f8206 osc/rdma: clean out stale aggregation code
The aggregation code in osc/rdma is currently broken and will likely
not be reused. This commit cleans it out.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:42:45 -05:00
Jeff Squyres
2e37f97a38 Miscellaneous compiler warning stomps.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit fe0852bcb4)
2018-09-21 14:35:51 -05:00
Geoff Paulsen
4688da0631
Merge pull request #5736 from hoopoepg/topic/topic/common-del-procs-v4.0
MCA/COMMON/UCX: del_procs calls are unified to common module - v4.0
2018-09-20 18:12:25 -05:00
Geoff Paulsen
1a65b0ab66
Merge pull request #5741 from ggouaillardet/topic/v4.0.x/use_mpi_f08_bindings
v4.0.x: fortran/use-mpi-f08: clean [p]ompi_FOO_f bindings
2018-09-20 18:10:11 -05:00
Gilles Gouaillardet
d0a0fe818f fortran/use-mpi-f08: use bindings from ompi_mpifh_bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@c4ce01d104)
2018-09-20 10:37:33 +09:00
Gilles Gouaillardet
afb66d222b fortran/use-mpi-f08: fix [p]ompi_FOO_f symbols handling
- do not generate bindings for pompi_FOO_f symbols
   (they are simply not used anywhere)

 - move ompi_FOO_f bindings out of mpi_f08.mod into
   ompi_mpifh_bindings.mod that is only used at build time

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@c6070fd2e0)
2018-09-20 10:37:01 +09:00
Gilles Gouaillardet
03d994c9cf configury: do not define "dummy" empty targets any more.
We previously needed to have empty targets because AM couldn't handle
having an AM_CONDITIONAL was targets in the "if" statement but not in the
"else".  :-(

That now appears as an old automake bug that has been fixed,
so cleanup some Makefile.am

Thanks Jeff for the pointer.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@6e04b2a66a)
2018-09-20 10:36:41 +09:00
Gilles Gouaillardet
98156b7ace use-mpi-f08: fix a typo in [P]MPI_Dist_graph_create_adjacent bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@d2393251f7)
2018-09-20 10:36:01 +09:00
Sergey Oblomov
3cace87749 MCA/COMMON/UCX: del_procs calls are unified to common module
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 920cc2e0d9)
2018-09-19 10:47:27 +03:00
George Bosilca
8d892f9917 Be conservative with the array_of_indices
We were assuming that the array_of_indices has the same size as the
number of requests (incount), instead of the numberr of actually
active requests. While the patch is trivial, the question of the
size of the array_of_indices should be clarified in the MPI Forum.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit a5fbfa476a)
2018-09-18 12:05:51 -07:00
Howard Pritchard
3a584fee53
Merge pull request #5723 from ggouaillardet/topic/v4.0.x/libnbc_error_path
coll/libnbc: fix various error paths
2018-09-18 09:29:45 -06:00
KAWASHIMA Takahiro
e83e118ae7 mpiext/pcollreq: fix more typos
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 4a0a2598f6)
2018-09-18 15:43:06 +09:00
Gilles Gouaillardet
ece18aed45 coll/libnbc: fix various error paths
The parameter passed to NBC_Return_handle() was incorrectly casted
and not dereferenced.

Thanks Yossi for the bug report.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@8b51862fb2)
2018-09-18 15:29:33 +09:00
Gilles Gouaillardet
73f531a8f2 mpiext/pcollreq: fix misc typos
Thanks Jeff for the report

Fixes open-mpi/ompi#5712

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@8dc6985a5a)
2018-09-18 12:47:04 +09:00
Geoff Paulsen
de0c595ca5
Merge pull request #5650 from matcabral/remove_psm2_shadow_env_40x
v4.0.x: MTL PSM2: Remove shadow variables from v4.0.x
2018-09-17 14:40:59 -05:00
Geoff Paulsen
17aab5ea5b
Merge pull request #5659 from ggouaillardet/topic/v4.0.x/misc_finalize_leaks
Plug misc leaks on MPI_Finalize()
2018-09-10 14:06:31 -05:00
KAWASHIMA Takahiro
6858028596 mpiext/pcollreq: Fix zero-count reduction
We need to return a persistent request.
`ompi_request_empty` is not a persistent request.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>

(cherry picked from commit 69901a5156)
2018-09-10 13:11:59 +09:00
Gilles Gouaillardet
ff8600f2e4 ompi/hook: plug a misc memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@b79b37465c)
2018-09-10 09:21:49 +09:00
Gilles Gouaillardet
4bd5c538a2 pml/ob1: plug a memory leak in mca_pml_ob1_component_fini()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@fed33c1530)
2018-09-10 09:21:12 +09:00
Gilles Gouaillardet
c767c63a3b ompi/info: plug memory leaks in ompi_mpiinfo_finalize()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@d0d399c9a9)
2018-09-10 09:18:15 +09:00
Gilles Gouaillardet
080e20fa02 mtl/psm2: fix a misc memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@316e4e38f4)
2018-09-10 09:17:54 +09:00
matcabral
8fa172e60b MTL PSM2: Remove shadow variables from v4.0.x
As agreed on #4574, where removed in past release branches
to avoid perfomance impacts in the default values for
some paramters.

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2018-09-05 18:44:40 -04:00
Howard Pritchard
7e10bc0833
Merge pull request #5607 from edgargabriel/pr/sharedfp-naming-conflict-v4.0
sharedfp/sm and lockedfile: fix naming bug
2018-09-02 16:03:14 -04:00
Geoff Paulsen
3282c61048
Merge pull request #5625 from hoopoepg/topic/optimize-blocked-calls-v4.0
PML/UCX: blocked calls optimizations - v4.0
2018-08-31 14:11:11 -05:00
Geoff Paulsen
334748753c
Merge pull request #5626 from hoopoepg/topic/opal-mem-hooks-syno-v4.0
MCA/COMMON/UCX: added synonim to opal_mem_hook variable - v4.0
2018-08-31 14:09:14 -05:00
Geoff Paulsen
51e685ff40
Merge pull request #5622 from aravindksg/ofi_race_fix_40x
MTL OFI: Fix race condition due to global progress entries array
2018-08-31 14:07:42 -05:00
Sergey Oblomov
028bcb8a73 MCA/COMMON/UCX: added synonim to opal_mem_hook variable
- added synonim to common ucx variables to allow
  to print it in opal_info -a

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e00f7a68ba)
2018-08-29 15:17:00 +03:00
Sergey Oblomov
9215eb9a3b PML/UCX: blocked calls optimizations
- refactoring of opal/UCX progress calls
- added UCX progress priority

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit b0f87f2235)
2018-08-29 14:38:22 +03:00
Aravind Gopalakrishnan
37d1a202be MTL OFI: Fix race condition due to global progress entries array
Since progress entries array is globally allocated, it is susceptible
to race conditions when using multi-threaded applications. Allocating it
on the stack resolves any potential races as it is thread local by default.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit ed2343034d)
2018-08-28 14:23:56 -07:00
Edgar Gabriel
2e3cf6fb12 io/base: fixes to file_delete selection logic
file_delete triggers underneath the hood the full component selection
logic, since we do not have a file handle, just a file name.

As part of the selection logic, we have to however initiate the
framework-open of the fs component in case of ompio, since ompio
will call the delete function of the selected fs componentn, which
is based on the file system where the file is located.

This was not handled correctly so far. The problem however only
shows up if the first I/O operatin to be executed is a file_delete,
other wise the file_open will lead to the correct opening and initialization
of the fs framework. This commit ensures that we do the right thing
even if file_delete is the first file I/O operation in the application.

Fixes issue #5611

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-08-28 08:18:59 -05:00
Edgar Gabriel
a489a6fc9d sharedfp/sm and lockedfile: fix naming bug
If an application opens a file for reading from multiple processes
using MPI_COMM_SELF (or another communicator that has distinct
process groups but the same comm-id, as can happen as the result
of comm_split), the naming chosen for the lockedfile or the mmapped
file used by the sharedfp/sm component would collide. This patch
ensures that the filename is different by integrating the process id
of rank 0 for each sub-communicator.

This fixes one aspect of the problem reported in github issue 5593

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-08-27 14:11:03 -05:00
Howard Pritchard
37440aca90
Merge pull request #5497 from markalle/apply_romio314_patch_to_v40x
v4.0.x: apply romio314 patch to romio321
2018-08-25 11:12:08 -04:00
Howard Pritchard
b926c35df0
Merge pull request #5562 from edgargabriel/pr/file_open_sharedfp_ordering_v4.0x
common/ompio: fix an ordering problem during file_open
2018-08-21 22:17:45 -04:00
Howard Pritchard
4c8852c2c8
Merge pull request #5555 from karasevb/v4.0.x_pmix_fence_status
v4.0.x/pmix: added check for pmix fence status
2018-08-21 09:28:17 -06:00
Edgar Gabriel
2da601a350 common/ompio: fix an ordering problem during file_open
the sharedfp component has to be selected and opened before
we set the default file view during file_open. Otherwise
there is a sperious error message from the sharefp_file_seek
operation that is called during the file_set_view.

Fixes Issue #5560

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-08-20 10:23:32 -05:00
Boris Karasev
8873d901e8 pmix: added check for pmix fence status
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit 57683366ca)

Conflicts:
	opal/mca/common/ucx/common_ucx.c
	opal/mca/common/ucx/common_ucx.h

Modified:
	ompi/mca/pml/ucx/pml_ucx.c
	oshmem/mca/spml/ucx/spml_ucx.c
2018-08-17 21:33:50 +06:00
Jeff Squyres
7f443a159a fortran/use TKR: remove excess declaration for PMPI_Type_extent
This declaration was accidentally left behind in 89da9651bb.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 8a0b5454ae)
2018-08-16 13:13:14 -07:00
Howard Pritchard
cdc315c1ac
Merge pull request #5523 from tkordenbrock/topic/v4.0.x/fix.PtlMEUnlink.in.use
v4.0.x: coll-portals4: retry PtlMEUnlink() if PTL_IN_USE
2018-08-13 14:19:10 -06:00
Howard Pritchard
7b6a2da71a
Merge pull request #5504 from rhc54/cmr40/ofi
MTL OFI: send/isend split into blocking/non-blocking paths
2018-08-13 14:18:05 -06:00
Todd Kordenbrock
36369f9133 coll-portals4: retry PtlMEUnlink() if PTL_IN_USE
In the cleanup phase, it is possible for PtlMEUnlink() to return
PTL_IN_USE if the NIC is not done with the ME.  This should not
be considered an error.  This commit adds a retry loop around
PtlMEUnlink().

In some cases, the return value of PtlMEUnlink() and PtlCTFree()
was not checked at all.  Check them with the same retry loop as
above.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
(cherry picked from commit f3f2a826b4)
2018-08-07 11:23:51 -05:00
Howard Pritchard
9a6f6e61f0
Merge pull request #5499 from nrspruit/ns_cancel_fix_4.0
MTL OFI: Fix Deadlock in fi_cancel given completion during cancel
2018-08-07 09:16:56 -06:00
Howard Pritchard
2386994c9d
Merge pull request #5495 from hoopoepg/topic/ucx-init-c99-v4.0
PML/SPML/UCX: init global objects using C99 style - v4.0
2018-08-04 16:03:56 -06:00
Spruit, Neil R
1fbbae1907 MTL OFI: send/isend split into blocking/non-blocking paths
-Updated blocking send to directly call functionality and
set completion events expected to 0 initally. This allows for optimization for
providers that support fi_tinject up to larger sizes. This also reduces
latency on running the OFI mtl with smaller sizes without requiring
calls to progress given fi_tinject is required to complete the messaging
before returning and will not create any events in the Completion Queue.

-Updated non-blocking send to directly call fi_tsend and avoid calling
fi_tinject as the functionality should not wait on completions. This
resolves a bug where applications calling MPI_Isend can overrun the
TX buffer with small (inject) messages causing a deadlock. In addition
this improves performance in message rates by preventing
waiting on any size message to complete in non-blocking send messages.

-Created common ompi_mtl_ofi_ssend_recv function to post the ssend recv
which is common between isend and send code paths.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit 7dc8c8ba3f)
2018-08-01 06:45:48 -07:00
Ralph Castain
7830d9971e
Merge pull request #5467 from rhc54/cmr40/ofi
MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow
2018-07-31 13:08:03 -07:00
Mark Allen
e2b6e9ee09 apply romio314 patch to romio321
When romio314 was first pulled in an extra patch was applied to it, see commit
92f6c7c1e2. Most of that patch is already present
in vanilla romio321, but the fix for MPIO_DATATYPE_ISCOMMITTED() isn't.

If that macro doesn't set err_ then some paths end up with a variable being used
uninitialized. In particular you can trace through romio321/romio/mpi-io/read.c
to see what happens with error_code. It's an uninitialized stack variable that goes
through three MPIO_CHECK_* macros none of which set it. The macros consistently set
error_code to a failure if they see something wrong, but they don't consistently
set it to success when things are fine.

And then in the last macro MPIO_CHECK_DATATYPE it tries to look at the value
of error_code that was never set.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit f413ef6b14)
2018-07-30 17:23:59 -04:00
Spruit, Neil R
9cc6bc1ea6 MTL OFI: Fix Deadlock in fi_cancel given completion during cancel
- If a message for a recv that is being cancelled gets completed after
the call to fi_cancel, then the OFI mtl will enter a deadlock state
waiting for ofi_req->super.ompi_req->req_status._cancelled which will
never happen since the recv was successfully finished.

- To resolve this issue, the OFI mtl now checks ofi_req->req_started
to see if the request has been started within the loop waiting for the
event to be cancelled. If the request is being completed, then the loop
is broken and fi_cancel exits setting
ofi_req->super.ompi_req->req_status._cancelled = false;

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit 767135c580)
2018-07-30 07:17:40 -07:00
Sergey Oblomov
b64502977a PML/SPML/UCX: init global objects using C99 style
- to avoid value mix used C99 style of object initializations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2806504290)
2018-07-28 16:47:43 +03:00
Mikhail Kurnosov
c540dfb18c coll-base-allgather: fix MPI_IN_PLACE processing
The call of MPI_Allgather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
(cherry picked from commit 540c2d1)
2018-07-25 08:11:28 +07:00
Spruit, Neil R
ac8d2e01f9 MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow
- Added support in MTL_OFI_RETRY_UNTIL_DONE to handle -FI_EAGAIN
  from the provider and correctly attempt to progress the OFI Completion
  queue by calling ompi_mtl_ofi_progress.

- If events were pending that blocked OFI operations from being enqueued
  they will be completed and the OFI operation will be retried once
  ompi_mtl_ofi_progress has successfully completed.

- Updated MTL_OFI_RETRY_UNTIL_DONE to take a RETURN variable instead of
  requiring the existance of a "ret" variable to pass back the return
  value from completing the OFI operation.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
(cherry picked from commit d4f408a7f8)
2018-07-23 11:14:42 -07:00
Sergey Oblomov
af0e7b190e PML/UCX: fixed ucp request free on persistent request completion
- in sine cases persistent request was deleted during completion
  callback, this cause double free of linked UCX request (assert
  in debug build or hang in release build)
- UCX request is freed prior completion callback

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 6fe0a73861)
2018-07-20 22:20:14 +03:00
Sergey Oblomov
74d6ad09bc OSC/UCX: fixed hang on OSC init
- there worked progress was missed on startup which caused hang
  on one of ranks

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit a081fba046)
2018-07-19 15:23:01 +03:00
Edgar Gabriel
b6b9552ca9
Merge pull request #5444 from gbossu/fix-file-delete
io/ompio: Call component-specific file_delete function instead of POSIX unlink
2018-07-18 08:45:57 -05:00
Gilles Gouaillardet
fed1e7766e
Merge pull request #5430 from ggouaillardet/pr/pcollreq-fort
mpiext/pcollreq: add Fortran bindings
2018-07-18 09:52:59 +09:00
Joshua Ladd
3add13c72e
Merge pull request #5441 from hoopoepg/topic/ucx-memhooks-to-common-module
MCA/COMMON/UCX: shift opal memhooks into common UCX
2018-07-17 15:52:44 -04:00
Matias Cabral
be3cb01cb4
Merge pull request #5397 from nrspruit/ns_ofi_mtl_ssend
MTL OFI: Redesign sync send with reduced tag bits and quick ack
2018-07-17 10:14:33 -07:00
Gaëtan Bossu
8522ba112c MCA/IO/OMPIO: fix MPI_File_delete implementation.
OMPIO now uses the correct delete function depending on the fs

mca_common_ompio_file_delete now works this way instead
of calling POSIX unlink:
 - create a minimal file handle with the given file name
 - select the best fs component using this file handle
 - call the component-specific file delete function

Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>
2018-07-17 18:17:13 +02:00
Gaëtan Bossu
ac6f75e3d1 MCA/FS: check communicator validity in query functions
It is needed because the fs components might be queried due to a MPI_File_delete call.
And in this case, we don't have a communicator value.

Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>
2018-07-17 18:16:21 +02:00
Josh Hursey
9aa5168795
Merge pull request #5353 from ggouaillardet/topic/romio321_grequests
io/romio321: make grequest extensions internal
2018-07-17 10:53:53 -05:00
Gilles Gouaillardet
1a41482720 coll/libnbc: do not recursively call opal_progress()
instead of invoking ompi_request_test_all(), that will end up
calling opal_progress() recursively, manually check the status
of the requests.

the same method is used in ompi_comm_request_progress()

Refs open-mpi/ompi#3901

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 09:45:08 -06:00
Sergey Oblomov
1c7ae22dfb MCA/COMMON/UCX: shift opal memhooks into common UCX
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-17 13:46:38 +03:00
Gilles Gouaillardet
47351b7fac mpiext/pcollreq: Add Fortran use-mpi-f08 bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 16:29:41 +09:00
Kurita, Takehiro
73e038ec18 mpiext/pcollreq: Add Fortran use-mpi bindings
Signed-off-by: Kurita, Takehiro <fj6370fp@aa.jp.fujitsu.com>
2018-07-17 16:29:41 +09:00
Gilles Gouaillardet
9e0115c980 mpiext/pcollreq: Add Fortran mpif-h bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 16:29:33 +09:00
Gilles Gouaillardet
44110a575d mpiext/pcollreq: do include PMPIX_* subroutines to C bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-17 16:29:33 +09:00
KAWASHIMA Takahiro
5ddf0f6418 mpi/fortran: Fix IN_PLACE detection of ISCATTER(V)
Blocking `MPI_SCATTER` and `MPI_SCATTERV` were fixed in 506d0e96f4
but noblocking `MPI_ISCATTER` and `MPI_ISCATTERV` were not fixed yet.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-17 14:15:21 +09:00
Mikhail Kurnosov
ba83cc91eb coll/base: add MPI_Bcast based on a binomial tree scatter followed by a ring allgather
Implements MPI_Bcast using a binomial tree scatter followed by a ring allgather.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-16 08:56:09 -06:00
Gilles Gouaillardet
61b3308871 mpiext/pcollreq: check subroutine parameters and add profiling symbols
- check subroutine parameters
 - implement PMPIX_* subroutines

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-14 14:14:37 +09:00
Gilles Gouaillardet
dec1663364 spc: add missing subroutines
add counters for :
 - MPI_Exscan
 - MPI_Iexscan
 - MPI_Igatherv

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-14 14:14:37 +09:00
Howard Pritchard
9a5fd48388
Merge pull request #5079 from jsquyres/pr/fortran-is-the-devil
status_set_cancelled: fix F08 binding
2018-07-13 15:36:02 -05:00
Joshua Ladd
b12868239c
Merge pull request #4765 from xinzhao3/topic/osc-ucx-mem-hook
OMPI/OSC/UCX: move memory hooks init in osc to win creation.
2018-07-13 09:36:20 -04:00
Xin Zhao
74ef51af1b OMPI/OSC/UCX: move memory hooks init in osc to win creation.
Move memory hooks init (for request based operation) in osc ucx to window
creation time, to avoid performance issue in MPI initialization.

Signed-off-by: Xin Zhao <xinz@mellanox.com>
2018-07-12 15:03:02 -07:00
Nathan Hjelm
304a6a52d4 osc/rdma: use local base for local process when possible
This commit fixes a crash that occurs when using btl/vader as an RDMA
btl. This btl supports using CPU atomics and does not support using
the btl for self communication so we must use the local memory
optimizations in osc/rdma.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-12 15:50:50 -06:00
KAWASHIMA Takahiro
c87a3df0c9
Merge pull request #5416 from kawashima-fj/pr/coll-libnbc-suppress-warnings
coll/libnbc: Suppress compiler warnings
2018-07-12 15:45:59 +09:00
KAWASHIMA Takahiro
37a05e74aa coll/libnbc: Suppress compiler warnings
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-12 14:42:39 +09:00
KAWASHIMA Takahiro
0021616984 pml/ob1: Fix data corruption of MPI_BSEND
Data transferred by `MPI_BSEND` may corrupt if all of the following
conditions are met.

- The message size is less than the eager limit.
- The `btl_alloc` function in the BTL interface returns `NULL`
  for some reason.
- The MPI program overwrites the send buffer after `MPI_BSEND`
  returns.

The problem is in the way of pending a send request in ob1 PML.
The `mca_pml_ob1_send_request_start_copy` function retruns
`OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns
`des = NULL`. In this case, the send request is added to the
`send_pending` list and `MPI_BSEND` returns immediately. Next time
the `mca_pml_ob1_send_request_start_copy` function tries sending,
the user buffer may have been overwritten by the MPI program.

Call hierarchy of `MPI_BSEND`:

```
  MPI_Bsend
    mca_pml_ob1_send
      if (MCA_PML_BASE_SEND_BUFFERED == sendmode)
        mca_pml_ob1_isend
          MCA_PML_OB1_SEND_REQUEST_START_W_SEQ
            mca_pml_ob1_send_request_start_seq
              mca_pml_ob1_send_request_start_btl
                if (size <= eager_limit)
                  if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED)
                    mca_pml_ob1_send_request_start_copy
                      mca_bml_base_alloc
                        btl_alloc
              if (OMPI_ERR_OUT_OF_RESOURCE == rc)
                add_request_to_send_pending
        ompi_request_free
```

To solve this problem, we should save the data to the buffer
attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`.

This problem was introduced by ob1 optimization (commits 2b57f422
and a06e491c) in v1.8 series.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-12 14:30:58 +09:00
Howard Pritchard
34bc77747c
Merge pull request #5388 from mkurnosov/base-gather-bmtree-fix-mpi-in-place
coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing
2018-07-11 18:34:35 -05:00
Nathan Hjelm
35a75a6bf5 osc/sm: avoid filename collision when multiple windows share same CID
This commit fixes an issue identified by MTT where we can have two
different sets of processes on the same node creating a shared memory
window with communicators sharing the same CID. To avoid this issue
the temporary filename now includes the creating processes vpid.

References #5363

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-11 14:32:27 -06:00
Nathan Hjelm
037656bc1d osc/rdma: fix bug introduced in b90c838
This commit fixes an bug that was introduced back in 2016 which
impacts request-based RMA in some cases.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-10 18:17:55 -06:00
Gilles Gouaillardet
76292951e5 coll/libnbc: fix integer overflow
Use internal pack/unpack subroutines that operate on MPI_Aint instead of int
and hence solve some integer overflows.

Thanks Clyde Stanfield for reporting this issue.

Refs open-mpi/ompi#5383

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2018-07-09 10:08:33 -06:00
Mikhail Kurnosov
22fa5a8a67 coll/base/scatter: replaces right skewed binomial tree (in order) with left skewed binomial tree
Current implementation of `coll/base/MPI_Scatter` is based on in-order binomial tree. This tree is right skewed and it provides good performance for a MPI_Gather operation. But for a MPI_Scatter operation left skewed binomial tree is effective.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-09 10:04:41 -06:00
Spruit, Neil R
9a17864278 MTL OFI: Redesign sync send with reduced tag bits and quick ack
-Updated the design for sync send MPI calls to use 2 protocol bits for
denoting "sync_send" or "sync_send_ack".

-"Sync_send" is added to the send tag only and is masked out in receives
such that it can be read by the original Recv posted in the send/recv
operation.

-"Sync_send_ack" is sent from the recv callback to the send side. This 0
byte send does not generate a completion entry and instead sends the
message and immediately completes the opal completion in the recv.

-Tag formats ofi_tag_1 and ofi_tag_2 have been updated to include 2
more tag bits per format type due to the reduced protocal bits required
by OMPI.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
2018-07-09 06:50:21 -07:00
Yossi Itigin
e77e31b50b
Merge pull request #5378 from hoopoepg/topic/unify-ucx-logging
MCA/COMMON/UCX: unified logging across all UCX modules
2018-07-08 12:45:26 +03:00
Mikhail Kurnosov
b9e14cd7d0 coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing
The call of MPI_Gather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault in the root process.

The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard (page 150, line 37), sendtype and sendcount parameters should be ignored in this case.

Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
2018-07-07 20:59:39 +07:00
Sergey Oblomov
240670152e MCA/COMMON/UCX: code beautify - alignment
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-06 19:40:58 +03:00
Sergey Oblomov
eb7010933d OSC/UCX: suppressed compilation warnings
- suppressed sing/unsign-compare warnings

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-06 10:58:09 +03:00
Sergey Oblomov
bef47b792c MCA/COMMON/UCX: unified logging across all UCX modules
- added common logging infrastructure for all
  UCX modules
- all UCX modules are switched to new infra

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-05 16:25:39 +03:00
Sergey Oblomov
8080283b3d MCA/COMMON/UCX: changed return type for wait_request
- for now wait_request returns OMPI status
- updated callers

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 23:29:38 +03:00
Sergey Oblomov
c2bd6af9f2 MCA/COMMON/UCX: minor unification of del_proces calls
- some common functionality of del_procs calls is moved into
  mca_common module
- blocking ucp_put call is replaced by non-blocking routine

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 15:10:53 +03:00
Yossi Itigin
09c10d5e09
Merge pull request #5345 from hoopoepg/topic/pml-ucx-suppress-compiler-warning
PML/UCX: suppressed compilation warning
2018-07-02 13:41:12 +03:00
Edgar Gabriel
d191ed6b4f fs/base: move redundant code to fs/base
moving some code from fs/ufs into fs/base. The benefit of this approach is
that fs components that are fundamentally based on posix I/O (and only differ
in some non-posix functionality such as setting stripe size, or which hints
are being supported) can avoid having to replicate the same code over and
over again. First beneficiary is the lustre fs component, but more
are to follow soon.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-07-01 10:20:32 -05:00
Xin Zhao
c1ac0c00c5
Merge pull request #5185 from jjolly/fix-memcpy-size-mismatch
- Build warning: stringop-overflow in get_dynamic_win_info() at osc_ucx_comm.c
2018-06-29 19:37:53 -07:00
Jeff Squyres
f4320193e3 mpi.h.in: remove some deprecation/removed warnings
Intentionally do not mark some MPI-1 function pointer typedefs as
`__mpi_interface_removed__` because we have to use them in prototyping
some MPI-1 functions when `--enable-mpi1-compatibility` is used.

Fixes #5357.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-29 07:43:51 -07:00