openmpi

Автор	SHA1	Сообщение	Дата
KAWASHIMA Takahiro	080c52f906	mpiext/pcollreq: Add missing f08 `asynchronous` Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com> (cherry picked from commit `be91a26fd8`)	2018-10-05 09:33:17 +09:00
KAWASHIMA Takahiro	fcc698f27f	mpiext/pcollreq: Correct f08 routine signatures Changes of nonblocking collectives in `e98d794e8b` and `f750c6932c` are applied to persistent collectives. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com> (cherry picked from commit `357531847e`)	2018-10-05 09:33:16 +09:00
KAWASHIMA Takahiro	b9316d3136	fortran/use-mpi-f08: Correct f08 routine signatures Following the commit `f750c6932c`, I compared `ompi/mpi/fortran/use-mpi-f08/.F90` and `ompi/mpi/fortran/use-mpi-f08/profile/p.F90`, and `ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces.F90` and `ompi/mpi/fortran/use-mpi-f08/mod/pmpi-f08-interfaces.F90`. There are many differences. Some are bugs of `MPI_`, some are bugs of `PMPI_`. I'm not sure how these bugs affect applications. To make it easy to compare these files future, I also removed editorial differences. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com> (cherry picked from commit `cf6d28cb66`)	2018-10-05 09:04:17 +09:00
Geoff Paulsen	c0796664b1	Merge pull request #5780 from jsquyres/pr/v4.0.x/moar-fortran-fixes v4.0.x: Fortran 08 bindings fixes	2018-10-04 16:08:30 -05:00
Geoff Paulsen	5cae0ec25b	Merge pull request #5794 from bwbarrett/v4.0.x-ofi-mtl-selection mtl ofi: Change from opt-in to opt-out provider selection	2018-10-03 08:31:07 -05:00
Jeff Squyres	46dd266e45	mpi.h: remove MPI_UB/MPI_LB when not enabling MPI-1 compat When --enable-mpi1-compatibility was specified, the ompi_mpi_ub/lb symbols were #if'ed out of mpi.h. But the #defines for MPI_UB/LB still remained. This commit also #if's out the MPI_UB/LB macros when --enable-mpi1-compatibility is specified. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit `7223334d4d`)	2018-09-28 10:01:48 -07:00
Brian Barrett	10d0a430c4	mtl ofi: Change from opt-in to opt-out provider selection Change default provider selection logic for the OFI MTL. The old logic was whitelist-only, so any new HPC NIC provider would have to ask users to do extra work or wait for an OMPI release to be whitelisted. The reason for the logic was to avoid selecting a "generic" provider like sockets or shm that would frequently have worse performance than the optimized BTL options Open MPI supports. With the change, we blacklist the (small, relatively static) list of providers that duplicate internal capabilities. Users can use one of thse blacklisted providers in two ways: first, they can explicitly request the provider in the include list (which will override the default exclude list) and second, the can set a new empty exclude list. Since most HPC networks require special libraries and therefore an explicit build of libfabric, it is highly unlikely that this change will cause users to use libfabric when they didn't want to do so. It does, however, solve the whitelisting problem. Signed-off-by: Brian Barrett <bbarrett@amazon.com> (cherry picked from commit `c5eaa38491`)	2018-09-27 18:41:47 +00:00
Gilles Gouaillardet	ce5959ba6c	fortran/use-mpi-f08: Corrections to PMPI signatures of collectives Corrected the signatures of the collectives used by the Fortran 2008 interface to state correct intent for inout arguments and use the ASYNCHRONOUS attribute in non-blocking collective calls. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit `f750c6932c`)	2018-09-26 12:34:46 -07:00
Philipp Otte	e98eae3da6	fortran/use-mpi-f08: Corrections to Fortran08 signatures of collectives Corrected the signatures of the collectives used by the Fortran 2008 interface to state correct intent for inout arguments and use the ASYNCHRONOUS attribute in non-blocking collective calls. Also corrected the C-bindings in Fortran accordingly Signed-off-by: Philipp Otte <philipp.j.otte@googlemail.com> (cherry picked from commit `e98d794e8b`)	2018-09-26 12:34:46 -07:00
Geoff Paulsen	9d9ae9286c	Merge pull request #5753 from gpaulsen/man-page-script-abstraction-break Fix script abstraction break: mv make_manpage.pl to config	2018-09-23 09:01:19 -05:00
Jeff Squyres	c83b30755a	Fix script abstraction break: mv make_manpage.pl to config Having the "make_manpage.pl" script in the ompi/ tree broke "./autogen.pl --no-ompi" (specifically: "make distcheck" of --no-ompi builds would break). (cherry picked from commit `89773c41`) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-09-22 15:11:06 -05:00
Geoff Paulsen	3d4164e1e1	Merge pull request #5752 from gpaulsen/misc-warnings-fixes Miscellaneous compiler warning stomps.	2018-09-22 15:01:53 -05:00
Geoff Paulsen	bc798b6135	Merge pull request #5755 from gpaulsen/osc_rdma_cleanup osc/rdma: clean out stale aggregation code	2018-09-22 15:00:21 -05:00
Nathan Hjelm	72fc8acb50	osc/rdma: quiet warning gcc complains about ret possibly being used uninitialized. That will never happen but we should still quiet the warning. This commit sets ret to a valid value. Fixes #5513 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-09-21 14:44:56 -05:00
Nathan Hjelm	56e31f8206	osc/rdma: clean out stale aggregation code The aggregation code in osc/rdma is currently broken and will likely not be reused. This commit cleans it out. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-09-21 14:42:45 -05:00
Jeff Squyres	2e37f97a38	Miscellaneous compiler warning stomps. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit `fe0852bcb4`)	2018-09-21 14:35:51 -05:00
Geoff Paulsen	4688da0631	Merge pull request #5736 from hoopoepg/topic/topic/common-del-procs-v4.0 MCA/COMMON/UCX: del_procs calls are unified to common module - v4.0	2018-09-20 18:12:25 -05:00
Geoff Paulsen	1a65b0ab66	Merge pull request #5741 from ggouaillardet/topic/v4.0.x/use_mpi_f08_bindings v4.0.x: fortran/use-mpi-f08: clean [p]ompi_FOO_f bindings	2018-09-20 18:10:11 -05:00
Gilles Gouaillardet	d0a0fe818f	fortran/use-mpi-f08: use bindings from ompi_mpifh_bindings Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit open-mpi/ompi@c4ce01d104)	2018-09-20 10:37:33 +09:00
Gilles Gouaillardet	afb66d222b	fortran/use-mpi-f08: fix [p]ompi_FOO_f symbols handling - do not generate bindings for pompi_FOO_f symbols (they are simply not used anywhere) - move ompi_FOO_f bindings out of mpi_f08.mod into ompi_mpifh_bindings.mod that is only used at build time Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit open-mpi/ompi@c6070fd2e0)	2018-09-20 10:37:01 +09:00
Gilles Gouaillardet	03d994c9cf	configury: do not define "dummy" empty targets any more. We previously needed to have empty targets because AM couldn't handle having an AM_CONDITIONAL was targets in the "if" statement but not in the "else". :-( That now appears as an old automake bug that has been fixed, so cleanup some Makefile.am Thanks Jeff for the pointer. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit open-mpi/ompi@6e04b2a66a)	2018-09-20 10:36:41 +09:00
Gilles Gouaillardet	98156b7ace	use-mpi-f08: fix a typo in [P]MPI_Dist_graph_create_adjacent bindings Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit open-mpi/ompi@d2393251f7)	2018-09-20 10:36:01 +09:00
Sergey Oblomov	3cace87749	MCA/COMMON/UCX: del_procs calls are unified to common module Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `920cc2e0d9`)	2018-09-19 10:47:27 +03:00
George Bosilca	8d892f9917	Be conservative with the array_of_indices We were assuming that the array_of_indices has the same size as the number of requests (incount), instead of the numberr of actually active requests. While the patch is trivial, the question of the size of the array_of_indices should be clarified in the MPI Forum. Signed-off-by: George Bosilca <bosilca@icl.utk.edu> (cherry picked from commit `a5fbfa476a`)	2018-09-18 12:05:51 -07:00
Howard Pritchard	3a584fee53	Merge pull request #5723 from ggouaillardet/topic/v4.0.x/libnbc_error_path coll/libnbc: fix various error paths	2018-09-18 09:29:45 -06:00
KAWASHIMA Takahiro	e83e118ae7	mpiext/pcollreq: fix more typos Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com> (cherry picked from commit `4a0a2598f6`)	2018-09-18 15:43:06 +09:00
Gilles Gouaillardet	ece18aed45	coll/libnbc: fix various error paths The parameter passed to NBC_Return_handle() was incorrectly casted and not dereferenced. Thanks Yossi for the bug report. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit open-mpi/ompi@8b51862fb2)	2018-09-18 15:29:33 +09:00
Gilles Gouaillardet	73f531a8f2	mpiext/pcollreq: fix misc typos Thanks Jeff for the report Fixes open-mpi/ompi#5712 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit open-mpi/ompi@8dc6985a5a)	2018-09-18 12:47:04 +09:00
Geoff Paulsen	de0c595ca5	Merge pull request #5650 from matcabral/remove_psm2_shadow_env_40x v4.0.x: MTL PSM2: Remove shadow variables from v4.0.x	2018-09-17 14:40:59 -05:00
Geoff Paulsen	17aab5ea5b	Merge pull request #5659 from ggouaillardet/topic/v4.0.x/misc_finalize_leaks Plug misc leaks on MPI_Finalize()	2018-09-10 14:06:31 -05:00
KAWASHIMA Takahiro	6858028596	mpiext/pcollreq: Fix zero-count reduction We need to return a persistent request. `ompi_request_empty` is not a persistent request. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com> (cherry picked from commit `69901a5156`)	2018-09-10 13:11:59 +09:00
Gilles Gouaillardet	ff8600f2e4	ompi/hook: plug a misc memory leak Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit open-mpi/ompi@b79b37465c)	2018-09-10 09:21:49 +09:00
Gilles Gouaillardet	4bd5c538a2	pml/ob1: plug a memory leak in mca_pml_ob1_component_fini() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (back-ported from commit open-mpi/ompi@fed33c1530)	2018-09-10 09:21:12 +09:00
Gilles Gouaillardet	c767c63a3b	ompi/info: plug memory leaks in ompi_mpiinfo_finalize() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit open-mpi/ompi@d0d399c9a9)	2018-09-10 09:18:15 +09:00
Gilles Gouaillardet	080e20fa02	mtl/psm2: fix a misc memory leak Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> (cherry picked from commit open-mpi/ompi@316e4e38f4)	2018-09-10 09:17:54 +09:00
matcabral	8fa172e60b	MTL PSM2: Remove shadow variables from v4.0.x As agreed on #4574, where removed in past release branches to avoid perfomance impacts in the default values for some paramters. Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-09-05 18:44:40 -04:00
Howard Pritchard	7e10bc0833	Merge pull request #5607 from edgargabriel/pr/sharedfp-naming-conflict-v4.0 sharedfp/sm and lockedfile: fix naming bug	2018-09-02 16:03:14 -04:00
Geoff Paulsen	3282c61048	Merge pull request #5625 from hoopoepg/topic/optimize-blocked-calls-v4.0 PML/UCX: blocked calls optimizations - v4.0	2018-08-31 14:11:11 -05:00
Geoff Paulsen	334748753c	Merge pull request #5626 from hoopoepg/topic/opal-mem-hooks-syno-v4.0 MCA/COMMON/UCX: added synonim to opal_mem_hook variable - v4.0	2018-08-31 14:09:14 -05:00
Geoff Paulsen	51e685ff40	Merge pull request #5622 from aravindksg/ofi_race_fix_40x MTL OFI: Fix race condition due to global progress entries array	2018-08-31 14:07:42 -05:00
Sergey Oblomov	028bcb8a73	MCA/COMMON/UCX: added synonim to opal_mem_hook variable - added synonim to common ucx variables to allow to print it in opal_info -a Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `e00f7a68ba`)	2018-08-29 15:17:00 +03:00
Sergey Oblomov	9215eb9a3b	PML/UCX: blocked calls optimizations - refactoring of opal/UCX progress calls - added UCX progress priority Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `b0f87f2235`)	2018-08-29 14:38:22 +03:00
Aravind Gopalakrishnan	37d1a202be	MTL OFI: Fix race condition due to global progress entries array Since progress entries array is globally allocated, it is susceptible to race conditions when using multi-threaded applications. Allocating it on the stack resolves any potential races as it is thread local by default. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com> (cherry picked from commit `ed2343034d`)	2018-08-28 14:23:56 -07:00
Edgar Gabriel	2e3cf6fb12	io/base: fixes to file_delete selection logic file_delete triggers underneath the hood the full component selection logic, since we do not have a file handle, just a file name. As part of the selection logic, we have to however initiate the framework-open of the fs component in case of ompio, since ompio will call the delete function of the selected fs componentn, which is based on the file system where the file is located. This was not handled correctly so far. The problem however only shows up if the first I/O operatin to be executed is a file_delete, other wise the file_open will lead to the correct opening and initialization of the fs framework. This commit ensures that we do the right thing even if file_delete is the first file I/O operation in the application. Fixes issue #5611 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-08-28 08:18:59 -05:00
Edgar Gabriel	a489a6fc9d	sharedfp/sm and lockedfile: fix naming bug If an application opens a file for reading from multiple processes using MPI_COMM_SELF (or another communicator that has distinct process groups but the same comm-id, as can happen as the result of comm_split), the naming chosen for the lockedfile or the mmapped file used by the sharedfp/sm component would collide. This patch ensures that the filename is different by integrating the process id of rank 0 for each sub-communicator. This fixes one aspect of the problem reported in github issue 5593 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-08-27 14:11:03 -05:00
Howard Pritchard	37440aca90	Merge pull request #5497 from markalle/apply_romio314_patch_to_v40x v4.0.x: apply romio314 patch to romio321	2018-08-25 11:12:08 -04:00
Howard Pritchard	b926c35df0	Merge pull request #5562 from edgargabriel/pr/file_open_sharedfp_ordering_v4.0x common/ompio: fix an ordering problem during file_open	2018-08-21 22:17:45 -04:00
Howard Pritchard	4c8852c2c8	Merge pull request #5555 from karasevb/v4.0.x_pmix_fence_status v4.0.x/pmix: added check for pmix fence status	2018-08-21 09:28:17 -06:00
Edgar Gabriel	2da601a350	common/ompio: fix an ordering problem during file_open the sharedfp component has to be selected and opened before we set the default file view during file_open. Otherwise there is a sperious error message from the sharefp_file_seek operation that is called during the file_set_view. Fixes Issue #5560 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-08-20 10:23:32 -05:00
Boris Karasev	8873d901e8	pmix: added check for pmix fence status Signed-off-by: Boris Karasev <karasev.b@gmail.com> (cherry picked from commit `57683366ca`) Conflicts: opal/mca/common/ucx/common_ucx.c opal/mca/common/ucx/common_ucx.h Modified: ompi/mca/pml/ucx/pml_ucx.c oshmem/mca/spml/ucx/spml_ucx.c	2018-08-17 21:33:50 +06:00
Jeff Squyres	7f443a159a	fortran/use TKR: remove excess declaration for PMPI_Type_extent This declaration was accidentally left behind in `89da9651bb`. Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit `8a0b5454ae`)	2018-08-16 13:13:14 -07:00
Howard Pritchard	cdc315c1ac	Merge pull request #5523 from tkordenbrock/topic/v4.0.x/fix.PtlMEUnlink.in.use v4.0.x: coll-portals4: retry PtlMEUnlink() if PTL_IN_USE	2018-08-13 14:19:10 -06:00
Howard Pritchard	7b6a2da71a	Merge pull request #5504 from rhc54/cmr40/ofi MTL OFI: send/isend split into blocking/non-blocking paths	2018-08-13 14:18:05 -06:00
Todd Kordenbrock	36369f9133	coll-portals4: retry PtlMEUnlink() if PTL_IN_USE In the cleanup phase, it is possible for PtlMEUnlink() to return PTL_IN_USE if the NIC is not done with the ME. This should not be considered an error. This commit adds a retry loop around PtlMEUnlink(). In some cases, the return value of PtlMEUnlink() and PtlCTFree() was not checked at all. Check them with the same retry loop as above. Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com> (cherry picked from commit `f3f2a826b4`)	2018-08-07 11:23:51 -05:00
Howard Pritchard	9a6f6e61f0	Merge pull request #5499 from nrspruit/ns_cancel_fix_4.0 MTL OFI: Fix Deadlock in fi_cancel given completion during cancel	2018-08-07 09:16:56 -06:00
Howard Pritchard	2386994c9d	Merge pull request #5495 from hoopoepg/topic/ucx-init-c99-v4.0 PML/SPML/UCX: init global objects using C99 style - v4.0	2018-08-04 16:03:56 -06:00
Spruit, Neil R	1fbbae1907	MTL OFI: send/isend split into blocking/non-blocking paths -Updated blocking send to directly call functionality and set completion events expected to 0 initally. This allows for optimization for providers that support fi_tinject up to larger sizes. This also reduces latency on running the OFI mtl with smaller sizes without requiring calls to progress given fi_tinject is required to complete the messaging before returning and will not create any events in the Completion Queue. -Updated non-blocking send to directly call fi_tsend and avoid calling fi_tinject as the functionality should not wait on completions. This resolves a bug where applications calling MPI_Isend can overrun the TX buffer with small (inject) messages causing a deadlock. In addition this improves performance in message rates by preventing waiting on any size message to complete in non-blocking send messages. -Created common ompi_mtl_ofi_ssend_recv function to post the ssend recv which is common between isend and send code paths. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com> (cherry picked from commit `7dc8c8ba3f`)	2018-08-01 06:45:48 -07:00
Ralph Castain	7830d9971e	Merge pull request #5467 from rhc54/cmr40/ofi MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow	2018-07-31 13:08:03 -07:00
Mark Allen	e2b6e9ee09	apply romio314 patch to romio321 When romio314 was first pulled in an extra patch was applied to it, see commit `92f6c7c1e2`. Most of that patch is already present in vanilla romio321, but the fix for MPIO_DATATYPE_ISCOMMITTED() isn't. If that macro doesn't set err_ then some paths end up with a variable being used uninitialized. In particular you can trace through romio321/romio/mpi-io/read.c to see what happens with error_code. It's an uninitialized stack variable that goes through three MPIO_CHECK_* macros none of which set it. The macros consistently set error_code to a failure if they see something wrong, but they don't consistently set it to success when things are fine. And then in the last macro MPIO_CHECK_DATATYPE it tries to look at the value of error_code that was never set. Signed-off-by: Mark Allen <markalle@us.ibm.com> (cherry picked from commit `f413ef6b14`)	2018-07-30 17:23:59 -04:00
Spruit, Neil R	9cc6bc1ea6	MTL OFI: Fix Deadlock in fi_cancel given completion during cancel - If a message for a recv that is being cancelled gets completed after the call to fi_cancel, then the OFI mtl will enter a deadlock state waiting for ofi_req->super.ompi_req->req_status._cancelled which will never happen since the recv was successfully finished. - To resolve this issue, the OFI mtl now checks ofi_req->req_started to see if the request has been started within the loop waiting for the event to be cancelled. If the request is being completed, then the loop is broken and fi_cancel exits setting ofi_req->super.ompi_req->req_status._cancelled = false; Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com> (cherry picked from commit `767135c580`)	2018-07-30 07:17:40 -07:00
Sergey Oblomov	b64502977a	PML/SPML/UCX: init global objects using C99 style - to avoid value mix used C99 style of object initializations Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `2806504290`)	2018-07-28 16:47:43 +03:00
Mikhail Kurnosov	c540dfb18c	coll-base-allgather: fix MPI_IN_PLACE processing The call of MPI_Allgather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault. The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com> (cherry picked from commit `540c2d1`)	2018-07-25 08:11:28 +07:00
Spruit, Neil R	ac8d2e01f9	MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow - Added support in MTL_OFI_RETRY_UNTIL_DONE to handle -FI_EAGAIN from the provider and correctly attempt to progress the OFI Completion queue by calling ompi_mtl_ofi_progress. - If events were pending that blocked OFI operations from being enqueued they will be completed and the OFI operation will be retried once ompi_mtl_ofi_progress has successfully completed. - Updated MTL_OFI_RETRY_UNTIL_DONE to take a RETURN variable instead of requiring the existance of a "ret" variable to pass back the return value from completing the OFI operation. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com> (cherry picked from commit `d4f408a7f8`)	2018-07-23 11:14:42 -07:00
Sergey Oblomov	af0e7b190e	PML/UCX: fixed ucp request free on persistent request completion - in sine cases persistent request was deleted during completion callback, this cause double free of linked UCX request (assert in debug build or hang in release build) - UCX request is freed prior completion callback Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `6fe0a73861`)	2018-07-20 22:20:14 +03:00
Sergey Oblomov	74d6ad09bc	OSC/UCX: fixed hang on OSC init - there worked progress was missed on startup which caused hang on one of ranks Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com> (cherry picked from commit `a081fba046`)	2018-07-19 15:23:01 +03:00
Edgar Gabriel	b6b9552ca9	Merge pull request #5444 from gbossu/fix-file-delete io/ompio: Call component-specific file_delete function instead of POSIX unlink	2018-07-18 08:45:57 -05:00
Gilles Gouaillardet	fed1e7766e	Merge pull request #5430 from ggouaillardet/pr/pcollreq-fort mpiext/pcollreq: add Fortran bindings	2018-07-18 09:52:59 +09:00
Joshua Ladd	3add13c72e	Merge pull request #5441 from hoopoepg/topic/ucx-memhooks-to-common-module MCA/COMMON/UCX: shift opal memhooks into common UCX	2018-07-17 15:52:44 -04:00
Matias Cabral	be3cb01cb4	Merge pull request #5397 from nrspruit/ns_ofi_mtl_ssend MTL OFI: Redesign sync send with reduced tag bits and quick ack	2018-07-17 10:14:33 -07:00
Gaëtan Bossu	8522ba112c	MCA/IO/OMPIO: fix MPI_File_delete implementation. OMPIO now uses the correct delete function depending on the fs mca_common_ompio_file_delete now works this way instead of calling POSIX unlink: - create a minimal file handle with the given file name - select the best fs component using this file handle - call the component-specific file delete function Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>	2018-07-17 18:17:13 +02:00
Gaëtan Bossu	ac6f75e3d1	MCA/FS: check communicator validity in query functions It is needed because the fs components might be queried due to a MPI_File_delete call. And in this case, we don't have a communicator value. Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>	2018-07-17 18:16:21 +02:00
Josh Hursey	9aa5168795	Merge pull request #5353 from ggouaillardet/topic/romio321_grequests io/romio321: make grequest extensions internal	2018-07-17 10:53:53 -05:00
Gilles Gouaillardet	1a41482720	coll/libnbc: do not recursively call opal_progress() instead of invoking ompi_request_test_all(), that will end up calling opal_progress() recursively, manually check the status of the requests. the same method is used in ompi_comm_request_progress() Refs open-mpi/ompi#3901 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-17 09:45:08 -06:00
Sergey Oblomov	1c7ae22dfb	MCA/COMMON/UCX: shift opal memhooks into common UCX Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-17 13:46:38 +03:00
Gilles Gouaillardet	47351b7fac	mpiext/pcollreq: Add Fortran use-mpi-f08 bindings Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-17 16:29:41 +09:00
Kurita, Takehiro	73e038ec18	mpiext/pcollreq: Add Fortran use-mpi bindings Signed-off-by: Kurita, Takehiro <fj6370fp@aa.jp.fujitsu.com>	2018-07-17 16:29:41 +09:00
Gilles Gouaillardet	9e0115c980	mpiext/pcollreq: Add Fortran mpif-h bindings Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-17 16:29:33 +09:00
Gilles Gouaillardet	44110a575d	mpiext/pcollreq: do include PMPIX_* subroutines to C bindings Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-17 16:29:33 +09:00
KAWASHIMA Takahiro	5ddf0f6418	mpi/fortran: Fix IN_PLACE detection of ISCATTER(V) Blocking `MPI_SCATTER` and `MPI_SCATTERV` were fixed in `506d0e96f4` but noblocking `MPI_ISCATTER` and `MPI_ISCATTERV` were not fixed yet. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-07-17 14:15:21 +09:00
Mikhail Kurnosov	ba83cc91eb	coll/base: add MPI_Bcast based on a binomial tree scatter followed by a ring allgather Implements MPI_Bcast using a binomial tree scatter followed by a ring allgather. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-07-16 08:56:09 -06:00
Gilles Gouaillardet	61b3308871	mpiext/pcollreq: check subroutine parameters and add profiling symbols - check subroutine parameters - implement PMPIX_* subroutines Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-14 14:14:37 +09:00
Gilles Gouaillardet	dec1663364	spc: add missing subroutines add counters for : - MPI_Exscan - MPI_Iexscan - MPI_Igatherv Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-14 14:14:37 +09:00
Howard Pritchard	9a5fd48388	Merge pull request #5079 from jsquyres/pr/fortran-is-the-devil status_set_cancelled: fix F08 binding	2018-07-13 15:36:02 -05:00
Joshua Ladd	b12868239c	Merge pull request #4765 from xinzhao3/topic/osc-ucx-mem-hook OMPI/OSC/UCX: move memory hooks init in osc to win creation.	2018-07-13 09:36:20 -04:00
Xin Zhao	74ef51af1b	OMPI/OSC/UCX: move memory hooks init in osc to win creation. Move memory hooks init (for request based operation) in osc ucx to window creation time, to avoid performance issue in MPI initialization. Signed-off-by: Xin Zhao <xinz@mellanox.com>	2018-07-12 15:03:02 -07:00
Nathan Hjelm	304a6a52d4	osc/rdma: use local base for local process when possible This commit fixes a crash that occurs when using btl/vader as an RDMA btl. This btl supports using CPU atomics and does not support using the btl for self communication so we must use the local memory optimizations in osc/rdma. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-07-12 15:50:50 -06:00
KAWASHIMA Takahiro	c87a3df0c9	Merge pull request #5416 from kawashima-fj/pr/coll-libnbc-suppress-warnings coll/libnbc: Suppress compiler warnings	2018-07-12 15:45:59 +09:00
KAWASHIMA Takahiro	37a05e74aa	coll/libnbc: Suppress compiler warnings Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-07-12 14:42:39 +09:00
KAWASHIMA Takahiro	0021616984	pml/ob1: Fix data corruption of MPI_BSEND Data transferred by `MPI_BSEND` may corrupt if all of the following conditions are met. - The message size is less than the eager limit. - The `btl_alloc` function in the BTL interface returns `NULL` for some reason. - The MPI program overwrites the send buffer after `MPI_BSEND` returns. The problem is in the way of pending a send request in ob1 PML. The `mca_pml_ob1_send_request_start_copy` function retruns `OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns `des = NULL`. In this case, the send request is added to the `send_pending` list and `MPI_BSEND` returns immediately. Next time the `mca_pml_ob1_send_request_start_copy` function tries sending, the user buffer may have been overwritten by the MPI program. Call hierarchy of `MPI_BSEND`: ``` MPI_Bsend mca_pml_ob1_send if (MCA_PML_BASE_SEND_BUFFERED == sendmode) mca_pml_ob1_isend MCA_PML_OB1_SEND_REQUEST_START_W_SEQ mca_pml_ob1_send_request_start_seq mca_pml_ob1_send_request_start_btl if (size <= eager_limit) if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED) mca_pml_ob1_send_request_start_copy mca_bml_base_alloc btl_alloc if (OMPI_ERR_OUT_OF_RESOURCE == rc) add_request_to_send_pending ompi_request_free ``` To solve this problem, we should save the data to the buffer attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`. This problem was introduced by ob1 optimization (commits `2b57f422` and `a06e491c`) in v1.8 series. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-07-12 14:30:58 +09:00
Howard Pritchard	34bc77747c	Merge pull request #5388 from mkurnosov/base-gather-bmtree-fix-mpi-in-place coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing	2018-07-11 18:34:35 -05:00
Nathan Hjelm	35a75a6bf5	osc/sm: avoid filename collision when multiple windows share same CID This commit fixes an issue identified by MTT where we can have two different sets of processes on the same node creating a shared memory window with communicators sharing the same CID. To avoid this issue the temporary filename now includes the creating processes vpid. References #5363 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-07-11 14:32:27 -06:00
Nathan Hjelm	037656bc1d	osc/rdma: fix bug introduced in `b90c838` This commit fixes an bug that was introduced back in 2016 which impacts request-based RMA in some cases. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-07-10 18:17:55 -06:00
Gilles Gouaillardet	76292951e5	coll/libnbc: fix integer overflow Use internal pack/unpack subroutines that operate on MPI_Aint instead of int and hence solve some integer overflows. Thanks Clyde Stanfield for reporting this issue. Refs open-mpi/ompi#5383 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-09 10:08:33 -06:00
Mikhail Kurnosov	22fa5a8a67	coll/base/scatter: replaces right skewed binomial tree (in order) with left skewed binomial tree Current implementation of `coll/base/MPI_Scatter` is based on in-order binomial tree. This tree is right skewed and it provides good performance for a MPI_Gather operation. But for a MPI_Scatter operation left skewed binomial tree is effective. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-07-09 10:04:41 -06:00
Spruit, Neil R	9a17864278	MTL OFI: Redesign sync send with reduced tag bits and quick ack -Updated the design for sync send MPI calls to use 2 protocol bits for denoting "sync_send" or "sync_send_ack". -"Sync_send" is added to the send tag only and is masked out in receives such that it can be read by the original Recv posted in the send/recv operation. -"Sync_send_ack" is sent from the recv callback to the send side. This 0 byte send does not generate a completion entry and instead sends the message and immediately completes the opal completion in the recv. -Tag formats ofi_tag_1 and ofi_tag_2 have been updated to include 2 more tag bits per format type due to the reduced protocal bits required by OMPI. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-09 06:50:21 -07:00
Yossi Itigin	e77e31b50b	Merge pull request #5378 from hoopoepg/topic/unify-ucx-logging MCA/COMMON/UCX: unified logging across all UCX modules	2018-07-08 12:45:26 +03:00
Mikhail Kurnosov	b9e14cd7d0	coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing The call of MPI_Gather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault in the root process. The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard (page 150, line 37), sendtype and sendcount parameters should be ignored in this case. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-07-07 20:59:39 +07:00
Sergey Oblomov	240670152e	MCA/COMMON/UCX: code beautify - alignment Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-06 19:40:58 +03:00
Sergey Oblomov	eb7010933d	OSC/UCX: suppressed compilation warnings - suppressed sing/unsign-compare warnings Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-06 10:58:09 +03:00
Sergey Oblomov	bef47b792c	MCA/COMMON/UCX: unified logging across all UCX modules - added common logging infrastructure for all UCX modules - all UCX modules are switched to new infra Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-05 16:25:39 +03:00
Sergey Oblomov	8080283b3d	MCA/COMMON/UCX: changed return type for wait_request - for now wait_request returns OMPI status - updated callers Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-04 23:29:38 +03:00
Sergey Oblomov	c2bd6af9f2	MCA/COMMON/UCX: minor unification of del_proces calls - some common functionality of del_procs calls is moved into mca_common module - blocking ucp_put call is replaced by non-blocking routine Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-02 15:10:53 +03:00
Yossi Itigin	09c10d5e09	Merge pull request #5345 from hoopoepg/topic/pml-ucx-suppress-compiler-warning PML/UCX: suppressed compilation warning	2018-07-02 13:41:12 +03:00
Edgar Gabriel	d191ed6b4f	fs/base: move redundant code to fs/base moving some code from fs/ufs into fs/base. The benefit of this approach is that fs components that are fundamentally based on posix I/O (and only differ in some non-posix functionality such as setting stripe size, or which hints are being supported) can avoid having to replicate the same code over and over again. First beneficiary is the lustre fs component, but more are to follow soon. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-07-01 10:20:32 -05:00
Xin Zhao	c1ac0c00c5	Merge pull request #5185 from jjolly/fix-memcpy-size-mismatch - Build warning: stringop-overflow in get_dynamic_win_info() at osc_ucx_comm.c	2018-06-29 19:37:53 -07:00
Jeff Squyres	f4320193e3	mpi.h.in: remove some deprecation/removed warnings Intentionally do not mark some MPI-1 function pointer typedefs as `__mpi_interface_removed__` because we have to use them in prototyping some MPI-1 functions when `--enable-mpi1-compatibility` is used. Fixes #5357. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-29 07:43:51 -07:00
Gilles Gouaillardet	7363906e4e	io/romio321: make grequest extensions internal Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-29 16:41:27 +09:00
Jeff Squyres	c1ccbece2f	Merge pull request #5347 from jsquyres/pr/fix-f90-removed-interfaces F90 removed interfaces: add missing "end interface"	2018-06-27 13:54:02 -04:00
Jeff Squyres	768b800533	F90 removed interfaces: add missing "end interface" Thanks to @fsciortino for reporting. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-27 13:02:16 -04:00
Sergey Oblomov	074f30ba27	PML/UCX: suppressed compilation warning Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-27 12:05:07 +03:00
Yossi Itigin	aca61a6bfb	Merge pull request #5238 from hoopoepg/topic/fixed-coverity-issues-ucx-pml UCX/PML: fixed few coverity issues	2018-06-27 11:14:06 +03:00
Nathan Hjelm	4c230683e7	osc/sm: fix a typo This commit fixes a typo where a bcast is used instead of the intended collective (barrier). References #5262 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-06-26 12:53:12 -06:00
Sergey Oblomov	502d04bf12	UCX/PML/SPML: fixed few coverity issues - fixed incorrect pointer manipulation/free - cleaned dead code - minor optimization on process delete routine - fixed error handling - free pointers - added debug output for woker flush failure Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-26 18:52:39 +03:00
Yossi Itigin	ee873f4f79	Merge pull request #5322 from hoopoepg/topic/mca-ucx-common MCA/UCX: added common module	2018-06-26 13:54:12 +03:00
Gilles Gouaillardet	e609cf7bc3	Merge pull request #5337 from ggouaillardet/topic/generalized_requests ompi/requests: implement generalized request extensions	2018-06-26 13:01:04 +09:00
KAWASHIMA Takahiro	a8da78eeaa	Merge pull request #4618 from ggouaillardet/topic/pcoll Add the persistent collectives feature	2018-06-26 12:36:34 +09:00
Gilles Gouaillardet	5c394377d0	io/romio312: use Grequest extensions provided by Open MPI Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-26 10:52:18 +09:00
Gilles Gouaillardet	f72922b8b1	io/romio321: do not use removed MPI1 primitives Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-26 10:52:18 +09:00
Gilles Gouaillardet	383f23bf35	ompi/request: implement MPI Generalized request extensions so latest ROM-IO can be used with Open MPI. Note this first and naive implementation does not use the wait_fn callback. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-26 10:52:18 +09:00
Gilles Gouaillardet	1e5404873f	io/romio321: update .gitignore and remove two files that should have never been commited Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-26 10:52:17 +09:00
Joshua Ladd	256ad707f1	Merge pull request #5293 from yosefe/topic/osc-ucx-on-demand-progress osc_ucx: register progress on-demand	2018-06-25 15:09:11 -04:00
Joshua Ladd	98afc838aa	Merge pull request #5294 from yosefe/topic/coll-hcoll-progress-fn coll_hcoll: register progress callback directly without a proxy	2018-06-25 15:07:26 -04:00
Nathan Hjelm	e4989714c2	osc/rdma: fix data race on teardown The osc/rdma module did not wait for all pending atomics to complete before tearing down. This could lead to weird issues as the target location may no longer be registered or allocated. This commit also fixes an offset calculation issue in ompi_osc_get_data_blocking (). Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-06-25 11:47:34 -06:00
Nathan Hjelm	c9e58cedc1	mpi.h: fix warning with gcc Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-06-25 11:45:36 -06:00
Ralph Castain	3b2390e5d5	Silence coverity warnings, remove/ignore build product Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-25 08:01:28 -07:00
Sergey Oblomov	bf7fd480e9	MCA/COMMON/UCX: added non-blocking implementations of atomics - added implementation of swap/cswap/fadd operations - blocking add64 is replaced by non-blocking routine Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-25 12:25:31 +03:00
Sergey Oblomov	63e7ba6843	MCA/COMMON/UCX: added parameter for UCX/opal progress - added parameter to set UCX/opal progresses - minor refactoring of request wait routines Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-25 11:00:12 +03:00
Yossi Itigin	e3ee11608b	coll_hcoll: register progress callback directly without a proxy Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-06-24 18:06:07 +03:00
Edgar Gabriel	edfdcb6e82	Merge pull request #5324 from edgargabriel/pr/minor-fixes Pr/minor fixes	2018-06-22 17:20:02 -05:00
Howard Pritchard	8babaad35c	Merge pull request #4520 from ggouaillardet/refresh/romio321 io/romio321: refresh ROMIO based on latest stable MPICH 3.2.1	2018-06-22 16:58:46 -05:00
Edgar Gabriel	cf5cdad40f	fcoll: make vulcan the default component make vulcan the default component except for Lustre file systems. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-22 14:12:02 -05:00
Edgar Gabriel	fd8c5fba4e	common/ompio: fix the fview based grouping options a bug sneaked into constructing the list of aggregators processes when using the fileview based grouping options Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-22 14:01:31 -05:00
Sergey Oblomov	d57ae62dee	MCA/UCX: added common module - implemented non-blocking routines for flush operations Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-22 16:41:09 +03:00
Gilles Gouaillardet	edd02b7144	pml/ucx: silence a warning declare 'fenced' volatile in order to silence CID 1437465 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-22 13:11:42 +09:00
Edgar Gabriel	743e0dff5a	common/ompio: fix zero size fview issue handle the situation where the user requests a non-zero amount of data but has a zero-size fileview. My instrinct would have been to return an error code, but according to the test that I used it should be MPI_SUCCESS and zero bytes. It is definitely better than segfaulting :-) THis makes another test from the IBM testsuite pass. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 17:02:13 -05:00
Edgar Gabriel	7643ccfbcf	sharedfp/sm and sharedfp/lockedfile: fix seek offset calculation the seek offset calculation did not treat the offset as a multiple of the etype provided. Fixing this makes some more ibm tests pass. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 14:26:36 -05:00
Mikhail Kurnosov	c500739293	coll/base: Add MPI_Bcast based on a scatter followed by an allgather Implements MPI_Bcast using a binomial tree scatter followed by an recursive doubling allgather. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-06-21 11:47:07 -06:00
Edgar Gabriel	fb16d40775	Merge pull request #5196 from edgargabriel/topic/cuda io/ompio: introduce initial support for cuda buffers in ompio	2018-06-21 10:14:43 -05:00
Edgar Gabriel	7808379a47	common/ompio: incorporate George's comments incorporate a couple of comments by George as part of the review on github. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:29:49 -05:00
Edgar Gabriel	3c10ed4ed1	common/ompio: use allocator to manage temporary buffers use an allocator to manage temporary buffers when copying unmanaged data from GPU buffer to host. This is necessary, since the buffers have to be pinned for better performance, which is an expensive operation. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Edgar Gabriel	ac79e576ef	fcoll/base: do not use the two_phase compoment with CUDA support the two_phase compoment does not work with some collective I/O operations on CUDA buffers due to the data sieving (i.e. both read and write operations) executed on some buffers, which are not anticipated in the GPU buffer management of the code. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Edgar Gabriel	6a532101aa	io/ompio and common/ompio: add initial support for cuda buffers in ompio this commit adds the initial support for cuda buffers in ompio, for blocking and non-blocking individual read and write operations. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-21 09:25:50 -05:00
Yossi Itigin	db26c08336	Merge pull request #5307 from hoopoepg/topic/async-progress-on-mpi-fin PML/UCX: fixed hang on MPI_Finalize	2018-06-21 13:44:14 +03:00
Sergey Oblomov	5f03628560	PML/UCX: removed uneeded flush Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-21 12:40:46 +03:00
Sergey Oblomov	2745da7dcc	PML/UCX: use non-blocking fence instead of async progress Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-21 09:46:03 +03:00
Edgar Gabriel	7bbeaf30ff	Merge pull request #5306 from edgargabriel/pr/minor-improvements Pr/minor improvements	2018-06-20 08:43:41 -05:00
Sergey Oblomov	10f2d831ec	PML/UCX: fixed hang on MPI_Finalize - added async UCX progress thread to allow pending requests to complete Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-20 16:12:05 +03:00
Edgar Gabriel	0757cb11a8	fcoll/all components: minor updates two minor updates: - in all components: use the fh->f_bytes_per_agg value (which might have been set by an info object) instead of re-reading the mca parameter - vulcan and dynamic_gen2: replace one allgather operation by an allreduce, since it is used to determine the sum of an array. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-20 07:47:29 -05:00
Gilles Gouaillardet	9d7f0e1c95	Replace MPI_Type_extent with MPI_Type_get_extent in ROMIO. Signed-off-by: Ben Menadue <ben.menadue@nci.org.au> (back-ported from commit open-mpi/ompi@34ec0bd8ab) Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-20 14:28:17 +09:00
Gilles Gouaillardet	11428e400a	Replace MPI_Address with MPI_Get_address in ROMIO. Signed-off-by: Ben Menadue <ben.menadue@nci.org.au> (back-ported from commit open-mpi/ompi@756cc67221) Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-20 14:28:16 +09:00
Gilles Gouaillardet	ad8c49053d	io/romio321: fix two more MPI-3 compliance issues Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (back-ported from commit open-mpi/ompi@ae17908f35) Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-20 14:28:16 +09:00
Gilles Gouaillardet	e5460dcb4a	io/romio: do not use removed functions This commit attempts to update the romio io component to not use functions removed in MPI-3.0 (2012). This is a first cut and will probably need to be reviewed for correctness. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (back-ported from commit open-mpi/ompi@84765001aa) Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-20 14:28:16 +09:00
Gilles Gouaillardet	c29301da95	io/romio321: fix minmax datatypes romio assumes that all predefined datatypes are contiguous. Because of the (terribly named) composed datatypes MPI_SHORT_INT, MPI_DOUBLE_INT, MPI_LONG_INT, etc this is an incorrect assumption. The simplest way to fix this is to override the MPI_Type_get_envelope and MPI_Type_get_contents calls with calls that will work on these datatypes. Note that not all calls to these MPI functions are replaced, only the ones used when flattening a non-contiguous datatype. References #5009 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov> (back-ported from commit open-mpi/ompi@4d876ec6fe) Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-20 14:28:16 +09:00
Gilles Gouaillardet	4355a67740	ROMIO 3.2.1 refresh: add refresh notes Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-20 14:28:15 +09:00
Gilles Gouaillardet	bf23e843df	ROMIO 3.2.1 refresh: remove old romio Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-20 14:28:15 +09:00
Gilles Gouaillardet	2f0db1945c	ROMIO 3.2.1 refresh: patch mpich romio for OMPI Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-20 14:28:14 +09:00
Gilles Gouaillardet	2f391a99a7	ROMIO 3.2.1 refresh: import romio from mpich 3.2.1 tarball Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-20 14:28:14 +09:00
Gilles Gouaillardet	4272b57089	ROMIO 3.2.1 refresh: prepare new romio directory ompi/mca/io/romio321 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-20 14:28:13 +09:00
Edgar Gabriel	df4431bd48	io/ompio: add support for some info objects add support for the info objects cb_buffer_size and collective_buffering. Also, introduce a new mca parameter that allows to give feedback on whether an info object is recognized (and honored). Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-19 19:34:36 -05:00
Mikhail Kurnosov	66bc86a25b	Change the tree_next to a flexible array member Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-06-19 13:01:26 -06:00
Mikhail Kurnosov	6547b58316	coll/base: add knomial tree algorithm for MPI_Bcast Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-06-19 13:01:26 -06:00
Edgar Gabriel	c3ac06dc1b	sharedfp/sm and lockedfile: fix coverty warnings this commit fixes the coverty warnings CID 1437402 and CID 1437401 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-19 10:04:51 -05:00
Yossi Itigin	c2fbf3a3e8	osc_ucx: register progress on-demand Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-06-19 12:47:08 +03:00
Edgar Gabriel	9986a15b57	sharedfp/individual: only complain about fseek if sharedfp operations are really in use this component can only be used in very specific scenarios. However, since some file systems do not support file locking and processes might be distributed over multiple nodes (hence the sm sharedfp component is also inelligible), the component might be selected in some scenarios, even if an application does not intend to use shared file pointers. Since the fseek_shared function is involved as part of the File_set_view operation, only complain about the inability to perform the seek_shared operation if actual shared file pointer operations are being used. This avoid spurious error values being returned. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-18 18:25:29 -05:00
Edgar Gabriel	bb1522472f	Merge pull request #5286 from edgargabriel/topic/sharedfp-revamp sharedfp/all components: revamp internal operations	2018-06-18 16:09:54 -05:00
Edgar Gabriel	bc0f60dfd9	sharedfp/all components: revamp internal operations this commit revamps the internal operations of the sharedfp components. Specifically, it is focused around removing the second file_open operation for shared file pointers. This makes the code more efficient. Because of that, there is no necessity anymore for the sharedfp_lazy_open mca parameter. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-18 14:34:05 -05:00
Yossi Itigin	733cac864a	Merge pull request #5282 from yosefe/topic/pml-ucx-opal-mem-hooks pml_ucx: add option to use opal memhooks instead of ucx internal hooks	2018-06-18 19:07:01 +03:00
Gilles Gouaillardet	3f874c9857	spc: remove ompi_spc_get_count() prototype from ompi_spc.h This function is only used in ompi_spc.c and is hence declared as static. Remove its prototype from the header file in order to silence compiler warnings who will typically consider ompi_spc_get_count() as a declared but not defined function. Fixes open-mpi/ompi#5279 Fixes open-mpi/ompi#5273 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-18 16:07:11 +09:00
Yossi Itigin	564f80d362	pml_ucx: add option to use opal memhooks instead of ucx internal hooks Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-06-17 15:30:44 +03:00
Gilles Gouaillardet	2caf1bf0e5	Merge pull request #5263 from ggouaillardet/topic/ompio_abstraction ompio: fix abstraction	2018-06-16 23:29:29 +09:00
Matias A Cabral	e6674556aa	MTL OFI: add support for FI_REMOTE_CQ_DATA. Extend number of supported ranks with providers that support FI_REMOTE_CQ_DATA. Add README file to OFI MTL Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-06-14 17:17:38 -07:00
Edgar Gabriel	d5bdcf8595	fs/pvfs2: fix compilation problem Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-14 09:30:45 -05:00
Howard Pritchard	7dcab6e4a4	Merge pull request #5269 from hppritcha/topic/squash_gcc7.3.0_warnings topo/treematch - quash compiler warning	2018-06-13 21:13:04 -05:00
Gilles Gouaillardet	cd45c7abb6	ompio: misc renames Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-14 09:41:10 +09:00
Gilles Gouaillardet	36b35ae0db	ompio: fix abstraction Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-14 09:41:10 +09:00
Howard Pritchard	64de269cc3	topo/treematch - quash compiler warning quash a compiler warning showing up with gcc 7.3 Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2018-06-13 16:34:17 -05:00
Thananon Patinyasakdikul	390d72addd	Merge pull request #4885 from davideberius/spc_pr Initial Software-based Performance Counters PR	2018-06-12 14:04:49 -07:00
David Eberius	d377a6b6f4	Added Software-based Performance Counters driver code along with several counters. This code is the implementation of Software-base Performance Counters as described in the paper 'Using Software-Base Performance Counters to Expose Low-Level Open MPI Performance Information' in EuroMPI/USA '17 (http://icl.cs.utk.edu/news_pub/submissions/software-performance-counters.pdf). More practical usage information can be found here: https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI. All software events functions are put in macros that become no-ops when SOFTWARE_EVENTS_ENABLE is not defined. The internal timer units have been changed to cycles to avoid division operations which was a large source of overhead as discussed in the paper. Added a --with-spc configure option to enable SPCs in the Open MPI build. This defines SOFTWARE_EVENTS_ENABLE. Added an MCA parameter, mpi_spc_enable, for turning on specific counters. Added an MCA parameter, mpi_spc_dump_enabled, for turning on and off dumping SPC counters in MPI_Finalize. Added an SPC test and example. Signed-off-by: David Eberius <deberius@vols.utk.edu>	2018-06-11 22:48:16 -04:00
KAWASHIMA Takahiro	a38e9e064f	coll: Update COLL module interface version to 2.3.0 Members for persistent operations are added to the module structure in a prior commit. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	e12a5056f1	coll/libnbc: Rename internal functions The `nbc_i` functions don't start communication, but create a request. `nbc__init` are appropriate names for them. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	5c21903477	coll/libnbc: Add assertion for `NBC_A2A_DISS` Persistent operation for `NBC_A2A_DISS` is not supported currently. Though the algorithm is not selected at all currently, I put an assertion not to select it by mistake. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	0b8b0f8393	coll/libnbc: Implement `MPI_STARTALL` Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	ed0144bad4	coll/libnbc: Adapt local copy for persistent request `NBC_Copy` shoud not be called in `MPI_*_INIT`. `NBC_Sched_copy` should be called instead. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	5c5de3a4fb	coll/libnbc: Fix handling of completed request Because a persistent reuqest does not free its `schedule` object when the communication completes, the `NBC_Progress` function cannot determine the completion using `schedule`. Without this change, a hang occurs when the `NBC_Progress` function is called recursively through the `NBC_Start_round` function. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	8e5690bf5c	coll/libnbc: Correct persistent request handling Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	e72f510daf	ompi/request: Add `ompi_request_persistent_noop_create` Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
Gilles Gouaillardet	9a63dacf1c	mpi: check MPI_Start[all] is invoked on persistent requests and errors otherwise. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	545e9af896	mpiext/pcollreq: Add `MPIX_Bcast_init` etc. Until the MPI Forum decides to add the persistent collective communication request feature to the MPI Standard, these functions are supported through MPI extensions with the `MPIX_` prefix. Only C bindings are supported currently. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 17:22:16 +09:00
KAWASHIMA Takahiro	e69e99575e	coll: Enable func check in `mca_coll_base_comm_select` Now libnbc COLL supports persistent collectives and all `*_init` functions of the COLL interface are available. So let's enable the check of availability of those functions on a communicator creation. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 09:53:37 +09:00
Gilles Gouaillardet	a9609b6bf8	coll/libnbc: add persistent collectives implementation Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-11 09:53:37 +09:00
KAWASHIMA Takahiro	a9fdea51aa	coll: Add persistent collective communication request feature Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-06-11 09:53:37 +09:00
Gilles Gouaillardet	c753e9baff	coll/libnbc: code refactoring prepare the upcoming persistent collectives by pre-factoring some code Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp> fixup 808c3c62cd9475edd91ecde9d2d53b12e28b2c04	2018-06-11 09:53:37 +09:00
Gilles Gouaillardet	fe0bb6c310	coll/libnbc: misc revamp - merge NBC_Init_handle() into NBC_Schedule_request() - set schedule in NBC_Schedule_request instead of NBC_Start() - update NBC_Start() prototype Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-11 09:53:37 +09:00
Gilles Gouaillardet	360a76f440	coll/libnbc: revamp ibcast and use NBC_Schedule_request() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-11 09:53:37 +09:00
Jeff Squyres	84701cd2b0	Merge pull request #5204 from markalle/info_snprintf fix info-subscribe to use snprintf() and warn on long key	2018-06-08 15:22:55 -04:00
Edgar Gabriel	2d8a769bfd	fcoll/static: remove component now that we have a shiny new fcoll component, no need to keep the static component around. No use for it anymore. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-08 07:39:46 -05:00
Edgar Gabriel	b27a40cdf9	Merge pull request #5246 from edgargabriel/topic/ibm-testsuite-fixes Topic/ibm testsuite fixes	2018-06-08 06:06:49 -05:00
Yossi Itigin	fd12540751	Merge pull request #5227 from hoopoepg/topic/pml-ucx-hang-on-finalize PML/UCX: fixed hand on MPI_Finalize	2018-06-08 13:19:49 +03:00
KAWASHIMA Takahiro	317e53f83f	Merge pull request #5243 from t-kurita/pr/mpiext-mpi-f08-logical Fortran: Enable using `LOGICAL` parameter in MPI extensions.	2018-06-08 13:11:13 +09:00
Edgar Gabriel	a1484ec69a	io/ompio: check error conditions before executing file_sync check for pending I/O operations and invalid modes and return proper error codes before executing MPI_File_sync makes the e_sync_1 test from the ibm testsuite pass. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 19:30:27 -05:00
Edgar Gabriel	5f1e88d265	mpi/c: check for valid datatype in file_get_type_extend the interface if file_get_type_extent did not check whether the input datatype is valid or not. Makes the e_get_type_extend_2 test from the ibm testsuite pass. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 19:30:27 -05:00
Edgar Gabriel	14bd114973	common/ompio: return error code from file_delete operation in file_close in case the user opened a file using the DELETE_ON_CLOSE flag, return the error code generated in the delete operation. Note, that this is however just a partial fix to the e_close_1 test from the ibm testsuite, since the object destructor that triggers the file_close function does not have a mechanism right now to recognize and return an error code. Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>	2018-06-07 19:30:14 -05:00
Edgar Gabriel	f7cae7731c	io/ompio: return error code for invalid offset in file_get_byte_offset, return an error code if the offset leads to an invalid position in file. Makes the e_get_byte_offset_1 test from the ibm testsuite pass. Signed-off-by: Edgar Gabriel <gabriel@cs.uh.edu>	2018-06-07 18:46:17 -05:00
Edgar Gabriel	deaeaa60de	fcoll/vulcan: minor bugfix when creating the groups_per_proc arrays Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 17:52:32 -05:00
Edgar Gabriel	8feb497dbe	io/ompio: cleanup the aggregator selection logic and some internal structure elements/components. Along the way, add support for the cb_nodes Info object. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 16:47:10 -05:00
Edgar Gabriel	529d882ff0	io/ompio and common/ompio: relocate ompio_request code to common since the request code is now being accessed also from the vulcan fcoll component, the request code was relocated into the common/ompio directory to avoid ld load problems. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 16:13:12 -05:00
raafatfeki	5ecb4a56e3	fcoll/vulcan: Support of asynchronous write in collective writeAll We introduced a new mca_vulcan parameter that specify the I/O synchronization type (Async/sync I/O) applied within the collective write operation. The user can explicitly choose to use async or sync write operation or make the choice automatically made. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-06-07 16:13:12 -05:00
raafatfeki	4f7172ddf6	fcoll/vulcan: Support of larger offsets For very large offsets, the data chunk size to be written by each aggregator exceeds the capacity of an integer variable. Besides, some variables were not large enough to hold intermediate values. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-06-07 16:13:12 -05:00
raafatfeki	4670fe50d7	fcoll/vulcan: Remove unnecessary calls to write Identify the index of each aggregator process in order to restrict the call to write_init function by the specific aggregator. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-06-07 16:13:12 -05:00
raafatfeki	bc6431bee9	fcoll/vulcan: use hindexed constructor on the sender side Instead of using a temporary buffer and copy data into the temp buffer before sending, use a derived datatype to describe the data that needs to be sent during a cycle in the collective I/O operation. Signed-off-by: raafatfeki <fekiraafat@gmail.com>	2018-06-07 16:13:12 -05:00
Edgar Gabriel	1c2c110824	fcoll/vulcan: add new fcoll component import of the new vulcan component. It is an enhanced version of the two_phase component, which uses however the ompio internal codes/loops to assemble the data arrays. It is therefore more inline with the dynamic and dynamic_gen2 component, and will be easier to maintain. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-07 16:13:12 -05:00
Kurita, Takehiro	f9ae932bfd	Fortran: Enable using `LOGICAL` parameter in MPI extensions. If a subroutine of the Fortran `use-mpi-f08` binding in an MPI extension have a `LOGICAL` parameter and no `TYPE(MPI_Status)` parameter, it needs to use the `mpi_ext` module and call its corresponding subroutine in the `mpif-h` directory, as explained in `ompi/mpi/fortran/use-mpi-f08/mpi-f-interfaces-bind.h`. However, as shown in the figure below, the required directories are dependent on each other, and "Can't open module file" error occurs at build time. ompi/mpiext/{extension name}/use-mpi-f08 A \| \| \| \| V ompi/mpi/fortran/use-mpi-f08 <--- ompi/mpi/fortran/mpiext (mpi_ext.mod) In order to solve this problem, change the configuration and the build order. - divide Fortran extension directory (`ompi/mpi/fortran/mpiext`) into the directories for `use-mpi` and for `use-mpi-08` - `ompi/mpi/fortran/mpiext-use-mpi` : for `use-mpi` (mpi_ext.mod) - `ompi/mpi/fortran/mpiext-use-mpi-f08` : for `use-mpi-08` (mpi_f08_ext.mod) - change to the following build order about Fortran `use-mpi` and `use-mpi-f08` bindings in `ompi` 1. mpi_ext bindings of MPI extensions (`mpiext/{extension name}/use-mpi` directory) 2. Fortran use-mpi (`mpi/fortran/use-mpi-[ignore-]tkr` directory) 3. Fortran extension for use-mpi (`mpi/fortran/mpiext-use-mpi` directory) 4. Fortran use-mpi-f08 modules only (`mpi/fortran/use-mpi-f08/mod` directory) 5. mpi_f08_ext bindings of MPI extensions (`mpiext/{extension name}/use-mpi-f08` directory) 6. Fortran use-mpi-f08 (`mpi/fortran/use-mpi-f08` directory) 7. Fortran extension for use-mpi-f08 (`mpi/fortran/mpiext-use-mpi-f08` directory) Signed-off-by: Kurita, Takehiro <fj6370fp@aa.jp.fujitsu.com>	2018-06-07 15:02:17 +09:00
Nathan Hjelm	63ded4d083	Merge pull request #5224 from benmenadue/master io/romio314: Replace deprecated MPI-1 functions	2018-06-06 15:41:53 -06:00
Ralph Castain	5853ebee1a	Merge pull request #5240 from rhc54/topic/foo Correct typo in name comparison flags	2018-06-06 13:25:20 -07:00
Ralph Castain	86d699d42e	Correct typo in name comparison flags Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-06 12:18:52 -07:00
bosilca	fa1386768f	Merge pull request #5234 from jsquyres/pr/oshmem-init-race ompi_mpi_init: fix race condition	2018-06-06 12:14:00 -04:00
Jeff Squyres	9b9cb5fef0	to be squashed: move wait-for-init loop to ompi_mpi_init() Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-06 05:35:19 -07:00
Ralph Castain	840fb42f93	PMIx rte component does support dynamics Minor cleanups Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-06-05 21:55:19 -07:00
Jeff Squyres	67ba8da76f	ompi_mpi_init: fix race condition There was a race condition in `35438ae9b5`: if multiple threads invoked ompi_mpi_init() simultaneously (which could happen from both MPI and OSHMEM), the code did not catch this condition -- Bad Things would happen. Now use an atomic cmp/set to ensure that only one thread is able to advance ompi_mpi_init from NOT_INITIALIZED to INIT_STARTED. Additionally, change the prototype of ompi_mpi_init() so that oshmem_init() can safely invoke ompi_mpi_init() multiple times (as long as MPI_FINALIZE has not started) without displaying an error. If multiple threads invoke oshmem_init() simultaneously, one of them will actually do the initialization, and the rest will loop waiting for it to complete. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-05 18:09:13 -07:00
Nathan Hjelm	64a5baaa28	Merge pull request #5193 from hjelmn/osc_sm_location Use /dev/shm for shared memory files in osc components	2018-06-05 09:42:14 -06:00
Sergey Oblomov	0a8261f3b0	PML/UCX: fixed hand on MPI_Finalize fixes issue https://github.com/openucx/ucx/issues/2656 added flush for worker object to complete all pending operations Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-05 17:22:03 +03:00
Mikhail Kurnosov	3adf96fdb8	coll/base: add butterfly algorithm for MPI_Reduce_scatter Implements butterfly algorithm for MPI_Reduce_scatter. The algorithm can be used both by commutative and non-commutative operations, for power-of-two and non-power-of-two number of processes. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-06-05 15:53:13 +07:00
Ben Menadue	34ec0bd8ab	Replace MPI_Type_extent with MPI_Type_get_extent in ROMIO. Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>	2018-06-05 15:27:58 +10:00
Ben Menadue	756cc67221	Replace MPI_Address with MPI_Get_address in ROMIO. Signed-off-by: Ben Menadue <ben.menadue@nci.org.au>	2018-06-05 15:27:25 +10:00
Gilles Gouaillardet	05b3546151	java: do not use MPI1 deprecated subroutines Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-04 10:49:52 +09:00
Ralph Castain	3020b699f3	Merge pull request #5213 from rhc54/topic/rte Enable the PMIx ompi/rte component	2018-06-03 10:23:40 -07:00
Ralph Castain	55ac526a67	Enable the PMIx ompi/rte component Get the OMPI rte/pmix component working. This was tested using PRRTE as the RM, configuring OMPI using: * autogen --no-orte * with external libevent, external hwloc, and external PMIx master * configuring PMIx master with the same libevent and hwloc * execute the application using PRRTE's "prun" launcher, which has the same cmd line as ORTE's mpirun Note that PMIx master appears to have a bug in the event notification system that caches job termination events. Thus, the first execution runs fine, but subsequent executions cause an "abort" when the OMPI default error handler is invoked upon notification of the prior job's termination. Will work that separately. Signed-off-by: Ralph Castain <rhc@open-mpi.org> (cherry picked from commit 134cca9ac0de092d767999357573a31703f72292)	2018-06-03 07:25:12 -07:00
Mark Allen	93fefc4d70	fix info-subscribe to use snprintf() and warn on long key This checkin mainly concerns our internal info keys that are registering for callbacks via opal_infosubscribe_subscribe(). Those keys need to have an extra __IN_<key>/val stored to preserve their pre-callback value. So that means our internal keys are limited to 5 chars shorter than the usual key length limit. The code previously would have been silently inactive if a large key happened to come in, now it warns and also uses snprintf() to avoid compiler warnings. I'm also making the top-level MPI_Info_set warn if the user uses our reserved "__IN_" prefix. I had wanted the feature to be more invisible than that, but it would require a more sophisticated approach to change that. Signed-off-by: Mark Allen <markalle@us.ibm.com>	2018-06-01 18:31:32 -04:00
Jeff Squyres	38ed70de6f	ompi_mpi_finalize: remove some dead code Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-01 13:37:20 -07:00
Jeff Squyres	35438ae9b5	mpi/finalized: revamp INITIALIZED/FINALIZED Per MPI-3.1:8.7.1 p361:11-13, it's valid for MPI_FINALIZED to be invoked during an attribute destruction callback (e.g., during the destruction of keyvals on MPI_COMM_SELF during the very beginning of MPI_FINALIZE). In such cases, MPI_FINALIZED must return "false". Prior to this commit, we hung in FINALIZED if it were invoked during a COMM_SELF attribute destruction callback in FINALIZE. See https://github.com/open-mpi/ompi/issues/5084. This commit converts the MPI_INITIALIZED / MPI_FINALIZED infrastructure to use a single enum (ompi_mpi_state, set atomically) to represent the state of MPI: - not initialized - init started - init completed - finalize started - finalize past COMM_SELF destruction - finalize completed The "finalize past COMM_SELF destruction" state is what allows us to return "false" from MPI_FINALIZED before COMM_SELF has been fully destroyed / all attribute callbacks have been invoked. Since this state is checked at nearly every MPI API call (to see if we're outside of the INIT/FINALIZE epoch), care was taken to use atomics to set the ompi_mpi_state value in ompi_mpi_init() and ompi_mpi_finalize(), but performance-critical code paths can simply read the variable without needing to use a slow call to an opal_atomic_*() function. Thanks to @AndrewGaspar for reporting the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-01 13:36:29 -07:00
Edgar Gabriel	52bd606294	fcoll/dynamic_gen2: make sure that intermediate variables can hold the offset for very large offsets, ome ariables used in the fcoll/dynamic_gen2 code base were under certain circumstances not large enough to hold intermediate values. This issue was more detected in the vulcan component but could happen in the dynamic_gen2 component as well. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-06-01 06:53:38 -05:00
Gilles Gouaillardet	9f7586465d	fortran/mpif-h: fix MPI1 compatibility Makefile appends MPI1 compatible source files instead of redefining all the source files fix a typo from open-mpi/ompi@89da9651bb Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-01 09:52:22 +09:00
Nathan Hjelm	b323655809	mpi: make C++ bindings compile when MPI-1 compat is disabled Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-05-31 09:44:19 -06:00
Nathan Hjelm	89da9651bb	ompi: disable functions removed from MPI-3.0 by default This commit adds a new configure option: --enable-mpi1-compat. Without this option we will no longer provide APIs, typedefs, and defines that were removed from the standard in MPI-3.0. This option will exist for one major release (Open MPI v4.x.x) and then the option and associated code will be removed in Open MPI v5.x.x. Open MPI has already internally prepared for this change. Please prepare your codes accordingly. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-05-31 09:44:19 -06:00
KAWASHIMA Takahiro	04b509b2d2	Merge pull request #5207 from t-kurita/pr/java-doc-descriptions java: Improve descriptions of `javadoc`	2018-05-31 16:12:21 +09:00
Kurita, Takehiro	11ae771b82	java: Improve descriptions of `javadoc` - Improve descriptions - Fix some typos - Remove MPI-1 functions and replace them with MPI-2 functions Signed-off-by: Kurita, Takehiro <fj6370fp@aa.jp.fujitsu.com>	2018-05-31 15:02:35 +09:00
Jeff Squyres	25f2d02c61	fcoll/dynamic_gen2: minor compiler warning stomp Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-05-30 10:08:19 -07:00
Jeff Squyres	2dce549df2	ompi/debuggers: stomp a compiler warning in dlopen_test.c Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-05-30 10:08:14 -07:00
Nathan Hjelm	e9de42544e	osc/sm: add support for controlling location of backing store This commit adds a new MCA variable to set the location of the backing store: osc_sm_backing_directory. The default on Linux has been changed to use /dev/shm to improve performance in cases where /tmp is not a tmpfs. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-05-29 21:44:01 -06:00
Nathan Hjelm	d0d59b1d7d	osc/rdma: add support for controlling location of backing store This commit adds a new MCA variable to set the location of the backing store: osc_rdma_backing_directory. The default on Linux has been changed to use /dev/shm to improve performance in cases where /tmp is not a tmpfs. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-05-29 21:43:33 -06:00
Howard Pritchard	5b7c866f59	osc/pt2pt: disable when THREAD_MULTIPLE. Per discussion at https://github.com/open-mpi/ompi/issues/2614#issuecomment-392815654, do not allow for selection of the OSC PT2PT when creating an MPI RMA window when THREAD_MULTIPLE is active. Print a helpful message and return a not-supported error. Signed-off-by: Howard Pritchard <howardp@lanl.gov> Signed-off-by: Jeff Squyres <jsquyres@cisco.com> (cherry picked from commit d0ffd660841623c02d1dfa3151e7f7afd3327698) Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-05-29 08:59:53 -07:00
Mikhail Kurnosov	28d5837dd9	coll: reduce_scatter_block: add butterfly algorithm Implements butterfly algorithm for MPI_Reduce_scatter_block. The algorithm can be used both by commutative and non-commutative operations, for power-of-two and non-power-of-two number of processes. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-05-27 14:17:41 +07:00
Edgar Gabriel	6b03cee7f1	io/ompio: erroneous condition in selecting aggregator selection logic fix the logic in the decision which aggregator selection algorithm to use. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-05-24 15:52:19 -05:00
John L. Jolly	36b9e15fb7	- Build warning: stringop-overflow in get_dynamic_win_info() at osc_ucx_comm.c In file included from /usr/include/string.h:494:0, from ../../../../ompi/info/info.h:29, from ../../../../ompi/mca/osc/base/base.h:24, from osc_ucx_comm.c:13: In function 'memcpy', inlined from 'get_dynamic_win_info' at osc_ucx_comm.c:359:5, inlined from 'ompi_osc_ucx_put' at osc_ucx_comm.c:401:18: /usr/include/bits/string_fortified.h:34:10: warning: '__builtin___memcpy_chk' writing 8 bytes into a region of size 4 overflows the destination [-Wstringop-overflow=] return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is caused by a type size mismatch in a call to memcpy This fix corrects the type definition of the win_count variable. Signed-off-by: John Jolly <jjolly@suse.com>	2018-05-22 10:11:57 -06:00
Brian Barrett	09e4c40ce9	mtl: remove MXM MTL Remove the MXM MTL, which has been deprecated in preference for the Yalla PML. This was discussed at the last developers meeting and somehow I ended up with the action item to do the removal. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-05-21 14:18:30 -07:00
Sergey Oblomov	5ec26914a6	PML/UCX: do not set offset on ordered data recv Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-21 19:40:07 +03:00
Sergey Oblomov	19607daa32	PML/UCX: create convertor clone instead of stack reset Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-17 16:39:13 +03:00
Sergey Oblomov	7c5de01c57	PML/UCX: reset converter stack on unordered messages Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-05-17 13:11:02 +03:00
Jeff Squyres	9f21ea437c	java: clean up MPI Java configury The Java configury is split into two parts: 1. Determine if we want MPI Java bindings. 2. Find the Java compiler (and related). This commit does a few things: - Move the "Find the Java compiler" step from OPAL to OMPI (because there is no Java in OPAL, and there doesn't appear to be any immanent danger that there will be). - As a direct consequence, remove the --enable-java CLI option (--enable-mpi-java still remains). Enabling the MPI Java bindings and enabling Java are now considered the same thing (since there is no Java elsewhere in the code base, the different was meaningless). - Only invoke the "Find the Java compiler" step if we actually want the MPI Java bindings. - A few miscellaneous Java-related cleanups in configury (E.g., change testing "$foo" == "1" to $foo -eq 1, etc. This commit is mostly s/opal/ompi/gi in many places in configury and shifting code around. But it looks bigger than it actually is because of two reasons: 1. Some files were renamed: * ompi_setup_java.m4 -> ompi_setup_mpi_java.m4 (setup MPI Java bindings) * opal_setup_java.m4 -> ompi_setup_java.m4 (setup Java compiler) 2. Indenting level changed in (the new) ompi_setup_java.m4. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-05-15 15:15:22 -07:00
George Bosilca	7191ea120c	Fix merge conflict related to function renaming. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-05-15 11:34:20 -04:00

... 3 4 5 6 7 ...

10361 Коммитов