openmpi

Автор	SHA1	Сообщение	Дата
Sergey Oblomov	d8e3562bae	PML/SPML/UCX: added evaluation of mmap events - there was a set of UCX related issues reported which caused by mmap API hooks conflicts. We added diagnostic of such problems to simplify bug-resolving pipeline Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2019-03-12 21:14:27 +02:00
Geoff Paulsen	a14bb4bc89	Merge pull request #6471 from hppritcha/topic/issue_6470 ompi_info: report whether MPI1 compat is enabled	2019-03-11 21:11:55 -05:00
Howard Pritchard	61ccc65302	ompi_info: report MPI1 compat is disabled MPI1 compat disabled beyond v4.0.x Related to #6470 Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2019-03-11 13:50:29 -06:00
Gilles Gouaillardet	26c1b833c7	man: remove man pages of removed MPI1 subroutines Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-03-05 15:01:07 +09:00
Gilles Gouaillardet	cc97c0f611	schizo/ompi: correctly handle the yield_when_idle option in schizo/ompi, sets the new OMPI_MCA_mpi_oversubscribe environment variable according to the node oversubscription state. This MCA parameter is used to set the default value of the mpi_yield_when_idle parameter. This two steps tango is needed so the mpi_yield_when_idle setting is always honored when set in a config file. Refs. open-mpi/ompi#6433 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-28 09:53:29 +09:00
Nathan Hjelm	73085e9ce3	Merge pull request #6413 from nuriallv/issue_osc_rdma osc/rdma: fix when determining the node with the rank_array info for a peer	2019-02-27 16:30:06 -07:00
Geoffrey Paulsen	a6d6be2853	mpi.h.in: delete removed MPI1 functions/datatypes (API change!) This commit DELETES the removed MPI1 functions and datatypes from both the mpi.h header and from the library (they were deleted from the MPI standard in MPI-3.0). WARNING: This changes the MPI API in a non-backwards compatible way. This also removes the configure option that was added in Open MPI v4.0.x, requiring users to change their apps if they are using any of these almost 20 year old APIs. This commit removes the following MPI1 removed functions and datatypes: MPI_Address MPI_Errhandler_create MPI_Errhandler_get MPI_Errhandler_set MPI_Type_extent MPI_Type_hindexed MPI_Type_hvector MPI_Type_struct MPI_Type_UB MPI_Type_LB Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>	2019-02-27 08:24:11 -08:00
Geoffrey Paulsen	3136a1706c	mpi.h.in: Revamp MPI-1 removed function warnings Refs https://github.com/open-mpi/ompi/issues/6278. This commit is intended to be cherry-picked to v4.0.x and the following commit will ammend to this functionality for master's removal. Changes the prototypes for MPI removed functions in the following ways: There are 4 cases: 1) User wants MPI-1 compatibility (--enable-mpi1-compatibility) MPI_Address (and friends) are declared in mpi.h with deprecation notice 2) User does not want MPI-1 compatibility, and has a C11-capable compiler Declare an MPI_Address (etc.) macro in mpi.h, which will cause a compile-time error using _Static_assert C11 feature 3) User does not want MPI-1 compatibility, and does not have a C11-capable compiler, but the compiler supports error function attributes. Declare an MPI_Address (etc.) macro in mpi.h, which will cause a compile-time error using error function attribute. 4) User does not want MPI-1 compatibility, and does not have a C11-capable compiler, or a compiler that supports error function attributes. Do not declare MPI_Address (etc.) in mpi.h at all. Unless the user is compiling with something like -Werror, this will allow the user's code to compile. We are choosing this because it seems like a losing battle to make some kind of compile time error that is friendly to the user (and doesn't make it look like mpi.h itself is broken). On v4.0.x, this will allow the user code to both compile (albeit with a warning) and link (because the MPI_Address will be in the MPI library because we are preserving ABI back to 3.0.x). On master/v5.0.x, this will allow the user code to compile, but it will fail to link (because the MPI_Address symbol will not be in the MPI library). Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>	2019-02-27 08:24:11 -08:00
bosilca	8400502d8a	Merge pull request #6353 from bosilca/topic/fix_monitoring_pvar Fix the PVAR allocation usage.	2019-02-25 16:03:56 -05:00
Howard Pritchard	9b3a9c2579	Merge pull request #6417 from abouteiller/bugfix/cart_create_cid Cart/Graph create would not run the next_cid algorithm	2019-02-22 13:05:59 -07:00
Howard Pritchard	d6cdbdfd39	Merge pull request #6412 from hppritcha/topic/fix_pgi_usempif08 fortran:fix for PGI linking	2019-02-21 20:31:14 -07:00
Aurelien Bouteiller	fb17115ba9	Cart/Graph create would not run the next_cid algorithm and create disjoint communicator with inconsistent cid. Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2019-02-21 11:40:22 -05:00
Howard Pritchard	266bc3aced	fortran:use mpif08 fix for PGI linking commit `c6070fd2e` broke building fortran bindings with PGI compilers. Turns out PGI compilers need to link in the *.o from a module file whether or not there are module subroutines defined or not in the module file. Related to #6411 Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2019-02-20 12:33:25 -07:00
Nuria Losada	3cae149262	osc/rdma: fix when determining the node with the rank_array info for a peer Signed-off-by: Nuria Losada <nlosada@icl.utk.edu>	2019-02-20 13:12:00 -05:00
Artem Polyakov	13a8e42108	Merge pull request #6163 from artpol84/osc/mt_submission Refactoring of osc/ucx component for MT	2019-02-20 09:41:27 -08:00
Gilles Gouaillardet	ad114be28c	configury: automatically select rte/pmix runtime if ORTE project is not built Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-20 13:55:55 +09:00
Gilles Gouaillardet	69d136ae5e	ompi/pmix: fix misc OPAL function calls Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-20 13:55:55 +09:00
Gilles Gouaillardet	18f679efac	Merge pull request #6401 from ggouaillardet/topic/osc_rdma_self osc/rdma: correctly handle communications to self	2019-02-20 11:43:22 +09:00
KAWASHIMA Takahiro	7095ad10a5	man: fix more typos in MPI_Win_attach man page Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-20 11:22:38 +09:00
Gilles Gouaillardet	7c0596819b	man: fix typos in MPI_Win_{attach,detach} man pages no code change [skip ci] bot:notest Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-20 11:09:45 +09:00
Gilles Gouaillardet	fe05fcc11a	osc/rdma: correctly handle communications to self mark the "self" peer OMPI_OSC_RDMA_PEER_LOCAL_BASE when the window is dynamically created and use_cpu_atomics is set in order to correctly handle communications to self. Thanks Bart Janssens for reporting this issue. Refs. open-mpi/ompi#6394 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-20 09:52:17 +09:00
Artem Polyakov	19e2ae2efb	opal/common/ucx: Switch to opal/tsd Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-19 14:22:07 -08:00
Artem Polyakov	7984d7d997	opal/common/ucx: Remove unused debugging macro Will be reintroduced later if needed and after adaptation to the OMPI infrastructure. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-19 14:22:07 -08:00
Artem Polyakov	43f16d8796	opal/common/ucx: Remove common_ucx_int.h Place the content of common_ucx_int.h back to the common_ucx.h and include common_ucx_wpool.h explicitly. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-19 14:22:07 -08:00
Xin Zhao	c6de09940f	ompi/osc/ucx: Switch osc/ucx code to use Worker Pool. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-19 14:22:07 -08:00
Yossi Itigin	91d05f91e2	Merge pull request #6384 from brminich/topic/ucx_worker_net_address PML/UCX: Use net worker address for remote peers	2019-02-17 12:21:00 +02:00
Matias A Cabral	25bdd118ac	MTL_OFI: Changed Recv cancel to be non-blocking Updated the OFI MTL's Recv cancel to be a non-blocking call to match the MPI spec. Given fi_cancel succeeded, then it is expected that the user will wait on the request to read the result of if the cancel has completed. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com	2019-02-14 17:07:20 -05:00
Mikhail Brinskii	751d88192d	PML/UCX: Use net worker address for remote peers For remote node peers pack smaller worker address, which contains network device addresses only. This would reduce amount of OOB traffic during startup. Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>	2019-02-14 18:06:36 +02:00
Brian Barrett	7a593cea4a	Merge pull request #6361 from aravindksg/fix_tg_segfault mtl/ofi: Fix segfault when not using Thread-Grouping feature	2019-02-12 12:04:26 -08:00
Ralph Castain	125d236173	Move from the use of regex to compression We've been fighting the battle of trying to create a regex generator and parser that can handle arbitrary hostname schemes - without long-term success. The worst of it is that there is no way of checking to see if the computed regex is correct short of parsing it and doing a character-by-character comparison with the original string. Ugh...there has to be a better solution. One option is to investigate using 3rd-party regex libraries as those are coming from communities whose sole focus is resolving that problem. However, someone would need to spend the time to investigate it, and we'd have to find a license-friendly implementation. Another option is to quit beating our heads against the wall and just compress the information. It won't be as much of a reduction, but we also won't keep hitting scenarios where things break. In this case, it seems that "perfection" is definitely the enemy of "good enough". This PR implements the compression option while retaining the possibility of people adding regex-generating components. The compression code used in ORTE is consolidated into the opal/compress framework. That framework currently held bzip and gzip components for use in compressing checkpoint files - since we no longer support C/R, I have .opal_ignore'd those components. However, I have left the original framework APIs alone in case someone ever decides to redo C/R. The APIs of interest here are added to the framework - specifically, the "compress_block" and "decompress_block" functions. I then moved the ORTE zlib compression code into a new component in this framework. Unfortunately, the framework currently is a single-select one - i.e., only one active component at a time. Since I .opal_ignore'd the other two and made the priority of zlib high, this isn't a problem. However, if someone wants to re-enable bzip/gzip or add another component, they might need to transition opal/compress to a multi-select framework. Included changes: * Consolidate the compression code into the opal/compress framework * Move the ORTE zlib compression code into a new opal/compress/zlib component * Ignore the bzip and gzip components in opal/compress framework * Add a "compress_base_limit" MCA param to set the threshold above which we compress data - defaults to 4096 bytes * Delete stale brucks and rcd components from orte/grpcomm framework * Delete the orte/regx framework * Update the launch system to use opal/compress instead of string regex * Provide a default module if no zlib is available * Fix some misc multi-node issues * Properly generate the nidmap in response to a "connection warmup" message so the remote daemon knows the children it needs to launch. * Remove stale references to orte_node_regex * opal_byte_object_t's are not OPAL objects - properly release allocated memory. * Set the topology * Currently only handling homogeneous case * Update the compress framework files to conform * Consolidate open/close into one "frame" file. Ensure we open/close the framework Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-02-08 11:11:14 -08:00
KAWASHIMA Takahiro	8bbd201029	Merge pull request #6205 from kawashima-fj/pr/fp16 Add FP16 datatypes	2019-02-08 14:52:13 +09:00
Artem Polyakov	35090b69f1	osc/base: Add debug output stating a selected component Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-07 15:54:20 -08:00
Aravind Gopalakrishnan	6edcc479c4	mtl/ofi: Fix segfault when not using Thread-Grouping feature For the non thread-grouping paths, only the first (0th) OFI context should be used for communication. Otherwise this would access a non existant array item and cause segfault. While at it, clarifiy some content regarding SEPs in README (Credit to Matias Cabral for README edits). Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-02-07 11:52:53 -08:00
Jeff Squyres	f5e1a672cc	ofi: revamp OPAL_CHECK_OFI configury Update the OPAL_CHECK_OFI configury macro: - Make it safe to call the macro multiple times: - The checks only execute the first time it is invoked - Subsequent invocations, it just emits a friendly "checking..." message so that configure output is sensible/logical - With the goal of ultimately removing opal/mca/common/ofi, rename the output variables from OPAL_CHECK_OFI to be opal_ofi_{happy\|CPPFLAGS\|LDFLAGS\|LIBS}. - Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions. - Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that causes the macro to be invoked at a fairly random time, which makes configure stdout confusing / hard to grok. - Remove a little left-over kruft in OPAL_CHECK_OFI, too (which resulted in an indenting change, making the change to opal_check_ofi.m4 look larger than it really is). Thanks Alastair McKinstry for the report and initial fix. Thanks Rashika Kheria for the reminder. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Jeff Squyres	aba2571881	mtl/ofi/Makefile.am: down with tabs! Replace all tabs with spaces. No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Gilles Gouaillardet	945f830f7a	mtl/ofi: fix configury when VPATH is used Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-07 06:29:58 -08:00
Matias Cabral	0601b3e982	Merge pull request #6325 from aravindksg/fix_help_reference mtl/ofi: Fix reference to help text object	2019-02-05 07:22:51 -08:00
George Bosilca	e42b573cd3	Fix the PVAR allocation usage. According to the MPI standard the obj_handle is a pointer to an MPI object, and therefore cannot be MPI_COMM_WORLD. The MPI standard example 14.6 highlight this usage. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-02-02 19:03:43 -05:00
KAWASHIMA Takahiro	f8a441957a	mpiext/shortfloat: Add `MPIX_C_FLOAT16` datatype `MPIX_C_FLOAT16` is defined as a synonym for `MPIX_SHORT_FLOAT` if the C compiler supports `_Float16`, which is defined in ISO/IEC JTC 1/SC 22/WG 14 N1945 (ISO/IEC TS 18661-3:2015). This name and meaning are same as that of MPICH. This may be a transitional datatype until the MPI Forum decides a proper name for the type. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 14:55:52 +09:00
KAWASHIMA Takahiro	c44599ec13	mpiext/shortfloat: Add `shortfloat` MPI extension This extension provides additional MPI datatypes `MPIX_SHORT_FLOAT`, `MPIX_C_SHORT_FLOAT_COMPLEX`, and `MPIX_CXX_SHORT_FLOAT_COMPLEX` for `short float` (C/C++), `short float _Complex` (C), and `std::complex<short float>` (C++), respectively, or their alternate types like `_Float16`. See `ompi/mpiext/shortfloat/README.txt` for details. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 13:01:14 +09:00
KAWASHIMA Takahiro	4d7bde27fb	ompi/datatype: Use `short float` for `MPI_REAL2` ... and add `MPI_COMPLEX4`. This commit changes values of existing `OMPI_DATATYPE_MPI_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OMPI internal code. On the other hand, `ompi_datatype_t::id` values of existing datatypes are not changed and 73 is newly assigned to for `MPI_COMPLEX4` to retain ABI compatibility. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 13:01:10 +09:00
KAWASHIMA Takahiro	4375c11a58	ompi/datatype: Add `ompi_mpi_short_float` ... and `ompi_mpi_c_short_float_complex` and `ompi_mpi_cxx_sfltcplex`. These are Open MPI internal variables intended to be defined as `MPI_SHORT_FLOAT`, `MPI_C_SHORT_FLOAT_COMPLEX`, and `MPI_CXX_SHORT_FLOAT_COMPLEX` in the future. `OMPI_DATATYPE_MPI_C_SHORT_FLOAT_COMPLEX` is also required to support `MPI_COMPLEX4` in the next commit. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:43:13 +09:00
Sergey Lebedev	829846dbcc	fp16 hcoll bindings Signed-off-by: Sergey Lebedev <sergeyle@mellanox.com>	2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro	2ad1c09848	opal/datatype: Add `opal_short_float_t` The type `short float`, which is proposed in ISO/IEC JTC 1/SC 22 WG 14 (C WG), is not supported by most compilers yet. But some compilers (including gcc 7 for AArch64 and clang 6) support `_Float16`, which is defined in ISO/IEC TS 18661-3:2015 (ISO/IEC JTC 1/SC 22/WG 14 N1945) as an extensions for C. If it is detected in `configure`, it is used as an alternate type of `short float` in Open MPI internal code. This commit adds a `configure` option `--enable-alt-short-float=TYPE`. It can be used to specify a type other than `short float` and `_Float16` as the alternate type. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro	f6b39452f6	opal/datatype: Support `short float` The type `short float` is proposed for the C language in ISO/IEC JTC 1/SC 22 WG 14 (C WG) for mainly IEEE 754-2008 binary16, a.k.a. half-precision floating point or FP16. By this commit, `short float` and `short float _Complex` are detected in `configure` and used in Open MPI internal code. `MPI_SHORT_FLOAT` and its complex number version are not added yet. This commit changes values of existing `OPAL_DATATYPE_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OPAL and OMPI internal code. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:40:14 +09:00
Jeff Squyres	4c64322db4	Merge pull request #6334 from jsquyres/pr/make-mpi-h-a-little-more-c++-friendly mpi.h.in: use C++ static_cast<> where appropriate	2019-01-31 07:14:34 -05:00
Jeff Squyres	30afdcead9	mpi.h.in: use C++ static_cast<> where appropriate When compiling mpi.h with a modern C++ compiler and a high degree of pickyness (e.g., -Wold-style-cast), casting using (void) in the OMPI_PREDEFINED_GLOBAL and MPI_STATUS_IGNORE macros will emit warnings. So if we're compiling with a C++ compiler, use C++'s static_cast<> instead of (void*). Thanks to @shadow-fax for identifying the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-01-31 03:22:26 -08:00
Thananon Patinyasakdikul	782ec851ea	Merge pull request #6319 from thananon/pr/allow_overtake pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.	2019-01-30 15:32:04 -05:00
Jeff Squyres	2203f8d900	Merge pull request #6185 from ggouaillardet/topic/hwloc_macros hwloc: remove public hwloc macros from opal_config.h	2019-01-30 07:32:22 -05:00
Gilles Gouaillardet	0aeb27f776	topo/treematch: silence a hwloc related warning treematch/km_partitioning.c #include "config.h", but there is no such file when the embedded treematch is used. In order to prevent the embedded treematch from incorrectly using the config.h from the embedded hwloc, generate a dummy config.h. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-01-30 14:51:38 +09:00
Aravind Gopalakrishnan	9cabcfdbba	mtl/ofi: Fix reference to help text object When we exceed the threshold number of contexts created, print appropriate help text Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-01-29 15:10:06 -08:00
Thananon Patinyasakdikul	0263456cf4	pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE. We missed an assert to check if ALLOW_OVERTAKE is set or not before validating the sequence number and this will cause deadlock. Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>	2019-01-29 14:55:06 -05:00
Nathan Hjelm	f9338dac93	Merge pull request #6312 from ggouaillardet/topic/op ompi/op: fix support of non predefined datatypes with predefined oper…	2019-01-29 10:55:00 -07:00
Brian Barrett	23da9fac23	Merge pull request #6294 from bwbarrett/mtl-ofi-no-device-warning mtl/ofi: Print descriptive error message on modex failure	2019-01-29 08:32:49 -08:00
Brian Barrett	1bb7a73a9c	Merge pull request #6302 from bwbarrett/feature/ofi-av-count mtl/ofi: Provide av count hint during initialization	2019-01-29 08:32:24 -08:00
Edgar Gabriel	7023357843	Merge pull request #6286 from edgargabriel/pr/floating-point-division-problem common/ompio: fix a floating point division problem	2019-01-29 10:07:09 -06:00
Gilles Gouaillardet	bc1cab5498	ompi/op: fix support of non predefined datatypes with predefined operators ACCUMULATE, unlike REDUCE, can use with derived datatypes with predefinied operations, with some restrictions outlined in MPI-3:11.3.4. The derived datatype must be composed entierly from one predefined datatype (so you can do all the construction you want, but at the bottom, you can only use one datatype, say, MPI_INT). Refs. open-mpi/ompi#6275 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-01-29 09:33:39 +09:00
Gilles Gouaillardet	45fb69b2b9	ompi/datatype: fix how we compute the space needed for the args Refs. open-mpi/ompi#6275 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-01-28 15:26:11 +09:00
Brian Barrett	44be7f139a	mtl/ofi: Provide av count hint during initialization Provide the av_attr.count hint (number of addresses that will be inserted into the address vector through the life of the process) at initialization of the address vector. It's ok to be a bit wrong, but some endpoints (RxR) can benefit by not going through the slow growth realloc churn. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2019-01-24 15:47:24 -08:00
Edgar Gabriel	c0f8ce0fff	common/ompio: fix a floating point division problem This commit fixes a problem reported on the mailing list with individual writes larger than 512 MB. The culprit is a floating point division of two large, close values. Changing the datatypes from float to double (which is what is being used in the fcoll components) fixes the problem. See issue #6285 and https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118 Thanks for Axel Huebl and René Widera for reporting the issue. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2019-01-21 17:59:12 -06:00
Brian Barrett	fe25097194	mtl/ofi: Print descriptive error message on modex failure With MTLs, there's no "other transport" when the remote side does not have an active NIC, so we should print a useful error message when the modex failed (indicating lack of a NIC on the remote side). Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2019-01-21 23:50:31 +00:00
KAWASHIMA Takahiro	352b667323	Merge pull request #6210 from kawashima-fj/pr/mpiext-use-mod Use mpi_f08 module in mpi_f08_ext module	2019-01-21 11:56:41 +09:00
René Widera	a91fab80a1	common/ompio: possible rounding issue Similar to #6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors. - remove floating point operations for `round up` - removes floating point conversion for round down (native behavior of integer division) Signed-off-by: René Widera <r.widera@hzdr.de>	2019-01-18 14:05:23 +01:00
Yossi Itigin	387b2ff56f	Merge pull request #6260 from hoopoepg/topic/removed-fca COLL: removed FCA component	2019-01-17 00:05:07 +08:00
KAWASHIMA Takahiro	b380dd58b5	config/ompi_ext: use mpi module in mpi_ext module If MPI extensions are enabled, all `ompi/mpiext/pcollreq/use-mpi/mpiext__usempi.h` are included in `ompi/mpi/fortran/mpiext-use-mpi/mpi-ext-module.F90` and all `ompi/mpiext/pcollreq/use-mpi/mpiext__usempif08.h` are included in `ompi/mpi/fortran/mpiext-use-mpi-f08/mpi-f08-ext-module.F90` using `#include` directives. In `mpiext__usempi.h` and `mpiext__usempif08.h`, some MPI extension may want to use constants or handles defined in the `mpi` module and the `mpi_f08` module. For example, if you want to define a new datatype in `mpi_f08_ext`, you'll need the definition of `type(mpi_datatype)`. However, putting `use mpi_f08` line in thier `mpiext_*_usempif08.h` may cause a compilation error if more than one MPI extensions are enabled because the `use` statement must be put prior to any variable declarations. To resolve this problem, this commit puts `use mpi` and `use mpi_f08` as first lines of `mpi-ext-module.F90` and `mpi-f08-ext-module.F90` respectively. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-01-16 11:55:55 +09:00
KAWASHIMA Takahiro	2220623f34	config/ompi_ext: Don't include mpiext__mpifh.h in mpi_f08_ext Including `mpiext__mpifh.h` in the source file of the `mpi_f08_ext` module is not always appropriate. For example, if you want to define a new datatype in an MPI extension, the `include 'mpif-ext.h'` binding defines the datatype as `integer` but the `use mpi_f08_ext` binding defines it as `type(mpi_datatype)`. They conflict. This commit allows each MPI extension to declare whether it wants to include its `mpiext_*_mpifh.h` in `mpi_f08` and `mpi_f08_ext` respectively. The default (no declaration) is 'want'. See `ompi/mpiext/example/configure.m4` for an example. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-01-16 11:55:55 +09:00
Aravind Gopalakrishnan	37f9aff2a0	mtl/ofi: Add MCA variables to enable SEP and to request number of OFI contexts Moving to a model where we have users actively _enable_ SEP feature for use rather than opening SEP by default if provider supports it. This allows us to not regress (either functionally or for performance reasons) any apps that were working correctly on regular endpoints. Also, providing MCA to specify number of OFI contexts to create and default this value to 1 (Given btl/ofi also creates one by default, this reduces the incidence of a scenario where we allocate all available contexts by default and if btl/ofi asks for one more, then provider breaks as it doesn't support it). While at it, spruce up README on SEP content. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-01-14 09:58:36 -08:00
Ralph Castain	d1fd1f4cce	Merge pull request #6151 from nrspruit/ns_ompi_mtl_ofi_specializations MTL_OFI: Generation of specialized functions at build time	2019-01-14 09:31:54 -08:00
Sergey Oblomov	0759bb8561	COLL: removed FCA component - removed FCA collectives from coll/scoll Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2019-01-09 16:51:40 +02:00
Risto Toijala	f14a0f4fc9	mpi/fortran: Fix valgrind warnings for type create Valgrind warns that newtype is uninitialized when calling from Fortran as e.g. use mpi integer :: t, err call MPI_Type_create_f90_integer(5, t, err) Since newtype is intent(out), this should not happen. There is no reason to convert the type using PMPI_Type_f2c, only to over- write it immediately afterwards. The other type_create_ functions did not convert newtype. The valgrind warnings: ==28441== Conditional jump or move depends on uninitialised value(s) ==28441== at 0x581B555: PMPI_Type_f2c (in [...]/lib/libmpi.so.0.0.0) ==28441== by 0x4E87AB7: MPI_TYPE_CREATE_F90_INTEGER (in [...]/lib/libmpi_mpifh.so.0.0.0) ==28441== by 0x400BA1: MAIN__ (in [...]) ==28441== by 0x400C46: main (in [...]) ==28441== ==28441== Conditional jump or move depends on uninitialised value(s) ==28441== at 0x581B563: PMPI_Type_f2c (in [...]/lib/libmpi.so.0.0.0) ==28441== by 0x4E87AB7: MPI_TYPE_CREATE_F90_INTEGER (in [...]/lib/libmpi_mpifh.so.0.0.0) ==28441== by 0x400BA1: MAIN__ (in [..]) ==28441== by 0x400C46: main (in [...]) ==28441== ==28441== Use of uninitialised value of size 8 ==28441== at 0x581B577: PMPI_Type_f2c (in [...]/lib/libmpi.so.0.0.0) ==28441== by 0x4E87AB7: MPI_TYPE_CREATE_F90_INTEGER (in [...]/lib/libmpi_mpifh.so.0.0.0) ==28441== by 0x400BA1: MAIN__ (in [...]) ==28441== by 0x400C46: main (in [...]) ==28441== Signed-off-by: Risto Toijala <risto.toijala@gmail.com>	2019-01-08 22:00:00 +02:00
Aurelien Bouteiller	e54496bf2a	Merge pull request #6087 from ICLDisco/export/errors_cid Manage errors in communicator creations (cid)	2018-12-31 15:01:55 -05:00
Jeff Squyres	17be4c6d1f	Merge pull request #6229 from jsquyres/pr/fix-enable-grequest-extension-in-a-tarball romio321: ensure to distribute ompi_grequestx.h	2018-12-28 16:15:23 -05:00
Jeff Squyres	62321be186	romio321: ensure to distribute ompi_grequestx.h Refs https://github.com/open-mpi/ompi/issues/6227. Thanks to @georgemarselis for reporting. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-12-27 15:39:47 -08:00
bosilca	96f88052e9	Merge pull request #5948 from mkurnosov/coll-ireduce-silence-coverity coll/libnbc/ireduce: silence Coverity warning CID 1440360	2018-12-24 12:59:16 -05:00
bosilca	593db292da	Merge pull request #5644 from mkurnosov/coll-iallreduce-rabenseifner coll/libnbc: add Rabenseifner's algorithm for MPI_Iallreduce	2018-12-24 12:58:21 -05:00
Jeff Squyres	efcaef74d8	MPI_Type_set_name: fix string length at target opal_string_copy() takes care of all the string computations. Specifically: when we converted to opal_string_copy(), we accidentally left the source length as the argument, not the target length, which resulted in one less character being copied than intended (as was showing up in MTT C++ testing results). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-12-23 13:00:01 -08:00
Aurelien Bouteiller	bd0d2b832e	Merge pull request #6086 from ICLDisco/export/errors_nbc Manage errors in NBC collective ops	2018-12-21 02:34:00 -05:00
Jeff Squyres	1be5358834	Merge pull request #6212 from jsquyres/pr/fix-treematch-common-symbol treematch: fix global common symbol	2018-12-20 15:20:41 -05:00
Jeff Squyres	e9a6246b90	treematch: fix global common symbol Despite its name, this symbol doesn't need to be global. So just make it static. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-12-20 11:06:14 -08:00
Jeff Squyres	81bfb5f5e5	Remove some IMPI attributes that were never implemented. This is a holdover from LAM/MPI that was never implemented here in Open MPI (and never will be). Might as well remove this dead code. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-12-20 10:12:32 -08:00
Nathan Hjelm	4944508603	Merge pull request #6136 from hjelmn/opal_cleanup opal: clean up init/finalize	2018-12-18 15:23:32 -07:00
Nathan Hjelm	a39cb747dd	ompi/datatype: don't call opal_datatype_finalize directly Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-12-18 14:37:04 -07:00
Nathan Hjelm	06baa518f7	rte/pmix: fill in opal_process_info when using prrte/pmix This commit fixes a bug when launching with prun where the process info structures used by the btls are not populated. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-12-13 16:04:31 -07:00
bosilca	804a517929	Merge pull request #6146 from bosilca/topic/treematch_update Update to the latest TreeMatch (v1.3).	2018-12-13 13:26:40 -05:00
Spruit, Neil R	bef5f50a42	MTL_OFI: Generation of specialized functions at build time -> Added new targets in Makefile.am to call a new build script generate-opt-funcs.pl to generate specialized functions for each .pm file. -> Added new perl module .pm files for send,isend,irecv,iprobe,improbe which are loaded by generate-opt-funcs.pl to create new source files that correspond to the name of the .pm file to be used as part of MTL OFI. -> Added mtl_ofi_opt.pm.template and updated README with details on the specialization features and how to add additional specialization support. -> Added new opt_common/mtl_ofi_opt_common.pm containing common functions for generating the specialized functions used by all other *.pm modules. -> Added new mtl_ofi.h which includes the definitions for the function symbol table for storing the specialized functions along with the definitions for the initialization functions for the corresponding function pointers. -> Based off the OFI provider capabilities the specialized function pointers are assigned at mtl_ofi_component_init to the corresponding MTL OFI function. -> mca_mtl_ofi_module_t has been updated with the symbol table struct which is assigned at component init. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-12-13 00:35:19 -08:00
Aravind Gopalakrishnan	e5e19dfcf7	Fix for SEP when num local procs is greater than available contexts For cases when the number of local processes is greater than the number of available contexts, the SEP initialization phase would calculate the number of contexts to provision for each rank to be 0 and would eventually crash. Fix the issue here by using regular endpoints in the event the number of local processes is more than available contexts. This fixes issue #6182. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-12-12 16:49:04 -08:00
KAWASHIMA Takahiro	adc05f705e	Merge pull request #6174 from kawashima-fj/pr/f08-missing-handles fortran/use-mpi-f08: Add C++ datatypes and MPI_NO_OP	2018-12-12 14:13:36 +09:00
Brian Barrett	6e15128d96	mtl/ofi: Fix crash if no providers found Commit `109d0569ff` introduced a crash when an error occurred before ofi_ctxt was allocated, including when no providers passed the selection logic. Properly check that the pointer is not NULL in the error cleanup code before dereferencing the pointer. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-12-11 15:46:18 -08:00
Jeff Squyres	6f7fbd1676	Merge pull request #6158 from ggouaillardet/topic/mpiext-path-updates mpiext: updates for header file locations	2018-12-11 13:01:46 -05:00
KAWASHIMA Takahiro	63ecf01610	fortran/use-mpi-f08: Add C++ datatypes and MPI_NO_OP Though the MPI standard does not have `MPI_CXX_COMPLEX`, `mpi.h`, `mpif.h`, and `mpi.mod` have it. So I added it for consistency. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-12-11 13:08:29 +09:00
KAWASHIMA Takahiro	e0c5bad195	fortran/use-mpi-f08: Remove unnecessary `;` Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-12-11 09:06:21 +09:00
Matias Cabral	cdb952f66d	Merge pull request #6170 from matcabral/remove_psm2_lower_p MTL/PSM2: add missing default priority	2018-12-07 16:11:45 -08:00
Matias A Cabral	c76c6d8b28	MTL/PSM2: add missing default priority Missing default priority after PR #6153 Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-12-07 14:46:34 -08:00
Matias Cabral	0b821f2184	Merge pull request #6153 from matcabral/remove_psm2_lower_p MTL/PSM2: Do not lower the priority when all processes are local.	2018-12-07 10:19:53 -08:00
KAWASHIMA Takahiro	4be5a6cdc8	Merge pull request #6159 from kawashima-fj/pr/fix-type-create-f90 mpi/c: Fix MPI_TYPE_CREATE_F90_{REAL,COMPLEX}	2018-12-08 01:41:20 +09:00
KAWASHIMA Takahiro	6fb01f64fe	mpi/c: Fix MPI_TYPE_CREATE_F90_{REAL,COMPLEX} This commit fixes edge cases of `r = 38` and `r = 308`. As defined in the MPI standard, `TYPE_CREATE_F90_REAL` and `TYPE_CREATE_F90_COMPLEX` must be consistent with the Fortran `SELECTED_REAL_KIND` function. The `SELECTED_REAL_KIND` function is defined based on the `RANGE` function. The `RANGE` function returns `INT(MIN(LOG10(HUGE(X)), -LOG10(TINY(X))))` for a real value `X`. The old code considers only `INT(LOG10(HUGE(X)))` using `_MAX_10_EXP`. This commit adds `INT(-LOG10(TINY(X)))` part using `_MIN_10_EXP`. This bug affected the following `p`-`r` combinations. \| p \| r \| expected \| returned \| expected \| returned \| \| :------------ \| --: \| :-------- \| :-------- \| :------- \| :-------- \| \| MPI_UNDEFINED \| 38 \| REAL8 \| REAL4 \| COMPLEX16 \| COMPLEX8 \| \| 0 <= p <= 6 \| 38 \| REAL8 \| REAL4 \| COMPLEX16 \| COMPLEX8 \| \| MPI_UNDEFINED \| 308 \| REAL16 \| REAL8 \| COMPLEX32 \| COMPLEX16 \| \| 0 <= p <= 15 \| 308 \| REAL16 \| REAL8 \| COMPLEX32 \| COMPLEX16 \| MPICH returns the same result as Open MPI with this fix. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-12-06 16:48:23 +09:00
Gilles Gouaillardet	975e3cd0c9	mpiext: updates for header file locations Per discussion on https://github.com/open-mpi/ompi/pull/6030 and https://github.com/open-mpi/ompi/pull/6145, move around where MPI extension header files are installed (specifically: the installation tree path does not need to match the source tree path). For reference, header files were installed like this : - <prefix>/include/openmpi/ompi/mpiext/pcollreq/mpif-h/mpiext_pcollreq_mpifh.h - <prefix>/include/openmpi/ompi/mpiext/pcollreq/c/mpiext_pcollreq_c.h and they are now installed like this : - <prefix>/include/openmpi/mpiext/mpiext_pcollreq_mpifh.h - <prefix>/include/openmpi/mpiext/mpiext_pcollreq_c.h Signed-off-by: Jeff Squyres <jsquyres@cisco.com> Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-12-06 15:40:02 +09:00
Gilles Gouaillardet	4918fc4455	Revert "fortran/mpif-h: keep include path for extension short" This reverts commit open-mpi/ompi@848a868f7b. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-12-06 15:39:59 +09:00
Gilles Gouaillardet	ccbdc8fd58	Revert "c: keep include path for extension short" This reverts commit open-mpi/ompi@27c25fa721. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-12-06 15:39:54 +09:00
Gilles Gouaillardet	a152aa215e	cleanup: remove the unused (and unexpanded) {ORTE,OMPI}_WANT_REPO_REV macro Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-12-06 13:13:13 +09:00
George Bosilca	74f2365d6e	Remove most (all) warnings from the new TreeMatch. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-12-05 15:38:39 -05:00
Guillaume Mercier	27aa34e53f	New version based on TM 1.3 Optimize_topology is commented for now until bug resolved in TM Signed-off-by: Guillaume Mercier <guillaume.mercier@bordeaux-inp.fr>	2018-12-05 15:38:39 -05:00
Matias A Cabral	fc8582c560	MTL/PSM2: Do not lower the priority when all processes are local. The intention of lowering the priority when all processes are local was to favor Vader BTL. However, in builds including the OFI MTL it gets selected instead. Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com> Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com> Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-12-04 15:31:09 -08:00
Matias Cabral	abd34620f4	Merge pull request #5972 from aravindksg/ofi_sep_master MTL/OFI: Add OFI Scalable Endpoint support	2018-12-04 13:07:44 -08:00
Aravind Gopalakrishnan	109d0569ff	MTL/OFI: Add OFI Scalable Endpoint support OFI MTL supports OFI Scalable Endpoints feature as means to improve multi-threaded application throughput and message rate. Currently the feature is designed to utilize multiple TX/RX contexts exposed by the OFI provider in conjunction with a multi-communicator MPI application model. For more information, refer to README under mtl/ofi. Reviewed-by: Matias Cabral <matias.a.cabral@intel.com> Reviewed-by: Neil Spruit <neil.r.spruit@intel.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-12-03 09:56:52 -08:00
George Bosilca	c6f73e8883	First step of the integration with the new TreeMatch. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-12-02 20:05:03 -05:00
Yossi Itigin	83cca9d52a	ucx: add owner.txt for components Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-12-01 17:14:03 +02:00
matcabral	6a15712df5	MTL/OFI: revert PR 6082 Revert to avoid issues with dynamic processes. Signed-off-by: matcabral <matias.a.cabral@intel.com>	2018-11-30 13:44:39 -08:00
Matias Cabral	ef5db1b752	Merge pull request #6082 from matcabral/lower_mtl_ofi_p MTL/OFI: Lower priority when all procs are local	2018-11-30 12:05:40 -08:00
Bert Wesarg	18525ce39b	Fix use of bitwise operation in CPP condition Signed-off-by: Bert Wesarg <bert.wesarg@tu-dresden.de>	2018-11-30 12:44:42 +01:00
Nathan Hjelm	5ebcbe444e	Merge pull request #6083 from devreal/rdma-plug-memleak Plug two memory leaks in rdma osc	2018-11-27 09:56:19 -07:00
Nathan Hjelm	27084c60c9	Merge pull request #6123 from hoopoepg/topic/osc-ucx-max-level-60 OSC/UCX: set max level value to 60	2018-11-27 09:51:09 -07:00
Sergey Oblomov	2d230b3aac	OSC/UCX: set max level value to 60 Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-11-27 14:20:28 +02:00
KAWASHIMA Takahiro	291d7654c5	Merge pull request #6030 from ggouaillardet/topic/mpiext_short_path mpiext: keep include path for extension short	2018-11-27 20:59:01 +09:00
Gilles Gouaillardet	5a968306d6	mpi/c: add back (some more) deprecated subroutines - MPI_NULL_DELETE_FN - MPI_NULL_COPY_FN - MPI_DUP_FN Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-11-27 13:56:03 +09:00
Gilles Gouaillardet	27c25fa721	c: keep include path for extension short move openmpi/ompi/mpiext/FOO/c/mpiext_FOO_c.h to openmpi/ompi/mpiext/FOO_c.h in order to use consistent paths with mpif.h extensions Refs. open-mpi/ompi#6019 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-11-27 11:21:05 +09:00
Gilles Gouaillardet	848a868f7b	fortran/mpif-h: keep include path for extension short in order to cope with the 72 characters per line limit, move openmpi/ompi/mpiext/FOO/mpif-h/mpiext_FOO_mpifh.h to openmpi/ompi/mpiext/FOO_mpifh.h Refs. open-mpi/ompi#6019 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-11-27 09:39:09 +09:00
Jeff Squyres	8459d29738	Merge pull request #5979 from mkurnosov/coll-libnbc-cleanup coll/libnbc: remove debug output	2018-11-26 18:10:10 -05:00
Jeff Squyres	dbe064af97	Merge pull request #5653 from bmwiedemann/userhost Allow to override build user and host	2018-11-26 17:48:37 -05:00
Bert Wesarg	b3f3281290	Re-add removed deprecate-only MPI-2.0 symbols See #6114 Signed-off-by: Bert Wesarg <bert.wesarg@tu-dresden.de>	2018-11-26 14:00:05 +01:00
Yossi Itigin	e98ce2b36b	Merge pull request #6108 from yosefe/topic/pml-ucx-init-req_mpi_object pml_ucx: initialize req_mpi_object.comm for error handler	2018-11-26 11:54:10 +02:00
KAWASHIMA Takahiro	5f0fcf0f45	README & man: Update pcollreq documentation The feature of persistent collectives is approved in the Sept. 2018 MPI Forum meeting and 2018 Draft Specification of the MPI standard is published during SC18. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-11-26 17:27:43 +09:00
Yossi Itigin	f36eeef4c5	pml_ucx: initialize req_mpi_object.comm for error handler without this fix, an error handler invoked on pml_ucx request would segfault while trying to dereference requests[i]->req_mpi_object.comm Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-11-25 19:37:54 +02:00
Yossi Itigin	ed967d867b	Merge pull request #6073 from hoopoepg/topic/set-osc-ucx-level-200 OSC: set UCX module used by default	2018-11-22 10:53:37 +02:00
KAWASHIMA Takahiro	303d7842d9	Merge pull request #6074 from kawashima-fj/pr/remove-c99-type-check Remove `#if HAVE_[TYPE]` for types available in C99	2018-11-20 11:42:13 +09:00
Aurelien Bouteiller	20447be744	Someone left a debug printf in NBC Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2018-11-16 10:37:04 -05:00
Aurelien Bouteiller	65660e5999	Manage errors in NBC collective ops Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu> Correctly bubble up errors in NBC collective operations Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu> The error field of requests needs to be rearmed at start, not at create Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2018-11-15 16:43:56 -05:00
Joseph Schuchart	91885f5876	Plug two memory leaks in rdma osc Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2018-11-14 14:31:54 -05:00
matcabral	5f58453e63	MTL/OFI: Lower priority when all procs are local So far Vader is faster than OFI MTL for doing shared memory. Therefore, let it run by default when all procs are local. Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com> Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com> Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>	2018-11-14 11:01:33 -08:00
Sergey Oblomov	e91f214982	OSC/UCX: added UCX version evaluation - added UCX version evaluation to set OSC UCX priority Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-11-14 10:03:13 +02:00
KAWASHIMA Takahiro	cacd6f389c	datatype: Remove `#if HAVE_[TYPE]` for C99 types Now Open MPI requires a C99 compiler. Checking availability of the following types is no more needed. - `long long` (`signed` and `unsigned`) - `long double` - `float _Complex` - `double _Complex` - `long double _Complex` Furthermore, the `#if HAVE_[TYPE]` style checking is not correct. Availability of C types is checked by `AC_CHECK_TYPES` in `configure.ac`. `AC_CHECK_TYPES` defines macro `HAVE_[TYPE]` as `1` in `opal_config.h` if the `[TYPE]` is available. But it does not define `HAVE_[TYPE]` (instead of defining as `0`) if it is not available. So even if we need `HAVE_[TYPE]` checking, it should be `#if defined(HAVE_[TYPE])`. I didn't remove `AC_CHECK_TYPES` for these types in `configure.ac` since someone may use `HAVE_[TYPE]` macros somewhere. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-11-14 09:32:52 +09:00
Sergey Oblomov	36934a8bb2	OSC: set UCX module used by default - OSC/UCX module set priority to 200 to be used by default Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-11-12 15:08:22 +02:00
Gilles Gouaillardet	b3ce25af95	mpiext/cuda: fix mpiext_cuda_c.h install path This fixes a regression introduced in commit open-mpi/ompi@f8318f0a8f. Fixes open-mpi/ompi#6069 Thanks Kawashima-san for the heads up ! Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-11-12 00:58:19 -06:00
Matias Cabral	30b6435897	Merge pull request #6015 from aravindksg/proc-threshold-fix MTL/OFI: Check threshold number of peers allowed per rank	2018-11-08 15:47:45 -08:00
Gilles Gouaillardet	f8318f0a8f	mpiext/cuda: do not include automatically generated file into dist tarball ompi/mpiext/cuda/c/mpiext_cuda_c.h is automatically generated from ompi/mpiext/cuda/c/mpiext_cuda_c.h.in at configure time. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-11-06 13:57:31 +09:00
Jeff Squyres	65eb118e08	MPI_Type_get_envelope: remove MPI-1 deleted names Several names are now no longer returned by MPI_Type_get_envelope. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-11-03 16:20:45 -04:00
Aravind Gopalakrishnan	5cf43de445	MTL/OFI: Check threshold number of peers allowed per rank When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when this limit is crossed. Check the max allowed number of ranks during add_procs() and return if there is danger of exceeding this threshold. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-11-01 14:03:00 -07:00
Geoffrey Paulsen	b03a39d359	mpi.h: restore some MPI-deprecated items to default builds Commit `89da9651b` inadvertantly #if'ed out both deprecated and removed items from mpi.h. The intent was only to #if out items that have been removed from the MPI specification and leave all items that are merely deprecated. This commit also re-orders the deleted typedef+functions to be in the same order as they are listed in MPI-3.1 chapter 17, just to make verifying/checking the code easier. Note that --enable-mpi1-compatibility can still be used to restore prototypes for the items that have been removed from the MPI specification (e.g., MPI_Address()). Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com> Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-11-01 13:36:48 -07:00
Matias Cabral	2da31706bf	Merge pull request #5970 from aravindksg/coll-tuned-fix coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms	2018-10-31 11:20:07 -07:00
Ralph Castain	05ac8fa71c	Remove stale defunct tools Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-10-30 08:48:16 -07:00
Mikhail Kurnosov	64abd0f405	coll/libnbc: remove debug output 1. Remove debug output in iallgather (I have forgotten to remove it). 2. Remove an incorrect comment in description of ibcast Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-26 15:52:02 +07:00
Aravind Gopalakrishnan	88d781056f	coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms. But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths as well before calling ompi_datatype_type_size() as otherwise we segfault. MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and Allgatherv operations. So, extending the check to these algorithms as well. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-10-24 15:31:33 -07:00
Joseph Schuchart	a193ae26bf	Fix regression introduced earlier by re-adding a barrier after shared memory has been registered Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2018-10-24 15:54:19 -04:00
Aurelien Bouteiller	96c91e94eb	Manage errors in communicator creations (cid) In order for this to work, error management needs to also be added to NBC, from separate PR Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu> The error field of requests needs to be rearmed at start, not at create Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2018-10-23 23:43:33 -04:00
Yossi Itigin	4c442f2601	Merge pull request #5934 from hoopoepg/topic/suppressed-cov-warn-added-log-msg COMMON/UCX: suppressed coverity warnings	2018-10-22 11:00:47 +03:00
Mikhail Kurnosov	8b511c7889	coll/libnbc/ireduce: silence Coverity warning CID 1440360 Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-22 11:20:28 +07:00
Sergey Oblomov	1099d5f023	COMMON/UCX: added error code to log output Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-10-21 11:37:25 +03:00
Nathan Hjelm	a66373454e	Merge pull request #5943 from bosilca/fix/libnbc_warnings Remove few warnings in libnbc identified by clang-1000.11.45.2	2018-10-20 21:24:30 -06:00
Bernhard M. Wiedemann	bc23993dea	Allow to override build user and host using the standard $USER and $HOSTNAME environment variables to make reproducible builds possible. See https://reproducible-builds.org/ for why this is good. This helps improve issue #3759 Signed-off-by: Bernhard M. Wiedemann <bwiedemann@suse.de>	2018-10-20 09:27:00 -04:00
bosilca	c3abedbd2c	Merge pull request #5759 from bosilca/fix/monitoring Fix/monitoring	2018-10-19 07:18:41 -07:00
Nathan Hjelm	dbae9c0958	romio/romio321: silence some compiler warnings Some compilers complain when comparing signed and unsigned. romio321 was doing just this. The check is meant to check whether a size (which is an ADIO_Offset-- a signed number) will work with memcpy which takes a size_t. To silence the warning I added a new type (ADIO_Size) which is an unsigned type and cast the ADIO_Offset to this new type. Fixes #5951 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-10-18 13:36:51 -06:00
George Bosilca	dc972f0b92	Fix the PML monitoring. The monitoring PML hides it's existence from the OMPI infrastructure by removing itself from the list of PML loaded components, remaining hidden until MPI_Finalize. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-10-18 00:29:23 -04:00
George Bosilca	668aa15dda	Early selection of the best PML. With this patch the best PML is selected earlier, before finalizing the others PML. This provides a simpler mechanism to intercept and highjack the PML (as done in the monitoring PML) Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-10-18 00:29:23 -04:00
Mikhail Kurnosov	73e048b62a	coll/libnbc: add Rabenseifner's algorithm for MPI_Iallreduce An implementation of R. Rabenseifner's algorithm for MPI_Iallreduce. This algorithm is a combination of a reduce-scatter implemented with recursive vector halving and recursive distance doubling, followed either by an allgather. Limitations: -- count >= 2^{\floor{\log_2 p}} -- commutative operations only -- intra-communicators only Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-18 08:50:16 +07:00
Ralph Castain	1bd772e8eb	Remove the stale orte-dvm code Users should migrate to https://github.com/pmix/prrte Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-10-17 15:11:38 -07:00
George Bosilca	66182a294d	Remove few warnings in libnbc identified by clang-1000.11.45.2 Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-10-17 18:04:39 -04:00
Howard Pritchard	a435bfe1cf	Merge pull request #5933 from hppritcha/topic/remove_bfo_pml remove the bfo pml	2018-10-17 09:39:58 -06:00
Nathan Hjelm	43547ade4c	Merge pull request #5663 from mkurnosov/coll-ireduce-rabenseifner coll/libnbc: add Rabenseifner's algorithm for MPI_Ireduce	2018-10-17 09:02:06 -06:00
Nathan Hjelm	979a199b4f	Merge pull request #5896 from mkurnosov/coll-iallgather-recursivedoubling coll/libnbc: add recursive doubling algorithm for MPI_Iallgather	2018-10-17 09:01:14 -06:00
Sergey Oblomov	df765595e3	COMMON/UCX: suppressed coverity warnings - suppressed coverity warnings - added log messages on failed calls Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-10-17 16:11:03 +03:00
Howard Pritchard	7d6774acf8	remove the bfo pml Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2018-10-17 06:50:11 -06:00
Nathan Hjelm	1ff3cfedb6	Merge pull request #5921 from devreal/ompi-rdma-preinit RDMA OSC: initialize segment memory before registering the segment	2018-10-16 15:10:02 -06:00
Joseph Schuchart	d9dcdfdfba	RDMA OSC: initialize segment memory before registering the segment Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>	2018-10-16 16:12:14 -04:00
Edgar Gabriel	069084e6ad	Merge pull request #5907 from edgargabriel/topic/testmpio-fixes Topic/testmpio fixes	2018-10-16 13:03:22 -07:00
Edgar Gabriel	ba95588332	io/ompio: add verification for data representations. check for providing a data representation that is actually supported by ompio. Add also one check for a non-NULL pointer in mpi/c/file_set_view for the data representation. Also fixes parts of issue #5643 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-16 12:45:33 -05:00
Jeff Squyres	54ca3310ea	ompi: cleanup various string operations Several fixes to string handling: 1. strncpy() -> opal_string_copy() (because opal_string_copy() guarantees to NULL-terminate, and strncpy() does not) 2. Simplify a few places, such as: * Since opal_string_copy() guarantees to NULL terminate, eliminate some memsets(), etc. * Use opal_asprintf() to eliminate multi-step string creation There's more work that could be done; e.g., this commit doesn't attempt to clean up any strcpy() usage. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-10-14 16:10:20 -07:00
Yossi Itigin	a5b1c9a91d	Merge pull request #5898 from yosefe/topic/pml-ucx-init-err-code pml_ucx: fix return code from mca_pml_ucx_init() error flow	2018-10-14 11:34:00 +03:00
Gilles Gouaillardet	0a09b0419e	Merge pull request #5812 from ggouaillardet/topic/mpi_sizeof_misc_additions fortran: add CHARACTER and LOGICAL support to MPI_Sizeof()	2018-10-12 14:08:27 +09:00
Edgar Gabriel	849d0452a0	io/ompio: execute barrier before sync this ensures that all processes are done modifying a file before syncing. Fixes an error in the testmpio testsuite. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-11 17:39:05 -05:00
Edgar Gabriel	bf058ca6b0	common/ompio: check datatypes when setting file view return MPI_ERR_ARG if the size of the fileview is not a multiple of the size of the etype provided. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-11 14:43:32 -05:00
Edgar Gabriel	05d25383c2	common/ompio: return correct error code for improper access return MPI_ERR_ACCESS if the user tries to read from a file that was opened using MPI_MODE_WRONLY return MPI_ERR_READ_ONLY if the user tries to write a file that was opened using MPI_MODE_RDONLY Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-11 14:41:58 -05:00
Edgar Gabriel	c0d7b578be	io/ompio: fix seek position calculation for SEEK_CUR This commit fixes the calculation of the position where to seek to, in case SEEK_CUR is used. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-10-11 14:09:03 -05:00
Yossi Itigin	b71e85b8d5	pml_ucx: fix return code from mca_pml_ucx_init() error flow Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-11 18:48:54 +03:00
Jeff Squyres	f4b3ccabf7	mpi.h.in: remove C99-style comments While we require C99 to build Open MPI, we do not require C99 to build user MPI applications. As such, we shouldn't have C99-style comments (i.e., "//"-style) in mpi.h.in. Thanks to @AdamSimpson for reporting the issue. This commit simply converts a //-style comment to a /**/-style comment. No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-10-11 10:58:06 -04:00
Mikhail Kurnosov	a7386c1e09	coll/libnbc: add recursive doubling algorithm for MPI_Iallgather Implements recursive doubling algorithm for MPI_Iallgather. The algorithm can be used only for power-of-two number of processes. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-11 21:43:13 +07:00
Yossi Itigin	b8e1af6fcb	osc_ucx: add worker flush before osc module free Make sure all pending communications are done on all ranks before closing the window. This way it will be safe to close the endpoints when closing the component. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-10 20:47:16 +03:00
Yossi Itigin	bcc48515e4	Revert "osc_ucx: fix hang/timeout in component finalize" This reverts commit 438d13b4ca1e7333b789ca3fb536fda17b0feb38. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-10 20:47:13 +03:00
Yossi Itigin	27d8c8e83c	Merge pull request #5878 from yosefe/topic/pml-ucx-fix-datatype-leak pml_ucx: add ompi datatype attribute to release ucp_datatype	2018-10-10 20:18:39 +03:00
Yossi Itigin	a012ee91d8	Merge pull request #5886 from yosefe/topic/osc-ucx-fix-finalize-hang osc_ucx: fix hang/timeout in component finalize	2018-10-10 16:29:29 +03:00
Yossi Itigin	9a365555b0	Merge pull request #5879 from hoopoepg/topic/fixed-zero-size-window OSC/UCX: fixed zero-size window processing	2018-10-10 16:28:55 +03:00
Yossi Itigin	40ac9e4771	pml_ucx: fix return code from mca_pml_ucx_init() Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-10 14:41:05 +03:00
Yossi Itigin	dc6809495d	osc_ucx: fix hang/timeout in component finalize Add barrier to make sure all endpoints are destroyed before destroying the worker. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-10 14:38:06 +03:00
Sergey Oblomov	ae6f81983f	OSC/UCX: fixed zero-size window processing - added processing of zero-size MPI window Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-10-10 13:08:01 +03:00
Nathan Hjelm	32682aa2c0	Merge pull request #5772 from mkurnosov/coll-ibcast-knomial coll/libnbc: add knomial tree algorithm for MPI_Ibcast	2018-10-09 16:26:13 -06:00
Jeff Squyres	bb13941b69	Merge pull request #5811 from ggouaillardet/topic/mpi_f08_c_types fortran/use-mpi-f08: add MPI C types	2018-10-09 13:17:30 -04:00
Yossi Itigin	4763822a64	pml_ucx: add ompi datatype attribute to release ucp_datatype Signed-off-by: Yossi Itigin <yosefe@mellanox.com>	2018-10-09 17:34:34 +03:00
Mikhail Kurnosov	b0429d25df	coll/libnbc: add knomial tree algorithm for MPI_Ibcast Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-09 20:43:04 +07:00
Mikhail Kurnosov	7bd63e79c8	coll/libnbc: add Rabenseifner's algorithm for MPI_Ireduce An implementation of R. Rabenseifner's algorithm for MPI_Ireduce. This algorithm is a combination of a reduce-scatter implemented with recursive vector halving and recursive distance doubling, followed either by a gather. Limitations: -- count >= 2^{\floor{\log_2 p}} -- commutative operations only -- intra-communicators only Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-09 20:27:09 +07:00
KAWASHIMA Takahiro	b491b454dc	java: Fix javadoc build failure with OpenJDK 11 OpenJDK 11 changed the default javadoc output HTML version to HTML 5 from HTML 4.01. It causes an error on building Open MPI configured with `--enable-mpi-java` (default: disable). This fix is compatible with older OpenJDK. I don't know whether this problem exists with other vender's JDKs. But this fix should be compatible with other JDKs because the new syntax is used in other places in the same file. Thanks to Siegmar Gross for the bug report. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-10-09 17:49:30 +09:00
Brian Barrett	e9e4d2a4bc	Handle asprintf errors with opal_asprintf wrapper The Open MPI code base assumed that asprintf always behaved like the FreeBSD variant, where ptr is set to NULL on error. However, the C standard (and Linux) only guarantee that the return code will be -1 on error and leave ptr undefined. Rather than fix all the usage in the code, we use opal_asprintf() wrapper instead, which guarantees the BSD-like behavior of ptr always being set to NULL. In addition to being correct, this will fix many, many warnings in the Open MPI code base. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-10-08 16:43:53 -07:00
Mikhail Kurnosov	9557fa087f	Resolve merge conflicts Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-10-05 21:40:27 +07:00
KAWASHIMA Takahiro	5f1c940c8b	Merge pull request #5840 from kawashima-fj/pr/pcollreq-f08-signatures mpiext/pcollreq: Correct f08 routine signatures	2018-10-05 08:59:03 +09:00
KAWASHIMA Takahiro	43d85dbc81	mpiext/pcollreq: Add Fortran bindings in man Fortran bindings were added to persistent collectives in `9e0115c980` but man was not updated. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-10-04 21:05:38 +09:00
KAWASHIMA Takahiro	994b345253	man: Correct markup of `MPI_Neighbor_allgather` Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-10-04 21:02:35 +09:00
KAWASHIMA Takahiro	be91a26fd8	mpiext/pcollreq: Add missing f08 `asynchronous` Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-10-04 20:36:30 +09:00
KAWASHIMA Takahiro	357531847e	mpiext/pcollreq: Correct f08 routine signatures Changes of nonblocking collectives in `e98d794e8b` and `f750c6932c` are applied to persistent collectives. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-10-04 19:51:40 +09:00
Nathan Hjelm	88a560fa3c	Merge pull request #5744 from mkurnosov/coll-iscan-recursivedoubling coll/libnbc: add recursive doubling algorithm for MPI_Iscan	2018-10-03 09:02:05 -06:00
Gilles Gouaillardet	69f1a19c5d	fortran/use-mpi-f08: add MPI C types Refers open-mpi/ompi#5801 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-10-03 16:09:00 +09:00
KAWASHIMA Takahiro	eb65e1c6fb	Merge pull request #5799 from kawashima-fj/pr/correct-f08-signatures fortran/use-mpi-f08: Correct f08 routine signatures	2018-10-03 10:37:21 +09:00
Brian Barrett	b2ee56aa81	fortran: Fix ident warning On OS X, where #pragma ident and #ident aren't supported, the use of a static const star that was never used was generating a warning (and, it should be noted, was useless, because the compiler would optimize it away). Fix up the ident declaration so that it is only created once in libmpi_mpifh.la. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-10-02 13:35:15 -04:00
Brian Barrett	2e24e6ec08	coll libnbc: Remove dead code Remove dead code that was causing warnings about unused static functions. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-10-02 13:35:15 -04:00
Gilles Gouaillardet	e4001040b4	fortran: add CHARACTER and LOGICAL support to MPI_Sizeof() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-10-01 13:37:05 +09:00
KAWASHIMA Takahiro	cf6d28cb66	fortran/use-mpi-f08: Correct f08 routine signatures Following the commit `f750c6932c`, I compared `ompi/mpi/fortran/use-mpi-f08/.F90` and `ompi/mpi/fortran/use-mpi-f08/profile/p.F90`, and `ompi/mpi/fortran/use-mpi-f08/mod/mpi-f08-interfaces.F90` and `ompi/mpi/fortran/use-mpi-f08/mod/pmpi-f08-interfaces.F90`. There are many differences. Some are bugs of `MPI_`, some are bugs of `PMPI_`. I'm not sure how these bugs affect applications. To make it easy to compare these files future, I also removed editorial differences. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-09-29 01:39:01 +09:00
Jeff Squyres	7223334d4d	mpi.h: remove MPI_UB/MPI_LB when not enabling MPI-1 compat When --enable-mpi1-compatibility was specified, the ompi_mpi_ub/lb symbols were #if'ed out of mpi.h. But the #defines for MPI_UB/LB still remained. This commit also #if's out the MPI_UB/LB macros when --enable-mpi1-compatibility is specified. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-09-28 09:10:03 -07:00
Jeff Squyres	11ab621555	mpi.h: file errhandeler typedef: use new form of name The old/deprecated form of the file errhandler typedef used "fn" as a suffix. The new form uses the name "function". The MPI API typedef name has already been updated to use "function"; this commit updates the internal Open MPI typedef to use the name "function" to match the MPI API name and avoid confusion. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-09-28 07:49:28 -07:00
Brian Barrett	c5eaa38491	mtl ofi: Change from opt-in to opt-out provider selection Change default provider selection logic for the OFI MTL. The old logic was whitelist-only, so any new HPC NIC provider would have to ask users to do extra work or wait for an OMPI release to be whitelisted. The reason for the logic was to avoid selecting a "generic" provider like sockets or shm that would frequently have worse performance than the optimized BTL options Open MPI supports. With the change, we blacklist the (small, relatively static) list of providers that duplicate internal capabilities. Users can use one of thse blacklisted providers in two ways: first, they can explicitly request the provider in the include list (which will override the default exclude list) and second, the can set a new empty exclude list. Since most HPC networks require special libraries and therefore an explicit build of libfabric, it is highly unlikely that this change will cause users to use libfabric when they didn't want to do so. It does, however, solve the whitelisting problem. Signed-off-by: Brian Barrett <bbarrett@amazon.com>	2018-09-27 11:02:18 -07:00
Jeff Squyres	6bb356ab87	Squash a bunch of harmless compiler warnings. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-09-26 12:15:21 -07:00
Gilles Gouaillardet	f750c6932c	fortran/use-mpi-f08: Corrections to PMPI signatures of collectives Corrected the signatures of the collectives used by the Fortran 2008 interface to state correct intent for inout arguments and use the ASYNCHRONOUS attribute in non-blocking collective calls. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-09-25 11:17:01 +09:00
Philipp Otte	e98d794e8b	fortran/use-mpi-f08: Corrections to Fortran08 signatures of collectives Corrected the signatures of the collectives used by the Fortran 2008 interface to state correct intent for inout arguments and use the ASYNCHRONOUS attribute in non-blocking collective calls. Also corrected the C-bindings in Fortran accordingly Signed-off-by: Philipp Otte <philipp.j.otte@googlemail.com>	2018-09-25 11:16:52 +09:00
Mikhail Kurnosov	dfe203e167	coll/libnbc: add recursive doubling algorithm for MPI_Iexscan Implements recursive doubling algorithm for MPI_Iexscan. The algorithm preserves order of operations so it can be used both by commutative and non-commutative operations. The MCA parameter 'coll_libnbc_iexscan_algorithm' was added for dynamic algorithm selection. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-09-23 19:54:27 +07:00
Mikhail Kurnosov	3d43ff0f32	coll/libnbc: add recursive doubling algorithm for MPI_Iscan Implements recursive doubling algorithm for MPI_Iscan. The algorithm preserves order of operations so it can be used both by commutative and non-commutative operations. The MCA parameter coll_libnbc_iscan_algorithm was added for dynamic algorithm selection. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-09-22 21:09:12 +07:00
bosilca	3f598e9e83	Merge pull request #5450 from mkurnosov/coll-base-allgather-fix-in-place coll-base-allgather: fix MPI_IN_PLACE processing	2018-09-21 14:51:45 -04:00
bosilca	17f1684438	Merge pull request #5491 from mkurnosov/coll-base-allgatherv-fix-mpi-in-place coll/base/allgatherv: fix MPI_IN_PLACE processing	2018-09-21 14:48:16 -04:00
bosilca	441727fcb0	Merge pull request #5680 from ggouaillardet/topic/nbc_unpack coll/libnbc: fix NBC_Unpack()	2018-09-19 10:17:21 -04:00
bosilca	2ae3cfd9bc	Merge pull request #5699 from ICLDisco/export/coll_errors Error cases in base collectives	2018-09-19 09:47:24 -04:00
Jeff Squyres	3dae8703a5	Merge pull request #5451 from ggouaillardet/topic/use_mpi_f08_bindings fortran/use-mpi-f08: clean [p]ompi_FOO_f bindings	2018-09-19 07:53:04 -04:00
Kurita, Takehiro	fb8311d331	java: Fix typos of `javadoc` Signed-off-by: Kurita, Takehiro <fj6370fp@aa.jp.fujitsu.com>	2018-09-19 14:45:17 +09:00
Gilles Gouaillardet	c4ce01d104	fortran/use-mpi-f08: use bindings from ompi_mpifh_bindings Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-09-19 14:13:00 +09:00
Gilles Gouaillardet	c6070fd2e0	fortran/use-mpi-f08: fix [p]ompi_FOO_f symbols handling - do not generate bindings for pompi_FOO_f symbols (they are simply not used anywhere) - move ompi_FOO_f bindings out of mpi_f08.mod into ompi_mpifh_bindings.mod that is only used at build time Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-09-19 14:12:55 +09:00
Gilles Gouaillardet	6e04b2a66a	configury: do not define "dummy" empty targets any more. We previously needed to have empty targets because AM couldn't handle having an AM_CONDITIONAL was targets in the "if" statement but not in the "else". :-( That now appears as an old automake bug that has been fixed, so cleanup some Makefile.am Thanks Jeff for the pointer. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-09-19 14:12:50 +09:00
Gilles Gouaillardet	d2393251f7	use-mpi-f08: fix a typo in [P]MPI_Dist_graph_create_adjacent bindings Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-09-19 14:12:46 +09:00
Jeff Squyres	09d6740a72	Merge pull request #4897 from bosilca/topic/waitsome Be conservative with the array_of_indices	2018-09-18 12:34:22 -04:00
KAWASHIMA Takahiro	e27f519a5e	Merge pull request #5722 from kawashima-fj/pr/pcoll-typo mpiext/pcollreq: fix more typos	2018-09-18 15:41:20 +09:00
KAWASHIMA Takahiro	4a0a2598f6	mpiext/pcollreq: fix more typos Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-09-18 14:45:33 +09:00
Gilles Gouaillardet	8b51862fb2	coll/libnbc: fix various error paths The parameter passed to NBC_Return_handle() was incorrectly casted and not dereferenced. Thanks Yossi for the bug report. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-09-18 13:17:43 +09:00
Gilles Gouaillardet	8dc6985a5a	mpiext/pcollreq: fix misc typos Thanks Jeff for the report Fixes open-mpi/ompi#5712 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-09-18 10:02:04 +09:00
Jeff Squyres	5be0ba0247	common/monitoring: fix include files Move includes to top of file. Set some #defines so that monitoring_prof.c compiles without warning (as identified by gcc 8 on MacOS). Also ensure to include the internal Open MPI "mpi.h" file (not some random system <mpi.h> file). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-09-15 06:04:13 -07:00
Jeff Squyres	06c1bf73da	libnbc: remove some stale/dead code Gcc 8 identified hb_tree_csearch() as an infinite recursion, and it turns out that we never call this function, anyway. So just remove it. Fixes #5670. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-09-15 06:04:13 -07:00
Jeff Squyres	8f2620d3af	misc: compiler warning fixes A variety of small compiler warning fixes. The 2 PMIx fixes are already committed upstream. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-09-15 06:04:13 -07:00
Nathan Hjelm	1071d72130	Merge pull request #5445 from hjelmn/asm_type Update opal to use C11 atomics if available	2018-09-14 12:32:56 -06:00
Aurelien Bouteiller	466217fadd	Always return a valid error code from collective operations Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2018-09-14 13:46:35 -04:00
Nathan Hjelm	000f9eed4d	opal: add types for atomic variables This commit updates the entire codebase to use specific opal types for all atomic variables. This is a change from the prior atomic support which required the use of the volatile keyword. This is the first step towards implementing support for C11 atomics as that interface requires the use of types declared with the _Atomic keyword. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-09-14 10:48:55 -06:00
Ralph Castain	466cad6cb2	Update master to PMIx v4 Retain ext3x for PMIx 3 compatibility Get the blasted permissions correct on config files Signed-off-by: Ralph Castain <rhc@open-mpi.org>	2018-09-13 08:24:17 -07:00
Gilles Gouaillardet	ff48e92864	coll/libnbc: fix NBC_Unpack() always initialize 'size'. Only the a2a_sched_diss() alltoall algorithm is impacted, and this algo is currently unused, so there is no need to backport nor update the NEWS file for now. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-09-13 10:55:29 +09:00
KAWASHIMA Takahiro	69901a5156	mpiext/pcollreq: Fix zero-count reduction We need to return a persistent request. `ompi_request_empty` is not a persistent request. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-09-10 11:19:07 +09:00
Gilles Gouaillardet	42b0e3bd61	Merge pull request #5494 from markalle/apply_romio314_patch_to_master apply romio314 patch to romio321	2018-09-05 10:27:19 +09:00
Nathan Hjelm	1c89631db5	Merge pull request #5630 from hjelmn/osc_portals4_c99 osc/portal: use c99 subobject naming to initialize module	2018-08-30 09:20:55 -06:00
Gilles Gouaillardet	b79b37465c	ompi/hook: plug a misc memory leak Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-08-30 10:07:18 +09:00
Gilles Gouaillardet	316e4e38f4	mtl/psm2: fix a misc memory leak Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-08-30 10:07:17 +09:00
Gilles Gouaillardet	fed33c1530	pml/ob1: plug a memory leak in mca_pml_ob1_component_fini() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-08-30 10:07:17 +09:00
Gilles Gouaillardet	d0d399c9a9	ompi/info: plug memory leaks in ompi_mpiinfo_finalize() Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-08-30 10:07:17 +09:00
Nathan Hjelm	7fdf887937	osc/portal: use c99 subobject naming to initialize module Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-08-29 10:34:24 -06:00
Yossi Itigin	68206a5635	Merge pull request #5569 from hoopoepg/topic/optimize-blocked-calls PML/UCX: blocked calls optimizations	2018-08-29 14:19:09 +03:00
Yossi Itigin	4bb6845888	Merge pull request #5570 from hoopoepg/topic/opal-mem-hooks-syno MCA/COMMON/UCX: added synonym to opal_mem_hook variable	2018-08-29 14:16:33 +03:00
Edgar Gabriel	2303f0f17c	io/base: fixes to file_delete selection logic file_delete triggers underneath the hood the full component selection logic, since we do not have a file handle, just a file name. As part of the selection logic, we have to however initiate the framework-open of the fs component in case of ompio, since ompio will call the delete function of the selected fs componentn, which is based on the file system where the file is located. This was not handled correctly so far. The problem however only shows up if the first I/O operatin to be executed is a file_delete, other wise the file_open will lead to the correct opening and initialization of the fs framework. This commit ensures that we do the right thing even if file_delete is the first file I/O operation in the application. Fixes issue #5611 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-08-27 16:01:48 -05:00
Edgar Gabriel	9b65ec9445	sharedfp/sm and lockedfile: fix naming bug If an application opens a file for reading from multiple processes using MPI_COMM_SELF (or another communicator that has distinct process groups but the same comm-id, as can happen as the result of comm_split), the naming chosen for the lockedfile or the mmapped file used by the sharedfp/sm component would collide. This patch ensures that the filename is different by integrating the process id of rank 0 for each sub-communicator. This fixes one aspect of the problem reported in github issue 5593 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-08-27 12:40:49 -05:00
Nathan Hjelm	2221720fc7	Merge pull request #5591 from hjelmn/osc_rdma_cleanup osc/rdma: clean out stale aggregation code	2018-08-27 10:05:50 -06:00
Ralph Castain	b68ac81efd	Merge pull request #5590 from aravindksg/aravindksg/thread_settings MTL OFI: Ask for FI_THREAD_DOMAIN support as needed	2018-08-27 06:33:57 -07:00
Sergey Oblomov	c201c0abb3	PML/UCX: blocked calls optimizations: removed reset progress count Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-08-27 09:50:39 +03:00
Sergey Oblomov	2cd9e04166	PML/UCX: optimization of mprobe call - renamed vars - renamed of internal variable names - used unsigned datatypes Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-08-27 09:50:39 +03:00
Sergey Oblomov	38e908f83e	PML/UCX: optimization of mprobe call - refactoring of opal/UCX progress calls Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-08-27 09:50:38 +03:00
Sergey Oblomov	b0f87f2235	PML/UCX: blocked calls optimizations - added UCX progress priority Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-08-27 09:50:38 +03:00
Sergey Oblomov	b72dd83f05	MCA/COMMON/UCX: added synonims for common ucx variables - added synonims for atomic/osc modules Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-08-26 18:25:21 +03:00
Jeff Squyres	fe0852bcb4	Miscellaneous compiler warning stomps. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-08-24 07:39:14 -07:00
Nathan Hjelm	feb0e90301	Merge pull request #5589 from hjelmn/threads_cleanup config: remove OPAL_ENABLE_MULTI_THREADS config macro	2018-08-23 15:43:13 -06:00
Nathan Hjelm	d0cd80e902	osc/rdma: clean out stale aggregation code The aggregation code in osc/rdma is currently broken and will likely not be reused. This commit cleans it out. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-08-23 15:40:21 -06:00
Aravind Gopalakrishnan	5cbcae79d8	MTL OFI: Ask for FI_THREAD_DOMAIN support when not using MPI_THREAD_MULTIPLE When an application is not using multiple threads to call into MPI, we can safely ask for FI_THREAD_DOMAIN setting from the provider as it should translate to the least amount of locking in provider. Conversely, for applications using THREAD_MULTIPLE, explicitly ask for FI_THREAD_SAFE to prevent race conditions. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-08-23 14:18:32 -07:00
Nathan Hjelm	1c84f48640	config: remove OPAL_ENABLE_MULTI_THREADS config macro We long ago hard-coded this value to 1. This commit cleans it out entirely. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-08-23 13:47:02 -06:00
Ralph Castain	f7655280cb	Merge pull request #5503 from aravindksg/aravindksg/fix_ofi_race MTL OFI: Fix race condition due to global progress entries array	2018-08-22 14:31:38 -07:00
Nathan Hjelm	29320872b3	osc/rdma: quiet warning gcc complains about ret possibly being used uninitialized. That will never happen but we should still quiet the warning. This commit sets ret to a valid value. Fixes #5513 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-08-21 15:54:53 -06:00
Nathan Hjelm	438c40de03	osc/pt2pt: use c99 for module initialization Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-08-21 11:23:33 -06:00
Sergey Oblomov	e00f7a68ba	MCA/COMMON/UCX: added synonim to opal_mem_hook variable - added synonim to opal_mem_hook variable to allow to print it in opal_info -a Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-08-21 15:05:12 +03:00
Edgar Gabriel	e6a344ba63	Merge pull request #5561 from edgargabriel/pr/file_open_sharedfp_ordering common/ompio: fix an ordering problem during file_open	2018-08-20 10:18:14 -05:00
Edgar Gabriel	eaabfdd028	Merge pull request #5539 from DDNStorage/ime-support ompio: support for DDN's Infinite Memory Engine	2018-08-20 09:52:22 -05:00
Edgar Gabriel	2742273ee3	common/ompio: fix an ordering problem during file_open the sharedfp component has to be selected and opened before we set the default file view during file_open. Otherwise there is a sperious error message from the sharefp_file_seek operation that is called during the file_set_view. Fixes Issue #5560 Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-08-20 09:28:29 -05:00
Jeff Squyres	8a0b5454ae	fortran/use TKR: remove excess declaration for PMPI_Type_extent This declaration was accidentally left behind in `89da9651bb`. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-08-16 10:31:41 -07:00
Gaëtan Bossu	ccc96efc2e	DDN's Infinite Memory Engine support for OMPIO Changes made: - Create a new fs component for IME - Create a new fbtl component for IME - Modify the close function of OMPIO to finalize IME if necessary Signed-off-by: Gaëtan Bossu <gbossu@ddn.com> Signed-off-by: Sylvain Didelot <sdidelot@ddn.com>	2018-08-16 11:45:47 +02:00
Aravind Gopalakrishnan	ed2343034d	MTL OFI: Fix race condition due to global progress entries array Since progress entries array is globally allocated, it is susceptible to race conditions when using multi-threaded applications. Allocating it on the stack resolves any potential races as it is thread local by default. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-08-09 10:52:28 -07:00
Jeff Squyres	89773c41a2	Fix script abstraction break: mv make_manpage.pl to config Having the "make_manpage.pl" script in the ompi/ tree broke "./autogen.pl --no-ompi" (specifically: "make distcheck" of --no-ompi builds would break). Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-08-08 08:50:55 -07:00
Todd Kordenbrock	e9f378e851	Merge pull request #5500 from tkordenbrock/topic/master/fix.PtlMEUnlink.in.use coll-portals4: retry PtlMEUnlink() if PTL_IN_USE	2018-08-07 11:21:00 -05:00
Nathan Hjelm	c294bbc352	Merge pull request #5508 from hjelmn/fuzzy_match Bring fuzzy matching support into master	2018-08-06 13:52:04 -06:00
Nathan Hjelm	eeae3f9b93	Merge pull request #5517 from bosilca/topic/treematch_warnings Remove few warnings identified by @rhc in #5514.	2018-08-06 13:25:07 -06:00
Matthew Dosanjh	c8d13486cc	Fixed promotion bug Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-08-06 12:56:36 -06:00
Boris Karasev	57683366ca	pmix: added check for pmix fence status Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2018-08-06 15:01:57 +06:00
George Bosilca	6d11a45f44	Remove few warnings identified by @rhc in #5514 . Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-08-03 16:21:06 -04:00
George Bosilca	a5fbfa476a	Be conservative with the array_of_indices We were assuming that the array_of_indices has the same size as the number of requests (incount), instead of the numberr of actually active requests. While the patch is trivial, the question of the size of the array_of_indices should be clarified in the MPI Forum. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2018-08-03 14:58:13 -04:00
Nathan Hjelm	dd74c6252f	pml/ob1: custom matching cleanup and configury This commit updates the new custom matching code in pml/ob1 so it can not be enabled with a configure option. This commit also renames the fuzzy-matching headers to avoid potential name conflicts and removes the use of C reserved identifiers. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-08-02 13:06:19 -06:00
Matthew Dosanjh	572694b621	Adding custom match source. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-08-02 12:23:08 -06:00
Ralph Castain	1aef0a64aa	Merge pull request #5477 from nrspruit/ns_mtl_send_isend MTL OFI: send/isend split into blocking/non-blocking paths	2018-07-31 13:08:37 -07:00
Ralph Castain	8744320a18	Merge pull request #5476 from nrspruit/ns_cancel_fix MTL OFI: Fix Deadlock in fi_cancel given completion during cancel	2018-07-31 13:07:41 -07:00
Todd Kordenbrock	f3f2a826b4	coll-portals4: retry PtlMEUnlink() if PTL_IN_USE In the cleanup phase, it is possible for PtlMEUnlink() to return PTL_IN_USE if the NIC is not done with the ME. This should not be considered an error. This commit adds a retry loop around PtlMEUnlink(). In some cases, the return value of PtlMEUnlink() and PtlCTFree() was not checked at all. Check them with the same retry loop as above. Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>	2018-07-31 10:20:55 -05:00
Mark Allen	f413ef6b14	apply romio314 patch to romio321 When romio314 was first pulled in an extra patch was applied to it, see commit `92f6c7c1e2`. Most of that patch is already present in vanilla romio321, but the fix for MPIO_DATATYPE_ISCOMMITTED() isn't. If that macro doesn't set err_ then some paths end up with a variable being used uninitialized. In particular you can trace through romio321/romio/mpi-io/read.c to see what happens with error_code. It's an uninitialized stack variable that goes through three MPIO_CHECK_* macros none of which set it. The macros consistently set error_code to a failure if they see something wrong, but they don't consistently set it to success when things are fine. And then in the last macro MPIO_CHECK_DATATYPE it tries to look at the value of error_code that was never set. Signed-off-by: Mark Allen <markalle@us.ibm.com>	2018-07-30 17:14:56 -04:00
Sergey Oblomov	d204b8a678	PML/SPML/UCX/COMPONENT: applied C99 initialization Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-28 09:44:03 +03:00
Mikhail Kurnosov	b45e190e66	coll/base/allgatherv: fix MPI_IN_PLACE processing The call of MPI_Allgatherv with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault. The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-07-27 09:34:17 +07:00
Sergey Oblomov	2806504290	PML/SPML/UCX: init global objects using C99 style - to avoid value mix used C99 style of object initializations Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-25 14:52:45 +03:00
Spruit, Neil R	7dc8c8ba3f	MTL OFI: send/isend split into blocking/non-blocking paths -Updated blocking send to directly call functionality and set completion events expected to 0 initally. This allows for optimization for providers that support fi_tinject up to larger sizes. This also reduces latency on running the OFI mtl with smaller sizes without requiring calls to progress given fi_tinject is required to complete the messaging before returning and will not create any events in the Completion Queue. -Updated non-blocking send to directly call fi_tsend and avoid calling fi_tinject as the functionality should not wait on completions. This resolves a bug where applications calling MPI_Isend can overrun the TX buffer with small (inject) messages causing a deadlock. In addition this improves performance in message rates by preventing waiting on any size message to complete in non-blocking send messages. -Created common ompi_mtl_ofi_ssend_recv function to post the ssend recv which is common between isend and send code paths. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-24 07:54:24 -07:00
Spruit, Neil R	767135c580	MTL OFI: Fix Deadlock in fi_cancel given completion during cancel - If a message for a recv that is being cancelled gets completed after the call to fi_cancel, then the OFI mtl will enter a deadlock state waiting for ofi_req->super.ompi_req->req_status._cancelled which will never happen since the recv was successfully finished. - To resolve this issue, the OFI mtl now checks ofi_req->req_started to see if the request has been started within the loop waiting for the event to be cancelled. If the request is being completed, then the loop is broken and fi_cancel exits setting ofi_req->super.ompi_req->req_status._cancelled = false; Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-24 03:12:44 -07:00
Matias Cabral	d996f529c0	MTL OFI: Add support for mem_tag_format OFI providers may reserve some of the upper bits of the tag for internal usage and expose it using mem_tag_format. Check for that and adjust communicator bits as needed. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2018-07-23 11:39:40 -07:00
Matias Cabral	30fb635836	Merge pull request #5446 from nrspruit/ns_mtl_ofi_overflow MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow	2018-07-20 14:53:53 -07:00
Sergey Oblomov	6fe0a73861	PML/UCX: fixed ucp request free on persistent request completion - in sine cases persistent request was deleted during completion callback, this cause double free of linked UCX request (assert in debug build or hang in release build) - UCX request is freed prior completion calback Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-20 19:32:20 +03:00
Yossi Itigin	bdb6ece3dd	Merge pull request #5452 from hoopoepg/topic/osc-ucx-fox-hang OSC/UCX: fixed hang on OSC init	2018-07-19 13:57:51 +03:00
Sergey Oblomov	fa33e322e7	OSC/UCX: code deduplication Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-19 12:39:15 +03:00
Sergey Oblomov	6f0a7a2005	OSC/UCX: opal progress register/unregister optimization Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-19 12:07:26 +03:00
Yossi Itigin	29812494f2	Merge pull request #5402 from hoopoepg/topic/common-del-procs MCA/COMMON/UCX: del_procs calls are unified to common module	2018-07-19 11:19:45 +03:00
Sergey Oblomov	55b934bacf	OSC/UCX: enable progress when at least one window is allocated Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-18 17:52:30 +03:00
Sergey Oblomov	a081fba046	OSC/UCX: fixed hang on OSC init - there worked progress was missed on startup which caused hang on one of ranks Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-18 17:01:53 +03:00
Edgar Gabriel	b6b9552ca9	Merge pull request #5444 from gbossu/fix-file-delete io/ompio: Call component-specific file_delete function instead of POSIX unlink	2018-07-18 08:45:57 -05:00
Sergey Oblomov	920cc2e0d9	MCA/COMMON/UCX: del_procs calls are unified to common module Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-18 07:37:25 +03:00
Mikhail Kurnosov	540c2d1617	coll-base-allgather: fix MPI_IN_PLACE processing The call of MPI_Allgather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault. The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-07-18 10:27:00 +07:00
Gilles Gouaillardet	fed1e7766e	Merge pull request #5430 from ggouaillardet/pr/pcollreq-fort mpiext/pcollreq: add Fortran bindings	2018-07-18 09:52:59 +09:00
Joshua Ladd	3add13c72e	Merge pull request #5441 from hoopoepg/topic/ucx-memhooks-to-common-module MCA/COMMON/UCX: shift opal memhooks into common UCX	2018-07-17 15:52:44 -04:00
Matias Cabral	be3cb01cb4	Merge pull request #5397 from nrspruit/ns_ofi_mtl_ssend MTL OFI: Redesign sync send with reduced tag bits and quick ack	2018-07-17 10:14:33 -07:00
Gaëtan Bossu	8522ba112c	MCA/IO/OMPIO: fix MPI_File_delete implementation. OMPIO now uses the correct delete function depending on the fs mca_common_ompio_file_delete now works this way instead of calling POSIX unlink: - create a minimal file handle with the given file name - select the best fs component using this file handle - call the component-specific file delete function Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>	2018-07-17 18:17:13 +02:00
Gaëtan Bossu	ac6f75e3d1	MCA/FS: check communicator validity in query functions It is needed because the fs components might be queried due to a MPI_File_delete call. And in this case, we don't have a communicator value. Signed-off-by: Gaëtan Bossu <gbossu@ddn.com>	2018-07-17 18:16:21 +02:00
Josh Hursey	9aa5168795	Merge pull request #5353 from ggouaillardet/topic/romio321_grequests io/romio321: make grequest extensions internal	2018-07-17 10:53:53 -05:00
Gilles Gouaillardet	1a41482720	coll/libnbc: do not recursively call opal_progress() instead of invoking ompi_request_test_all(), that will end up calling opal_progress() recursively, manually check the status of the requests. the same method is used in ompi_comm_request_progress() Refs open-mpi/ompi#3901 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-17 09:45:08 -06:00
Sergey Oblomov	1c7ae22dfb	MCA/COMMON/UCX: shift opal memhooks into common UCX Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-17 13:46:38 +03:00
Spruit, Neil R	d4f408a7f8	MTL OFI: MTL_OFI_RETRY_UNTIL_DONE support for Resource overflow - Added support in MTL_OFI_RETRY_UNTIL_DONE to handle -FI_EAGAIN from the provider and correctly attempt to progress the OFI Completion queue by calling ompi_mtl_ofi_progress. - If events were pending that blocked OFI operations from being enqueued they will be completed and the OFI operation will be retried once ompi_mtl_ofi_progress has successfully completed. - Updated MTL_OFI_RETRY_UNTIL_DONE to take a RETURN variable instead of requiring the existance of a "ret" variable to pass back the return value from completing the OFI operation. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-17 03:00:38 -07:00
Gilles Gouaillardet	47351b7fac	mpiext/pcollreq: Add Fortran use-mpi-f08 bindings Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-17 16:29:41 +09:00
Kurita, Takehiro	73e038ec18	mpiext/pcollreq: Add Fortran use-mpi bindings Signed-off-by: Kurita, Takehiro <fj6370fp@aa.jp.fujitsu.com>	2018-07-17 16:29:41 +09:00
Gilles Gouaillardet	9e0115c980	mpiext/pcollreq: Add Fortran mpif-h bindings Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-17 16:29:33 +09:00
Gilles Gouaillardet	44110a575d	mpiext/pcollreq: do include PMPIX_* subroutines to C bindings Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-17 16:29:33 +09:00
KAWASHIMA Takahiro	5ddf0f6418	mpi/fortran: Fix IN_PLACE detection of ISCATTER(V) Blocking `MPI_SCATTER` and `MPI_SCATTERV` were fixed in `506d0e96f4` but noblocking `MPI_ISCATTER` and `MPI_ISCATTERV` were not fixed yet. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-07-17 14:15:21 +09:00
Mikhail Kurnosov	ba83cc91eb	coll/base: add MPI_Bcast based on a binomial tree scatter followed by a ring allgather Implements MPI_Bcast using a binomial tree scatter followed by a ring allgather. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-07-16 08:56:09 -06:00
Gilles Gouaillardet	61b3308871	mpiext/pcollreq: check subroutine parameters and add profiling symbols - check subroutine parameters - implement PMPIX_* subroutines Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-14 14:14:37 +09:00
Gilles Gouaillardet	dec1663364	spc: add missing subroutines add counters for : - MPI_Exscan - MPI_Iexscan - MPI_Igatherv Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-14 14:14:37 +09:00
Howard Pritchard	9a5fd48388	Merge pull request #5079 from jsquyres/pr/fortran-is-the-devil status_set_cancelled: fix F08 binding	2018-07-13 15:36:02 -05:00
Joshua Ladd	b12868239c	Merge pull request #4765 from xinzhao3/topic/osc-ucx-mem-hook OMPI/OSC/UCX: move memory hooks init in osc to win creation.	2018-07-13 09:36:20 -04:00
Xin Zhao	74ef51af1b	OMPI/OSC/UCX: move memory hooks init in osc to win creation. Move memory hooks init (for request based operation) in osc ucx to window creation time, to avoid performance issue in MPI initialization. Signed-off-by: Xin Zhao <xinz@mellanox.com>	2018-07-12 15:03:02 -07:00
Nathan Hjelm	304a6a52d4	osc/rdma: use local base for local process when possible This commit fixes a crash that occurs when using btl/vader as an RDMA btl. This btl supports using CPU atomics and does not support using the btl for self communication so we must use the local memory optimizations in osc/rdma. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-07-12 15:50:50 -06:00
KAWASHIMA Takahiro	c87a3df0c9	Merge pull request #5416 from kawashima-fj/pr/coll-libnbc-suppress-warnings coll/libnbc: Suppress compiler warnings	2018-07-12 15:45:59 +09:00
KAWASHIMA Takahiro	37a05e74aa	coll/libnbc: Suppress compiler warnings Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-07-12 14:42:39 +09:00
KAWASHIMA Takahiro	0021616984	pml/ob1: Fix data corruption of MPI_BSEND Data transferred by `MPI_BSEND` may corrupt if all of the following conditions are met. - The message size is less than the eager limit. - The `btl_alloc` function in the BTL interface returns `NULL` for some reason. - The MPI program overwrites the send buffer after `MPI_BSEND` returns. The problem is in the way of pending a send request in ob1 PML. The `mca_pml_ob1_send_request_start_copy` function retruns `OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns `des = NULL`. In this case, the send request is added to the `send_pending` list and `MPI_BSEND` returns immediately. Next time the `mca_pml_ob1_send_request_start_copy` function tries sending, the user buffer may have been overwritten by the MPI program. Call hierarchy of `MPI_BSEND`: ``` MPI_Bsend mca_pml_ob1_send if (MCA_PML_BASE_SEND_BUFFERED == sendmode) mca_pml_ob1_isend MCA_PML_OB1_SEND_REQUEST_START_W_SEQ mca_pml_ob1_send_request_start_seq mca_pml_ob1_send_request_start_btl if (size <= eager_limit) if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED) mca_pml_ob1_send_request_start_copy mca_bml_base_alloc btl_alloc if (OMPI_ERR_OUT_OF_RESOURCE == rc) add_request_to_send_pending ompi_request_free ``` To solve this problem, we should save the data to the buffer attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`. This problem was introduced by ob1 optimization (commits `2b57f422` and `a06e491c`) in v1.8 series. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2018-07-12 14:30:58 +09:00
Howard Pritchard	34bc77747c	Merge pull request #5388 from mkurnosov/base-gather-bmtree-fix-mpi-in-place coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing	2018-07-11 18:34:35 -05:00
Nathan Hjelm	35a75a6bf5	osc/sm: avoid filename collision when multiple windows share same CID This commit fixes an issue identified by MTT where we can have two different sets of processes on the same node creating a shared memory window with communicators sharing the same CID. To avoid this issue the temporary filename now includes the creating processes vpid. References #5363 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-07-11 14:32:27 -06:00
Nathan Hjelm	037656bc1d	osc/rdma: fix bug introduced in `b90c838` This commit fixes an bug that was introduced back in 2016 which impacts request-based RMA in some cases. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-07-10 18:17:55 -06:00
Gilles Gouaillardet	76292951e5	coll/libnbc: fix integer overflow Use internal pack/unpack subroutines that operate on MPI_Aint instead of int and hence solve some integer overflows. Thanks Clyde Stanfield for reporting this issue. Refs open-mpi/ompi#5383 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-07-09 10:08:33 -06:00
Mikhail Kurnosov	22fa5a8a67	coll/base/scatter: replaces right skewed binomial tree (in order) with left skewed binomial tree Current implementation of `coll/base/MPI_Scatter` is based on in-order binomial tree. This tree is right skewed and it provides good performance for a MPI_Gather operation. But for a MPI_Scatter operation left skewed binomial tree is effective. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-07-09 10:04:41 -06:00
Spruit, Neil R	9a17864278	MTL OFI: Redesign sync send with reduced tag bits and quick ack -Updated the design for sync send MPI calls to use 2 protocol bits for denoting "sync_send" or "sync_send_ack". -"Sync_send" is added to the send tag only and is masked out in receives such that it can be read by the original Recv posted in the send/recv operation. -"Sync_send_ack" is sent from the recv callback to the send side. This 0 byte send does not generate a completion entry and instead sends the message and immediately completes the opal completion in the recv. -Tag formats ofi_tag_1 and ofi_tag_2 have been updated to include 2 more tag bits per format type due to the reduced protocal bits required by OMPI. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>	2018-07-09 06:50:21 -07:00
Yossi Itigin	e77e31b50b	Merge pull request #5378 from hoopoepg/topic/unify-ucx-logging MCA/COMMON/UCX: unified logging across all UCX modules	2018-07-08 12:45:26 +03:00
Mikhail Kurnosov	b9e14cd7d0	coll/base/gather_intra_binomial: fix MPI_IN_PLACE processing The call of MPI_Gather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault in the root process. The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard (page 150, line 37), sendtype and sendcount parameters should be ignored in this case. Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>	2018-07-07 20:59:39 +07:00
Sergey Oblomov	240670152e	MCA/COMMON/UCX: code beautify - alignment Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-06 19:40:58 +03:00
Sergey Oblomov	eb7010933d	OSC/UCX: suppressed compilation warnings - suppressed sing/unsign-compare warnings Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-06 10:58:09 +03:00
Sergey Oblomov	bef47b792c	MCA/COMMON/UCX: unified logging across all UCX modules - added common logging infrastructure for all UCX modules - all UCX modules are switched to new infra Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-05 16:25:39 +03:00
Sergey Oblomov	8080283b3d	MCA/COMMON/UCX: changed return type for wait_request - for now wait_request returns OMPI status - updated callers Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-04 23:29:38 +03:00
Sergey Oblomov	c2bd6af9f2	MCA/COMMON/UCX: minor unification of del_proces calls - some common functionality of del_procs calls is moved into mca_common module - blocking ucp_put call is replaced by non-blocking routine Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-07-02 15:10:53 +03:00
Yossi Itigin	09c10d5e09	Merge pull request #5345 from hoopoepg/topic/pml-ucx-suppress-compiler-warning PML/UCX: suppressed compilation warning	2018-07-02 13:41:12 +03:00
Edgar Gabriel	d191ed6b4f	fs/base: move redundant code to fs/base moving some code from fs/ufs into fs/base. The benefit of this approach is that fs components that are fundamentally based on posix I/O (and only differ in some non-posix functionality such as setting stripe size, or which hints are being supported) can avoid having to replicate the same code over and over again. First beneficiary is the lustre fs component, but more are to follow soon. Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>	2018-07-01 10:20:32 -05:00
Xin Zhao	c1ac0c00c5	Merge pull request #5185 from jjolly/fix-memcpy-size-mismatch - Build warning: stringop-overflow in get_dynamic_win_info() at osc_ucx_comm.c	2018-06-29 19:37:53 -07:00
Jeff Squyres	f4320193e3	mpi.h.in: remove some deprecation/removed warnings Intentionally do not mark some MPI-1 function pointer typedefs as `__mpi_interface_removed__` because we have to use them in prototyping some MPI-1 functions when `--enable-mpi1-compatibility` is used. Fixes #5357. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-29 07:43:51 -07:00
Gilles Gouaillardet	7363906e4e	io/romio321: make grequest extensions internal Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2018-06-29 16:41:27 +09:00
Jeff Squyres	c1ccbece2f	Merge pull request #5347 from jsquyres/pr/fix-f90-removed-interfaces F90 removed interfaces: add missing "end interface"	2018-06-27 13:54:02 -04:00
Jeff Squyres	768b800533	F90 removed interfaces: add missing "end interface" Thanks to @fsciortino for reporting. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2018-06-27 13:02:16 -04:00
Sergey Oblomov	074f30ba27	PML/UCX: suppressed compilation warning Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-27 12:05:07 +03:00
Yossi Itigin	aca61a6bfb	Merge pull request #5238 from hoopoepg/topic/fixed-coverity-issues-ucx-pml UCX/PML: fixed few coverity issues	2018-06-27 11:14:06 +03:00
Nathan Hjelm	4c230683e7	osc/sm: fix a typo This commit fixes a typo where a bcast is used instead of the intended collective (barrier). References #5262 Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2018-06-26 12:53:12 -06:00
Sergey Oblomov	502d04bf12	UCX/PML/SPML: fixed few coverity issues - fixed incorrect pointer manipulation/free - cleaned dead code - minor optimization on process delete routine - fixed error handling - free pointers - added debug output for woker flush failure Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2018-06-26 18:52:39 +03:00
Yossi Itigin	ee873f4f79	Merge pull request #5322 from hoopoepg/topic/mca-ucx-common MCA/UCX: added common module	2018-06-26 13:54:12 +03:00
Gilles Gouaillardet	e609cf7bc3	Merge pull request #5337 from ggouaillardet/topic/generalized_requests ompi/requests: implement generalized request extensions	2018-06-26 13:01:04 +09:00
KAWASHIMA Takahiro	a8da78eeaa	Merge pull request #4618 from ggouaillardet/topic/pcoll Add the persistent collectives feature	2018-06-26 12:36:34 +09:00

... 5 6 7 8 9 ...

10695 Коммитов