openmpi

Автор	SHA1	Сообщение	Дата
Yossi Itigin	9b91cf09cc	Merge pull request #6481 from hoopoepg/topic/check-ucx-params PML/SPML/UCX: added evaluation of mmap events	2019-03-14 11:53:42 +02:00
Austen Lauria	b61e6242d3	Fix integer overflows with indexed datatype creation. The types of count, disp, and extent passed into ompi_datatype_add() should be size_t, ptrdiff_t and ptrdiff_t, respectively. This prevents integer overflows and errors in computing the size of large indexed datatypes. Signed-off-by: Austen Lauria <awlauria@us.ibm.com>	2019-03-13 09:39:57 -04:00
Sergey Oblomov	d8e3562bae	PML/SPML/UCX: added evaluation of mmap events - there was a set of UCX related issues reported which caused by mmap API hooks conflicts. We added diagnostic of such problems to simplify bug-resolving pipeline Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>	2019-03-12 21:14:27 +02:00
Geoff Paulsen	a14bb4bc89	Merge pull request #6471 from hppritcha/topic/issue_6470 ompi_info: report whether MPI1 compat is enabled	2019-03-11 21:11:55 -05:00
Howard Pritchard	61ccc65302	ompi_info: report MPI1 compat is disabled MPI1 compat disabled beyond v4.0.x Related to #6470 Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2019-03-11 13:50:29 -06:00
Gilles Gouaillardet	26c1b833c7	man: remove man pages of removed MPI1 subroutines Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-03-05 15:01:07 +09:00
Nathan Hjelm	73085e9ce3	Merge pull request #6413 from nuriallv/issue_osc_rdma osc/rdma: fix when determining the node with the rank_array info for a peer	2019-02-27 16:30:06 -07:00
Geoffrey Paulsen	a6d6be2853	mpi.h.in: delete removed MPI1 functions/datatypes (API change!) This commit DELETES the removed MPI1 functions and datatypes from both the mpi.h header and from the library (they were deleted from the MPI standard in MPI-3.0). WARNING: This changes the MPI API in a non-backwards compatible way. This also removes the configure option that was added in Open MPI v4.0.x, requiring users to change their apps if they are using any of these almost 20 year old APIs. This commit removes the following MPI1 removed functions and datatypes: MPI_Address MPI_Errhandler_create MPI_Errhandler_get MPI_Errhandler_set MPI_Type_extent MPI_Type_hindexed MPI_Type_hvector MPI_Type_struct MPI_Type_UB MPI_Type_LB Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>	2019-02-27 08:24:11 -08:00
Geoffrey Paulsen	3136a1706c	mpi.h.in: Revamp MPI-1 removed function warnings Refs https://github.com/open-mpi/ompi/issues/6278. This commit is intended to be cherry-picked to v4.0.x and the following commit will ammend to this functionality for master's removal. Changes the prototypes for MPI removed functions in the following ways: There are 4 cases: 1) User wants MPI-1 compatibility (--enable-mpi1-compatibility) MPI_Address (and friends) are declared in mpi.h with deprecation notice 2) User does not want MPI-1 compatibility, and has a C11-capable compiler Declare an MPI_Address (etc.) macro in mpi.h, which will cause a compile-time error using _Static_assert C11 feature 3) User does not want MPI-1 compatibility, and does not have a C11-capable compiler, but the compiler supports error function attributes. Declare an MPI_Address (etc.) macro in mpi.h, which will cause a compile-time error using error function attribute. 4) User does not want MPI-1 compatibility, and does not have a C11-capable compiler, or a compiler that supports error function attributes. Do not declare MPI_Address (etc.) in mpi.h at all. Unless the user is compiling with something like -Werror, this will allow the user's code to compile. We are choosing this because it seems like a losing battle to make some kind of compile time error that is friendly to the user (and doesn't make it look like mpi.h itself is broken). On v4.0.x, this will allow the user code to both compile (albeit with a warning) and link (because the MPI_Address will be in the MPI library because we are preserving ABI back to 3.0.x). On master/v5.0.x, this will allow the user code to compile, but it will fail to link (because the MPI_Address symbol will not be in the MPI library). Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>	2019-02-27 08:24:11 -08:00
bosilca	8400502d8a	Merge pull request #6353 from bosilca/topic/fix_monitoring_pvar Fix the PVAR allocation usage.	2019-02-25 16:03:56 -05:00
Howard Pritchard	9b3a9c2579	Merge pull request #6417 from abouteiller/bugfix/cart_create_cid Cart/Graph create would not run the next_cid algorithm	2019-02-22 13:05:59 -07:00
Howard Pritchard	d6cdbdfd39	Merge pull request #6412 from hppritcha/topic/fix_pgi_usempif08 fortran:fix for PGI linking	2019-02-21 20:31:14 -07:00
Aurelien Bouteiller	fb17115ba9	Cart/Graph create would not run the next_cid algorithm and create disjoint communicator with inconsistent cid. Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>	2019-02-21 11:40:22 -05:00
Howard Pritchard	266bc3aced	fortran:use mpif08 fix for PGI linking commit c6070fd2e broke building fortran bindings with PGI compilers. Turns out PGI compilers need to link in the *.o from a module file whether or not there are module subroutines defined or not in the module file. Related to #6411 Signed-off-by: Howard Pritchard <howardp@lanl.gov>	2019-02-20 12:33:25 -07:00
Nuria Losada	3cae149262	osc/rdma: fix when determining the node with the rank_array info for a peer Signed-off-by: Nuria Losada <nlosada@icl.utk.edu>	2019-02-20 13:12:00 -05:00
Artem Polyakov	13a8e42108	Merge pull request #6163 from artpol84/osc/mt_submission Refactoring of osc/ucx component for MT	2019-02-20 09:41:27 -08:00
Gilles Gouaillardet	ad114be28c	configury: automatically select rte/pmix runtime if ORTE project is not built Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-20 13:55:55 +09:00
Gilles Gouaillardet	69d136ae5e	ompi/pmix: fix misc OPAL function calls Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-20 13:55:55 +09:00
Gilles Gouaillardet	18f679efac	Merge pull request #6401 from ggouaillardet/topic/osc_rdma_self osc/rdma: correctly handle communications to self	2019-02-20 11:43:22 +09:00
KAWASHIMA Takahiro	7095ad10a5	man: fix more typos in MPI_Win_attach man page Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-20 11:22:38 +09:00
Gilles Gouaillardet	7c0596819b	man: fix typos in MPI_Win_{attach,detach} man pages no code change [skip ci] bot:notest Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-20 11:09:45 +09:00
Gilles Gouaillardet	fe05fcc11a	osc/rdma: correctly handle communications to self mark the "self" peer OMPI_OSC_RDMA_PEER_LOCAL_BASE when the window is dynamically created and use_cpu_atomics is set in order to correctly handle communications to self. Thanks Bart Janssens for reporting this issue. Refs. open-mpi/ompi#6394 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-20 09:52:17 +09:00
Artem Polyakov	19e2ae2efb	opal/common/ucx: Switch to opal/tsd Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-19 14:22:07 -08:00
Artem Polyakov	7984d7d997	opal/common/ucx: Remove unused debugging macro Will be reintroduced later if needed and after adaptation to the OMPI infrastructure. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-19 14:22:07 -08:00
Artem Polyakov	43f16d8796	opal/common/ucx: Remove common_ucx_int.h Place the content of common_ucx_int.h back to the common_ucx.h and include common_ucx_wpool.h explicitly. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-19 14:22:07 -08:00
Xin Zhao	c6de09940f	ompi/osc/ucx: Switch osc/ucx code to use Worker Pool. Signed-off-by: Artem Polyakov <artpol84@gmail.com>	2019-02-19 14:22:07 -08:00
Yossi Itigin	91d05f91e2	Merge pull request #6384 from brminich/topic/ucx_worker_net_address PML/UCX: Use net worker address for remote peers	2019-02-17 12:21:00 +02:00
Matias A Cabral	25bdd118ac	MTL_OFI: Changed Recv cancel to be non-blocking Updated the OFI MTL's Recv cancel to be a non-blocking call to match the MPI spec. Given fi_cancel succeeded, then it is expected that the user will wait on the request to read the result of if the cancel has completed. Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com	2019-02-14 17:07:20 -05:00
Mikhail Brinskii	751d88192d	PML/UCX: Use net worker address for remote peers For remote node peers pack smaller worker address, which contains network device addresses only. This would reduce amount of OOB traffic during startup. Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>	2019-02-14 18:06:36 +02:00
Brian Barrett	7a593cea4a	Merge pull request #6361 from aravindksg/fix_tg_segfault mtl/ofi: Fix segfault when not using Thread-Grouping feature	2019-02-12 12:04:26 -08:00
Ralph Castain	125d236173	Move from the use of regex to compression We've been fighting the battle of trying to create a regex generator and parser that can handle arbitrary hostname schemes - without long-term success. The worst of it is that there is no way of checking to see if the computed regex is correct short of parsing it and doing a character-by-character comparison with the original string. Ugh...there has to be a better solution. One option is to investigate using 3rd-party regex libraries as those are coming from communities whose sole focus is resolving that problem. However, someone would need to spend the time to investigate it, and we'd have to find a license-friendly implementation. Another option is to quit beating our heads against the wall and just compress the information. It won't be as much of a reduction, but we also won't keep hitting scenarios where things break. In this case, it seems that "perfection" is definitely the enemy of "good enough". This PR implements the compression option while retaining the possibility of people adding regex-generating components. The compression code used in ORTE is consolidated into the opal/compress framework. That framework currently held bzip and gzip components for use in compressing checkpoint files - since we no longer support C/R, I have .opal_ignore'd those components. However, I have left the original framework APIs alone in case someone ever decides to redo C/R. The APIs of interest here are added to the framework - specifically, the "compress_block" and "decompress_block" functions. I then moved the ORTE zlib compression code into a new component in this framework. Unfortunately, the framework currently is a single-select one - i.e., only one active component at a time. Since I .opal_ignore'd the other two and made the priority of zlib high, this isn't a problem. However, if someone wants to re-enable bzip/gzip or add another component, they might need to transition opal/compress to a multi-select framework. Included changes: * Consolidate the compression code into the opal/compress framework * Move the ORTE zlib compression code into a new opal/compress/zlib component * Ignore the bzip and gzip components in opal/compress framework * Add a "compress_base_limit" MCA param to set the threshold above which we compress data - defaults to 4096 bytes * Delete stale brucks and rcd components from orte/grpcomm framework * Delete the orte/regx framework * Update the launch system to use opal/compress instead of string regex * Provide a default module if no zlib is available * Fix some misc multi-node issues * Properly generate the nidmap in response to a "connection warmup" message so the remote daemon knows the children it needs to launch. * Remove stale references to orte_node_regex * opal_byte_object_t's are not OPAL objects - properly release allocated memory. * Set the topology * Currently only handling homogeneous case * Update the compress framework files to conform * Consolidate open/close into one "frame" file. Ensure we open/close the framework Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-02-08 11:11:14 -08:00
KAWASHIMA Takahiro	8bbd201029	Merge pull request #6205 from kawashima-fj/pr/fp16 Add FP16 datatypes	2019-02-08 14:52:13 +09:00
Aravind Gopalakrishnan	6edcc479c4	mtl/ofi: Fix segfault when not using Thread-Grouping feature For the non thread-grouping paths, only the first (0th) OFI context should be used for communication. Otherwise this would access a non existant array item and cause segfault. While at it, clarifiy some content regarding SEPs in README (Credit to Matias Cabral for README edits). Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-02-07 11:52:53 -08:00
Jeff Squyres	f5e1a672cc	ofi: revamp OPAL_CHECK_OFI configury Update the OPAL_CHECK_OFI configury macro: - Make it safe to call the macro multiple times: - The checks only execute the first time it is invoked - Subsequent invocations, it just emits a friendly "checking..." message so that configure output is sensible/logical - With the goal of ultimately removing opal/mca/common/ofi, rename the output variables from OPAL_CHECK_OFI to be opal_ofi_{happy\|CPPFLAGS\|LDFLAGS\|LIBS}. - Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions. - Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that causes the macro to be invoked at a fairly random time, which makes configure stdout confusing / hard to grok. - Remove a little left-over kruft in OPAL_CHECK_OFI, too (which resulted in an indenting change, making the change to opal_check_ofi.m4 look larger than it really is). Thanks Alastair McKinstry for the report and initial fix. Thanks Rashika Kheria for the reminder. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Jeff Squyres	aba2571881	mtl/ofi/Makefile.am: down with tabs! Replace all tabs with spaces. No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Gilles Gouaillardet	945f830f7a	mtl/ofi: fix configury when VPATH is used Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-07 06:29:58 -08:00
Matias Cabral	0601b3e982	Merge pull request #6325 from aravindksg/fix_help_reference mtl/ofi: Fix reference to help text object	2019-02-05 07:22:51 -08:00
George Bosilca	e42b573cd3	Fix the PVAR allocation usage. According to the MPI standard the obj_handle is a pointer to an MPI object, and therefore cannot be MPI_COMM_WORLD. The MPI standard example 14.6 highlight this usage. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-02-02 19:03:43 -05:00
KAWASHIMA Takahiro	f8a441957a	mpiext/shortfloat: Add `MPIX_C_FLOAT16` datatype `MPIX_C_FLOAT16` is defined as a synonym for `MPIX_SHORT_FLOAT` if the C compiler supports `_Float16`, which is defined in ISO/IEC JTC 1/SC 22/WG 14 N1945 (ISO/IEC TS 18661-3:2015). This name and meaning are same as that of MPICH. This may be a transitional datatype until the MPI Forum decides a proper name for the type. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 14:55:52 +09:00
KAWASHIMA Takahiro	c44599ec13	mpiext/shortfloat: Add `shortfloat` MPI extension This extension provides additional MPI datatypes `MPIX_SHORT_FLOAT`, `MPIX_C_SHORT_FLOAT_COMPLEX`, and `MPIX_CXX_SHORT_FLOAT_COMPLEX` for `short float` (C/C++), `short float _Complex` (C), and `std::complex<short float>` (C++), respectively, or their alternate types like `_Float16`. See `ompi/mpiext/shortfloat/README.txt` for details. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 13:01:14 +09:00
KAWASHIMA Takahiro	4d7bde27fb	ompi/datatype: Use `short float` for `MPI_REAL2` ... and add `MPI_COMPLEX4`. This commit changes values of existing `OMPI_DATATYPE_MPI_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OMPI internal code. On the other hand, `ompi_datatype_t::id` values of existing datatypes are not changed and 73 is newly assigned to for `MPI_COMPLEX4` to retain ABI compatibility. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 13:01:10 +09:00
KAWASHIMA Takahiro	4375c11a58	ompi/datatype: Add `ompi_mpi_short_float` ... and `ompi_mpi_c_short_float_complex` and `ompi_mpi_cxx_sfltcplex`. These are Open MPI internal variables intended to be defined as `MPI_SHORT_FLOAT`, `MPI_C_SHORT_FLOAT_COMPLEX`, and `MPI_CXX_SHORT_FLOAT_COMPLEX` in the future. `OMPI_DATATYPE_MPI_C_SHORT_FLOAT_COMPLEX` is also required to support `MPI_COMPLEX4` in the next commit. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:43:13 +09:00
Sergey Lebedev	829846dbcc	fp16 hcoll bindings Signed-off-by: Sergey Lebedev <sergeyle@mellanox.com>	2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro	2ad1c09848	opal/datatype: Add `opal_short_float_t` The type `short float`, which is proposed in ISO/IEC JTC 1/SC 22 WG 14 (C WG), is not supported by most compilers yet. But some compilers (including gcc 7 for AArch64 and clang 6) support `_Float16`, which is defined in ISO/IEC TS 18661-3:2015 (ISO/IEC JTC 1/SC 22/WG 14 N1945) as an extensions for C. If it is detected in `configure`, it is used as an alternate type of `short float` in Open MPI internal code. This commit adds a `configure` option `--enable-alt-short-float=TYPE`. It can be used to specify a type other than `short float` and `_Float16` as the alternate type. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro	f6b39452f6	opal/datatype: Support `short float` The type `short float` is proposed for the C language in ISO/IEC JTC 1/SC 22 WG 14 (C WG) for mainly IEEE 754-2008 binary16, a.k.a. half-precision floating point or FP16. By this commit, `short float` and `short float _Complex` are detected in `configure` and used in Open MPI internal code. `MPI_SHORT_FLOAT` and its complex number version are not added yet. This commit changes values of existing `OPAL_DATATYPE_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OPAL and OMPI internal code. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:40:14 +09:00
Jeff Squyres	4c64322db4	Merge pull request #6334 from jsquyres/pr/make-mpi-h-a-little-more-c++-friendly mpi.h.in: use C++ static_cast<> where appropriate	2019-01-31 07:14:34 -05:00
Jeff Squyres	30afdcead9	mpi.h.in: use C++ static_cast<> where appropriate When compiling mpi.h with a modern C++ compiler and a high degree of pickyness (e.g., -Wold-style-cast), casting using (void) in the OMPI_PREDEFINED_GLOBAL and MPI_STATUS_IGNORE macros will emit warnings. So if we're compiling with a C++ compiler, use C++'s static_cast<> instead of (void*). Thanks to @shadow-fax for identifying the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-01-31 03:22:26 -08:00
Thananon Patinyasakdikul	782ec851ea	Merge pull request #6319 from thananon/pr/allow_overtake pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.	2019-01-30 15:32:04 -05:00
Jeff Squyres	2203f8d900	Merge pull request #6185 from ggouaillardet/topic/hwloc_macros hwloc: remove public hwloc macros from opal_config.h	2019-01-30 07:32:22 -05:00
Gilles Gouaillardet	0aeb27f776	topo/treematch: silence a hwloc related warning treematch/km_partitioning.c #include "config.h", but there is no such file when the embedded treematch is used. In order to prevent the embedded treematch from incorrectly using the config.h from the embedded hwloc, generate a dummy config.h. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-01-30 14:51:38 +09:00

1 2 3 4 5 ...

10394 Коммитов