openmpi

Автор	SHA1	Сообщение	Дата
Ralph Castain	125d236173	Move from the use of regex to compression We've been fighting the battle of trying to create a regex generator and parser that can handle arbitrary hostname schemes - without long-term success. The worst of it is that there is no way of checking to see if the computed regex is correct short of parsing it and doing a character-by-character comparison with the original string. Ugh...there has to be a better solution. One option is to investigate using 3rd-party regex libraries as those are coming from communities whose sole focus is resolving that problem. However, someone would need to spend the time to investigate it, and we'd have to find a license-friendly implementation. Another option is to quit beating our heads against the wall and just compress the information. It won't be as much of a reduction, but we also won't keep hitting scenarios where things break. In this case, it seems that "perfection" is definitely the enemy of "good enough". This PR implements the compression option while retaining the possibility of people adding regex-generating components. The compression code used in ORTE is consolidated into the opal/compress framework. That framework currently held bzip and gzip components for use in compressing checkpoint files - since we no longer support C/R, I have .opal_ignore'd those components. However, I have left the original framework APIs alone in case someone ever decides to redo C/R. The APIs of interest here are added to the framework - specifically, the "compress_block" and "decompress_block" functions. I then moved the ORTE zlib compression code into a new component in this framework. Unfortunately, the framework currently is a single-select one - i.e., only one active component at a time. Since I .opal_ignore'd the other two and made the priority of zlib high, this isn't a problem. However, if someone wants to re-enable bzip/gzip or add another component, they might need to transition opal/compress to a multi-select framework. Included changes: * Consolidate the compression code into the opal/compress framework * Move the ORTE zlib compression code into a new opal/compress/zlib component * Ignore the bzip and gzip components in opal/compress framework * Add a "compress_base_limit" MCA param to set the threshold above which we compress data - defaults to 4096 bytes * Delete stale brucks and rcd components from orte/grpcomm framework * Delete the orte/regx framework * Update the launch system to use opal/compress instead of string regex * Provide a default module if no zlib is available * Fix some misc multi-node issues * Properly generate the nidmap in response to a "connection warmup" message so the remote daemon knows the children it needs to launch. * Remove stale references to orte_node_regex * opal_byte_object_t's are not OPAL objects - properly release allocated memory. * Set the topology * Currently only handling homogeneous case * Update the compress framework files to conform * Consolidate open/close into one "frame" file. Ensure we open/close the framework Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-02-08 11:11:14 -08:00
Ralph Castain	fcbc7ea298	Merge pull request #6306 from karasevb/regx_host_ordering_fix regex: fixed host ordering for different prefixes	2019-02-08 11:09:55 -08:00
KAWASHIMA Takahiro	8bbd201029	Merge pull request #6205 from kawashima-fj/pr/fp16 Add FP16 datatypes	2019-02-08 14:52:13 +09:00
Jeff Squyres	8451cd70ac	Merge pull request #6363 from jsquyres/pr/fix-ofi-configury Consolidated: fix OFI configury / linking issues	2019-02-07 10:21:04 -05:00
Jeff Squyres	dd20174532	Remove opal/mca/common/ofi. It never lived up to its purpose (and has caused amorphous indirect errors such as https://github.com/open-mpi/ompi/issues/2519), so delete it. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Jeff Squyres	f5e1a672cc	ofi: revamp OPAL_CHECK_OFI configury Update the OPAL_CHECK_OFI configury macro: - Make it safe to call the macro multiple times: - The checks only execute the first time it is invoked - Subsequent invocations, it just emits a friendly "checking..." message so that configure output is sensible/logical - With the goal of ultimately removing opal/mca/common/ofi, rename the output variables from OPAL_CHECK_OFI to be opal_ofi_{happy\|CPPFLAGS\|LDFLAGS\|LIBS}. - Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions. - Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that causes the macro to be invoked at a fairly random time, which makes configure stdout confusing / hard to grok. - Remove a little left-over kruft in OPAL_CHECK_OFI, too (which resulted in an indenting change, making the change to opal_check_ofi.m4 look larger than it really is). Thanks Alastair McKinstry for the report and initial fix. Thanks Rashika Kheria for the reminder. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Jeff Squyres	b556cabfe9	btl/ofi/Makefile.am: down with tabs! Replace all tabs with spaces. No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Jeff Squyres	aba2571881	mtl/ofi/Makefile.am: down with tabs! Replace all tabs with spaces. No code or logic changes. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 06:29:58 -08:00
Gilles Gouaillardet	945f830f7a	mtl/ofi: fix configury when VPATH is used Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-07 06:29:58 -08:00
Jeff Squyres	f53a4f2d5b	Merge pull request #6270 from jsquyres/pr/remove-openib-and-affiliated-stuff So long, openib, and thanks for all the fish.	2019-02-07 09:29:31 -05:00
Jeff Squyres	99553eb1b9	platform: Remove "with_verbs" from all the platform files. Since --with-verbs has been removed, then remove it from all the platform files, too. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 05:36:06 -08:00
Jeff Squyres	48a33ee6db	README: Remove all references to --with-verbs[*] Now that all use of libibverbs is gone from Open MPI, and all verbs-based configury is also removed, update README to remove all references to --with-verbs. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 05:36:06 -08:00
Jeff Squyres	59c8ab6da4	m4: remove all configury related to libibverbs Now that all components that use libibverbs are gone, remove OPAL_CHECK_VERBS and the confusingly-named OPAL_CHECK_OPENFABRICS (which really just checked for verbs things -- not all the possible OpenFabrics APIs/libraries). The only code left in Open MPI that calls verbs is hwloc -- and that's just the APIs that takes an IBV device and returns topological information about it. Since nothing in the Open MPI code base uses the "ibv_*" API any more, we have no need for this hwloc functionality so we'll even remove the --with-verbs configure options. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 05:36:06 -08:00
Jeff Squyres	3f4af8e51c	opal/common: remove stale common components The verbs and verbs_usnic components are now no longer necessary / no longer used anywhere in the code base. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 05:36:06 -08:00
Jeff Squyres	3e82449dbe	sshmem/verbs: So long / farewell / it's time to say goodnight So long sshmem/verbs! After many years of (mostly) faithful service, it is time to remove the sshmem verbs component. It has been fully replaced by other components, such as the UCX PML and OFI MTL. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 05:34:19 -08:00
Jeff Squyres	8de786f5a4	btl/openib: So long / farewell / it's time to say goodnight So long BTL openib! After many years of (mostly) faithful service, it is time to remove the openib BTL. It has been fully replaced by other components, such as the UCX PML and OFI MTL. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-02-07 05:34:19 -08:00
Ralph Castain	ead2efb136	Merge pull request #6365 from rhc54/topic/pcfg Update PMIx configure logic and gitignore	2019-02-06 19:23:57 -08:00
Ralph Castain	43244cf66e	Ignore generated file Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-02-06 17:17:34 -08:00
Ralph Castain	677ce0a69f	Update PMIx configure logic in the embedded component PMIx is removing the --enable-embedded-libevent and --enable-embedded-hwloc flags as they are confusing users. Instead, we will use the --enable-embedded-mode to handle both of these options. Update the embedded configury to handle it. Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-02-06 17:15:44 -08:00
Matias Cabral	5aef3148d3	Merge pull request #6351 from aravindksg/fix_btl_ofi_valgrind btl/ofi: Fix valgrind complaints on uninitialized pointer use	2019-02-05 16:36:45 -08:00
Matias Cabral	0601b3e982	Merge pull request #6325 from aravindksg/fix_help_reference mtl/ofi: Fix reference to help text object	2019-02-05 07:22:51 -08:00
Ralph Castain	e2c7224281	Merge pull request #6356 from rhc54/topic/pmixup Update to latest PRI master	2019-02-04 13:04:22 -08:00
Ralph Castain	baef25338a	Update to latest PRI master Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-02-04 10:10:58 -08:00
Aravind Gopalakrishnan	786e686d43	btl/ofi: Fix valgrind complaints on uninitialized pointer use It doesn't seem like the BTL was using uninitialized pointer. But simply setting the rcache pointer to NULL after destroying it makes the valgrind errors go away. Fixes Issue #6345 Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>	2019-02-01 14:03:23 -08:00
KAWASHIMA Takahiro	ef4c47db1f	configure: disable `short float` with Intel compiler `short float` support of the Intel C++ Compiler (group of C and C++ compilers), at least versions 18.0 and 19.0, is half-baked. It can compile declarations of `short float` variables and expressions of `sizeof(short float)` but cannot compile operations of `short float` variables. In this situation, `AC_CHECK_TYPES(short float)` defines `HAVE_SHORT_FLOAT` as 1 and compilation errors occur in `ompi/mca/op/base/op_base_functions.c`. To avoid this error tentatively, we disable `short float` support when using the Intel C++ Compiler. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 15:02:13 +09:00
KAWASHIMA Takahiro	9b54967276	README: Add description of shortfloat MPI extension Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 15:02:13 +09:00
KAWASHIMA Takahiro	f8a441957a	mpiext/shortfloat: Add `MPIX_C_FLOAT16` datatype `MPIX_C_FLOAT16` is defined as a synonym for `MPIX_SHORT_FLOAT` if the C compiler supports `_Float16`, which is defined in ISO/IEC JTC 1/SC 22/WG 14 N1945 (ISO/IEC TS 18661-3:2015). This name and meaning are same as that of MPICH. This may be a transitional datatype until the MPI Forum decides a proper name for the type. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 14:55:52 +09:00
bosilca	89fa06135e	Merge pull request #6348 from ggouaillardet/topic/opal_datatype_clone opal/datatype: reset ptypes in opal_datatype_clone()	2019-02-01 00:36:33 -05:00
KAWASHIMA Takahiro	c44599ec13	mpiext/shortfloat: Add `shortfloat` MPI extension This extension provides additional MPI datatypes `MPIX_SHORT_FLOAT`, `MPIX_C_SHORT_FLOAT_COMPLEX`, and `MPIX_CXX_SHORT_FLOAT_COMPLEX` for `short float` (C/C++), `short float _Complex` (C), and `std::complex<short float>` (C++), respectively, or their alternate types like `_Float16`. See `ompi/mpiext/shortfloat/README.txt` for details. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 13:01:14 +09:00
KAWASHIMA Takahiro	4d7bde27fb	ompi/datatype: Use `short float` for `MPI_REAL2` ... and add `MPI_COMPLEX4`. This commit changes values of existing `OMPI_DATATYPE_MPI_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OMPI internal code. On the other hand, `ompi_datatype_t::id` values of existing datatypes are not changed and 73 is newly assigned to for `MPI_COMPLEX4` to retain ABI compatibility. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 13:01:10 +09:00
KAWASHIMA Takahiro	4375c11a58	ompi/datatype: Add `ompi_mpi_short_float` ... and `ompi_mpi_c_short_float_complex` and `ompi_mpi_cxx_sfltcplex`. These are Open MPI internal variables intended to be defined as `MPI_SHORT_FLOAT`, `MPI_C_SHORT_FLOAT_COMPLEX`, and `MPI_CXX_SHORT_FLOAT_COMPLEX` in the future. `OMPI_DATATYPE_MPI_C_SHORT_FLOAT_COMPLEX` is also required to support `MPI_COMPLEX4` in the next commit. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:43:13 +09:00
Sergey Lebedev	829846dbcc	fp16 hcoll bindings Signed-off-by: Sergey Lebedev <sergeyle@mellanox.com>	2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro	2ad1c09848	opal/datatype: Add `opal_short_float_t` The type `short float`, which is proposed in ISO/IEC JTC 1/SC 22 WG 14 (C WG), is not supported by most compilers yet. But some compilers (including gcc 7 for AArch64 and clang 6) support `_Float16`, which is defined in ISO/IEC TS 18661-3:2015 (ISO/IEC JTC 1/SC 22/WG 14 N1945) as an extensions for C. If it is detected in `configure`, it is used as an alternate type of `short float` in Open MPI internal code. This commit adds a `configure` option `--enable-alt-short-float=TYPE`. It can be used to specify a type other than `short float` and `_Float16` as the alternate type. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro	f6b39452f6	opal/datatype: Support `short float` The type `short float` is proposed for the C language in ISO/IEC JTC 1/SC 22 WG 14 (C WG) for mainly IEEE 754-2008 binary16, a.k.a. half-precision floating point or FP16. By this commit, `short float` and `short float _Complex` are detected in `configure` and used in Open MPI internal code. `MPI_SHORT_FLOAT` and its complex number version are not added yet. This commit changes values of existing `OPAL_DATATYPE_*` macros. This change does not affect ABI compatibility of `libmpi.so` and the like because these values are only used in OPAL and OMPI internal code. Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>	2019-02-01 12:40:14 +09:00
Gilles Gouaillardet	b395342c9f	opal/datatype: reset ptypes in opal_datatype_clone() Reset ptypes when cloning a datatype in order to prevent a double free() in the opal_datatype_t destructor. This fixes a bug introduced in open-mpi/ompi@7c938f070f Fixes open-mpi/ompi#6346 Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-02-01 11:20:13 +09:00
bosilca	2cf6944e70	Merge pull request #6326 from bosilca/fix/convertor_raw Provide a better fix for #6285.	2019-01-31 18:20:46 -05:00
George Bosilca	5a82c4fd07	Provide a better fix for #6285 . The issue was a little complicated due to the internal stack used in the convertor. The main issue was that in the case where we run out of iov space to save the raw description of the data while hanbdling a repetition (loop), instead of saving the current position and bailing out directly we reading of the next predefined type element. It worked in most cases, except the one identified by the HDF5 test. However, the biggest issue here was the drop in performance for all ensuing calls to the convertor pack/unpack, as instead of handling contiguous loops as a whole (and minimizing the number of memory copies) we copied data description by data description. Signed-off-by: George Bosilca <bosilca@icl.utk.edu>	2019-01-31 10:01:48 -05:00
Jeff Squyres	4c64322db4	Merge pull request #6334 from jsquyres/pr/make-mpi-h-a-little-more-c++-friendly mpi.h.in: use C++ static_cast<> where appropriate	2019-01-31 07:14:34 -05:00
Jeff Squyres	30afdcead9	mpi.h.in: use C++ static_cast<> where appropriate When compiling mpi.h with a modern C++ compiler and a high degree of pickyness (e.g., -Wold-style-cast), casting using (void) in the OMPI_PREDEFINED_GLOBAL and MPI_STATUS_IGNORE macros will emit warnings. So if we're compiling with a C++ compiler, use C++'s static_cast<> instead of (void*). Thanks to @shadow-fax for identifying the issue. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>	2019-01-31 03:22:26 -08:00
Ralph Castain	c03407320d	Merge pull request #6338 from rhc54/topic/rmlofi Remove stale rml/ofi component	2019-01-30 14:10:55 -08:00
Ralph Castain	8794077520	Remove stale rml/ofi component Signed-off-by: Ralph Castain <rhc@pmix.org>	2019-01-30 12:41:50 -08:00
Thananon Patinyasakdikul	782ec851ea	Merge pull request #6319 from thananon/pr/allow_overtake pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.	2019-01-30 15:32:04 -05:00
Thananon Patinyasakdikul	58244b36d1	Merge pull request #6320 from thananon/pr/wait_sync_fix opal/threads: reverted #6199	2019-01-30 15:31:27 -05:00
Nathan Hjelm	2c8f745d8d	Merge pull request #6337 from hjelmn/btl_vader_fix_a_stupid_error_in_the_fragment_sizes_used_by_the_free_lists_that_can_cause_weird_results btl/vader: fix fragment sizes used by free lists	2019-01-30 13:26:57 -07:00
Nathan Hjelm	b51c8f888c	btl/vader: fix fragment sizes used by free lists This commit fixes a bug introduced in f62d26ddbc8cda4d985cceee531a2ec32406d1f6. That commit changed how vader allocates fragment memory from the shared memory segment. Unfortunately, the values used for the fragment sizes did not include space for the fragment header. This can cause an overrun of data from one fragment to the header of the next fragment. Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>	2019-01-30 12:31:34 -07:00
Jeff Squyres	2203f8d900	Merge pull request #6185 from ggouaillardet/topic/hwloc_macros hwloc: remove public hwloc macros from opal_config.h	2019-01-30 07:32:22 -05:00
Boris Karasev	46e38b9193	regx: fixed the order of hosts for ranges with different prefixes Example: For the list of hosts `a01,b00,a00` a regex is generated: `a[2:1.0],b[2:0]`, where `a`-hosts prefixes moved to the begining, it breaks the hosts ordering. This commit fixes regex for that case to `a[2:1],b[2:0],a[2:0]` Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2019-01-30 15:06:30 +06:00
Gilles Gouaillardet	0aeb27f776	topo/treematch: silence a hwloc related warning treematch/km_partitioning.c #include "config.h", but there is no such file when the embedded treematch is used. In order to prevent the embedded treematch from incorrectly using the config.h from the embedded hwloc, generate a dummy config.h. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>	2019-01-30 14:51:38 +09:00
Boris Karasev	1967e41a71	regx/reverse: fixed adding an empty range for no numerical hostnames Example: For the nodelist `jjss,jjss0000001,jjss0000003,jjss0000002` a regular expression was `jjss[0:0],jjss[7:1,3,2]` that led to incorrect unpacking the first host as `jjs0`. This commit fixes an adding empty range for not numeric hostnames. Here is the fixed regex for this exapmle: `jjss,jjss[7:1,3,2]` Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2019-01-30 09:41:00 +06:00
Boris Karasev	d1ad90f47e	regx/test: update regex test Signed-off-by: Boris Karasev <karasev.b@gmail.com>	2019-01-30 09:40:59 +06:00

1 2 3 4 5 ...

29626 Коммитов