1
1

29671 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
f5e1a672cc ofi: revamp OPAL_CHECK_OFI configury
Update the OPAL_CHECK_OFI configury macro:

- Make it safe to call the macro multiple times:
  - The checks only execute the first time it is invoked
  - Subsequent invocations, it just emits a friendly "checking..."
    message so that configure output is sensible/logical
- With the goal of ultimately removing opal/mca/common/ofi, rename the
  output variables from OPAL_CHECK_OFI to be
  opal_ofi_{happy|CPPFLAGS|LDFLAGS|LIBS}.
- Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions.
- Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that
  causes the macro to be invoked at a fairly random time, which makes
  configure stdout confusing / hard to grok.
- Remove a little left-over kruft in OPAL_CHECK_OFI, too (which
  resulted in an indenting change, making the change to
  opal_check_ofi.m4 look larger than it really is).

Thanks Alastair McKinstry for the report and initial fix.
Thanks Rashika Kheria for the reminder.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 06:29:58 -08:00
Jeff Squyres
b556cabfe9 btl/ofi/Makefile.am: down with tabs!
Replace all tabs with spaces.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 06:29:58 -08:00
Jeff Squyres
aba2571881 mtl/ofi/Makefile.am: down with tabs!
Replace all tabs with spaces.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 06:29:58 -08:00
Gilles Gouaillardet
945f830f7a mtl/ofi: fix configury when VPATH is used
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-07 06:29:58 -08:00
Jeff Squyres
f53a4f2d5b
Merge pull request #6270 from jsquyres/pr/remove-openib-and-affiliated-stuff
So long, openib, and thanks for all the fish.
2019-02-07 09:29:31 -05:00
Jeff Squyres
99553eb1b9 platform: Remove "with_verbs" from all the platform files.
Since --with-verbs has been removed, then remove it from all the
platform files, too.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 05:36:06 -08:00
Jeff Squyres
48a33ee6db README: Remove all references to --with-verbs[*]
Now that all use of libibverbs is gone from Open MPI, and all
verbs-based configury is also removed, update README to remove all
references to --with-verbs.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 05:36:06 -08:00
Jeff Squyres
59c8ab6da4 m4: remove all configury related to libibverbs
Now that all components that use libibverbs are gone, remove
OPAL_CHECK_VERBS and the confusingly-named OPAL_CHECK_OPENFABRICS
(which really just checked for verbs things -- not all the possible
OpenFabrics APIs/libraries).

The only code left in Open MPI that calls verbs is hwloc -- and that's
just the APIs that takes an IBV device and returns topological
information about it.  Since nothing in the Open MPI code base uses
the "ibv_*" API any more, we have no need for this hwloc functionality
so we'll even remove the --with-verbs configure options.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 05:36:06 -08:00
Jeff Squyres
3f4af8e51c opal/common: remove stale common components
The verbs and verbs_usnic components are now no longer necessary / no
longer used anywhere in the code base.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 05:36:06 -08:00
Jeff Squyres
3e82449dbe sshmem/verbs: So long / farewell / it's time to say goodnight
So long sshmem/verbs!  After many years of (mostly) faithful service,
it is time to remove the sshmem verbs component.  It has been fully
replaced by other components, such as the UCX PML and OFI MTL.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 05:34:19 -08:00
Jeff Squyres
8de786f5a4 btl/openib: So long / farewell / it's time to say goodnight
So long BTL openib!  After many years of (mostly) faithful service, it
is time to remove the openib BTL.  It has been fully replaced by other
components, such as the UCX PML and OFI MTL.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 05:34:19 -08:00
Ralph Castain
ead2efb136
Merge pull request #6365 from rhc54/topic/pcfg
Update PMIx configure logic and gitignore
2019-02-06 19:23:57 -08:00
Ralph Castain
43244cf66e Ignore generated file
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-06 17:17:34 -08:00
Ralph Castain
677ce0a69f Update PMIx configure logic in the embedded component
PMIx is removing the --enable-embedded-libevent and
--enable-embedded-hwloc flags as they are confusing users. Instead, we
will use the --enable-embedded-mode to handle both of these options.
Update the embedded configury to handle it.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-06 17:15:44 -08:00
Matias Cabral
5aef3148d3
Merge pull request #6351 from aravindksg/fix_btl_ofi_valgrind
btl/ofi: Fix valgrind complaints on uninitialized pointer use
2019-02-05 16:36:45 -08:00
Matias Cabral
0601b3e982
Merge pull request #6325 from aravindksg/fix_help_reference
mtl/ofi: Fix reference to help text object
2019-02-05 07:22:51 -08:00
Ralph Castain
e2c7224281
Merge pull request #6356 from rhc54/topic/pmixup
Update to latest PRI master
2019-02-04 13:04:22 -08:00
Ralph Castain
baef25338a Update to latest PRI master
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-04 10:10:58 -08:00
Aravind Gopalakrishnan
786e686d43 btl/ofi: Fix valgrind complaints on uninitialized pointer use
It doesn't seem like the BTL was using uninitialized pointer. But simply
setting the rcache pointer to NULL after destroying it makes the valgrind
errors go away.

Fixes Issue #6345

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2019-02-01 14:03:23 -08:00
KAWASHIMA Takahiro
ef4c47db1f configure: disable short float with Intel compiler
`short float` support of the Intel C++ Compiler (group of C and C++
compilers), at least versions 18.0 and 19.0, is half-baked. It can
compile declarations of `short float` variables and expressions of
`sizeof(short float)` but cannot compile operations of `short float`
variables. In this situation, `AC_CHECK_TYPES(short float)` defines
`HAVE_SHORT_FLOAT` as 1 and compilation errors occur in
`ompi/mca/op/base/op_base_functions.c`. To avoid this error
tentatively, we disable `short float` support when using the Intel
C++ Compiler.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 15:02:13 +09:00
KAWASHIMA Takahiro
9b54967276 README: Add description of shortfloat MPI extension
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 15:02:13 +09:00
KAWASHIMA Takahiro
f8a441957a mpiext/shortfloat: Add MPIX_C_FLOAT16 datatype
`MPIX_C_FLOAT16` is defined as a synonym for `MPIX_SHORT_FLOAT`
if the C compiler supports `_Float16`, which is defined in
ISO/IEC JTC 1/SC 22/WG 14 N1945 (ISO/IEC TS 18661-3:2015).
This name and meaning are same as that of MPICH. This may be
a transitional datatype until the MPI Forum decides a proper
name for the type.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 14:55:52 +09:00
bosilca
89fa06135e
Merge pull request #6348 from ggouaillardet/topic/opal_datatype_clone
opal/datatype: reset ptypes in opal_datatype_clone()
2019-02-01 00:36:33 -05:00
KAWASHIMA Takahiro
c44599ec13 mpiext/shortfloat: Add shortfloat MPI extension
This extension provides additional MPI datatypes `MPIX_SHORT_FLOAT`,
`MPIX_C_SHORT_FLOAT_COMPLEX`, and `MPIX_CXX_SHORT_FLOAT_COMPLEX`
for `short float` (C/C++), `short float _Complex` (C), and
`std::complex<short float>` (C++), respectively, or their alternate
types like `_Float16`.

See `ompi/mpiext/shortfloat/README.txt` for details.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 13:01:14 +09:00
KAWASHIMA Takahiro
4d7bde27fb ompi/datatype: Use short float for MPI_REAL2
... and add `MPI_COMPLEX4`.

This commit changes values of existing `OMPI_DATATYPE_MPI_*` macros.
This change does not affect ABI compatibility of `libmpi.so` and the
like because these values are only used in OMPI internal code.

On the other hand, `ompi_datatype_t::id` values of existing datatypes
are not changed and 73 is newly assigned to for `MPI_COMPLEX4` to
retain ABI compatibility.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 13:01:10 +09:00
KAWASHIMA Takahiro
4375c11a58 ompi/datatype: Add ompi_mpi_short_float
... and `ompi_mpi_c_short_float_complex` and `ompi_mpi_cxx_sfltcplex`.

These are Open MPI internal variables intended to be defined as
`MPI_SHORT_FLOAT`, `MPI_C_SHORT_FLOAT_COMPLEX`, and
`MPI_CXX_SHORT_FLOAT_COMPLEX` in the future.

`OMPI_DATATYPE_MPI_C_SHORT_FLOAT_COMPLEX` is also required to
support `MPI_COMPLEX4` in the next commit.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 12:43:13 +09:00
Sergey Lebedev
829846dbcc fp16 hcoll bindings
Signed-off-by: Sergey Lebedev <sergeyle@mellanox.com>
2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro
2ad1c09848 opal/datatype: Add opal_short_float_t
The type `short float`, which is proposed in ISO/IEC JTC 1/SC 22 WG 14
(C WG), is not supported by most compilers yet. But some compilers
(including gcc 7 for AArch64 and clang 6) support `_Float16`, which
is defined in ISO/IEC TS 18661-3:2015 (ISO/IEC JTC 1/SC 22/WG 14 N1945)
as an extensions for C. If it is detected in `configure`, it is used
as an alternate type of `short float` in Open MPI internal code.

This commit adds a `configure` option `--enable-alt-short-float=TYPE`.
It can be used to specify a type other than `short float` and `_Float16`
as the alternate type.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro
f6b39452f6 opal/datatype: Support short float
The type `short float` is proposed for the C language in ISO/IEC JTC
1/SC 22 WG 14 (C WG) for mainly IEEE 754-2008 binary16, a.k.a.
half-precision floating point or FP16.

By this commit, `short float` and `short float _Complex` are detected
in `configure` and used in Open MPI internal code. `MPI_SHORT_FLOAT`
and its complex number version are not added yet.

This commit changes values of existing `OPAL_DATATYPE_*` macros.
This change does not affect ABI compatibility of `libmpi.so` and the
like because these values are only used in OPAL and OMPI internal code.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 12:40:14 +09:00
Gilles Gouaillardet
b395342c9f opal/datatype: reset ptypes in opal_datatype_clone()
Reset ptypes when cloning a datatype in order to prevent
a double free() in the opal_datatype_t destructor.

This fixes a bug introduced in open-mpi/ompi@7c938f070f

Fixes open-mpi/ompi#6346

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-01 11:20:13 +09:00
bosilca
2cf6944e70
Merge pull request #6326 from bosilca/fix/convertor_raw
Provide a better fix for #6285.
2019-01-31 18:20:46 -05:00
George Bosilca
5a82c4fd07
Provide a better fix for #6285.
The issue was a little complicated due to the internal stack used in the
convertor. The main issue was that in the case where we run out of iov
space to save the raw description of the data while hanbdling a
repetition (loop), instead of saving the current position and bailing out
directly we reading of the next predefined type element. It worked in
most cases, except the one identified by the HDF5 test. However, the
biggest issue here was the drop in performance for all ensuing calls to
the convertor pack/unpack, as instead of handling contiguous loops as a
whole (and minimizing the number of memory copies) we copied data
description by data description.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-01-31 10:01:48 -05:00
Jeff Squyres
4c64322db4
Merge pull request #6334 from jsquyres/pr/make-mpi-h-a-little-more-c++-friendly
mpi.h.in: use C++ static_cast<> where appropriate
2019-01-31 07:14:34 -05:00
Jeff Squyres
30afdcead9 mpi.h.in: use C++ static_cast<> where appropriate
When compiling mpi.h with a modern C++ compiler and a high degree of
pickyness (e.g., -Wold-style-cast), casting using (void*) in the
OMPI_PREDEFINED_GLOBAL and MPI_STATUS*_IGNORE macros will emit
warnings.  So if we're compiling with a C++ compiler, use C++'s
static_cast<> instead of (void*).

Thanks to @shadow-fax for identifying the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-01-31 03:22:26 -08:00
Ralph Castain
c03407320d
Merge pull request #6338 from rhc54/topic/rmlofi
Remove stale rml/ofi component
2019-01-30 14:10:55 -08:00
Ralph Castain
8794077520 Remove stale rml/ofi component
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-01-30 12:41:50 -08:00
Thananon Patinyasakdikul
782ec851ea
Merge pull request #6319 from thananon/pr/allow_overtake
pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.
2019-01-30 15:32:04 -05:00
Thananon Patinyasakdikul
58244b36d1
Merge pull request #6320 from thananon/pr/wait_sync_fix
opal/threads: reverted #6199
2019-01-30 15:31:27 -05:00
Nathan Hjelm
2c8f745d8d
Merge pull request #6337 from hjelmn/btl_vader_fix_a_stupid_error_in_the_fragment_sizes_used_by_the_free_lists_that_can_cause_weird_results
btl/vader: fix fragment sizes used by free lists
2019-01-30 13:26:57 -07:00
Nathan Hjelm
b51c8f888c btl/vader: fix fragment sizes used by free lists
This commit fixes a bug introduced in
f62d26ddbc8cda4d985cceee531a2ec32406d1f6. That commit changed how
vader allocates fragment memory from the shared memory
segment. Unfortunately, the values used for the fragment sizes did not
include space for the fragment header. This can cause an overrun of
data from one fragment to the header of the next fragment.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-01-30 12:31:34 -07:00
Jeff Squyres
2203f8d900
Merge pull request #6185 from ggouaillardet/topic/hwloc_macros
hwloc: remove public hwloc macros from opal_config.h
2019-01-30 07:32:22 -05:00
Boris Karasev
46e38b9193 regx: fixed the order of hosts for ranges with different prefixes
Example:
For the list of hosts `a01,b00,a00` a regex is generated:
`a[2:1.0],b[2:0]`, where `a`-hosts prefixes moved to the begining,
it breaks the hosts ordering.
This commit fixes regex for that case to `a[2:1],b[2:0],a[2:0]`

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2019-01-30 15:06:30 +06:00
Gilles Gouaillardet
0aeb27f776 topo/treematch: silence a hwloc related warning
treematch/km_partitioning.c #include "config.h",
but there is no such file when the embedded treematch is used.

In order to prevent the embedded treematch from incorrectly using
the config.h from the embedded hwloc, generate a dummy config.h.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-30 14:51:38 +09:00
Boris Karasev
1967e41a71 regx/reverse: fixed adding an empty range for no numerical hostnames
Example:
For the nodelist `jjss,jjss0000001,jjss0000003,jjss0000002` a regular
expression was `jjss[0:0],jjss[7:1,3,2]` that led to incorrect unpacking
the first host as `jjs0`. This commit fixes an adding empty range for
not numeric hostnames. Here is the fixed regex for this exapmle:
`jjss,jjss[7:1,3,2]`

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2019-01-30 09:41:00 +06:00
Boris Karasev
d1ad90f47e regx/test: update regex test
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2019-01-30 09:40:59 +06:00
Aravind Gopalakrishnan
9cabcfdbba mtl/ofi: Fix reference to help text object
When we exceed the threshold number of contexts created, print appropriate help
text

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2019-01-29 15:10:06 -08:00
bosilca
29915fc943
Merge pull request #6292 from ggouaillardet/topic/opal_datatype_destruct
opal/datatype: plug a memory leak in opal_datatype_t destructor
2019-01-29 17:33:18 -05:00
Thananon Patinyasakdikul
0263456cf4 pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.
We missed an assert to check if ALLOW_OVERTAKE is set or not before
validating the sequence number and this will cause deadlock.

Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
2019-01-29 14:55:06 -05:00
Thananon Patinyasakdikul
56d3e0a43d opal/threads: reverted #6199
This commit reverted pr #6199 as it introduced deadlock in some cases.
Also removed the assert as the condition is obsoleted.

Signed-off-by: Thananon Patinyasakdikul <tpatinya@utk.edu>
2019-01-29 13:34:44 -05:00
Nathan Hjelm
ea40d48899
Merge pull request #6295 from ggouaillardet/topic/opal_convertor_raw
opal/datatype: fix opal_convertor_raw()
2019-01-29 10:57:29 -07:00