1
1

10394 Коммитов

Автор SHA1 Сообщение Дата
Yossi Itigin
9b91cf09cc
Merge pull request #6481 from hoopoepg/topic/check-ucx-params
PML/SPML/UCX: added evaluation of mmap events
2019-03-14 11:53:42 +02:00
Austen Lauria
b61e6242d3 Fix integer overflows with indexed datatype creation.
The types of count, disp, and extent passed into
ompi_datatype_add() should be size_t, ptrdiff_t and ptrdiff_t,
respectively. This prevents integer overflows and errors in
computing the size of large indexed datatypes.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2019-03-13 09:39:57 -04:00
Sergey Oblomov
d8e3562bae PML/SPML/UCX: added evaluation of mmap events
- there was a set of UCX related issues reported which caused
  by mmap API hooks conflicts. We added diagnostic of such
  problems to simplify bug-resolving pipeline

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-03-12 21:14:27 +02:00
Geoff Paulsen
a14bb4bc89
Merge pull request #6471 from hppritcha/topic/issue_6470
ompi_info: report whether MPI1 compat is enabled
2019-03-11 21:11:55 -05:00
Howard Pritchard
61ccc65302 ompi_info: report MPI1 compat is disabled
MPI1 compat disabled beyond v4.0.x

Related to #6470

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-03-11 13:50:29 -06:00
Gilles Gouaillardet
26c1b833c7 man: remove man pages of removed MPI1 subroutines
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-05 15:01:07 +09:00
Nathan Hjelm
73085e9ce3
Merge pull request #6413 from nuriallv/issue_osc_rdma
osc/rdma: fix when determining the node with the rank_array info for a peer
2019-02-27 16:30:06 -07:00
Geoffrey Paulsen
a6d6be2853 mpi.h.in: delete removed MPI1 functions/datatypes (API change!)
This commit DELETES the removed MPI1 functions and datatypes from
both the mpi.h header and from the library (they were deleted from the
MPI standard in MPI-3.0).

WARNING: This changes the MPI API in a non-backwards compatible way.
         This also removes the configure option that was added in Open
         MPI v4.0.x, requiring users to change their apps if they are
         using any of these almost 20 year old APIs.

This commit removes the following MPI1 removed functions and datatypes:

         MPI_Address
         MPI_Errhandler_create
         MPI_Errhandler_get
         MPI_Errhandler_set
         MPI_Type_extent
         MPI_Type_hindexed
         MPI_Type_hvector
         MPI_Type_struct
         MPI_Type_UB
         MPI_Type_LB

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-02-27 08:24:11 -08:00
Geoffrey Paulsen
3136a1706c mpi.h.in: Revamp MPI-1 removed function warnings
Refs https://github.com/open-mpi/ompi/issues/6278.

This commit is intended to be cherry-picked to v4.0.x and
the following commit will ammend to this functionality for
master's removal.

Changes the prototypes for MPI removed functions in the
following ways:

There are 4 cases:

 1) User wants MPI-1 compatibility (--enable-mpi1-compatibility)

    MPI_Address (and friends) are declared in mpi.h with
    deprecation notice

 2) User does not want MPI-1 compatibility, and has a C11-capable
    compiler

    Declare an MPI_Address (etc.) macro in mpi.h, which will
    cause a compile-time error using _Static_assert C11 feature

 3) User does not want MPI-1 compatibility, and does not have a
    C11-capable compiler, but the compiler supports error function
    attributes.

    Declare an MPI_Address (etc.) macro in mpi.h, which will
    cause a compile-time error using error function attribute.

 4) User does not want MPI-1 compatibility, and does not have a
    C11-capable compiler, or a compiler that supports error
    function attributes.

    Do not declare MPI_Address (etc.) in mpi.h at all.
    Unless the user is compiling with something like -Werror,
    this will allow the user's code to compile. We are
    choosing this because it seems like a losing battle to
    make some kind of compile time error that is friendly to
    the user (and doesn't make it look like mpi.h itself is broken).

    On v4.0.x, this will allow the user code to both compile
    (albeit with a warning) and link (because the MPI_Address
    will be in the MPI library because we are preserving ABI
    back to 3.0.x).

    On master/v5.0.x, this will allow the user code to compile,
    but it will fail to link (because the MPI_Address symbol will
    not be in the MPI library).

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-02-27 08:24:11 -08:00
bosilca
8400502d8a
Merge pull request #6353 from bosilca/topic/fix_monitoring_pvar
Fix the PVAR allocation usage.
2019-02-25 16:03:56 -05:00
Howard Pritchard
9b3a9c2579
Merge pull request #6417 from abouteiller/bugfix/cart_create_cid
Cart/Graph create would not run the next_cid  algorithm
2019-02-22 13:05:59 -07:00
Howard Pritchard
d6cdbdfd39
Merge pull request #6412 from hppritcha/topic/fix_pgi_usempif08
fortran:fix for PGI linking
2019-02-21 20:31:14 -07:00
Aurelien Bouteiller
fb17115ba9
Cart/Graph create would not run the next_cid algorithm and create
disjoint communicator with inconsistent cid.

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2019-02-21 11:40:22 -05:00
Howard Pritchard
266bc3aced fortran:use mpif08 fix for PGI linking
commit c6070fd2e broke building fortran bindings
with PGI compilers.  Turns out PGI compilers need
to link in the *.o from a module file whether or
not there are module subroutines defined or not in
the module file.

Related to #6411

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-02-20 12:33:25 -07:00
Nuria Losada
3cae149262 osc/rdma: fix when determining the node with the rank_array info for a peer
Signed-off-by: Nuria Losada <nlosada@icl.utk.edu>
2019-02-20 13:12:00 -05:00
Artem Polyakov
13a8e42108
Merge pull request #6163 from artpol84/osc/mt_submission
Refactoring of osc/ucx component for MT
2019-02-20 09:41:27 -08:00
Gilles Gouaillardet
ad114be28c configury: automatically select rte/pmix runtime if ORTE project is not built
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-20 13:55:55 +09:00
Gilles Gouaillardet
69d136ae5e ompi/pmix: fix misc OPAL function calls
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-20 13:55:55 +09:00
Gilles Gouaillardet
18f679efac
Merge pull request #6401 from ggouaillardet/topic/osc_rdma_self
osc/rdma: correctly handle communications to self
2019-02-20 11:43:22 +09:00
KAWASHIMA Takahiro
7095ad10a5 man: fix more typos in MPI_Win_attach man page
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-20 11:22:38 +09:00
Gilles Gouaillardet
7c0596819b man: fix typos in MPI_Win_{attach,detach} man pages
no code change

[skip ci]
bot:notest

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-20 11:09:45 +09:00
Gilles Gouaillardet
fe05fcc11a osc/rdma: correctly handle communications to self
mark the "self" peer OMPI_OSC_RDMA_PEER_LOCAL_BASE when
the window is dynamically created and use_cpu_atomics is set
in order to correctly handle communications to self.

Thanks Bart Janssens for reporting this issue.

Refs. open-mpi/ompi#6394

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-20 09:52:17 +09:00
Artem Polyakov
19e2ae2efb opal/common/ucx: Switch to opal/tsd
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
7984d7d997 opal/common/ucx: Remove unused debugging macro
Will be reintroduced later if needed and after adaptation to the OMPI
infrastructure.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Artem Polyakov
43f16d8796 opal/common/ucx: Remove common_ucx_int.h
Place the content of common_ucx_int.h back to the common_ucx.h and
include common_ucx_wpool.h explicitly.

Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Xin Zhao
c6de09940f ompi/osc/ucx: Switch osc/ucx code to use Worker Pool.
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
2019-02-19 14:22:07 -08:00
Yossi Itigin
91d05f91e2
Merge pull request #6384 from brminich/topic/ucx_worker_net_address
PML/UCX: Use net worker address for remote peers
2019-02-17 12:21:00 +02:00
Matias A Cabral
25bdd118ac MTL_OFI: Changed Recv cancel to be non-blocking
Updated the OFI MTL's Recv cancel to be a non-blocking call to match
the MPI spec. Given fi_cancel succeeded, then it is expected that the
user will wait on the request to read the result of if the cancel has
completed.

Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com
2019-02-14 17:07:20 -05:00
Mikhail Brinskii
751d88192d PML/UCX: Use net worker address for remote peers
For remote node peers pack smaller worker address, which contains
network device addresses only. This would reduce amount of OOB traffic
during startup.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-02-14 18:06:36 +02:00
Brian Barrett
7a593cea4a
Merge pull request #6361 from aravindksg/fix_tg_segfault
mtl/ofi: Fix segfault when not using Thread-Grouping feature
2019-02-12 12:04:26 -08:00
Ralph Castain
125d236173 Move from the use of regex to compression
We've been fighting the battle of trying to create a regex generator and
parser that can handle arbitrary hostname schemes - without long-term
success. The worst of it is that there is no way of checking to see if
the computed regex is correct short of parsing it and doing a
character-by-character comparison with the original string. Ugh...there
has to be a better solution.

One option is to investigate using 3rd-party regex libraries as
those are coming from communities whose sole focus is resolving that
problem. However, someone would need to spend the time to investigate
it, and we'd have to find a license-friendly implementation.

Another option is to quit beating our heads against the wall and just
compress the information. It won't be as much of a reduction, but we
also won't keep hitting scenarios where things break. In this case, it
seems that "perfection" is definitely the enemy of "good enough".

This PR implements the compression option while retaining the
possibility of people adding regex-generating components. The
compression code used in ORTE is consolidated into the opal/compress
framework. That framework currently held bzip and gzip components for
use in compressing checkpoint files - since we no longer support C/R, I
have .opal_ignore'd those components.

However, I have left the original framework APIs alone in case someone
ever decides to redo C/R. The APIs of interest here are added to the
framework - specifically, the "compress_block" and "decompress_block"
functions. I then moved the ORTE zlib compression code into a new
component in this framework.

Unfortunately, the framework currently is a single-select one - i.e.,
only one active component at a time. Since I .opal_ignore'd the other
two and made the priority of zlib high, this isn't a problem. However,
if someone wants to re-enable bzip/gzip or add another component, they
might need to transition opal/compress to a multi-select framework.

Included changes:

* Consolidate the compression code into the opal/compress framework

* Move the ORTE zlib compression code into a new opal/compress/zlib
  component

* Ignore the bzip and gzip components in opal/compress framework

* Add a "compress_base_limit" MCA param to set the threshold above which
  we compress data - defaults to 4096 bytes

* Delete stale brucks and rcd components from orte/grpcomm framework

* Delete the orte/regx framework

* Update the launch system to use opal/compress instead of string regex

* Provide a default module if no zlib is available

* Fix some misc multi-node issues

* Properly generate the nidmap in response to a "connection warmup"
  message so the remote daemon knows the children it needs to launch.

* Remove stale references to orte_node_regex

* opal_byte_object_t's are not OPAL objects - properly release allocated
  memory.

* Set the topology

* Currently only handling homogeneous case

* Update the compress framework files to conform

* Consolidate open/close into one "frame" file. Ensure we open/close the
  framework

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-02-08 11:11:14 -08:00
KAWASHIMA Takahiro
8bbd201029
Merge pull request #6205 from kawashima-fj/pr/fp16
Add FP16 datatypes
2019-02-08 14:52:13 +09:00
Aravind Gopalakrishnan
6edcc479c4 mtl/ofi: Fix segfault when not using Thread-Grouping feature
For the non thread-grouping paths, only the first (0th) OFI context
should be used for communication. Otherwise this would access a non existant
array item and cause segfault.

While at it, clarifiy some content regarding SEPs in README (Credit to Matias Cabral
for README edits).

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
2019-02-07 11:52:53 -08:00
Jeff Squyres
f5e1a672cc ofi: revamp OPAL_CHECK_OFI configury
Update the OPAL_CHECK_OFI configury macro:

- Make it safe to call the macro multiple times:
  - The checks only execute the first time it is invoked
  - Subsequent invocations, it just emits a friendly "checking..."
    message so that configure output is sensible/logical
- With the goal of ultimately removing opal/mca/common/ofi, rename the
  output variables from OPAL_CHECK_OFI to be
  opal_ofi_{happy|CPPFLAGS|LDFLAGS|LIBS}.
- Update btl/ofi, btl/usnic, and mtl/ofi for these new conventions.
- Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that
  causes the macro to be invoked at a fairly random time, which makes
  configure stdout confusing / hard to grok.
- Remove a little left-over kruft in OPAL_CHECK_OFI, too (which
  resulted in an indenting change, making the change to
  opal_check_ofi.m4 look larger than it really is).

Thanks Alastair McKinstry for the report and initial fix.
Thanks Rashika Kheria for the reminder.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 06:29:58 -08:00
Jeff Squyres
aba2571881 mtl/ofi/Makefile.am: down with tabs!
Replace all tabs with spaces.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-02-07 06:29:58 -08:00
Gilles Gouaillardet
945f830f7a mtl/ofi: fix configury when VPATH is used
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-07 06:29:58 -08:00
Matias Cabral
0601b3e982
Merge pull request #6325 from aravindksg/fix_help_reference
mtl/ofi: Fix reference to help text object
2019-02-05 07:22:51 -08:00
George Bosilca
e42b573cd3
Fix the PVAR allocation usage.
According to the MPI standard the obj_handle is a pointer to an MPI
object, and therefore cannot be MPI_COMM_WORLD. The MPI standard example
14.6 highlight this usage.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-02-02 19:03:43 -05:00
KAWASHIMA Takahiro
f8a441957a mpiext/shortfloat: Add MPIX_C_FLOAT16 datatype
`MPIX_C_FLOAT16` is defined as a synonym for `MPIX_SHORT_FLOAT`
if the C compiler supports `_Float16`, which is defined in
ISO/IEC JTC 1/SC 22/WG 14 N1945 (ISO/IEC TS 18661-3:2015).
This name and meaning are same as that of MPICH. This may be
a transitional datatype until the MPI Forum decides a proper
name for the type.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 14:55:52 +09:00
KAWASHIMA Takahiro
c44599ec13 mpiext/shortfloat: Add shortfloat MPI extension
This extension provides additional MPI datatypes `MPIX_SHORT_FLOAT`,
`MPIX_C_SHORT_FLOAT_COMPLEX`, and `MPIX_CXX_SHORT_FLOAT_COMPLEX`
for `short float` (C/C++), `short float _Complex` (C), and
`std::complex<short float>` (C++), respectively, or their alternate
types like `_Float16`.

See `ompi/mpiext/shortfloat/README.txt` for details.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 13:01:14 +09:00
KAWASHIMA Takahiro
4d7bde27fb ompi/datatype: Use short float for MPI_REAL2
... and add `MPI_COMPLEX4`.

This commit changes values of existing `OMPI_DATATYPE_MPI_*` macros.
This change does not affect ABI compatibility of `libmpi.so` and the
like because these values are only used in OMPI internal code.

On the other hand, `ompi_datatype_t::id` values of existing datatypes
are not changed and 73 is newly assigned to for `MPI_COMPLEX4` to
retain ABI compatibility.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 13:01:10 +09:00
KAWASHIMA Takahiro
4375c11a58 ompi/datatype: Add ompi_mpi_short_float
... and `ompi_mpi_c_short_float_complex` and `ompi_mpi_cxx_sfltcplex`.

These are Open MPI internal variables intended to be defined as
`MPI_SHORT_FLOAT`, `MPI_C_SHORT_FLOAT_COMPLEX`, and
`MPI_CXX_SHORT_FLOAT_COMPLEX` in the future.

`OMPI_DATATYPE_MPI_C_SHORT_FLOAT_COMPLEX` is also required to
support `MPI_COMPLEX4` in the next commit.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 12:43:13 +09:00
Sergey Lebedev
829846dbcc fp16 hcoll bindings
Signed-off-by: Sergey Lebedev <sergeyle@mellanox.com>
2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro
2ad1c09848 opal/datatype: Add opal_short_float_t
The type `short float`, which is proposed in ISO/IEC JTC 1/SC 22 WG 14
(C WG), is not supported by most compilers yet. But some compilers
(including gcc 7 for AArch64 and clang 6) support `_Float16`, which
is defined in ISO/IEC TS 18661-3:2015 (ISO/IEC JTC 1/SC 22/WG 14 N1945)
as an extensions for C. If it is detected in `configure`, it is used
as an alternate type of `short float` in Open MPI internal code.

This commit adds a `configure` option `--enable-alt-short-float=TYPE`.
It can be used to specify a type other than `short float` and `_Float16`
as the alternate type.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 12:40:14 +09:00
KAWASHIMA Takahiro
f6b39452f6 opal/datatype: Support short float
The type `short float` is proposed for the C language in ISO/IEC JTC
1/SC 22 WG 14 (C WG) for mainly IEEE 754-2008 binary16, a.k.a.
half-precision floating point or FP16.

By this commit, `short float` and `short float _Complex` are detected
in `configure` and used in Open MPI internal code. `MPI_SHORT_FLOAT`
and its complex number version are not added yet.

This commit changes values of existing `OPAL_DATATYPE_*` macros.
This change does not affect ABI compatibility of `libmpi.so` and the
like because these values are only used in OPAL and OMPI internal code.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2019-02-01 12:40:14 +09:00
Jeff Squyres
4c64322db4
Merge pull request #6334 from jsquyres/pr/make-mpi-h-a-little-more-c++-friendly
mpi.h.in: use C++ static_cast<> where appropriate
2019-01-31 07:14:34 -05:00
Jeff Squyres
30afdcead9 mpi.h.in: use C++ static_cast<> where appropriate
When compiling mpi.h with a modern C++ compiler and a high degree of
pickyness (e.g., -Wold-style-cast), casting using (void*) in the
OMPI_PREDEFINED_GLOBAL and MPI_STATUS*_IGNORE macros will emit
warnings.  So if we're compiling with a C++ compiler, use C++'s
static_cast<> instead of (void*).

Thanks to @shadow-fax for identifying the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-01-31 03:22:26 -08:00
Thananon Patinyasakdikul
782ec851ea
Merge pull request #6319 from thananon/pr/allow_overtake
pml/ob1: fix deadlock with communicator flag ALLOW_OVERTAKE.
2019-01-30 15:32:04 -05:00
Jeff Squyres
2203f8d900
Merge pull request #6185 from ggouaillardet/topic/hwloc_macros
hwloc: remove public hwloc macros from opal_config.h
2019-01-30 07:32:22 -05:00
Gilles Gouaillardet
0aeb27f776 topo/treematch: silence a hwloc related warning
treematch/km_partitioning.c #include "config.h",
but there is no such file when the embedded treematch is used.

In order to prevent the embedded treematch from incorrectly using
the config.h from the embedded hwloc, generate a dummy config.h.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-01-30 14:51:38 +09:00