1
1
Граф коммитов

29310 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
f38eebbbfb LICENSE: for v4.0.1
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-02-22 12:02:18 -07:00
Howard Pritchard
7aeb65579b
Merge pull request #6395 from brminich/topic/ucx_net_waddr_4.0.x
PML/UCX: Use net worker address for remote peers - v4.0.x
2019-02-21 20:29:47 -07:00
Geoff Paulsen
e82523fade
Merge pull request #6410 from hppritcha/topic/news_for_v4.0.1
update NEWS with a new fix
2019-02-21 15:48:50 -06:00
Howard Pritchard
7bb728b77a
Merge pull request #6399 from hppritcha/topic/excise_ofi_rml
rml/ofi: remove
2019-02-21 08:44:07 -07:00
Mikhail Brinskii
1c514948f6 PML/UCX: Use net worker address for remote peers
For remote node peers pack smaller worker address, which contains
network device addresses only. This would reduce amount of OOB traffic
during startup.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 751d88192d)
2019-02-21 16:58:20 +02:00
Howard Pritchard
35e3c071dc update NEWS with a new fix
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-02-20 09:55:07 -07:00
Howard Pritchard
83cb9ca51e
Merge pull request #6404 from ggouaillardet/topic/v4.0.x/osc_rdma_self
osc/rdma: correctly handle communications to self
2019-02-20 09:53:50 -07:00
Howard Pritchard
f433b6491a
Merge pull request #6405 from ggouaillardet/topic/v4.0.x/man_win_attach_detach
man: fix typos in MPI_Win_{attach,detach}
2019-02-20 09:51:26 -07:00
KAWASHIMA Takahiro
7b71369632 man: fix more typos in MPI_Win_attach man page
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>

[skip ci]
bot:notest

(cherry picked from commit open-mpi/ompi@7095ad10a5)
2019-02-20 13:26:48 +09:00
Gilles Gouaillardet
3ab227df30 man: fix typos in MPI_Win_{attach,detach} man pages
no code change

[skip ci]
bot:notest

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@7c0596819b)
2019-02-20 13:25:12 +09:00
Gilles Gouaillardet
749f51845b osc/rdma: correctly handle communications to self
mark the "self" peer OMPI_OSC_RDMA_PEER_LOCAL_BASE when
the window is dynamically created and use_cpu_atomics is set
in order to correctly handle communications to self.

Thanks Bart Janssens for reporting this issue.

Refs. open-mpi/ompi#6394

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@fe05fcc11a)
2019-02-20 13:06:05 +09:00
Howard Pritchard
55915c3885 rml/ofi: remove
per discussion at the 2/19/19 devel-core meeting,
remove rml/ofi from 4.0.x

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-02-19 10:27:47 -07:00
Geoff Paulsen
4fd2c605d9
Merge pull request #6391 from hppritcha/topic/news_for_v4.0.1
NEWS: update for 4.0.1 release
2019-02-15 16:17:25 -06:00
Howard Pritchard
4b2c62d6fd NEWS: update for 4.0.1 release
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-02-15 13:43:05 -07:00
Geoff Paulsen
c593b2004e
Merge pull request #6380 from hppritcha/ggouaillardet-topic/oob_tcp_cross_version_compatibility
v4.0.x: oob/tcp: add cross version compatibility support
2019-02-15 13:39:12 -06:00
Howard Pritchard
40db950c7d
Merge pull request #6340 from jsquyres/pr/v4.0.x/make-mpi.h-a-little-friendlier-to-c++
v4.0.x: mpi.h.in: use C++ static_cast<> where appropriate
2019-02-14 17:06:47 -07:00
Howard Pritchard
d2745ad0ad
Merge pull request #6327 from ggouaillardet/topic/v4.0.x/op
ompi/op: fix support of non predefined datatypes with predefined oper…
2019-02-14 17:05:32 -07:00
Howard Pritchard
d82be47013
Merge pull request #6273 from ggouaillardet/topic/v4.0.x/configury_clang5
v4.0.x: configury: enhance C11 detection
2019-02-14 17:04:53 -07:00
Howard Pritchard
de1dd1c2b0 oob/tcp: hardwire oob_tcp version string to 4.0.0
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-02-13 12:54:03 -07:00
Gilles Gouaillardet
dd750795ee oob/tcp: add cross version compatibility support
Since we intend to provide cross version compatibility
between versions with the same major and minor, use
MAJOR.MINOR.0 instead of orte_version_string
(e.g. MAJOR.MINOR.RELEASEGREEK).

Open MPI 4.0.0 has already been released, so in order to make
it compatible with future 4.0.x releases, we have to use 4.0.0
as the version string, that is why we use MAJOR.MINOR.0 instead
of MAJOR.MINOR

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-02-13 10:21:32 -07:00
Howard Pritchard
0b915b7e56
Merge pull request #6333 from jsquyres/pr/v4.0.x/hwloc-macro-conflict-fixes
v4.0.x: Various minor hwloc cleanups
2019-02-12 09:13:19 -07:00
Geoff Paulsen
e9cef8c80f
Merge pull request #6375 from karasevb/4.0.x_regx_host_ordering_fix
v4.0.x/regex: fixed host ordering for different prefixes
2019-02-11 14:23:24 -06:00
Howard Pritchard
6513b855cf
Merge pull request #6249 from hjelmn/v4.0.x_fix_issue_6201_in_the_v4.0.x_branch
v4.0.x: btl/vader: don't try to set reachabilty in add_procs if not requested
2019-02-11 13:16:15 -07:00
Howard Pritchard
5dd63405ce
Merge pull request #6368 from jsquyres/pr/v4.0.x/fix-ofi-configury
v4.0.x: fix OFI configury
2019-02-11 13:15:52 -07:00
Howard Pritchard
9e306cee49
Merge pull request #6336 from jsquyres/pr/v4.0.x/fix-datatype-destructor-leak
v4.0.x: opal/datatype: plug a memory leak in opal_datatype_t destructor
2019-02-11 13:14:06 -07:00
Boris Karasev
87c90866cb regx: fixed the order of hosts for ranges with different prefixes
Example:
For the list of hosts `a01,b00,a00` a regex is generated:
`a[2:1.0],b[2:0]`, where `a`-hosts prefixes moved to the begining,
it breaks the hosts ordering.
This commit fixes regex for that case to `a[2:1],b[2:0],a[2:0]`

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit 46e38b9193)
2019-02-11 12:06:49 +02:00
Boris Karasev
62044da5d9 regx/reverse: fixed adding an empty range for no numerical hostnames
Example:
For the nodelist `jjss,jjss0000001,jjss0000003,jjss0000002` a regular
expression was `jjss[0:0],jjss[7:1,3,2]` that led to incorrect unpacking
the first host as `jjs0`. This commit fixes an adding empty range for
not numeric hostnames. Here is the fixed regex for this exapmle:
`jjss,jjss[7:1,3,2]`

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit 1967e41a71)
2019-02-11 12:06:34 +02:00
Boris Karasev
c154631879 regx/test: update regex test
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit d1ad90f47e)
2019-02-11 12:05:50 +02:00
Howard Pritchard
8552d0e608
Merge pull request #6330 from ggouaillardet/topic/v4.0.x/ompi_datatype_set_args
ompi/datatype: fix how we compute the space needed for the args
2019-02-08 14:44:08 -07:00
Howard Pritchard
d84322076f
Merge pull request #6307 from uberlinuxguy/v4.0.x-fix-for-6303
Adding changes for issue #6303 for branch v4.0.x.
2019-02-08 14:41:11 -07:00
Howard Pritchard
85ed3f47fa
Merge pull request #6347 from ggouaillardet/topic/v4.0.x/opal_convertor_raw
opal/datatype: fix opal_convertor_raw()
2019-02-08 14:39:39 -07:00
Howard Pritchard
7513705600
Merge pull request #6335 from edgargabriel/pr/v4.0.x-floating-point-division-problem
Pr/v4.0.x floating point division problem
2019-02-07 08:44:20 -07:00
Jeff Squyres
7fd62cf745 Remove opal/mca/common/ofi.
It never lived up to its purpose (and has caused amorphous indirect
errors such as https://github.com/open-mpi/ompi/issues/2519), so
delete it.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit dd20174532)
2019-02-07 06:39:22 -08:00
Jeff Squyres
9ad871fc38 ofi: revamp OPAL_CHECK_OFI configury
Update the OPAL_CHECK_OFI configury macro:

- Make it safe to call the macro multiple times:
  - The checks only execute the first time it is invoked
  - Subsequent invocations, it just emits a friendly "checking..."
    message so that configure output is sensible/logical
- With the goal of ultimately removing opal/mca/common/ofi, rename the
  output variables from OPAL_CHECK_OFI to be
  opal_ofi_{happy|CPPFLAGS|LDFLAGS|LIBS}.
- Update btl/usnic and mtl/ofi for these new conventions.
- Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that
  causes the macro to be invoked at a fairly random time, which makes
  configure stdout confusing / hard to grok.
- Remove a little left-over kruft in OPAL_CHECK_OFI, too (which
  resulted in an indenting change, making the change to
  opal_check_ofi.m4 look larger than it really is).

Thanks Alastair McKinstry for the report and initial fix.
Thanks Rashika Kheria for the reminder.

Updated from master cherry pick: the OFI BTL does not exist on the
v4.0.x branch.  Therefore, did not include the OFI BTL changes on
master in this cherry pick.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f5e1a672cc)
2019-02-07 06:36:35 -08:00
Gilles Gouaillardet
0ae48475a1 opal/datatype: reset ptypes in opal_datatype_clone()
Reset ptypes when cloning a datatype in order to prevent
a double free() in the opal_datatype_t destructor.

This fixes a bug introduced in open-mpi/ompi@7c938f070f

Fixes open-mpi/ompi#6346

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@b395342c9f)
2019-02-01 14:39:49 +09:00
Howard Pritchard
4dfb9384cb
Merge pull request #6321 from hppritcha/topic/fix_6236_for_v4.x
Topic/fix 6236 for v4.x
2019-01-31 19:50:05 -06:00
George Bosilca
8acdc53892 Provide a better fix for #6285.
The issue was a little complicated due to the internal stack used in the
convertor. The main issue was that in the case where we run out of iov
space to save the raw description of the data while hanbdling a
repetition (loop), instead of saving the current position and bailing out
directly we reading of the next predefined type element. It worked in
most cases, except the one identified by the HDF5 test. However, the
biggest issue here was the drop in performance for all ensuing calls to
the convertor pack/unpack, as instead of handling contiguous loops as a
whole (and minimizing the number of memory copies) we copied data
description by data description.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

(back-ported from commit open-mpi/ompi@5a82c4fd07)
2019-02-01 09:28:52 +09:00
Gilles Gouaillardet
f7327735a0 opal/datatype: fix opal_convertor_raw
correctly handle the case in which iovec is full and the
last accessed element of the datatype is the beginning of a loop

Refs. open-mpi/ompi#6285

Thanks Axel Huebl for reporting this

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@0832ab5acc)
2019-02-01 09:26:30 +09:00
Jeff Squyres
c39426ec91 mpi.h.in: use C++ static_cast<> where appropriate
When compiling mpi.h with a modern C++ compiler and a high degree of
pickyness (e.g., -Wold-style-cast), casting using (void*) in the
OMPI_PREDEFINED_GLOBAL and MPI_STATUS*_IGNORE macros will emit
warnings.  So if we're compiling with a C++ compiler, use C++'s
static_cast<> instead of (void*).

Thanks to @shadow-fax for identifying the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 30afdcead9)
2019-01-31 04:16:07 -08:00
Gilles Gouaillardet
90a9c12fdb opal/datatype: plug a memory leak in opal_datatype_t destructor
correctly free ptypes if the datatype is not pre-defined.

Thanks Axel Huebl for reporting this.

Refs. open-mpi/ompi#6291

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 7c938f070f)
2019-01-30 10:41:14 -08:00
René Widera
e30e5b95c6 common/ompio: possible rounding issue
Similar to #6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <r.widera@hzdr.de>
(cherry picked from commit a91fab80a1)
2019-01-30 12:31:39 -06:00
Edgar Gabriel
d1e8779fe3 common/ompio: fix a floating point division problem
This commit fixes  a problem reported on the mailing list with
individual writes larger than 512 MB.

The culprit is a floating point division of two large, close values.
Changing the datatypes from float to double (which is what is being
used in the fcoll components) fixes the problem.

See issue #6285 and

 https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118

Thanks for Axel Huebl and René Widera for reporting the issue.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit c0f8ce0fff)
2019-01-30 12:31:16 -06:00
Gilles Gouaillardet
a247292275 topo/treematch: silence a hwloc related warning
treematch/km_partitioning.c #include "config.h",
but there is no such file when the embedded treematch is used.

In order to prevent the embedded treematch from incorrectly using
the config.h from the embedded hwloc, generate a dummy config.h.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 0aeb27f776)
2019-01-30 07:33:33 -05:00
Gilles Gouaillardet
c85fd35f27 opal: remove unnecessary #include file
opal_config_bottom.h can only be #include'd in opal_config.h,
so there is no need to #include "opal_config.h" inside.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit c8790d29de)
2019-01-30 07:33:32 -05:00
Gilles Gouaillardet
f79f14ad93 hwloc/base: fix some off-by-one errors
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 73d104f695)
2019-01-30 07:33:32 -05:00
Jeff Squyres
788c92b1ce hwloc/external.h: fix a clash with external HWLOC_VERSION[*]
Some macros defined by the embedded hwloc ends up in opal_config.h
because hwloc configury m4 files are slurped into Open MPI.  These
macros are not required here, and they might conflict with an external
hwloc install, so simply #undef them in hwloc/external/external.h
after including <opal_config.h> but before including the external
<hwloc.h>.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f22b7d4f46)
2019-01-30 07:33:32 -05:00
Gilles Gouaillardet
fd157a960a ompi/datatype: fix how we compute the space needed for the args
Refs. open-mpi/ompi#6275

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@45fb69b2b9)
2019-01-30 11:01:11 +09:00
Gilles Gouaillardet
f76c81a758 ompi/op: fix support of non predefined datatypes with predefined operators
ACCUMULATE, unlike REDUCE, can use with derived
datatypes with predefinied operations, with some
restrictions outlined in MPI-3:11.3.4.  The derived
datatype must be composed entierly from one predefined
datatype (so you can do all the construction you want,
but at the bottom, you can only use one datatype, say,
MPI_INT).

Refs. open-mpi/ompi#6275

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@bc1cab5498)
2019-01-30 10:29:39 +09:00
Ralph Castain
dae71d3a75 Correct parsing of ppr directives
Needed to apply commit from PR #5778 to get this commit
from PR #6238 to apply cleanly.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit b19e5edf76)
2019-01-29 11:34:44 -07:00
Ralph Castain
18afb8e8a6 Update mapping system
Correctly transfer job-level mapping directives for dynamically spawned
jobs to the mapping system.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 45f23ca5c9)
2019-01-29 10:04:30 -07:00