1
1

29701 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
a64a7c8a0a btl/vader: fix issues with xpmem registration invalidation
This commit fixes an issue discovered in the XPMEM registration cache. It
was possible for a registration to be invalidated by multiple threads
leading to a double-free situation or re-use of an invalidated registration.

This commit fixes the issue by setting the INVALID flag on a registation
when it will be deleted. The flag is set while iterating over the tree
to take advantage of the fact that a registration can not be removed
from the VMA tree by a thread while another thread is traversing the VMA
tree.

References #6524
References #7030
Closes #6534

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit f86f805be1145ace46b570c5c518555b38e58cee)
2020-01-19 13:42:00 -08:00
Nathan Hjelm
76002ada84 opal: make interval tree resilient to similar intervals
There are cases where the same interval may be in the tree multiple
times. This generally isn't a problem when searching the tree but
may cause issues when attempting to delete a particular registration
from the tree. The issue is fixed by breaking a low value tie by
checking the high value then the interval data.

If the high, low, and data of a new insertion exactly matches an
existing interval then an assertion is raised.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 1145abc0b790f82ea25e24a3becad91ff502769c)
2020-01-19 13:40:57 -08:00
Geoff Paulsen
629d0efa15
Merge pull request #7314 from hppritcha/topic/NEWS_v403
NEWS: update for 4.0.3
2020-01-17 12:41:05 -06:00
Geoff Paulsen
b21c475df6
Merge pull request #7313 from hppritcha/topic/version_for_4.0.3
VERSION - update for v4.0.3
2020-01-17 12:40:52 -06:00
Howard Pritchard
9d32fedd55 NEWS: update for 4.0.3
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-17 11:29:08 -07:00
Howard Pritchard
baf1b06c9e VERSION - update for v4.0.3
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-17 10:21:40 -07:00
Howard Pritchard
1cdcce7f89
Merge pull request #7296 from michaellass/v4.0.x-fix-dims_create
dims_create: fix calculation of factors for odd squares (v4.0.x)
2020-01-14 09:10:40 -07:00
Geoff Paulsen
3da939b124
Merge pull request #7248 from wckzhang/v4.0.x
MTL/OFI: Check threshold number of peers allowed per rank
2020-01-13 14:19:51 -06:00
Geoff Paulsen
6985a5560f
Merge pull request #7291 from gpaulsen/topic/v4.0.x/from_pr7190_7192
Topic/v4.0.x/from pr7190 7192
2020-01-13 14:03:16 -06:00
Michael Lass
ff85c82151 dims_create: fix calculation of factors for odd squares
Until now sqrt(n) was missed as a factor for odd square numbers n. This
lead to suboptimal results of MPI_Dims_create for input numbers like 9,
25, 49, ... Fix the results by including sqrt(n) in the search for
factors.

Refs: #7186

Signed-off-by: Michael Lass <bevan@bi-co.net>
(cherry picked from commit 67490118adb8372d2aefe1d2d923432e51e100cd)
2020-01-10 16:07:40 +01:00
Howard Pritchard
8df9f53bdc
Merge pull request #7288 from gpaulsen/topic/v4.0.x/from_pr7191
v4.0.x: Add the missing code to check a return code
2020-01-09 18:01:01 -07:00
Geoff Paulsen
e8834629af
Merge pull request #7294 from gpaulsen/topic/v4.0.x/from_pr7183
lustre: squash some compiler warnings
2020-01-09 08:41:02 -06:00
Howard Pritchard
90cc1f1cf0
Merge pull request #7287 from hppritcha/topic/support_for_cray_fortran_v4.0.x
cray ftn: modify fortran module loc checker
2020-01-08 19:51:13 -07:00
Geoff Paulsen
c18d74bd70
Merge pull request #7282 from jsquyres/pr/v4.0.x/large-count-ddt-fixes
v4.0.x: Large count DDT fixes
2020-01-08 15:01:23 -06:00
Howard Pritchard
e06595d7f6 lustre: squash some compiler warnings
Compiling OMPI on cray systems using latest Cray compilers (clang based)
yielded some compiler warnings from ompio/lustre.  Squash these warnings.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit e66a7cef11c51f64b7766080e1cef34b1395c4da)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 16:00:01 -05:00
Geoffroy Vallee
836ce83c9a Fix typo in comment: contiaing -> containing
Signed-off-by: Geoffroy Vallee <geoffroy.vallee@gmail.com>
(cherry picked from commit 127573cf44973c5c670a55e80ab3603e22af70ac)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 15:49:15 -05:00
Geoffroy Vallee
d59faea868 Fix a type in comments: insertted -> inserted
Signed-off-by: Geoffroy Vallee <geoffroy.vallee@gmail.com>
(cherry picked from commit 98de17c6da85eb42963928ecbbcc9ca88fcc0598)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 15:48:40 -05:00
Geoffroy Vallee
a479beeeae Add the missing code to check a return code
Signed-off-by: Geoffroy Vallee <geoffroy.vallee@gmail.com>
(cherry picked from commit de6f130b4a5b001ef85317fe3ebc2c8f8c8077e9)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 15:27:38 -05:00
Howard Pritchard
dc8246b8ae
Merge pull request #7279 from jsquyres/pr/v4.0.x/rpm-specfile-fix
v4.0.x: openmpi.spec: update modulefile_path behavior
2020-01-08 12:27:37 -07:00
Howard Pritchard
9582b76168 cray ftn: modify fortran module loc checker
to support the Cray Fortran compiler.  Cray Fortran compiler does not
contain all symbol info in the module file, have to link with the *.o
created as part of module file compilation.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 441bad9a758d79f1eb0fb85fbe6ee10a6f5e57b7)
2020-01-08 13:20:02 -06:00
George Bosilca
9330dc2a42 Swap the 2 fields to maintain the size of the struct.
Thanks @devreal for catching this.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 3de636dc6fda43fb31bcd69b6a91b7f8de1a985c)
2020-01-07 15:13:14 -08:00
George Bosilca
a1b4e697f5 Prevent overflow when dealing with datatype count.
This patch fixes #7147 by preventing overflow when multiplying
the count and the blocklen. The count reflects MPI count and is
therefore bound to the size of an int (it is an uint32_t) while the
blocklen can be merged together to represent the largest contiguous
memory layout and it is therefore promoted to a size_t.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 59fb02618e75022e3cd137199e3d390757e3c7e0)
2020-01-07 15:13:14 -08:00
Jeff Squyres
a183bb019d openmpi.spec: update modulefile_path behavior
Allow the user to override the modulefile_path (root directory to
install the Open MPI modulefile), even if install_in_opt==1.  For
example:

rpmbuild \
    --rebuild \
    --define 'install_in_opt 1' \
    --define 'modulefile_path /path/to/my/modulefiles/openmpi/%{version}' \
    openmpi-4.0.2-1.src.rpm

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 352e575e187bfc0a129c51924c244384f3303a65)
2020-01-07 14:54:09 -08:00
Geoff Paulsen
93c879962e
Merge pull request #7168 from wbailey2/pr/fix-yield_when_idle
v4.0.x: schizo/ompi: correctly handle the yield_when_idle option
2020-01-03 14:06:36 -06:00
Geoff Paulsen
e561f2aa69
Merge pull request #7269 from hppritcha/topic/pr7238_to_v4.0.x
mtl/ofi: ignore case when comparing provider names
2020-01-03 13:36:57 -06:00
Robert Wespetal
47c435e531 mtl/ofi: ignore case when comparing provider names
Change the provider include and exclude list name comparison check to
ignore case. The UDP provider's name is uppercase and was being selected
despite being in the exclude list.

Signed-off-by: Robert Wespetal <wesper@amazon.com>
(cherry picked from commit 9b72e9465da3f2891ac13ed0443db44136506a1a)
2020-01-03 08:52:24 -08:00
Howard Pritchard
6a739f8357
Merge pull request #7243 from jsquyres/pr/v4.0.x/neighbor-alltoall-fix
v4.0.x: neighbor alltoall fix
2019-12-23 13:40:28 -07:00
Howard Pritchard
67235b7906
Merge pull request #7250 from janjust/v4.0.x-oshmem_atomic_set_fix
V4.0.x oshmem atomic set fix
2019-12-20 06:44:02 -07:00
Howard Pritchard
4c1cd3faf7
Merge pull request #7254 from jsquyres/pr/v4.0.x/hwloc-readme-clarifications
v4.0.x: hwloc: clarify --with-hwloc behavior
2019-12-20 06:41:11 -07:00
Aravind Gopalakrishnan
1bee429a8d MTL/OFI: Check threshold number of peers allowed per rank
When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have
sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when
this limit is crossed.

Check the max allowed number of ranks during add_procs() and return if there is
danger of exceeding this threshold.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 5cf43de44538b818b014bdad0490439e2d212395)
2019-12-19 22:36:43 +00:00
Jeff Squyres
814f3e9caa hwloc: clarify --with-hwloc behavior
Clarify in README what --with-hwloc does in its different use cases.

Also, ensure that the behavior when specifying `--with-hwloc` is the
same as if that option is not specified at all.  This is what we did
in Open MPI <= v3.x; looks like we inadvertantly caused `--with-hwloc`
to be synonymous with `--with-hwloc=external` in v4.0.0.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 18c3e1af5ef281e8b502a7ad778256889cc707d1)
2019-12-19 12:30:40 -08:00
Tomislav Janjusic
5489bc081f oshmem/extended: Fix shmem_atomic_set for float and double.
Co-authored with: Artem Polyakov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 2d8f9b1d09d0dd8dee9e81f0ea4eaac6f979621c)
2019-12-19 21:20:36 +02:00
Tomislav Janjusic
ae30df4bae oshmem/ucx: fixed a build issue
Co-authored with: Artem Polyakov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit cb5ff55b27528817a2fbe6dbf452535a6219f57f)
2019-12-19 21:20:05 +02:00
Howard Pritchard
f6914ee35c
Merge pull request #7229 from tkordenbrock/topic/v4.0.x/portals4.fix.flowcontrol.bugs
v4.0.x: portals4: fix flow control bugs
2019-12-18 08:35:46 -07:00
George Bosilca
be58cf7982 Fix the communication ordering for all cartesian neighbor collectives.
This work is rooted in the [MPI Forum issue
153](https://github.com/mpi-forum/mpi-issues/issues/153).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 86acdee4606c1ac3b38070d1b7973a00a991f1d6)
2019-12-17 14:25:22 -08:00
Nathan Hjelm
21221eb70a coll/basic: fix neighbor alltoall message ordering
This commit updates the coll/basic component to correctly order sends
and receives for cartesian communicators with cyclic boundaries. This
addresses an issue identified by mpi-forum/mpi-issues#153. This issue
occurs when the size in any dimension is 1. This gives the same
neighbor in the positive and negative directions. The old code was
sending and receiving in the same order so the -1 buffer contained
the +1 result and vise-versa. The problem is addressed by using
unique tags for each send. This should cover both the case where
overtaking is allowed and is not allowed. The former case will be
possible is a MPI_Cart_create_with_info() call is added to the
standard.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 196a91e604885d7aae9ac9dfbd9b2e846b3015b7)
2019-12-17 14:25:22 -08:00
Howard Pritchard
3f752f1d4f
Merge pull request #7237 from mcoil1/pr/v4.0.x/wbailey2-fixes
v4.0.x/two fixes
2019-12-17 09:22:07 -07:00
Howard Pritchard
c0b27e4f74
Merge pull request #7236 from mcoil1/pr/v4.0.x/several-fixes
v4.0.x/several fixes
2019-12-17 09:19:59 -07:00
Howard Pritchard
af04a9d469
Merge pull request #7215 from janjust/v4.0.x-oshmem-context-fixes
v4.0.x, oshmem:ucx, fix race condition and add context recycling
2019-12-16 13:03:58 -07:00
William Bailey
c01a71fbe9 romio: fix uninitialized variable
Squash compiler warning.

ROMIO is third-party software but has an annoying compiler warning;
this is the minimum distance fix.

Signed-off-by: William Bailey <wbailey2@nd.edu>
(cherry picked from commit 30bda56bcef6f56823ac07f0418fd33e1eff837f)
2019-12-14 17:18:57 -05:00
William Bailey
bc018dec4a Changed the final URL to https://github.com/westes/flex
Signed-off-by: William Bailey <wbailey2@nd.edu>
(cherry picked from commit caf1d9292c51af8b8ccdd4da1238e774c7f04545)
2019-12-14 17:17:26 -05:00
Maxwell Coil
07a54b7025 memory/patcher: fix compiler warning
syscall() returns a long, but we are invoking shmat(), which returns
a void*.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
(cherry picked from commit 52a9cce6f3dcd87e9ae66177398b60b9317e9339)
2019-12-14 12:26:47 -05:00
Maxwell Coil
6fdd902d3f romio: Update ADIOI_R_Exchange_data function
Squash compiler warning due to whitespace/brace problems.

The code block from lines 829-839 was improperly indented, which led to
both the code being confusing and a compiler warning. Comparing this code to
the current version in the MPICH repo made it clear that the code was simply
improperly indented. Fixing the indentation both makes the code readable and
squashes the compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
(cherry picked from commit 8c237e268472a763ad4aa55e24d25ee7ca64b888)
2019-12-14 12:25:51 -05:00
Maxwell Coil
84a67bd6cf libnbc: fixed uninitialized variable
Squash compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
(cherry picked from commit 52241dbbcdcbf3605c8098d0cfbcf3c5a75a1c9c)
2019-12-14 12:25:18 -05:00
Maxwell Coil
879a25c239 ompi/dpm/dpm.c: Fix uninititalized variable
Squash compiler warning.

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
(cherry picked from commit 3ced33c2ebf403206c242c89bb70b6b9a0fc8968)
2019-12-14 12:24:57 -05:00
Maxwell Coil
3964144ca5 Fix misleading error message with missing #! interpreter
This change fixes the misleading error message. I added a conditional to
determine whether the error is due to a missing file or a bad interpreter.
If it is the latter, a new, more precise error message will be displayed.

Fixes #4528

Signed-off-by: Maxwell Coil <mcoil@nd.edu>
(cherry picked from commit 9b73f6ac83b1efd6db60d71a6e0a4c83b52af0aa)
2019-12-14 12:22:55 -05:00
Howard Pritchard
59d8d62555
Merge pull request #7116 from ggouaillardet/topic/v4.0.x/f08_bind_c_constants_revamp
v4.0.x: fortran/use-mpi-f08: revamp mpi_f08 constants
2019-12-13 08:07:42 -07:00
Todd Kordenbrock
2c082b6c7c btl-portals4: fix a flow control configure bug
This commit fixes a configure bug that caused flow control to be
disabled regardless of the configure options used.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
(cherry picked from commit f7e74b6a3d19cd6e3edc3364783311e595f81b3d)
2019-12-12 08:43:20 -06:00
Todd Kordenbrock
1f5a79bbd4 mtl-portals4: don't finalize flow control if Portals4 was not initialized
This commit fixes a segfault in mtl-portals4 finalize().  The segfault
occurs if finalize() is called without any calls to add_procs().  This
commit resolves the segfault by skipping the flow control fini() call if
Portals4 was not initialized.

Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
(cherry picked from commit e7b867c044f8b776b75f3c6d917745c06237743e)
2019-12-12 08:43:20 -06:00
Howard Pritchard
684c180cce
Merge pull request #7224 from jsquyres/pr/v4.0.x/mpool-basic-base-ptr-fix
v4.0.x: mpool/base: fix basic mpool_base() function
2019-12-12 05:38:49 -07:00