1
1
Граф коммитов

29867 Коммитов

Автор SHA1 Сообщение Дата
Geoffrey Paulsen
44c1b6fb98 Updating VERSION to v4.0.3rc3
We tried doing an RC2 built without updating the greek,
and found where that failed in build automation.
Reving again for rc3, as we've already applied the rc2 tag.

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-28 10:42:04 -05:00
Geoff Paulsen
b9d54dadb6
Merge pull request #7341 from hppritcha/topic/news_for_rc4.0.3rc2
NEWS: updates for 4.0.3rc2
2020-01-27 14:52:58 -06:00
Howard Pritchard
7147a8c3bb NEWS: updates for 4.0.3rc2
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-27 13:50:35 -07:00
Howard Pritchard
0ea96ec4db
Merge pull request #7340 from jjhursey/v4-no-ssh-core
plm/rsh: Fix segv on missing agent.
2020-01-27 13:08:40 -07:00
Joshua Hursey
05d003b109
plm/rsh: Fix segv on missing agent.
* Additionally, fixes the `NULL` option to `OMPI_MCA_plm_rsh_agent`
   would would also lead to a segv. Now it operates as intended by
   disqualifying the `rsh` component and falling back onto the `isolated`
   component.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
(cherry picked from commit 62d0058738)
2020-01-27 10:34:28 -06:00
Howard Pritchard
5f40b47088
Merge pull request #7338 from hppritcha/topic/fix_6539_v4.0.x
Topic/fix 6539 v4.0.x
2020-01-26 12:39:38 -07:00
Howard Pritchard
297505592a Fix a problem with fortran configure test.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-24 15:42:00 -06:00
Geoff Paulsen
2549ba2e47
Merge pull request #7329 from janjust/v4.0.x-oshmem-perf-multi-worker
V4.0.x: oshmem/ucx: improves spml ucx performance for multi-threaded applications.
2020-01-24 13:41:13 -06:00
Howard Pritchard
d12e0fdf32 make mpifort obey disable-wrapper-runpath
related to #6539

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 37b3e2f3fa)
2020-01-24 10:47:22 -06:00
Sergey Oblomov
91ab0e2191 SPML/UCX: fixed coverity issues
- fixed sizeof(char***) by variable datatype
- fixed resorce leak in proc_add

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 8543860689)
2020-01-24 17:29:53 +02:00
Tomislav Janjusic
0daf3df384 oshmem/ucx: improves spml ucx performance for multi-threaded
applications.

Improves multi-threaded performance by adding the option to create
multiple ucx workers in threaded applications.

Co-authored with:
Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 3d6bf9fd8e)
2020-01-24 17:29:53 +02:00
Howard Pritchard
0b2b9d7660
Merge pull request #7325 from hppritcha/topic/pr_7304_to_v4.0.x
btl/vader: modify how the max attachment address is determined
2020-01-24 08:00:36 -07:00
Howard Pritchard
686f2debda
Merge pull request #7327 from janjust/v4.0.x-oshmem-perf-progress
v4.0.x: oshmem/ucx: Fix progress in iput/iget: periodically poke progress to prevent hardware stalls when using DCT transport.
2020-01-24 07:58:54 -07:00
Howard Pritchard
0f54228535
Merge pull request #7321 from hppritcha/topic/pr_2551_to_v4.0.x
Topic/pr 7283 to v4.0.x
2020-01-23 16:36:36 -07:00
Tomislav Janjusic
9e755d3803 oshmem/ucx: Improves performance for non-blocking put/get operations.
Improves the performance when excess non-blocking operations are posted
by periodically calling progress on ucx workers.

Co-authored with:
Artem Y. Polyakov <artemp@mellanox.com>,
Manjunath Gorentla Venkata <manjunath@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 1b58e3d073)
2020-01-22 21:45:32 +02:00
Nathan Hjelm
66684bbda3 btl/vader: modify how the max attachment address is determined
This PR removes the constant defining the max attachment address and
replaces it with the largest address that shows up in /proc/self/maps.
This should address issues found on AARCH64 where the max address
may differ based on the configuration.

Since the calculated max address may differ between processes the
max address is sent as part of the modex and stored in the endpoint
data.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 728d51f9f3)
2020-01-19 15:05:41 -08:00
Nathan Hjelm
a64a7c8a0a btl/vader: fix issues with xpmem registration invalidation
This commit fixes an issue discovered in the XPMEM registration cache. It
was possible for a registration to be invalidated by multiple threads
leading to a double-free situation or re-use of an invalidated registration.

This commit fixes the issue by setting the INVALID flag on a registation
when it will be deleted. The flag is set while iterating over the tree
to take advantage of the fact that a registration can not be removed
from the VMA tree by a thread while another thread is traversing the VMA
tree.

References #6524
References #7030
Closes #6534

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit f86f805be1)
2020-01-19 13:42:00 -08:00
Nathan Hjelm
76002ada84 opal: make interval tree resilient to similar intervals
There are cases where the same interval may be in the tree multiple
times. This generally isn't a problem when searching the tree but
may cause issues when attempting to delete a particular registration
from the tree. The issue is fixed by breaking a low value tie by
checking the high value then the interval data.

If the high, low, and data of a new insertion exactly matches an
existing interval then an assertion is raised.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 1145abc0b7)
2020-01-19 13:40:57 -08:00
Geoff Paulsen
629d0efa15
Merge pull request #7314 from hppritcha/topic/NEWS_v403
NEWS: update for 4.0.3
2020-01-17 12:41:05 -06:00
Geoff Paulsen
b21c475df6
Merge pull request #7313 from hppritcha/topic/version_for_4.0.3
VERSION - update for v4.0.3
2020-01-17 12:40:52 -06:00
Howard Pritchard
9d32fedd55 NEWS: update for 4.0.3
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-17 11:29:08 -07:00
Howard Pritchard
baf1b06c9e VERSION - update for v4.0.3
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-17 10:21:40 -07:00
Howard Pritchard
1cdcce7f89
Merge pull request #7296 from michaellass/v4.0.x-fix-dims_create
dims_create: fix calculation of factors for odd squares (v4.0.x)
2020-01-14 09:10:40 -07:00
Geoff Paulsen
3da939b124
Merge pull request #7248 from wckzhang/v4.0.x
MTL/OFI: Check threshold number of peers allowed per rank
2020-01-13 14:19:51 -06:00
Geoff Paulsen
6985a5560f
Merge pull request #7291 from gpaulsen/topic/v4.0.x/from_pr7190_7192
Topic/v4.0.x/from pr7190 7192
2020-01-13 14:03:16 -06:00
Michael Lass
ff85c82151 dims_create: fix calculation of factors for odd squares
Until now sqrt(n) was missed as a factor for odd square numbers n. This
lead to suboptimal results of MPI_Dims_create for input numbers like 9,
25, 49, ... Fix the results by including sqrt(n) in the search for
factors.

Refs: #7186

Signed-off-by: Michael Lass <bevan@bi-co.net>
(cherry picked from commit 67490118ad)
2020-01-10 16:07:40 +01:00
Howard Pritchard
8df9f53bdc
Merge pull request #7288 from gpaulsen/topic/v4.0.x/from_pr7191
v4.0.x: Add the missing code to check a return code
2020-01-09 18:01:01 -07:00
Geoff Paulsen
e8834629af
Merge pull request #7294 from gpaulsen/topic/v4.0.x/from_pr7183
lustre: squash some compiler warnings
2020-01-09 08:41:02 -06:00
Howard Pritchard
90cc1f1cf0
Merge pull request #7287 from hppritcha/topic/support_for_cray_fortran_v4.0.x
cray ftn: modify fortran module loc checker
2020-01-08 19:51:13 -07:00
Geoff Paulsen
c18d74bd70
Merge pull request #7282 from jsquyres/pr/v4.0.x/large-count-ddt-fixes
v4.0.x: Large count DDT fixes
2020-01-08 15:01:23 -06:00
Howard Pritchard
e06595d7f6 lustre: squash some compiler warnings
Compiling OMPI on cray systems using latest Cray compilers (clang based)
yielded some compiler warnings from ompio/lustre.  Squash these warnings.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit e66a7cef11)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 16:00:01 -05:00
Geoffroy Vallee
836ce83c9a Fix typo in comment: contiaing -> containing
Signed-off-by: Geoffroy Vallee <geoffroy.vallee@gmail.com>
(cherry picked from commit 127573cf44)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 15:49:15 -05:00
Geoffroy Vallee
d59faea868 Fix a type in comments: insertted -> inserted
Signed-off-by: Geoffroy Vallee <geoffroy.vallee@gmail.com>
(cherry picked from commit 98de17c6da)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 15:48:40 -05:00
Geoffroy Vallee
a479beeeae Add the missing code to check a return code
Signed-off-by: Geoffroy Vallee <geoffroy.vallee@gmail.com>
(cherry picked from commit de6f130b4a)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2020-01-08 15:27:38 -05:00
Howard Pritchard
dc8246b8ae
Merge pull request #7279 from jsquyres/pr/v4.0.x/rpm-specfile-fix
v4.0.x: openmpi.spec: update modulefile_path behavior
2020-01-08 12:27:37 -07:00
Howard Pritchard
9582b76168 cray ftn: modify fortran module loc checker
to support the Cray Fortran compiler.  Cray Fortran compiler does not
contain all symbol info in the module file, have to link with the *.o
created as part of module file compilation.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 441bad9a75)
2020-01-08 13:20:02 -06:00
George Bosilca
9330dc2a42 Swap the 2 fields to maintain the size of the struct.
Thanks @devreal for catching this.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 3de636dc6f)
2020-01-07 15:13:14 -08:00
George Bosilca
a1b4e697f5 Prevent overflow when dealing with datatype count.
This patch fixes #7147 by preventing overflow when multiplying
the count and the blocklen. The count reflects MPI count and is
therefore bound to the size of an int (it is an uint32_t) while the
blocklen can be merged together to represent the largest contiguous
memory layout and it is therefore promoted to a size_t.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 59fb02618e)
2020-01-07 15:13:14 -08:00
Jeff Squyres
a183bb019d openmpi.spec: update modulefile_path behavior
Allow the user to override the modulefile_path (root directory to
install the Open MPI modulefile), even if install_in_opt==1.  For
example:

rpmbuild \
    --rebuild \
    --define 'install_in_opt 1' \
    --define 'modulefile_path /path/to/my/modulefiles/openmpi/%{version}' \
    openmpi-4.0.2-1.src.rpm

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 352e575e18)
2020-01-07 14:54:09 -08:00
Geoff Paulsen
93c879962e
Merge pull request #7168 from wbailey2/pr/fix-yield_when_idle
v4.0.x: schizo/ompi: correctly handle the yield_when_idle option
2020-01-03 14:06:36 -06:00
Geoff Paulsen
e561f2aa69
Merge pull request #7269 from hppritcha/topic/pr7238_to_v4.0.x
mtl/ofi: ignore case when comparing provider names
2020-01-03 13:36:57 -06:00
Robert Wespetal
47c435e531 mtl/ofi: ignore case when comparing provider names
Change the provider include and exclude list name comparison check to
ignore case. The UDP provider's name is uppercase and was being selected
despite being in the exclude list.

Signed-off-by: Robert Wespetal <wesper@amazon.com>
(cherry picked from commit 9b72e9465d)
2020-01-03 08:52:24 -08:00
Howard Pritchard
6a739f8357
Merge pull request #7243 from jsquyres/pr/v4.0.x/neighbor-alltoall-fix
v4.0.x: neighbor alltoall fix
2019-12-23 13:40:28 -07:00
Howard Pritchard
67235b7906
Merge pull request #7250 from janjust/v4.0.x-oshmem_atomic_set_fix
V4.0.x oshmem atomic set fix
2019-12-20 06:44:02 -07:00
Howard Pritchard
4c1cd3faf7
Merge pull request #7254 from jsquyres/pr/v4.0.x/hwloc-readme-clarifications
v4.0.x: hwloc: clarify --with-hwloc behavior
2019-12-20 06:41:11 -07:00
Aravind Gopalakrishnan
1bee429a8d MTL/OFI: Check threshold number of peers allowed per rank
When the provider does not support FI_REMOTE_CQ_DATA, the OFI tag does not have
sizeof(int) bits for the rank. Therefore, unexpected behavior will occur when
this limit is crossed.

Check the max allowed number of ranks during add_procs() and return if there is
danger of exceeding this threshold.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 5cf43de445)
2019-12-19 22:36:43 +00:00
Jeff Squyres
814f3e9caa hwloc: clarify --with-hwloc behavior
Clarify in README what --with-hwloc does in its different use cases.

Also, ensure that the behavior when specifying `--with-hwloc` is the
same as if that option is not specified at all.  This is what we did
in Open MPI <= v3.x; looks like we inadvertantly caused `--with-hwloc`
to be synonymous with `--with-hwloc=external` in v4.0.0.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 18c3e1af5e)
2019-12-19 12:30:40 -08:00
Tomislav Janjusic
5489bc081f oshmem/extended: Fix shmem_atomic_set for float and double.
Co-authored with: Artem Polyakov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 2d8f9b1d09)
2019-12-19 21:20:36 +02:00
Tomislav Janjusic
ae30df4bae oshmem/ucx: fixed a build issue
Co-authored with: Artem Polyakov <artemp@mellanox.com>

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit cb5ff55b27)
2019-12-19 21:20:05 +02:00
Howard Pritchard
f6914ee35c
Merge pull request #7229 from tkordenbrock/topic/v4.0.x/portals4.fix.flowcontrol.bugs
v4.0.x: portals4: fix flow control bugs
2019-12-18 08:35:46 -07:00