1
1

30126 Коммитов

Автор SHA1 Сообщение Дата
Raghu Raja
9ac5471035
Merge pull request #8300 from jsquyres/pr/v4.1.0-final-final-final
v4.1.0: README and VERSION final updates
2020-12-18 11:11:33 -08:00
Jeff Squyres
adb29bb328 v4.1.0: README, VERSION, and LICENSE final updates
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-12-18 09:03:29 -08:00
Raghu Raja
7de39931a3
Merge pull request #8297 from edgargabriel/pr/v4.1.x-ompio-sync
ompio: resync v4.1 branch to master
2020-12-18 08:40:08 -08:00
Jeff Squyres
b6c0baccdd NEWS: OMPIO is now the default everywhere
Huzzah!

Signed-off-by: Jeff Squyres <jeff@squyres.com>
2020-12-18 10:19:19 -05:00
Edgar Gabriel
ff130e7a9c ompio: resync v4.1 branch to master
this commit syncs ompio related directories in v4.1.x to master. The efforts to bring the lustre performance fixes and support for external32 data representation over were too overwhelming when dealing with every single pr individually.

There are a very few minor modification that had to be done for syncing:
 - v4.1.x does not have opal/mutex.h
 - v4.1.x does not have opal_atomic_int32_t datatype
 - the io module structure has two fewer function pointers (related to info_set/get) compared to the version on master.

Tested so far with the ompio testsuite as well as hdf5-1.10.5 testsuite (testphdf5, t_shapesame, t_bigio) on an XFS file system.
More tests on Lustre and BeeGFS to follow.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2020-12-17 17:44:26 -06:00
Raghu Raja
7fd4f3261d
Merge pull request #8289 from rhc54/cmr41/slurm
v4.1.x: Update Slurm launch support
2020-12-15 12:11:04 -08:00
Ralph Castain
7b138ec6d9
Update Slurm launch support
Assign all cpu's on node to the daemon

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 7bac7eed6ef423e47fe980b4c32eae36b8e1d4cb)
2020-12-15 08:26:33 -08:00
Raghu Raja
c65e9cbfb7
Merge pull request #8277 from rajachan/4.1.0rc5-version
VERSION: 4.1.0rc5
2020-12-08 17:14:17 -08:00
Raghu Raja
fbc711e687 VERSION: 4.1.0rc5
Updating VERSION and NEWS for the 4.1.0rc5 release.

Signed-off-by: Raghu Raja <craghun@amazon.com>
2020-12-08 14:56:25 -08:00
Jeff Squyres
cf1705298c
Merge pull request #8273 from rhc54/cmr41/pmix322
v4.1.x: Update PMIx to v3.2.2
2020-12-08 10:03:36 -05:00
Jeff Squyres
91b81d9966
Merge pull request #8275 from wzamazon/v4.1.x_pmix_callback_wmb
[v4.1.x] ompi : add memory barrier in PMIx registration callback
2020-12-08 10:02:15 -05:00
Wei Zhang
6760d531d5 [v4.1.x] ompi : add memory barrier in PMIx registration callback
PMIx reigstration callback functions are used when regitering PMIx
event handler.

This patch adjusts two such callback functions:

    model_registration_callback()
         in ompi/interlib/interlib.c and

    ompi_errhandler_registration_callback()
         in ompi/errhandler/errhandler.c

Both of them employes the following code structure:

static void xxx_callback(int status,
			 size_t errhandler_ref,
			 void *cbdata)
{
    myreg_t *trk = (myreg_t*)cbdata;

    trk->status = status;
    interlibhandler_id = errhandler_ref;
    trk->active = false;
}

The workflow is:

1. caller will call opal_pmix.register_evhandler() with
   callback function as an argument.
2. caller will call OMPI_LAZY_WAIT_FOR_COMPLETION(trk.active)
   to wait for trk->active to became false,
3. PMIx do the registration on anther thread, then call the
   registration callback function, which will set trk->active
   to false.
4. caller check trk->status to determine whether registration
   succeeded.

The expected behavior of the registration callback functions therefore
is that trk->status be updated first, then trk->active be set to false.

However, on ARM based systems, the expected behavior is not guaranteed
because ARM uses a relaxed memory model.

To address this issue, this patch added a call to opal_atomic_wmb()
(write memory barrier) after trk->status being set, to achieve the
expected behavior.

Signed-off-by: Wei Zhang <wzam@amazon.com>
2020-12-08 01:58:05 +00:00
Ralph Castain
da9ebdac42
Update PMIx to v3.2.2
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-12-07 13:25:39 -08:00
Jeff Squyres
d472f5a40f
Merge pull request #8265 from cpshereda/v4.1.x
v4.1.x: Fixed uninitialized memory access bug in base64 encoding
2020-12-07 15:35:39 -05:00
Jeff Squyres
9970e00188
Merge pull request #8254 from gleon99/v4.1.x
Replace usage of the deprecated NB API of UCX with NBX
2020-12-07 15:33:05 -05:00
Charles Shereda
97b28732e4 Fixed uninitialzed memory access bug in base64 encoding.
Signed-off-by: Charles Shereda <cpshereda@lanl.gov>
2020-12-03 10:31:29 -07:00
Jeff Squyres
ab3fc05101
Merge pull request #8256 from rhc54/cmr41/fix
v4.1.x: Fix the verbose output in ess base
2020-11-26 09:26:14 -05:00
Ralph Castain
09e9fe0178
Fix the verbose output in ess base
Only get the locality string and output binding message when requested

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-11-25 14:24:47 -08:00
Raghu Raja
6e2c8cfcba
Merge pull request #8255 from jsquyres/pr/v4.1.x/fix-missed-warnings
v4.1.x: Fix missed compiler warnings
2020-11-25 13:47:39 -08:00
Brian Barrett
8324b4e969 opal: Disable memory patcher component on MacOS
Open MPI doesn't support any transports on MacOS which require
memory manager hooks.  The memory patcher component uses the
syscall interface, which has been deprecated in recent versions
of MacOS.  Since we don't need it and it emits warnings about
deprecation, disable the memory patcher component on MacOS.

Fixes #5671

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 19e16d5fd0e3bc148b47d957b9b84a425c87777c)
2020-11-25 15:18:01 -05:00
Brian Barrett
f566613c5d opal: Remove outdated MacOS workaround
Remove the pack/unpack pragma around net/if.h on MacOS, which
was added to fix a bug in MacOS X 10.4.x on 64-bit platforms.
The bug was fixed in Mac OS X 10.5.0 and, sometime in the last
11 years, compilers started emitting warnings about the fact
that the Apple header stomped over the pragma pack settings
from the workaround.  We already don't support versions of MacOS
earlier than 10.5, so there's no point in keeping the workaround.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit a25df3f29e213c5ef094d66082b0e07e9d5a0759)
2020-11-25 15:09:46 -05:00
Jeff Squyres
529accb619 coll/base: fix compiler warnings
Add some "const"s that needed to be applied here on the v4.1.x branch,
effectively by cherry-picking part of b65ec273074 from master.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-11-25 15:09:46 -05:00
Leonid Genkin
0a819bff1a Replace usage of the deprecated NB API of UCX with NBX
Signed-off-by: Leonid Genkin <lgenkin@nvidia.com>
(cherry picked from commit 7f9a305a64f97e9611bc11c4c9db0161b9a02938)
2020-11-25 16:42:05 +02:00
Jeff Squyres
0ae14c06ee
Merge pull request #8251 from devreal/fix-han-commselect-new-v4.1.x
v4.1.x: coll/han: fix coll preference selection in mca_coll_han_comm_create_new
2020-11-24 09:22:26 -05:00
Joseph Schuchart
576db786af coll/han: fix coll preference selection in mca_coll_han_comm_create_new
Exclude HAN, don't include it.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
(cherry picked from commit 33105b031bbc821a6c5d816c4801d62072347f9c)
2020-11-24 10:11:17 +01:00
Raghu Raja
25161a0f08
Merge pull request #8247 from jsquyres/pr/v4.1.x/4.1.0rc4-ftw
VERSION: 4.1.0rc4
2020-11-23 21:03:05 -08:00
Raghu Raja
38011d3402
Merge pull request #8204 from jsquyres/pr/v4.1.x/fix-warnings
v4.1.x: fix many warnings
2020-11-23 17:28:44 -08:00
Jeff Squyres
772df607cc VERSION: 4.1.0rc4
Release the hounds!

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-11-23 13:40:32 -08:00
Jeff Squyres
f9e2bf7c6b Fix many compiler warnings
Fixes #8195. This PR doesn't fix all the warnings from #8195, but
fixes many of them (e.g., I didn't get the "string might be truncated"
warnings on my Mac).

This is an adaptation of 14aa5fae3c42f14a1c6a259dede93d5ca7ecb82c from
master; it drops some things that aren't relevant here on the v4.1.x
branch and adds a few more warnings fixes that are relevant here on
v4.1.x that aren't relevant on master.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry-picked from 14aa5fae3c42f14a1c6a259dede93d5ca7ecb82c)
2020-11-23 12:43:33 -08:00
Raghu Raja
d09771c5ba
Merge pull request #8241 from ggouaillardet/topic/v4.1.x/libtool_bigsur
v4.1.x: autogen.pl: patch libtool.m4 for OSX Big Sur
2020-11-23 12:08:36 -08:00
Raghu Raja
390045e5b2
Merge pull request #8240 from ggouaillardet/topic/v4.1.x/reproducibility_fixes
v4.1.x: configury reproducibility fixes
2020-11-23 12:07:18 -08:00
Jeff Squyres
2c91509bcb
Merge pull request #8238 from devreal/osc-page-align-v4.1.x
OSC RDMA: put memory for each process into separate pages [4.1.x]
2020-11-23 15:06:38 -05:00
Gilles Gouaillardet
534aeac1f9 autogen.pl: patch libtool.m4 for OSX Big Sur
Thanks FX Coudert for reporting this issue and pointing
to a solution.

Refs. open-mpi/ompi#8218

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

(back-ported from commit open-mpi/ompi@3f45ceda1b)
2020-11-23 09:43:55 +09:00
Gilles Gouaillardet
c28e16633a configury: fix typos
This is a one-off commit for the release branches that fixes
some typos introduced when backporting
open-mpi/ompi@35e7d86eb1

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-11-22 21:07:42 +09:00
Gilles Gouaillardet
33aa6394d9 configury: fix OPAL_GET_VERSION
- fix path to getdate.sh
 - do not prepend "date" to the revision
 - support git worktree

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 930d3c469551eaa4d30b6105226018e0392152d7)
2020-11-22 21:07:42 +09:00
Joseph Schuchart
2e1e9dc9dd OSC RDMA: only touch pages before memory registration, don't fill them
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 52b52b8ebbe82636b65f0adc6a0a40c165eda306)
2020-11-20 17:28:38 +01:00
Joseph Schuchart
de354eae9d OSC RDMA: put memory of each process into separate pages
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit d11ccbada945ad88916052b80cb5b5fcdf742e4a)
2020-11-20 17:27:17 +01:00
Jeff Squyres
4c0c0e9bcb
Merge pull request #8237 from devreal/fix-coll-base-preference-v4.1.x
Fix preference treatment in coll/base [v4.1.x]
2020-11-20 11:16:30 -05:00
Joseph Schuchart
2acf40cc5b coll/han: reduce default segment size for reduce/allreduce to 64k
This has shown to be more effective in achieving overlap
of inter- and intra-node communication and reduces the inital
delay before hitting the network.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
(cherry picked from commit 1cdc85564ed6c771f301c63d6bc6d8c1c8cf4a4c)
2020-11-20 09:18:51 +01:00
Joseph Schuchart
9a202ea81a coll/han: remove references to experimental solo and shared collective components
Also make coll/tuned the default for shared memory communication
as coll/sm has shown performance issues that need investigation.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
(cherry picked from commit 971d58c52454a6edecdbb1a44ebd037a86e69a69)
2020-11-20 09:18:50 +01:00
Joseph Schuchart
bcf70a2840 coll/[sm|han|adapt]: don't disqualify on priority 0
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
(cherry picked from commit 09c2f4af9437accd747e823c591927481c2103ad)
2020-11-20 09:18:49 +01:00
Joseph Schuchart
9f228c9dab coll/base: Fix collective module selection preference treatment
The selectable list is sorted with lowest to highest priority so the
user-defined preferences should be appended to the list.
The preference treatment should also maintain the order provided by the user
(first item has highest priority) so switch the loop order.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
(cherry picked from commit dd54af94508dc9ccee3e589276a9ede62fc8e409)
2020-11-20 09:18:48 +01:00
Jeff Squyres
35f8fbc39d
Merge pull request #8235 from rhc54/cmr41/px
Remove PMIx man page setup
2020-11-19 21:00:58 -05:00
Ralph Castain
0a0a15ab60
Remove PMIx man page setup
There are no manpages in v3.2.
Port of https://github.com/openpmix/openpmix/pull/1930

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 7b11693783429c43cb30475e4b54e691bf79529c)
2020-11-19 16:44:22 -08:00
Raghu Raja
ab530bf4d1
Merge pull request #8228 from jsquyres/pr/pak-lui-v4.1.x-fixup
v4.1.x: oshmem/tools/oshmem_info: fix an issue with fortran keyword when comp…
2020-11-17 15:18:36 -08:00
Pak Lui
870c2d7738 oshmem/tools/oshmem_info: fix an issue with fortran keyword when compiling param.c
Signed-off-by: Pak Lui <pak.lui@amd.com>
(cherry picked from commit 3cdead0d0cd2ec1ac7d87e0bf4bb0f949e6ef132)
2020-11-17 15:17:28 -05:00
Raghu Raja
3d422d1afa
Merge pull request #8224 from devreal/fix-tuned-allgatherv-v4.1.x
COLL TUNED: Use per-rank data size instead of total size for decision [4.1.x]
2020-11-17 09:50:15 -08:00
Joseph Schuchart
b299b491d3 COLL TUNED: Use per-rank data size instead of total size for decision
The total size depends on number of ranks so the usual ranges don't work.
Thus, use the average across all ranks to make a decision.

Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
(cherry picked from commit f670364d764bf7409e03860bf539a0a2884ffab3)
2020-11-17 17:05:42 +01:00
Jeff Squyres
c614c54818
Merge pull request #8216 from rhc54/cmr41/rmps
v4.1.x: Correctly skip the "mpirun" node when launching orted on it
2020-11-16 15:26:34 -05:00
Jeff Squyres
ac2f54f224
Merge pull request #8190 from hoopoepg/topic/pml-ucx-recv-improved-errhandling-v4.1
PML/UCX: improved error processing in MPI_Recv - v4.1
2020-11-16 15:25:34 -05:00