1
1
Граф коммитов

30340 Коммитов

Автор SHA1 Сообщение Дата
Austen Lauria
14785deb3c Fix pgcc18 support.
- pgcc18 defines __GNUC__ similar to Intel compilers. So we must
  check for pgi higher up, or else configury will mistake
  it for gcc.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2020-02-11 15:55:42 -05:00
Nathan Hjelm
5025628651
Merge pull request #7383 from hjelmn/fix_bug_7303_the_rcache_deadlock_v3
rcache/grdma: fix potential deadlock
2020-02-11 08:36:02 -08:00
Nathan Hjelm
14b6f4931f rcache/grdma: fix potential deadlock
This commit fixes a potential deadlock that can occur between the
memory hooks and region registation. This deadlock occurs because
of a hold and wait error between two mutexes. The first mutex is
the VMA lock used to protect internal rcache/grdma structures and
the reader/writer lock in the interval tree.

In the case of the memory hooks a reader lock is obtained on the
interval tree then the VMA lock is obtained to remove the
registration from the LRU. In the case of LRU evictions the VMA
lock is obtained then the writer lock on the interval tree is
obtained. This leads to the deadlock.

To fix the issue the code that evicts from the LRU has been
updated to only invalidate the registration while the VMA lock
is held then remove the registration from the VMA after the
lock is released. This should completely eliminate the above
deadlock.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-10 17:15:36 -07:00
Ralph Castain
c6831c59c8
Merge pull request #7379 from rhc54/topic/up
Add "stop-on-exec" support and update hwloc integration
2020-02-10 13:36:55 -08:00
Ralph Castain
42fe66ff0b
Add "stop-on-exec" support and update hwloc integration
Update PRRTE submodule pointer to track changes in master that impact
OMPI behavior plus provide a new capability

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-10 11:36:18 -08:00
Ralph Castain
12dd62b59a
Merge pull request #7375 from rhc54/topic/up
Update PMIx and PRRTE pointers
2020-02-09 10:17:29 -08:00
Ralph Castain
2f7338f682
Update PMIx and PRRTE pointers
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-09 07:43:15 -08:00
Yossi Itigin
f2c5b2d778 Merge pull request #7373 from hoopoepg/topic/oshmem-inc-max-segments
OSHMEM/SEGMENTS: increase number of max segments
2020-02-09 11:27:21 +02:00
Ralph Castain
3224329976
Merge pull request #7202 from rhc54/topic/pmixstat
Remove ORTE project
2020-02-08 12:25:01 -08:00
Gilles Gouaillardet
174e967dbc
Remove ORTE project
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.

remove orte: misc fixes

 - UCX fixes
 - VPATH issue
 - oshmem fixes
 - remove useless definition
 - Add PRRTE submodule
 - Get autogen.pl to traverse PRRTE submodule
 - Remove stale orcm reference
 - Configure embedded PRRTE
 - Correctly pass the prefix to PRRTE
 - Correctly set the OMPI_WANT_PRRTE am_conditional
 - Move prrte configuration to the end of OMPI's configure.ac
 - Make mpirun a symlink to prun, when available
 - Fix makedist with --no-orte/--no-prrte option
 - Add a `--no-prrte` option which is the same as the legacy
   `--no-orte` option.
 - Remove embedded PMIx tarball. Replace it with new submodule
   pointing to OpenPMIx master repo's master branch
 - Some cleanup in PRRTE integration and add config summary entry
 - Correctly set the hostname
 - Fix locality
 - Fix singleton operations
 - Fix support for "tune" and "am" options

Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-02-07 18:20:06 -08:00
Sergey Oblomov
f742f289ea OSHMEM/SEGMENTS: increase number of max segments
- increase number of max segments to allow application be launched
  on some Ubuntu configurations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2020-02-07 15:20:58 +02:00
Jeff Squyres
dc0d6a5e1b
Merge pull request #7366 from bgoglin/master
Fix hwloc <v2.0 compilation issue
2020-02-05 13:33:01 -05:00
Brice Goglin
329d4451a6 opal/hwloc: remove some unused variables when building with hwloc < 1.7
Refs #7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2020-02-04 22:56:46 +01:00
Brice Goglin
907ad854b4 hwloc/base: fix opal proc locality wrt to NUMA nodes on hwloc 1.11
Build was broken by mistake in commit d40662edc41a5a4d09ae690b640cfdeeb24e15a1

Fixes #7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2020-02-04 22:56:46 +01:00
Austen Lauria
395e2c9d8f
Merge pull request #7364 from awlauria/fix_compile
Protect use of _Static_assert().
2020-02-04 15:24:07 -05:00
Austen Lauria
824dbcbcf3 Protect use of _Static_assert().
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2020-02-04 13:46:58 -05:00
Jeff Squyres
827e9f66af
Merge pull request #7265 from jsquyres/pr/fortran-you-win-again
Fortran fixes
2020-02-04 07:03:02 -05:00
Yossi Itigin
1047e28a28
Merge pull request #7353 from dmitrygx/topic/spml/ucx
SPML/UCX: Fix compilation warnings with GCC
2020-02-04 10:38:37 +02:00
Ralph Castain
fd2de25609
Merge pull request #7358 from rhc54/topic/opt
Always consider retrieval of HOSTNAME to be optional
2020-02-03 18:59:36 -08:00
Ralph Castain
2ee1aaf08d
Always consider retrieval of HOSTNAME to be optional
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-03 16:03:05 -08:00
Jeff Squyres
ab398f4b9a fortran: ensure not to use [AM_]CPPFLAGS
Automake's Fortran compilation rules inexplicably use CPPFLAGS and
AM_CPPFLAGS.  Unfortunately, this can cause problems in some cases
(e.g., picking up already-installed mpi.mod in a system-default
include search path).

So in relevant module-using Fortran compilation Makefile.am's, zero
out CPPFLAGS and AM_CPPFLAGS.

This has a side-effect of requiring that we compile the one .c file in
the F08 library in a new, separate subdirectory (with its own
Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-02-03 14:45:32 -08:00
Jeff Squyres
f4a47a5a8e fortran: remove useless CPPFLAGS assignment
These -D's are for C compilation, not Fortran compilation.  Remove
this useless statement.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-02-03 14:45:25 -08:00
Howard Pritchard
d2b68e6ecd
Merge pull request #7201 from bgoglin/master
hwloc/base: fix opal proc locality wrt to NUMA nodes on hwloc 2.0
2020-02-03 11:23:42 -07:00
Howard Pritchard
9916b9124e
Merge pull request #7345 from hppritcha/topic/follow_up_pr7268
PR 7268 follow-up
2020-02-03 10:42:47 -07:00
Dmitry Gladkov
b6a658b24a SPML/UCX: Fix compilation warnings with GCC
Signed-off-by: Dmitry Gladkov <dmitrygla@mellanox.com>
2020-02-03 05:11:49 -08:00
Jeff Squyres
dfbaf23241
Merge pull request #7210 from MaskRay/mpi_fortran_argv_null_
Fix Fortran st_size fields of mpi_fortran_argv_null_, mpi_fortran_weights_empty_, mpi_fortran_unweighted_, mpi_fortran_errcodes_ignore_, and mpi_fortran_argvs_null_
2020-02-02 16:56:44 -05:00
Fangrui Song
5609268e90 Make C and Fortran types for MPI sentinels agree in size
Fix the C types for the following:

* MPI_UNWEIGHTED
* MPI_WEIGHTS_EMPTY
* MPI_ARGV_NULL
* MPI_ARGVS_NULL
* MPI_ERRCODES_IGNORE

There is lengthy discussion on
https://github.com/open-mpi/ompi/pull/7210 describing the issue; the
gist of it is that the C and Fortran types for several MPI global
sentenial values should agree (specifically: their sizes must(**)
agree).  We erroneously had several of these array-like sentinel
values be "array-like" values in C.  E.g., MPI_ERRCODES_IGNORE was an
(int *) in C while its corresponding Fortran type was "integer,
dimension(1)".  On a 64 bit platform, this resulted in C expecting the
symbol size to be sizeof(int*)==8 while Fortran expected the symbol
size to be sizeof(INTEGER, DIMENSION(1))==4.

That is incorrect -- the corresponding C type needed to be (int).
Then both C and Fortran expect the size of the symbol to be the same.

(**) NOTE: This code has been wrong for years.  This mismatch of types
typically worked because, due to Fortran's call-by-reference
semantics, Open MPI was comparing the *addresses* of these instances,
not their *types* (or sizes) -- so even if C expected the size of the
symbol to be X and Fortran expected the size of the symbol to be Y
(where X!=Y), all we really checked at run time was that the addresses
of the symbols were the same.  But it caused linker warning messages,
and even caused errors in some cases.

Specifically: due to a GNU ld bug
(https://sourceware.org/bugzilla/show_bug.cgi?id=25236), the 5 common
symbols are incorrectly versioned VER_NDX_LOCAL because their
definitions in Fortran sources have smaller st_size than those in
libmpi.so.

This makes the Fortran library not linkable with lld in distributions
that ship openmpi built with -Wl,--version-script
(https://bugs.llvm.org/show_bug.cgi?id=43748):

  % mpifort -fuse-ld=lld /dev/null
  ld.lld: error: corrupt input file: version definition index 0 for symbol
  mpi_fortran_argv_null_ is out of bounds
  >>> defined in /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_usempif08.so
  ...

If we fix the C and Fortran symbols to actually be the same size, the
problem goes away and the GNU ld bug does not come into play.

This commit also fixes a minor issue that MPI_UNWEIGHTED and
MPI_WEIGHTS_EMPTY were not declared as Fortran arrays (not fully fixed
by commit 107c0073dd).

Fixes open-mpi/ompi#7209

Signed-off-by: Fangrui Song <i@maskray.me>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-02-02 04:20:39 -08:00
Jeff Squyres
48c04cd637
Merge pull request #7350 from artemry-mlnx/artemry/enable_mlnx_ci_for_release_branches
Enabled Mellanox CI for release branches (changes for master branch).
2020-01-31 15:58:52 -05:00
Artem Ryabov
67b70ae9bd Enabled Mellanox CI for release branches.
Signed-off-by: Artem Ryabov <artemry@mellanox.com>
2020-01-31 21:57:03 +03:00
Howard Pritchard
5354b9b41c PR 7268 follow-up
Forgot to include a fix for the fortran test used to check if
new dtags is supported.

Related to #7268

This patch is already included on v4.0.x branch.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2020-01-28 12:34:49 -07:00
Austen Lauria
1275766037
Merge pull request #7207 from devreal/remove_shmem_seg_hdr
Remove unused opal_shmem_seg_hdr_t to retain alignment
2020-01-28 13:57:55 -05:00
Aurelien Bouteiller
9f4365fef6
Merge pull request #6783 from abouteiller/export/macos-epipe
Prevent EPIPE on OSX.
2020-01-28 11:18:46 -05:00
Austen Lauria
3c0fa537c2
Merge pull request #7113 from devreal/osc_rdma_cas
RDMA osc: perform CAS in shared memory if possible
2020-01-28 10:59:17 -05:00
Brian Barrett
d768d82231
Merge pull request #7167 from wckzhang/reachable_netlinks
Reachable documentation change
2020-01-27 15:39:14 -08:00
Brian Barrett
fc8c7a5869
Merge pull request #7134 from wckzhang/btl_tcp_interface_match
btl tcp: Use reachability and graph solving for global interface matching
2020-01-27 15:38:49 -08:00
Austen Lauria
10f6a77640
Merge pull request #7315 from abouteiller/export/tcp_errors_v2
Handle error cases in TCP BTL (v2)
2020-01-27 17:03:07 -05:00
Aurelien Bouteiller
76021e35ee
Adding a description of the FIN message for future reference.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-01-27 13:32:34 -05:00
Aurelien Bouteiller
395fcc4253
Disable inband PML error reporting during MPI Finalize as it interferes with the Finalize process.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-01-27 13:32:34 -05:00
Aurelien Bouteiller
93846fd0ee
Remove the pending event when socket is TCP_FAILED
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-01-27 13:32:34 -05:00
Aurelien Bouteiller
6b3be224d4
Adding a FIN message to differentiate normal TCP closing from failures
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-01-27 13:32:34 -05:00
Aurelien Bouteiller
b7be64482a
Revert "Revert "Handle error cases in TCP BTL""
This reverts commit 5162011428.

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-01-27 13:31:06 -05:00
Austen Lauria
969eb0286c
Merge pull request #6970 from ggouaillardet/topic/cuda_no_orte
coll/cuda: remove unnecessary references to ORTE
2020-01-27 13:12:32 -05:00
Austen Lauria
4f6978466d
Merge pull request #7284 from bosilca/fix/monitoring_registration
Minor cleanup in the monitoring PML.
2020-01-27 13:01:30 -05:00
Josh Hursey
01a671349b
Merge pull request #7337 from jjhursey/no-ssh-core
plm/rsh: Fix segv on missing agent.
2020-01-27 10:32:41 -06:00
Joshua Hursey
62d0058738 plm/rsh: Fix segv on missing agent.
* Additionally, fixes the `NULL` option to `OMPI_MCA_plm_rsh_agent`
   would would also lead to a segv. Now it operates as intended by
   disqualifying the `rsh` component and falling back onto the `isolated`
   component.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-01-24 14:02:53 -05:00
Howard Pritchard
e9a54e8e0e
Merge pull request #7268 from hppritcha/topic/fix_6539_master
make mpifort obey disable-wrapper-runpath
2020-01-23 16:37:37 -07:00
Artem Polyakov
548ed56bef
Merge pull request #7332 from hoopoepg/topic/fixed-coverity-issues
SPML/UCX: fixed coverity issues
2020-01-23 11:54:00 -08:00
Jeff Squyres
f16d2903aa
Merge pull request #7328 from DDNStorage/ime_fs_missing_header
fs/ime: fix compilation errors due to missing header inclusion
2020-01-23 14:13:59 -05:00
Sergey Oblomov
8543860689 SPML/UCX: fixed coverity issues
- fixed sizeof(char***) by variable datatype
- fixed resorce leak in proc_add

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2020-01-23 14:02:08 +02:00
Artem Polyakov
e04575e8b6
Merge pull request #7274 from janjust/master-oshmem-perf-multi-worker
oshmem/ucx: multi-threaded spml ucx performance improvements
2020-01-22 14:24:54 -08:00