1
1

30625 Коммитов

Автор SHA1 Сообщение Дата
Ralph Castain
82c71fae78
Update PRRTE and PMIx pointers
- Fix VPATH installs
- Protect against NULL home directories

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-17 21:45:22 -08:00
Nathan Hjelm
73469cb847
Merge pull request #7416 from hjelmn/lets_crash_on_debug_builds_if_the_user_home_directory_is_not_available
opal/mca: check if the user home directory is NULL
2020-02-17 17:50:07 -07:00
Nathan Hjelm
8197efa021 opal/mca: check if the user home directory is NULL
This commit fixes an issue in the MCA base variable system. The
code was retrieving the user home directory (from HOME) and
attempting to use it to build a search path for config files.
In this case user-level configuration directories have been
enabled so the appropriate thing to do is to print an error
message and return. This commit makes that change. It does not
ensure that HOME is set correctly.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-17 14:51:46 -08:00
Nathan Hjelm
eeb3d7f845
Merge pull request #7387 from hjelmn/osc_rdma_allow_overlapping_registration_regions_and_return_the_correct_error_code_when_regions_overlap
osc/rdma: modify attach to check for region overlap
2020-02-17 14:26:53 -07:00
Jeff Squyres
3b7fd5ad0f
Merge pull request #7404 from jsquyres/pr/m4-holy-hell
opal_setup_cli.m4: do not escape $
2020-02-17 13:05:55 -08:00
Nathan Hjelm
54c8233f4f osc/rdma: bump the default max dynamic attachments to 64
This commit increaes the osc_rdma_max_attach variable from 32
to 64. The new default is kept low due to the small number
of registration resources on some systems (Cray Aries). A
larger max attachement value can be set by the user on other
systems.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-16 17:09:20 -08:00
Nathan Hjelm
6649aef8bd osc/rdma: modify attach to check for region overlap
This commit addresses two issues in osc/rdma:

 1) It is erroneous to attach regions that overlap. This was being
    allowed but the standard does not allow overlapping attachments.

 2) Overlapping registration regions (4k alignment of attachments)
    appear to be allowed. Add attachment bases to the bookeeping
    structure so we can keep better track of what can be detached.

It is possible that the standard did not intend to allow #2. If that
is the case then #2 should fail in the same way as #1. There should
be no technical reason to disallow #2 at this time.

References #7384

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-16 17:09:06 -08:00
Ralph Castain
731b7e89e3
Merge pull request #7408 from rhc54/topic/myup
Update PRRTE and PMIx
2020-02-16 07:32:57 -08:00
Ralph Castain
274fba3126
Update PRRTE and PMIx
Correct platform file support
Fix configure cli capture to silence warning

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-15 19:38:34 -08:00
Ralph Castain
796886206b
Merge pull request #7406 from rhc54/topic/p2
Enable build against PMIx v2.2 without internal PRRTE
2020-02-15 18:58:25 -08:00
Ralph Castain
edaf9160ae
Enable build against PMIx v2.2 without internal PRRTE
If you autogen.pl --without-prrte, we wouldn't configure or build PRRTE
support. However, configuring with --disable-internal-rte wasn't working
as it was being ignored. This led to some false errors when compiling
with an earlier PMIx v2.2 release.

That said, there were a couple of places that needed protection against
PMIx v2.2.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-15 16:28:16 -08:00
Jeff Squyres
04c50c668e opal_setup_cli.m4: do not escape $
We do not want to escape $, because the resulting quoted string ends
up in C code, and "\$" is not recognized by printf (and some compilers
warn about it).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-02-15 11:50:16 -08:00
Ralph Castain
8eebeee9bf
Merge pull request #7403 from rhc54/topic/plt
Provide hooks for PRRTE and PMIx platform files
2020-02-15 10:59:19 -08:00
Ralph Castain
344346f27e
Provide hooks for PRRTE and PMIx platform files
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-15 08:40:56 -08:00
Ralph Castain
4bdb5a8103
Merge pull request #7402 from rhc54/topic/up3
Resolve the PMIx v3 incompatibility
2020-02-15 06:05:08 -08:00
Ralph Castain
133e8eba22
Resolve the PMIx v3 incompatibility
Fix a couple of spots in OMPI to resolve warnings. The one in comm_cid
in particular may be responsible for some/all of the comm_spawn issues
as it was passing an incorrect pointer to a macro, thus causing memory
corruption.

Update PRRTE and PMIx to deal with v3/v4 differences.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-14 21:01:10 -08:00
Jeff Squyres
77df94d688
Merge pull request #7398 from yanagibashi/pr/fix-info-key-object
osc/sm: fix typo and minor correction
2020-02-14 18:29:05 -05:00
Tsubasa Yanagibashi
a07a83d189 osc/sm: fix typo and minor correction
- fix a typo `alloc_shared_contig` to `alloc_shared_noncontig`
- correct the value of `blocking_fence`

Signed-off-by: Tsubasa Yanagibashi <fj2505dt@aa.jp.fujitsu.com>
2020-02-14 17:52:56 +09:00
Ralph Castain
cb0bc201f3
Merge pull request #7397 from rhc54/topic/up
Update PRRTE
2020-02-13 19:16:09 -08:00
Ralph Castain
3aeccee026
Update PRRTE
- fix prefix handling
- fix ALPS compile issues

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-13 17:06:41 -08:00
Ralph Castain
79e54e940d
Merge pull request #7395 from rhc54/topic/fixfast
Fix potential hangs for fast-fail jobs
2020-02-13 12:19:23 -08:00
Ralph Castain
e0141f10e6
Fix potential hangs for fast-fail jobs
When a job fails very quickly, it is possible that the spawning tool
won't get notified that the spawn completed prior to be told that the
job terminated. This can cause the tool to "hang" in PMIx_Spawn. Ensure
that PRRTE handles this case by guaranteeing we notify the spawner.

Track both PRRTE and PMIx masters as both have changed, though only the
PRRTE one is involved in this particular fix.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-13 10:59:18 -08:00
Ralph Castain
5cc5036fd7
Merge pull request #7386 from rhc54/topic/up
Update PRRTE
2020-02-12 13:26:44 -08:00
Ralph Castain
1a5647ddbe
Update PRRTE
- Ensure we accurately handle node name aliases
- Apply the local launch environ to apps prior to spawn
- Add error check if PMIx_Spawn fails
- Fix compiler warning
- Fix PGI vendor check
- Prevent mpirun from hanging if prte segfaults
- Fix absolute/relative path names to ensure "prte" and
  "prted" are taken from same distribution as "mpirun"

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-12 12:05:41 -08:00
Josh Hursey
ad83a1609d
Merge pull request #7385 from awlauria/pgcc18_support
Fix pgcc18 support.
2020-02-12 09:04:07 -06:00
Austen Lauria
14785deb3c Fix pgcc18 support.
- pgcc18 defines __GNUC__ similar to Intel compilers. So we must
  check for pgi higher up, or else configury will mistake
  it for gcc.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2020-02-11 15:55:42 -05:00
Nathan Hjelm
5025628651
Merge pull request #7383 from hjelmn/fix_bug_7303_the_rcache_deadlock_v3
rcache/grdma: fix potential deadlock
2020-02-11 08:36:02 -08:00
Nathan Hjelm
14b6f4931f rcache/grdma: fix potential deadlock
This commit fixes a potential deadlock that can occur between the
memory hooks and region registation. This deadlock occurs because
of a hold and wait error between two mutexes. The first mutex is
the VMA lock used to protect internal rcache/grdma structures and
the reader/writer lock in the interval tree.

In the case of the memory hooks a reader lock is obtained on the
interval tree then the VMA lock is obtained to remove the
registration from the LRU. In the case of LRU evictions the VMA
lock is obtained then the writer lock on the interval tree is
obtained. This leads to the deadlock.

To fix the issue the code that evicts from the LRU has been
updated to only invalidate the registration while the VMA lock
is held then remove the registration from the VMA after the
lock is released. This should completely eliminate the above
deadlock.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-02-10 17:15:36 -07:00
Ralph Castain
c6831c59c8
Merge pull request #7379 from rhc54/topic/up
Add "stop-on-exec" support and update hwloc integration
2020-02-10 13:36:55 -08:00
Ralph Castain
42fe66ff0b
Add "stop-on-exec" support and update hwloc integration
Update PRRTE submodule pointer to track changes in master that impact
OMPI behavior plus provide a new capability

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-10 11:36:18 -08:00
Ralph Castain
12dd62b59a
Merge pull request #7375 from rhc54/topic/up
Update PMIx and PRRTE pointers
2020-02-09 10:17:29 -08:00
Ralph Castain
2f7338f682
Update PMIx and PRRTE pointers
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-09 07:43:15 -08:00
Yossi Itigin
f2c5b2d778 Merge pull request #7373 from hoopoepg/topic/oshmem-inc-max-segments
OSHMEM/SEGMENTS: increase number of max segments
2020-02-09 11:27:21 +02:00
Ralph Castain
3224329976
Merge pull request #7202 from rhc54/topic/pmixstat
Remove ORTE project
2020-02-08 12:25:01 -08:00
Gilles Gouaillardet
174e967dbc
Remove ORTE project
Will be replaced by PRRTE. Ensure that OMPI and OPAL layers build
without reference to ORTE. Setup opal/pmix framework to be static.
Remove support for all PMI-1 and PMI-2 libraries. Add support for
"external" pmix component as well as internal v4 one.

remove orte: misc fixes

 - UCX fixes
 - VPATH issue
 - oshmem fixes
 - remove useless definition
 - Add PRRTE submodule
 - Get autogen.pl to traverse PRRTE submodule
 - Remove stale orcm reference
 - Configure embedded PRRTE
 - Correctly pass the prefix to PRRTE
 - Correctly set the OMPI_WANT_PRRTE am_conditional
 - Move prrte configuration to the end of OMPI's configure.ac
 - Make mpirun a symlink to prun, when available
 - Fix makedist with --no-orte/--no-prrte option
 - Add a `--no-prrte` option which is the same as the legacy
   `--no-orte` option.
 - Remove embedded PMIx tarball. Replace it with new submodule
   pointing to OpenPMIx master repo's master branch
 - Some cleanup in PRRTE integration and add config summary entry
 - Correctly set the hostname
 - Fix locality
 - Fix singleton operations
 - Fix support for "tune" and "am" options

Signed-off-by: Ralph Castain <rhc@pmix.org>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2020-02-07 18:20:06 -08:00
Sergey Oblomov
f742f289ea OSHMEM/SEGMENTS: increase number of max segments
- increase number of max segments to allow application be launched
  on some Ubuntu configurations

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2020-02-07 15:20:58 +02:00
Jeff Squyres
dc0d6a5e1b
Merge pull request #7366 from bgoglin/master
Fix hwloc <v2.0 compilation issue
2020-02-05 13:33:01 -05:00
Brice Goglin
329d4451a6 opal/hwloc: remove some unused variables when building with hwloc < 1.7
Refs #7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2020-02-04 22:56:46 +01:00
Brice Goglin
907ad854b4 hwloc/base: fix opal proc locality wrt to NUMA nodes on hwloc 1.11
Build was broken by mistake in commit d40662edc41a5a4d09ae690b640cfdeeb24e15a1

Fixes #7362

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
2020-02-04 22:56:46 +01:00
Austen Lauria
395e2c9d8f
Merge pull request #7364 from awlauria/fix_compile
Protect use of _Static_assert().
2020-02-04 15:24:07 -05:00
Austen Lauria
824dbcbcf3 Protect use of _Static_assert().
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2020-02-04 13:46:58 -05:00
Jeff Squyres
827e9f66af
Merge pull request #7265 from jsquyres/pr/fortran-you-win-again
Fortran fixes
2020-02-04 07:03:02 -05:00
Yossi Itigin
1047e28a28
Merge pull request #7353 from dmitrygx/topic/spml/ucx
SPML/UCX: Fix compilation warnings with GCC
2020-02-04 10:38:37 +02:00
Ralph Castain
fd2de25609
Merge pull request #7358 from rhc54/topic/opt
Always consider retrieval of HOSTNAME to be optional
2020-02-03 18:59:36 -08:00
Ralph Castain
2ee1aaf08d
Always consider retrieval of HOSTNAME to be optional
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-02-03 16:03:05 -08:00
Jeff Squyres
ab398f4b9a fortran: ensure not to use [AM_]CPPFLAGS
Automake's Fortran compilation rules inexplicably use CPPFLAGS and
AM_CPPFLAGS.  Unfortunately, this can cause problems in some cases
(e.g., picking up already-installed mpi.mod in a system-default
include search path).

So in relevant module-using Fortran compilation Makefile.am's, zero
out CPPFLAGS and AM_CPPFLAGS.

This has a side-effect of requiring that we compile the one .c file in
the F08 library in a new, separate subdirectory (with its own
Makefile.am that does _not_ have CPPFLAGS/AM_CPPFLAGS zeroed out).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2020-02-03 14:45:32 -08:00
Jeff Squyres
f4a47a5a8e fortran: remove useless CPPFLAGS assignment
These -D's are for C compilation, not Fortran compilation.  Remove
this useless statement.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-02-03 14:45:25 -08:00
Howard Pritchard
d2b68e6ecd
Merge pull request #7201 from bgoglin/master
hwloc/base: fix opal proc locality wrt to NUMA nodes on hwloc 2.0
2020-02-03 11:23:42 -07:00
Howard Pritchard
9916b9124e
Merge pull request #7345 from hppritcha/topic/follow_up_pr7268
PR 7268 follow-up
2020-02-03 10:42:47 -07:00
Dmitry Gladkov
b6a658b24a SPML/UCX: Fix compilation warnings with GCC
Signed-off-by: Dmitry Gladkov <dmitrygla@mellanox.com>
2020-02-03 05:11:49 -08:00