1
1
Граф коммитов

5253 Коммитов

Автор SHA1 Сообщение Дата
Geoff Paulsen
954692f06e
Merge pull request #5614 from karasevb/v4.0.x_fix_hwloc_numa_obj
v4.0.x: Fixed the NUMA obj detection for hwloc ver >= 2.0.0
2018-09-07 14:49:26 -05:00
Nathan Hjelm
2fb1a5e1b2 btl/uct: add missing opal_mem_hooks_unregister_release call
This commit fixes a bug when using the UCT btl with the UCX memory
hooks disabled. We were misssing a call to
opal_mem_hooks_unregister_release to remove the btl memory hook
callback.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 36c206d2d6)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-05 14:02:58 -06:00
Howard Pritchard
b6dafb6b90
Merge pull request #5636 from jsquyres/pr/v4.0.x/verbs-usnic-configury-moar-strictness
v4.0.x: make common/verbs-usnic actually check if it can compile
2018-09-04 09:13:12 -06:00
Geoff Paulsen
3282c61048
Merge pull request #5625 from hoopoepg/topic/optimize-blocked-calls-v4.0
PML/UCX: blocked calls optimizations - v4.0
2018-08-31 14:11:11 -05:00
Geoff Paulsen
334748753c
Merge pull request #5626 from hoopoepg/topic/opal-mem-hooks-syno-v4.0
MCA/COMMON/UCX: added synonim to opal_mem_hook variable - v4.0
2018-08-31 14:09:14 -05:00
Geoff Paulsen
b2daa0001f
Merge pull request #5565 from rhc54/cmr40/pmix301
Update to PMIx 3.0.1
2018-08-31 13:58:41 -05:00
Jeff Squyres
3e842348d1 common/verbs-usnic: check that it will actually compile
If someone specifies --with-verbs-usnic, actually do a configury check
to ensure that it will compile (vs. assuming that it will compile if
someone asks for it).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 05e5f61fe1)
2018-08-30 14:56:37 -07:00
Sergey Oblomov
028bcb8a73 MCA/COMMON/UCX: added synonim to opal_mem_hook variable
- added synonim to common ucx variables to allow
  to print it in opal_info -a

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e00f7a68ba)
2018-08-29 15:17:00 +03:00
Sergey Oblomov
9215eb9a3b PML/UCX: blocked calls optimizations
- refactoring of opal/UCX progress calls
- added UCX progress priority

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit b0f87f2235)
2018-08-29 14:38:22 +03:00
Boris Karasev
31ca3842da Fixed copyrights of prev commit.
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit beb0697f24)
2018-08-28 12:29:16 +03:00
Boris Karasev
d995fb1b3f Fixed the NUMA obj detection for hwloc ver >= 2.0.0
Since version hwloc 2.0.0 has a new organization of NUMA nodes on the
topology tree. This commit adds the detection of local NUMA object for
hwloc => 2.0.0, which fixes the procs bindings policy for rmaps mindist
component.

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit e5291ccc34)
2018-08-28 12:29:08 +03:00
Zoltán Mizsei
b2628129fd fcntl include bugfix
Signed-off-by: Zoltán Mizsei <zmizsei@extrowerk.com>

(cherry picked from commit open-mpi/ompi@ac3f8a16ed)
2018-08-27 09:48:54 +09:00
Jeff Squyres
4fd51a1563
Merge pull request #5592 from hjelmn/v4.0.x_sc_emu
btl/vader: clean up debuging and squash warning
2018-08-23 17:15:09 -07:00
Nathan Hjelm
eba44d3709 btl/vader: clean up debuging and squash warning
References #5512

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit c74cf666a9)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-23 15:53:43 -06:00
Howard Pritchard
c2cc336135
Merge pull request #5486 from rhc54/cmr40/maps
Cleanup pmix selection and map-by modifiers
2018-08-22 04:40:00 -04:00
Ralph Castain
e27e945d9a Complete job control integration
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-08-20 16:08:54 -07:00
Ralph Castain
3eef3d1d8f Update to PMIx 3.0.1
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-08-20 14:00:41 -07:00
Boris Karasev
8873d901e8 pmix: added check for pmix fence status
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit 57683366ca)

Conflicts:
	opal/mca/common/ucx/common_ucx.c
	opal/mca/common/ucx/common_ucx.h

Modified:
	ompi/mca/pml/ucx/pml_ucx.c
	oshmem/mca/spml/ucx/spml_ucx.c
2018-08-17 21:33:50 +06:00
Geoff Paulsen
8483eb4bf7
Merge pull request #5471 from hjelmn/v4.0.x_uct_btl_fix
btl/uct: fix compile warnings/errors
2018-08-15 16:30:40 -05:00
Geoff Paulsen
98bd571cc8
Merge pull request #5472 from ggouaillardet/topic/v4.0.x/prefer-externals
v4.0.x: Prefer external hwloc and libevent
2018-08-15 16:27:33 -05:00
Nathan Hjelm
b4f80e4e36 btl/vader: move memory barrier to where it belongs
The write memory barrier was intended to precede setting a fast-box
header but instead follows it. This commit moves the memory barrier to
the intended location.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit dca3516765)
2018-08-14 09:19:48 -07:00
Jeff Squyres
72e5766a56 hwloc201/configure.m4: make it safe when used with hwloc:external
The Autoconf AC_CONFIG_* macros can only be instantiated exacly once
for any given file, *and* they must be in a code execution path at run
time for the target file to be generated at the end of configure.

For example, if you want to generate file ABC at the end of configure,
you must invoke the AC_CONFIG_FILES(ABC) macro in a code path that
will get executed when configure is run.

That's pretty straightforward.

What's not straightforward is two corner cases:

1. You cannot invoke the AC_CONFIG_FILES(ABC) macro for the same file
   more than once.  If you do, autoreconf will fail (even before you
   can run configure).
2. If AC_CONFIG_FILES(ABC) is not in a code path that is executed by
   configure, the file ABC is not registered properly, and ABC will
   not be generated at the end of configure.

This applies to hwloc because hwloc's HWLOC_SETUP_CORE macro calls
both AC_CONFIG_FILES and AC_CONFIG_HEADER to setup its Makefiles
(etc.) so that targets like "make distclean" and "make distcheck" will
work properly.  Hence, we *have* to invoke HWLOC_SETUP_CORE.

However, the MCA_opal_hwloc_hwloc201_CONFIG macro has a few side
effects.  It would be nice to do able to do something like this:

```
    if hwloc:extern is going to be used:
        Invoke minimal HWLOC_SETUP_CORE (with no side effects)
    else
        Invoke full HWLOC_SETUP_CORE (with side effects)
    fi
```

But we can't, because autoreconf will detect that AC_CONFIG_FILES has
been invoked on the same files more than once (regardless of whether
those code paths will be executed at run time or not).  Kaboom.

Similarly, we can't do this:

```
    if hwloc:extern is not going to be used:
        Invoke full HWLOC_SETUP_CORE (with side effects)
    fi
```

Because then hwloc's AC_CONFIG_FILES won't be registered properly when
hwloc:external *is* used (i.e., when the HWLOC_SETUP_CORE macro is not
in a code path that is executed at run time), and targets like "make
distclean" will fail because hwloc's Makefiles won't have been setup.
Kaboom.

But remember that the hwloc framework is a bit special: there will
only ever be 2 comoponents: external and internal.  External is
guaranteed to be configured first because of its priority.  So the
internal component (i.e., this component) immediately knows if it is
going to be used or not based on whether the external component
configuration succeeded or failed.

Specifically: regardless of whether the internal component (i.e., this
component) is going to be used, we have to invoke HWLOC_SETUP_CORE.
But we can manage the side effects: allow the side effects when
this/internal component is going to be used, and avoid the side
effects when this/internal component is not going to be used.

This is a little less clean than I would have liked, but because of
Autoconf's oddity about its AC_CONFIG_* macros, this is the only
solution I could come up with.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 01e4570af7)
2018-08-11 12:00:35 -07:00
Jeff Squyres
714f203985 libevent2022/configure.m4: always invoke sub-configure
In order to make "make distclean" (and friends) work, we need to
*always* invoke the embedded configure script -- even if we know that
we're not going to use this component.

But in cases where we know we're not going to use this component, we
also need to avoid the side effects of the code path that is used when
we *do* want to use this component.  So split the two possibilities
into two different macros:

1. MCA_opal_event_libevent2022_FAKE_CONFIG: which does almost nothing
   except invoke the underlying "configure" script.
2. MCA_opal_event_libevent2022_REAL_CONFIG: which does all the real
   work (including invoking the underlying "configure" script).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 69aa46e167)
2018-08-11 12:00:35 -07:00
Jeff Squyres
5c5246f655 libevent2022/configure.m4: trivial cleanup
Put argument to AM_CONDITIONAL inside [].  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 80df3f040b)
2018-08-11 12:00:35 -07:00
Jeff Squyres
63d68ded48 libevent2022/configure.m4: minor comment cleanup
Change # -> dnl.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 17aa64e438)
2018-08-11 12:00:35 -07:00
Jeff Squyres
6cb3d61dd1 libevent2022: only configure if event:external fails
We know that event:external will be configured first (because of its
priority).  Take advantage of that here in libevent2022 by having it
refuse to configure / politely fail if event:external succeeded.

Also print out some additional lines in configure output indicating
what is going on (i.e., event:external succeeded, so this component
will be skipped, or event:external failed, so this component will be
used).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit b063cb6b0f)
2018-08-09 06:47:08 -07:00
Jeff Squyres
eca16720de hwloc201: only configure if hwloc:external fails
We know that hwloc:external will be configured first (because of its
priority).  Take advantage of that here in hwloc201 by having it
refuse to configure / politely fail if hwloc:external succeeded.

Also print out some additional lines in configure output indicating
what is going on (i.e., hwloc:external succeeded, so this component
will be skipped, or hwloc:external failed, so this component will be
used).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 4e5f432786)
2018-08-09 06:47:08 -07:00
Ralph Castain
0cdf49ed8a Fix typo
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit f7a537cf04)
2018-07-25 21:37:03 -07:00
Ralph Castain
511319c316 Fix the multiple pe/proc option
Things got a little out of whack and we weren't actually processing the map-by modifiers, plus an error crept into the display of the binding report. So clean those up.

Thanks to @tonyreina for the error report

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit bcdb1f45ac)
2018-07-25 19:55:28 -07:00
Ralph Castain
04054d63eb Cleanup pmix selection check
Allow for versions > 3

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 55cefedf9b)
2018-07-25 19:55:18 -07:00
Ralph Castain
2c2f9b8169 Leave opal_event_external_support exposed as global var
Signed-off-by: Ralph Castain <rhc@open-mpi.org>

(cherry picked from commit open-mpi/ompi@5cab823979)
2018-07-24 09:53:00 +09:00
Jeff Squyres
1ea021933f event/external: prefer external event component
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

(cherry picked from commit open-mpi/ompi@a70ecf5267)
2018-07-24 09:52:58 +09:00
Jeff Squyres
6f5a453492 event: trivial comment change
Switch from #-style to dnl-style.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

(cherry picked from commit open-mpi/ompi@83e4a45a9f)
2018-07-24 09:52:56 +09:00
Gilles Gouaillardet
aa7a4d0f6f hwloc: prefer external hwloc component
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

(cherry picked from commit open-mpi/ompi@ce2c9fffd4)
2018-07-24 09:52:53 +09:00
Nathan Hjelm
b6bd3d33f1 btl/uct: fix compile warnings/errors
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 47ed8e8830)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-23 14:05:17 -06:00
Ralph Castain
508c3f391f Default to internal PMIx if newer than external
Per https://github.com/open-mpi/ompi/issues/5031, if the user didn't specify a particular PMIx installation, then default back to the internal version if it is newer than the discovered external one. PMIx doesn't yet provide a full signature so we have to just get as close as possible for now.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 1e6aaf7f22)
2018-07-19 11:59:17 -07:00
Howard Pritchard
4447738098
Merge pull request #5414 from hppritcha/topic/iwarp_only_by_default
btl/openib: only look for iwarp/roce by default
2018-07-17 20:08:00 -06:00
Howard Pritchard
bc8134dae1
Merge pull request #5448 from thananon/ofi_context
btl/ofi: Added FI_CONTEXT as requirement.
2018-07-17 19:51:13 -06:00
Howard Pritchard
6818272392 btl/openib: only look for iwarp/roce by default
Due to decreasing support by vendors/other orgs for the OpenIB BTL,
only look for iWarp/RoCE devices by default.  Allow IB HCAs
with ports configured for ethernet.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-07-17 19:11:37 -06:00
Thananon Patinyasakdikul
033c364ee0 btl/ofi: Added FI_CONTEXT as requirement.
OFI BTL uses context for completion but never ask for it in
fi_getinfo(3). This commit makes sure that we always ask for FI_CONTEXT
to eliminate any potential error.

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
2018-07-17 12:18:43 -07:00
Sergey Oblomov
a4b8253fa2 MCA/COMMON/UCX: fixed initialization of malloc hooks
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-17 20:09:50 +03:00
Sergey Oblomov
1c7ae22dfb MCA/COMMON/UCX: shift opal memhooks into common UCX
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-17 13:46:38 +03:00
Ralph Castain
4a596d35f7 Remove the PMIx ext4x component
Update configury to redirect anything at or above v3 to the ext3x component

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-13 19:51:50 -07:00
Nathan Hjelm
9d3a79925b btl/vader: fix bugs in rma emulation
This commit fixes two bugs in the RMA/atomic emulation code:

 1) Fix a fragment leak when using AMO emulation.

 2) Always initialize the single-copy emulation code. This is required
 to use the AMO support.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-12 15:50:50 -06:00
Ralph Castain
fdca304268 Default to external PMIx installation
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-10 16:12:52 -07:00
Nathan Hjelm
8b090103e2 opal/fifo: fix 128-bit atomic fifo on Power9
This commit updates the atomic fifo code to fix a consistency issue
observed on Power9 systems when builtin atomics are used. The cause
was two things: 1) a missing write memory barrier in fifo push, and 2)
a read ordering issue when reading the fifo head non-atomically. This
commit fixes both issues and appears to correct then inconsistency.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-10 15:37:11 -06:00
KAWASHIMA Takahiro
3e179ba95f hwloc/external: Suppress missing-include-dirs warning
If OMPI is configured with `--with-hwloc=external` or `--with-hwloc=DIR`
and gfortran is used, I see a lot of warnings when compiling files
under the `ompi/mpi/fortran` directory.

```
f951: Warning: Nonexistent include directory
'BUILD_DIR/opal/mca/hwloc/external/hwloc/include' [-Wmissing-include-dirs]
```

There is no such `include` directory in the source tree and `configure`-
created tree. I think these lines in the `configure.m4` file are wrongly
copied from that for the embedded `hwlocXXX` component in the past.

The `-Wmissing-include-dirs` option is enabled in gfortran by default
but it is not enabled by default (or even with `-Wall`) in gcc and g++.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-09 10:55:33 +09:00
Yossi Itigin
e77e31b50b
Merge pull request #5378 from hoopoepg/topic/unify-ucx-logging
MCA/COMMON/UCX: unified logging across all UCX modules
2018-07-08 12:45:26 +03:00
Ralph Castain
17c4cf0db8 Install PMIx v3.0.0 release
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-06 06:38:02 -07:00
Sergey Oblomov
bef47b792c MCA/COMMON/UCX: unified logging across all UCX modules
- added common logging infrastructure for all
  UCX modules
- all UCX modules are switched to new infra

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-05 16:25:39 +03:00