1
1
Граф коммитов

5247 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
0a9c358ef8
Merge pull request #5577 from hjelmn/btl_vader_sc_emu_warnings
btl/vader: clean up debuging and squash warning
2018-08-23 09:41:13 -06:00
Nathan Hjelm
c74cf666a9 btl/vader: clean up debuging and squash warning
References #5512

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-22 10:57:32 -06:00
Nathan Hjelm
a99dba6cb7
Merge pull request #5553 from markalle/info_snprintf2
snprintf() length fix for info
2018-08-22 10:27:14 -06:00
Ralph Castain
5cfa2a7fca Complete integration of job_control
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-08-20 16:10:50 -07:00
Ralph Castain
9948084130 Update to PMIx v3.0.1
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-08-20 15:05:24 -07:00
Mark Allen
ee6eb9b12b snprintf() length fix for info
The important part of this fix is a couple places 5 was hard-coded that needed to be
strlen(OPAL_INFO_SAVE_PREFIX).

But also this contains a fix for a gcc 7.3.0 compiler warning about snprintf(). There
was an "if" statement making sure all the arguments had appropriate strlen(), but gcc
still complained about the following snprintf() because the size of the struct element
is iterator->ie_key[OPAL_MAX_INFO_KEY + 1].

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2018-08-16 15:32:04 -04:00
Nathan Hjelm
dca3516765 btl/vader: move memory barrier to where it belongs
The write memory barrier was intended to precede setting a fast-box
header but instead follows it. This commit moves the memory barrier to
the intended location.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-13 10:14:34 -06:00
Nathan Hjelm
e0f73866ef
Merge pull request #5525 from hjelmn/event_threading
opal/progress: protect against multiple threads in event base
2018-08-13 09:21:44 -06:00
Jeff Squyres
01e4570af7 hwloc201/configure.m4: make it safe when used with hwloc:external
The Autoconf AC_CONFIG_* macros can only be instantiated exacly once
for any given file, *and* they must be in a code execution path at run
time for the target file to be generated at the end of configure.

For example, if you want to generate file ABC at the end of configure,
you must invoke the AC_CONFIG_FILES(ABC) macro in a code path that
will get executed when configure is run.

That's pretty straightforward.

What's not straightforward is two corner cases:

1. You cannot invoke the AC_CONFIG_FILES(ABC) macro for the same file
   more than once.  If you do, autoreconf will fail (even before you
   can run configure).
2. If AC_CONFIG_FILES(ABC) is not in a code path that is executed by
   configure, the file ABC is not registered properly, and ABC will
   not be generated at the end of configure.

This applies to hwloc because hwloc's HWLOC_SETUP_CORE macro calls
both AC_CONFIG_FILES and AC_CONFIG_HEADER to setup its Makefiles
(etc.) so that targets like "make distclean" and "make distcheck" will
work properly.  Hence, we *have* to invoke HWLOC_SETUP_CORE.

However, the MCA_opal_hwloc_hwloc201_CONFIG macro has a few side
effects.  It would be nice to do able to do something like this:

```
    if hwloc:extern is going to be used:
        Invoke minimal HWLOC_SETUP_CORE (with no side effects)
    else
        Invoke full HWLOC_SETUP_CORE (with side effects)
    fi
```

But we can't, because autoreconf will detect that AC_CONFIG_FILES has
been invoked on the same files more than once (regardless of whether
those code paths will be executed at run time or not).  Kaboom.

Similarly, we can't do this:

```
    if hwloc:extern is not going to be used:
        Invoke full HWLOC_SETUP_CORE (with side effects)
    fi
```

Because then hwloc's AC_CONFIG_FILES won't be registered properly when
hwloc:external *is* used (i.e., when the HWLOC_SETUP_CORE macro is not
in a code path that is executed at run time), and targets like "make
distclean" will fail because hwloc's Makefiles won't have been setup.
Kaboom.

But remember that the hwloc framework is a bit special: there will
only ever be 2 comoponents: external and internal.  External is
guaranteed to be configured first because of its priority.  So the
internal component (i.e., this component) immediately knows if it is
going to be used or not based on whether the external component
configuration succeeded or failed.

Specifically: regardless of whether the internal component (i.e., this
component) is going to be used, we have to invoke HWLOC_SETUP_CORE.
But we can manage the side effects: allow the side effects when
this/internal component is going to be used, and avoid the side
effects when this/internal component is not going to be used.

This is a little less clean than I would have liked, but because of
Autoconf's oddity about its AC_CONFIG_* macros, this is the only
solution I could come up with.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-11 11:05:23 -07:00
Jeff Squyres
69aa46e167 libevent2022/configure.m4: always invoke sub-configure
In order to make "make distclean" (and friends) work, we need to
*always* invoke the embedded configure script -- even if we know that
we're not going to use this component.

But in cases where we know we're not going to use this component, we
also need to avoid the side effects of the code path that is used when
we *do* want to use this component.  So split the two possibilities
into two different macros:

1. MCA_opal_event_libevent2022_FAKE_CONFIG: which does almost nothing
   except invoke the underlying "configure" script.
2. MCA_opal_event_libevent2022_REAL_CONFIG: which does all the real
   work (including invoking the underlying "configure" script).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-11 06:03:46 -07:00
Jeff Squyres
80df3f040b libevent2022/configure.m4: trivial cleanup
Put argument to AM_CONDITIONAL inside [].  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-10 10:30:45 -07:00
Jeff Squyres
17aa64e438 libevent2022/configure.m4: minor comment cleanup
Change # -> dnl.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-10 10:30:04 -07:00
Jeff Squyres
b063cb6b0f libevent2022: only configure if event:external fails
We know that event:external will be configured first (because of its
priority).  Take advantage of that here in libevent2022 by having it
refuse to configure / politely fail if event:external succeeded.

Also print out some additional lines in configure output indicating
what is going on (i.e., event:external succeeded, so this component
will be skipped, or event:external failed, so this component will be
used).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-08 10:22:38 -07:00
Jeff Squyres
4e5f432786 hwloc201: only configure if hwloc:external fails
We know that hwloc:external will be configured first (because of its
priority).  Take advantage of that here in hwloc201 by having it
refuse to configure / politely fail if hwloc:external succeeded.

Also print out some additional lines in configure output indicating
what is going on (i.e., hwloc:external succeeded, so this component
will be skipped, or hwloc:external failed, so this component will be
used).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-08-08 10:22:38 -07:00
Nathan Hjelm
bdbb853461 opal/progress: protect against multiple threads in event base
libevent does not support multiple threads calling the event loop on
the same event base. This causes external libevent's to print out
re-entrant warning messages. This commit fixes the issue by protecting
the call to the event loop with an atomic swap check.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-07 16:13:43 -06:00
Nathan Hjelm
eeae3f9b93
Merge pull request #5517 from bosilca/topic/treematch_warnings
Remove few warnings identified by @rhc in #5514.
2018-08-06 13:25:07 -06:00
Boris Karasev
57683366ca pmix: added check for pmix fence status
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
2018-08-06 15:01:57 +06:00
George Bosilca
6d11a45f44
Remove few warnings identified by @rhc in #5514.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2018-08-03 16:21:06 -04:00
Ralph Castain
f7a537cf04 Fix typo
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-25 20:05:33 -07:00
Ralph Castain
bcdb1f45ac Fix the multiple pe/proc option
Things got a little out of whack and we weren't actually processing the map-by modifiers, plus an error crept into the display of the binding report. So clean those up.

Thanks to @tonyreina for the error report

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-25 18:47:39 -07:00
Ralph Castain
55cefedf9b Cleanup pmix selection check
Allow for versions > 3

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-25 16:11:32 -07:00
Nathan Hjelm
47ed8e8830 btl/uct: fix compile warnings/errors
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-23 14:04:38 -06:00
Ralph Castain
5cab823979 Leave opal_event_external_support exposed as global var
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-23 09:18:48 -07:00
Jeff Squyres
a70ecf5267 event/external: prefer external event component
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-07-23 09:20:27 +09:00
Jeff Squyres
83e4a45a9f event: trivial comment change
Switch from #-style to dnl-style.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-07-23 09:20:27 +09:00
Gilles Gouaillardet
ce2c9fffd4 hwloc: prefer external hwloc component
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-07-23 09:20:27 +09:00
Ralph Castain
8a6dc0b211
Merge pull request #5456 from rhc54/topic/px
Default to internal PMIx if newer than external
2018-07-19 11:53:47 -07:00
Ralph Castain
1e6aaf7f22 Default to internal PMIx if newer than external
Per https://github.com/open-mpi/ompi/issues/5031, if the user didn't specify a particular PMIx installation, then default back to the internal version if it is newer than the discovered external one. PMIx doesn't yet provide a full signature so we have to just get as close as possible for now.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-19 11:02:03 -07:00
Yossi Itigin
29812494f2
Merge pull request #5402 from hoopoepg/topic/common-del-procs
MCA/COMMON/UCX: del_procs calls are unified to common module
2018-07-19 11:19:45 +03:00
Sergey Oblomov
920cc2e0d9 MCA/COMMON/UCX: del_procs calls are unified to common module
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-18 07:37:25 +03:00
Howard Pritchard
4447738098
Merge pull request #5414 from hppritcha/topic/iwarp_only_by_default
btl/openib: only look for iwarp/roce by default
2018-07-17 20:08:00 -06:00
Howard Pritchard
bc8134dae1
Merge pull request #5448 from thananon/ofi_context
btl/ofi: Added FI_CONTEXT as requirement.
2018-07-17 19:51:13 -06:00
Howard Pritchard
6818272392 btl/openib: only look for iwarp/roce by default
Due to decreasing support by vendors/other orgs for the OpenIB BTL,
only look for iWarp/RoCE devices by default.  Allow IB HCAs
with ports configured for ethernet.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-07-17 19:11:37 -06:00
Thananon Patinyasakdikul
033c364ee0 btl/ofi: Added FI_CONTEXT as requirement.
OFI BTL uses context for completion but never ask for it in
fi_getinfo(3). This commit makes sure that we always ask for FI_CONTEXT
to eliminate any potential error.

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
2018-07-17 12:18:43 -07:00
Sergey Oblomov
a4b8253fa2 MCA/COMMON/UCX: fixed initialization of malloc hooks
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-17 20:09:50 +03:00
Sergey Oblomov
1c7ae22dfb MCA/COMMON/UCX: shift opal memhooks into common UCX
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-17 13:46:38 +03:00
Ralph Castain
4a596d35f7 Remove the PMIx ext4x component
Update configury to redirect anything at or above v3 to the ext3x component

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-13 19:51:50 -07:00
Nathan Hjelm
9d3a79925b btl/vader: fix bugs in rma emulation
This commit fixes two bugs in the RMA/atomic emulation code:

 1) Fix a fragment leak when using AMO emulation.

 2) Always initialize the single-copy emulation code. This is required
 to use the AMO support.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-12 15:50:50 -06:00
Ralph Castain
fdca304268 Default to external PMIx installation
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-10 16:12:52 -07:00
Nathan Hjelm
8b090103e2 opal/fifo: fix 128-bit atomic fifo on Power9
This commit updates the atomic fifo code to fix a consistency issue
observed on Power9 systems when builtin atomics are used. The cause
was two things: 1) a missing write memory barrier in fifo push, and 2)
a read ordering issue when reading the fifo head non-atomically. This
commit fixes both issues and appears to correct then inconsistency.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-10 15:37:11 -06:00
KAWASHIMA Takahiro
3e179ba95f hwloc/external: Suppress missing-include-dirs warning
If OMPI is configured with `--with-hwloc=external` or `--with-hwloc=DIR`
and gfortran is used, I see a lot of warnings when compiling files
under the `ompi/mpi/fortran` directory.

```
f951: Warning: Nonexistent include directory
'BUILD_DIR/opal/mca/hwloc/external/hwloc/include' [-Wmissing-include-dirs]
```

There is no such `include` directory in the source tree and `configure`-
created tree. I think these lines in the `configure.m4` file are wrongly
copied from that for the embedded `hwlocXXX` component in the past.

The `-Wmissing-include-dirs` option is enabled in gfortran by default
but it is not enabled by default (or even with `-Wall`) in gcc and g++.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-09 10:55:33 +09:00
Yossi Itigin
e77e31b50b
Merge pull request #5378 from hoopoepg/topic/unify-ucx-logging
MCA/COMMON/UCX: unified logging across all UCX modules
2018-07-08 12:45:26 +03:00
Ralph Castain
17c4cf0db8 Install PMIx v3.0.0 release
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-06 06:38:02 -07:00
Sergey Oblomov
bef47b792c MCA/COMMON/UCX: unified logging across all UCX modules
- added common logging infrastructure for all
  UCX modules
- all UCX modules are switched to new infra

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-05 16:25:39 +03:00
Sergey Oblomov
8080283b3d MCA/COMMON/UCX: changed return type for wait_request
- for now wait_request returns OMPI status
- updated callers

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 23:29:38 +03:00
Sergey Oblomov
f574c14e3a ATOMICS/UCX: redefine atomic module API
- now it accepts integer values directily instead of
  pointers

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 14:41:45 +03:00
Yossi Itigin
4962651567
Merge pull request #5366 from hoopoepg/topic/mca-common-ucx-unify-2
MCA/COMMON/UCX: minor unification of del_proces calls
2018-07-04 14:38:37 +03:00
Nathan Hjelm
bd5cd62df9 btl/ugni: fix up some warnings
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-03 16:30:44 -06:00
Nathan Hjelm
d8916a4672 btl/ugni: fix race condition in completing frags
The descriptor flags field in a fragment were being ready after the
fragment may have been freed. This commit reads the flags before
calling the user callback.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-03 10:48:54 -06:00
Nathan Hjelm
87d41da62b btl/vader: add support for atomics and emulated rdma
This commit adds support for atomic operations as well as rdma for
systems without rdma support. This support is implemented using an
internal send tag.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-02 13:57:11 -06:00