1
1
Граф коммитов

5337 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
fa768d748f btl/vader: work around Oracle compiler bug
This commit works around an Oracle C compiler bug in 5.15 (not sure
when it was introduced). The bug is triggered when we chain
assignments of atomic variables. Ex:

_Atomic intptr x, y;
intptr_t z = 0;

x = y = z;

Will produce a compiler error of the form:

operand cannot have void type: op "="
assignment type mismatch:
	long "=" void

To work around the issue we are removing the chain assignment and
setting the head and tail on different lines.

Fixes #5814

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit dfa8d3a81a)
2018-10-03 11:36:17 -04:00
Nathan Hjelm
df6dd69db8 btl/vader: ensure the fast box tag is always read first
On some platfoms reading a 64-bit value is non-atomic and it is
possible that the two 32-bit values are read in the wrong order. To
ensure the tag is always read first this commit reads the tag before
reading the full 64-bit value.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 66a7dc4c72)
2018-10-03 11:36:17 -04:00
Geoff Paulsen
593d652077
Merge pull request #5823 from jsquyres/pr/v4.0.x/fix-tcp-btl-show-help-ip-address
v4.0.x: btl/tcp: output the IP address correctly
2018-10-03 08:29:37 -05:00
George Bosilca
b63bee5da4 Small pedantic fixes.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit a3a492b42c)
2018-10-02 14:37:31 -04:00
George Bosilca
d450e460d6 Provide the correct socklen to bind.
Get Brian's patch from #5825 and his log message:
Fix a failure in binding the initiating side of a connection
on MacOS. MacOS doesn't like passing the size of the storage
structure (sockaddr_storage) instead of the expected size of
the structure (sockaddr_in or sockaddr_in6), which was causing
bind() failures. This patch simply changes the structure size
to the expected size.

Add a more clear error message in debug mode.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 9164e26e2f)
2018-10-02 14:37:31 -04:00
Jeff Squyres
81f2f19398 btl/tcp: output the IP address correctly
Per
https://github.com/open-mpi/ompi/issues/3035#issuecomment-426085673,
it looks like the IP address for a given interface is being stashed in
two places: on the endpoint and on the module.

1. On the endpoint, it is storing the moral equivalent of a
   (struct sockaddr_in.sin_addr).
2. On the module, it is storing a full (struct sockaddr_storage).

The call to opal_net_get_hostname() expects a full (struct sockaddr*)
-- not just the stripped-down (struct sockaddr_in.sin_addr).  Hence,
when the original code was passing in the endpoint's (struct
sockaddr_in.sin_addr) and opal_net_get_hostname() was treating it
like a (struct sockaddr), hilarity ensued (i.e., we got the wrong
output).

This commit eliminates the call to opal_net_get_hostname() and just
calls inet_ntop() directly to convert the (struct
sockaddr_in.sin_addr) to a string.

NOTE: Per the github comment cited above, there can be a disparity
between the IP address cached on the endpoint vs. the IP address
cached on the module.  This only happens with interfaces that have
more than one IP address.  This commit does not fix that issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 5dae086f7e)
2018-10-02 10:49:51 -04:00
Sergey Oblomov
68d3baffd5 OPAL/COMMON/UCX: used __func__ macro instead of __FUNCTION__
- used __func__ macro instead of __FUNCTION__ to unify
  macro usage with other components

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 9a51e257d1)
2018-09-27 12:04:07 +03:00
Jeff Squyres
8bdf4553d9 mca_base_var: fix output bug about settable vars
Fix the test that determined whether we output "writeable" or
"read-only" for MCA vars (it was checking the wrong flag).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 176da51aec)
2018-09-24 14:14:51 -07:00
Geoff Paulsen
3d4164e1e1
Merge pull request #5752 from gpaulsen/misc-warnings-fixes
Miscellaneous compiler warning stomps.
2018-09-22 15:01:53 -05:00
Geoff Paulsen
556367af31
Merge pull request #5754 from gpaulsen/event_threading
opal/progress: protect against multiple threads in event base
2018-09-22 15:00:32 -05:00
Mark Allen
5ac3fac6c2 snprintf() length fix for info
The important part of this fix is a couple places 5 was hard-coded that needed to be
strlen(OPAL_INFO_SAVE_PREFIX).

But also this contains a fix for a gcc 7.3.0 compiler warning about snprintf(). There
was an "if" statement making sure all the arguments had appropriate strlen(), but gcc
still complained about the following snprintf() because the size of the struct element
is iterator->ie_key[OPAL_MAX_INFO_KEY + 1].

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2018-09-21 14:47:11 -05:00
Nathan Hjelm
cd88e307fd opal/progress: protect against multiple threads in event base
libevent does not support multiple threads calling the event loop on
the same event base. This causes external libevent's to print out
re-entrant warning messages. This commit fixes the issue by protecting
the call to the event loop with an atomic swap check.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:40:08 -05:00
Jeff Squyres
2e37f97a38 Miscellaneous compiler warning stomps.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit fe0852bcb4)
2018-09-21 14:35:51 -05:00
Ralph Castain
192f0f6fff Remove the OFI/BTL component
Remove this component pending re-architecture of the overall OFI
components. We have had similar issues before when multiple components
use the same library - typical issues are race conditions, initialize
and finalize errors, etc. We are seeing similar problems here as we get
broader exposure to different library version and environment
combinations.

The correct fix in the past has been to centralize the library
interactions in a "common" component. We will pursue that here by moving
some additional functions (e.g., endpoint creation) into the existing
opal/mca/common/ofi component. We can't do that and thoroughly test it
in time for the v4.0.0 release, so we'll simply remove this component
from the release.

Once we have things correctly fixed, we'll submit a PR to restore the
component plus the related fixes to some future v4.x release.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-09-20 18:09:15 -07:00
Geoff Paulsen
4688da0631
Merge pull request #5736 from hoopoepg/topic/topic/common-del-procs-v4.0
MCA/COMMON/UCX: del_procs calls are unified to common module - v4.0
2018-09-20 18:12:25 -05:00
Geoff Paulsen
8dbfb9b032
Merge pull request #5745 from bwbarrett/v4.0.x-cuda-async
openib: Disable CUDA async by default
2018-09-20 18:09:32 -05:00
Geoff Paulsen
4d9deb33ec
Merge pull request #5688 from rhc54/cmr40/pmix302
v4.0.x: Update to PMIx v3.0.2
2018-09-20 16:26:06 -05:00
Brian Barrett
a4fabb7e26 openib: Disable CUDA async by default
Disable async receive for CUDA under OpenIB.  While a performance
optimization, it also causes incorrect results for transfers
larger than the GPUDirect RDMA limit.  This change has been validated
and approved by Akshay.

References #3972

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 9344afd485)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-09-20 14:20:54 -07:00
Howard Pritchard
27732bdf33
Merge pull request #5738 from hppritcha/topic/remove_scif_support_4.0.x
SCIF: remove it
2018-09-19 12:43:26 -06:00
Howard Pritchard
730f98d8b0
Merge pull request #5665 from hjelmn/v4.0.x_cache_flush
v4.0.x: patcher/base: improve instruction cache flush for aarch64
2018-09-19 12:12:57 -06:00
Howard Pritchard
01d4d52588 SCIF: remove it
KNC is effectively dead.  Remove corresponding SCIF
support in Open MPI.

cherry pick of PR #5737

+

news update

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit b9ac3d8931)
2018-09-19 11:48:17 -06:00
Sergey Oblomov
3cace87749 MCA/COMMON/UCX: del_procs calls are unified to common module
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 920cc2e0d9)
2018-09-19 10:47:27 +03:00
Ralph Castain
131ea01320 Update to PMIx v3.0.2
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-09-18 13:03:18 -07:00
Howard Pritchard
06c62c6b30
Merge pull request #5685 from selvintxavier/brcm_hcas_v4.0.x
v4.0.x: Add support for different Broadcom HCAs
2018-09-17 16:29:45 -06:00
Selvin Xavier
9114a9ac95 v4.0.x: Add support for different Broadcom HCAs
Adds device ids of different Broadcom adapters from
BCM57XXX and BCM58XXX family of HCAs.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
(cherry-picked from a53a6f7650)
2018-09-16 22:54:07 -07:00
Nathan Hjelm
83668f4b47 btl/vader: ensure that the send tag is always written last
To ensure fast box entries are complete when processed by the
receiving process the tag must be written last. This includes a zero
header for the next fast box entry (in some cases). This commit fixes
two instances where the tag was written too early. In one case, on
32-bit systems it is possible for the tag part of the header to be
written before the size. The second instance is an ordering issue. The
zero header was being written after the fastbox header.

Fixes #5375, #5638

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 850fbff441)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-14 12:34:26 -06:00
Nathan Hjelm
1cc295e4c4 patcher/base: improve instruction cache flush for aarch64
This commit updates the patcher component to either use the
__clear_cache intrinsic or the correct assembly to flush the
instruction cache.

Fixes #5631

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 1cdbceb095)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-10 15:44:50 -06:00
Geoff Paulsen
17aab5ea5b
Merge pull request #5659 from ggouaillardet/topic/v4.0.x/misc_finalize_leaks
Plug misc leaks on MPI_Finalize()
2018-09-10 14:06:31 -05:00
Howard Pritchard
38171be7ac
Merge pull request #5656 from jsquyres/pr/v4.0.x/dont-let-openib-yell
v4.0.x: btl/openib: don't complain about no NICs
2018-09-10 09:39:22 -06:00
Gilles Gouaillardet
d2646cd565 mpool/memkind: plug a memory leak in mca_mpool_memkind_close()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@ad2c207a7e)
2018-09-10 09:19:57 +09:00
Gilles Gouaillardet
9410de0d27 opal/util: plug a memory leak in the opal_infosubscriber_t destructor
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@7556dd0abb)
2018-09-10 09:19:34 +09:00
Gilles Gouaillardet
baf41aceed pmix/pmix3x: plug a memory leak in external_register()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@aeddd2f249)
2018-09-10 09:17:31 +09:00
Gilles Gouaillardet
8015e6f929 pmix/base: plug a memory leak in opal_pmix_base_select()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@6e47c5708e)
2018-09-10 09:16:56 +09:00
Geoff Paulsen
954692f06e
Merge pull request #5614 from karasevb/v4.0.x_fix_hwloc_numa_obj
v4.0.x: Fixed the NUMA obj detection for hwloc ver >= 2.0.0
2018-09-07 14:49:26 -05:00
Jeff Squyres
38c7364896 btl/openib: don't complain about no NICs
Since openib is on its long, slow way out the door, don't let it
complain about not being able to find any NICs at run time.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 098ec55e37)
2018-09-07 12:01:33 -07:00
Nathan Hjelm
2fb1a5e1b2 btl/uct: add missing opal_mem_hooks_unregister_release call
This commit fixes a bug when using the UCT btl with the UCX memory
hooks disabled. We were misssing a call to
opal_mem_hooks_unregister_release to remove the btl memory hook
callback.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 36c206d2d6)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-05 14:02:58 -06:00
Howard Pritchard
b6dafb6b90
Merge pull request #5636 from jsquyres/pr/v4.0.x/verbs-usnic-configury-moar-strictness
v4.0.x: make common/verbs-usnic actually check if it can compile
2018-09-04 09:13:12 -06:00
Geoff Paulsen
3282c61048
Merge pull request #5625 from hoopoepg/topic/optimize-blocked-calls-v4.0
PML/UCX: blocked calls optimizations - v4.0
2018-08-31 14:11:11 -05:00
Geoff Paulsen
334748753c
Merge pull request #5626 from hoopoepg/topic/opal-mem-hooks-syno-v4.0
MCA/COMMON/UCX: added synonim to opal_mem_hook variable - v4.0
2018-08-31 14:09:14 -05:00
Geoff Paulsen
b2daa0001f
Merge pull request #5565 from rhc54/cmr40/pmix301
Update to PMIx 3.0.1
2018-08-31 13:58:41 -05:00
Jeff Squyres
3e842348d1 common/verbs-usnic: check that it will actually compile
If someone specifies --with-verbs-usnic, actually do a configury check
to ensure that it will compile (vs. assuming that it will compile if
someone asks for it).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 05e5f61fe1)
2018-08-30 14:56:37 -07:00
Sergey Oblomov
028bcb8a73 MCA/COMMON/UCX: added synonim to opal_mem_hook variable
- added synonim to common ucx variables to allow
  to print it in opal_info -a

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e00f7a68ba)
2018-08-29 15:17:00 +03:00
Sergey Oblomov
9215eb9a3b PML/UCX: blocked calls optimizations
- refactoring of opal/UCX progress calls
- added UCX progress priority

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit b0f87f2235)
2018-08-29 14:38:22 +03:00
Boris Karasev
31ca3842da Fixed copyrights of prev commit.
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit beb0697f24)
2018-08-28 12:29:16 +03:00
Boris Karasev
d995fb1b3f Fixed the NUMA obj detection for hwloc ver >= 2.0.0
Since version hwloc 2.0.0 has a new organization of NUMA nodes on the
topology tree. This commit adds the detection of local NUMA object for
hwloc => 2.0.0, which fixes the procs bindings policy for rmaps mindist
component.

Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit e5291ccc34)
2018-08-28 12:29:08 +03:00
Zoltán Mizsei
b2628129fd fcntl include bugfix
Signed-off-by: Zoltán Mizsei <zmizsei@extrowerk.com>

(cherry picked from commit open-mpi/ompi@ac3f8a16ed)
2018-08-27 09:48:54 +09:00
Jeff Squyres
4fd51a1563
Merge pull request #5592 from hjelmn/v4.0.x_sc_emu
btl/vader: clean up debuging and squash warning
2018-08-23 17:15:09 -07:00
Nathan Hjelm
eba44d3709 btl/vader: clean up debuging and squash warning
References #5512

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit c74cf666a9)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-08-23 15:53:43 -06:00
Howard Pritchard
c2cc336135
Merge pull request #5486 from rhc54/cmr40/maps
Cleanup pmix selection and map-by modifiers
2018-08-22 04:40:00 -04:00
Ralph Castain
e27e945d9a Complete job control integration
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-08-20 16:08:54 -07:00
Ralph Castain
3eef3d1d8f Update to PMIx 3.0.1
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-08-20 14:00:41 -07:00
Boris Karasev
8873d901e8 pmix: added check for pmix fence status
Signed-off-by: Boris Karasev <karasev.b@gmail.com>
(cherry picked from commit 57683366ca)

Conflicts:
	opal/mca/common/ucx/common_ucx.c
	opal/mca/common/ucx/common_ucx.h

Modified:
	ompi/mca/pml/ucx/pml_ucx.c
	oshmem/mca/spml/ucx/spml_ucx.c
2018-08-17 21:33:50 +06:00
Geoff Paulsen
8483eb4bf7
Merge pull request #5471 from hjelmn/v4.0.x_uct_btl_fix
btl/uct: fix compile warnings/errors
2018-08-15 16:30:40 -05:00
Geoff Paulsen
98bd571cc8
Merge pull request #5472 from ggouaillardet/topic/v4.0.x/prefer-externals
v4.0.x: Prefer external hwloc and libevent
2018-08-15 16:27:33 -05:00
Nathan Hjelm
b4f80e4e36 btl/vader: move memory barrier to where it belongs
The write memory barrier was intended to precede setting a fast-box
header but instead follows it. This commit moves the memory barrier to
the intended location.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit dca3516765)
2018-08-14 09:19:48 -07:00
Jeff Squyres
72e5766a56 hwloc201/configure.m4: make it safe when used with hwloc:external
The Autoconf AC_CONFIG_* macros can only be instantiated exacly once
for any given file, *and* they must be in a code execution path at run
time for the target file to be generated at the end of configure.

For example, if you want to generate file ABC at the end of configure,
you must invoke the AC_CONFIG_FILES(ABC) macro in a code path that
will get executed when configure is run.

That's pretty straightforward.

What's not straightforward is two corner cases:

1. You cannot invoke the AC_CONFIG_FILES(ABC) macro for the same file
   more than once.  If you do, autoreconf will fail (even before you
   can run configure).
2. If AC_CONFIG_FILES(ABC) is not in a code path that is executed by
   configure, the file ABC is not registered properly, and ABC will
   not be generated at the end of configure.

This applies to hwloc because hwloc's HWLOC_SETUP_CORE macro calls
both AC_CONFIG_FILES and AC_CONFIG_HEADER to setup its Makefiles
(etc.) so that targets like "make distclean" and "make distcheck" will
work properly.  Hence, we *have* to invoke HWLOC_SETUP_CORE.

However, the MCA_opal_hwloc_hwloc201_CONFIG macro has a few side
effects.  It would be nice to do able to do something like this:

```
    if hwloc:extern is going to be used:
        Invoke minimal HWLOC_SETUP_CORE (with no side effects)
    else
        Invoke full HWLOC_SETUP_CORE (with side effects)
    fi
```

But we can't, because autoreconf will detect that AC_CONFIG_FILES has
been invoked on the same files more than once (regardless of whether
those code paths will be executed at run time or not).  Kaboom.

Similarly, we can't do this:

```
    if hwloc:extern is not going to be used:
        Invoke full HWLOC_SETUP_CORE (with side effects)
    fi
```

Because then hwloc's AC_CONFIG_FILES won't be registered properly when
hwloc:external *is* used (i.e., when the HWLOC_SETUP_CORE macro is not
in a code path that is executed at run time), and targets like "make
distclean" will fail because hwloc's Makefiles won't have been setup.
Kaboom.

But remember that the hwloc framework is a bit special: there will
only ever be 2 comoponents: external and internal.  External is
guaranteed to be configured first because of its priority.  So the
internal component (i.e., this component) immediately knows if it is
going to be used or not based on whether the external component
configuration succeeded or failed.

Specifically: regardless of whether the internal component (i.e., this
component) is going to be used, we have to invoke HWLOC_SETUP_CORE.
But we can manage the side effects: allow the side effects when
this/internal component is going to be used, and avoid the side
effects when this/internal component is not going to be used.

This is a little less clean than I would have liked, but because of
Autoconf's oddity about its AC_CONFIG_* macros, this is the only
solution I could come up with.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 01e4570af7)
2018-08-11 12:00:35 -07:00
Jeff Squyres
714f203985 libevent2022/configure.m4: always invoke sub-configure
In order to make "make distclean" (and friends) work, we need to
*always* invoke the embedded configure script -- even if we know that
we're not going to use this component.

But in cases where we know we're not going to use this component, we
also need to avoid the side effects of the code path that is used when
we *do* want to use this component.  So split the two possibilities
into two different macros:

1. MCA_opal_event_libevent2022_FAKE_CONFIG: which does almost nothing
   except invoke the underlying "configure" script.
2. MCA_opal_event_libevent2022_REAL_CONFIG: which does all the real
   work (including invoking the underlying "configure" script).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 69aa46e167)
2018-08-11 12:00:35 -07:00
Jeff Squyres
5c5246f655 libevent2022/configure.m4: trivial cleanup
Put argument to AM_CONDITIONAL inside [].  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 80df3f040b)
2018-08-11 12:00:35 -07:00
Jeff Squyres
63d68ded48 libevent2022/configure.m4: minor comment cleanup
Change # -> dnl.  No code or logic changes.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 17aa64e438)
2018-08-11 12:00:35 -07:00
Jeff Squyres
6cb3d61dd1 libevent2022: only configure if event:external fails
We know that event:external will be configured first (because of its
priority).  Take advantage of that here in libevent2022 by having it
refuse to configure / politely fail if event:external succeeded.

Also print out some additional lines in configure output indicating
what is going on (i.e., event:external succeeded, so this component
will be skipped, or event:external failed, so this component will be
used).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit b063cb6b0f)
2018-08-09 06:47:08 -07:00
Jeff Squyres
eca16720de hwloc201: only configure if hwloc:external fails
We know that hwloc:external will be configured first (because of its
priority).  Take advantage of that here in hwloc201 by having it
refuse to configure / politely fail if hwloc:external succeeded.

Also print out some additional lines in configure output indicating
what is going on (i.e., hwloc:external succeeded, so this component
will be skipped, or hwloc:external failed, so this component will be
used).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 4e5f432786)
2018-08-09 06:47:08 -07:00
Ralph Castain
0cdf49ed8a Fix typo
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit f7a537cf04)
2018-07-25 21:37:03 -07:00
Ralph Castain
511319c316 Fix the multiple pe/proc option
Things got a little out of whack and we weren't actually processing the map-by modifiers, plus an error crept into the display of the binding report. So clean those up.

Thanks to @tonyreina for the error report

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit bcdb1f45ac)
2018-07-25 19:55:28 -07:00
Ralph Castain
04054d63eb Cleanup pmix selection check
Allow for versions > 3

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 55cefedf9b)
2018-07-25 19:55:18 -07:00
Ralph Castain
2c2f9b8169 Leave opal_event_external_support exposed as global var
Signed-off-by: Ralph Castain <rhc@open-mpi.org>

(cherry picked from commit open-mpi/ompi@5cab823979)
2018-07-24 09:53:00 +09:00
Jeff Squyres
1ea021933f event/external: prefer external event component
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

(cherry picked from commit open-mpi/ompi@a70ecf5267)
2018-07-24 09:52:58 +09:00
Jeff Squyres
6f5a453492 event: trivial comment change
Switch from #-style to dnl-style.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

(cherry picked from commit open-mpi/ompi@83e4a45a9f)
2018-07-24 09:52:56 +09:00
Gilles Gouaillardet
aa7a4d0f6f hwloc: prefer external hwloc component
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>

(cherry picked from commit open-mpi/ompi@ce2c9fffd4)
2018-07-24 09:52:53 +09:00
Nathan Hjelm
b6bd3d33f1 btl/uct: fix compile warnings/errors
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 47ed8e8830)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-23 14:05:17 -06:00
Ralph Castain
508c3f391f Default to internal PMIx if newer than external
Per https://github.com/open-mpi/ompi/issues/5031, if the user didn't specify a particular PMIx installation, then default back to the internal version if it is newer than the discovered external one. PMIx doesn't yet provide a full signature so we have to just get as close as possible for now.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 1e6aaf7f22)
2018-07-19 11:59:17 -07:00
Howard Pritchard
4447738098
Merge pull request #5414 from hppritcha/topic/iwarp_only_by_default
btl/openib: only look for iwarp/roce by default
2018-07-17 20:08:00 -06:00
Howard Pritchard
bc8134dae1
Merge pull request #5448 from thananon/ofi_context
btl/ofi: Added FI_CONTEXT as requirement.
2018-07-17 19:51:13 -06:00
Howard Pritchard
6818272392 btl/openib: only look for iwarp/roce by default
Due to decreasing support by vendors/other orgs for the OpenIB BTL,
only look for iWarp/RoCE devices by default.  Allow IB HCAs
with ports configured for ethernet.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-07-17 19:11:37 -06:00
Thananon Patinyasakdikul
033c364ee0 btl/ofi: Added FI_CONTEXT as requirement.
OFI BTL uses context for completion but never ask for it in
fi_getinfo(3). This commit makes sure that we always ask for FI_CONTEXT
to eliminate any potential error.

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
2018-07-17 12:18:43 -07:00
Sergey Oblomov
a4b8253fa2 MCA/COMMON/UCX: fixed initialization of malloc hooks
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-17 20:09:50 +03:00
Sergey Oblomov
1c7ae22dfb MCA/COMMON/UCX: shift opal memhooks into common UCX
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-17 13:46:38 +03:00
Ralph Castain
4a596d35f7 Remove the PMIx ext4x component
Update configury to redirect anything at or above v3 to the ext3x component

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-13 19:51:50 -07:00
Nathan Hjelm
9d3a79925b btl/vader: fix bugs in rma emulation
This commit fixes two bugs in the RMA/atomic emulation code:

 1) Fix a fragment leak when using AMO emulation.

 2) Always initialize the single-copy emulation code. This is required
 to use the AMO support.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-12 15:50:50 -06:00
Ralph Castain
fdca304268 Default to external PMIx installation
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-10 16:12:52 -07:00
Nathan Hjelm
8b090103e2 opal/fifo: fix 128-bit atomic fifo on Power9
This commit updates the atomic fifo code to fix a consistency issue
observed on Power9 systems when builtin atomics are used. The cause
was two things: 1) a missing write memory barrier in fifo push, and 2)
a read ordering issue when reading the fifo head non-atomically. This
commit fixes both issues and appears to correct then inconsistency.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-10 15:37:11 -06:00
KAWASHIMA Takahiro
3e179ba95f hwloc/external: Suppress missing-include-dirs warning
If OMPI is configured with `--with-hwloc=external` or `--with-hwloc=DIR`
and gfortran is used, I see a lot of warnings when compiling files
under the `ompi/mpi/fortran` directory.

```
f951: Warning: Nonexistent include directory
'BUILD_DIR/opal/mca/hwloc/external/hwloc/include' [-Wmissing-include-dirs]
```

There is no such `include` directory in the source tree and `configure`-
created tree. I think these lines in the `configure.m4` file are wrongly
copied from that for the embedded `hwlocXXX` component in the past.

The `-Wmissing-include-dirs` option is enabled in gfortran by default
but it is not enabled by default (or even with `-Wall`) in gcc and g++.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-09 10:55:33 +09:00
Yossi Itigin
e77e31b50b
Merge pull request #5378 from hoopoepg/topic/unify-ucx-logging
MCA/COMMON/UCX: unified logging across all UCX modules
2018-07-08 12:45:26 +03:00
Ralph Castain
17c4cf0db8 Install PMIx v3.0.0 release
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-07-06 06:38:02 -07:00
Sergey Oblomov
bef47b792c MCA/COMMON/UCX: unified logging across all UCX modules
- added common logging infrastructure for all
  UCX modules
- all UCX modules are switched to new infra

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-05 16:25:39 +03:00
Sergey Oblomov
8080283b3d MCA/COMMON/UCX: changed return type for wait_request
- for now wait_request returns OMPI status
- updated callers

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 23:29:38 +03:00
Sergey Oblomov
f574c14e3a ATOMICS/UCX: redefine atomic module API
- now it accepts integer values directily instead of
  pointers

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-04 14:41:45 +03:00
Yossi Itigin
4962651567
Merge pull request #5366 from hoopoepg/topic/mca-common-ucx-unify-2
MCA/COMMON/UCX: minor unification of del_proces calls
2018-07-04 14:38:37 +03:00
Nathan Hjelm
bd5cd62df9 btl/ugni: fix up some warnings
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-03 16:30:44 -06:00
Nathan Hjelm
d8916a4672 btl/ugni: fix race condition in completing frags
The descriptor flags field in a fragment were being ready after the
fragment may have been freed. This commit reads the flags before
calling the user callback.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-03 10:48:54 -06:00
Nathan Hjelm
87d41da62b btl/vader: add support for atomics and emulated rdma
This commit adds support for atomic operations as well as rdma for
systems without rdma support. This support is implemented using an
internal send tag.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-07-02 13:57:11 -06:00
Nathan T. Weeks
08f9ae97ee btl/ugni: update BTL_VERBOSE argument list
Signed-off-by: Nathan T. Weeks <weeks@iastate.edu>
2018-07-02 09:23:30 -06:00
Sergey Oblomov
c2bd6af9f2 MCA/COMMON/UCX: minor unification of del_proces calls
- some common functionality of del_procs calls is moved into
  mca_common module
- blocking ucp_put call is replaced by non-blocking routine

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-07-02 15:10:53 +03:00
Jeff Squyres
7b0dd03e92 tcp/btl: fix a cast
The current cast is *functional*, but isn't really the way it should
be done.  This commit makes the cast the way it should be done.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-29 07:25:46 -07:00
Jeff Squyres
57bc657e7f btl/tcp: fix hash map usage
Fix two facepalms:

1. The "uint32" in the hash map functions refer to the *key* size, not
   the *value* size.  The values are always 64 bits.
2. Pass the straight value to the "set" functions -- not the pointer
   to the value.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-06-28 15:29:41 -07:00
Yossi Itigin
3a7271ef4e
Merge pull request #5344 from hoopoepg/topic/mca-common-ucx-fixed-build
MCA/COMMON/UCX: fixed build scripts
2018-06-28 15:14:04 +03:00
Sergey Oblomov
624d59604b MCA/COMMON/UCX: minor optimization of build scripts
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-28 12:58:07 +03:00
Thananon Patinyasakdikul
304cf97ab5
Merge pull request #5334 from thananon/ofi_progress_fix
btl/ofi: progress now happens after a threshold.
2018-06-27 12:51:33 -07:00
Sergey Oblomov
de8568c822 MCA/COMMON/UCX: enabled fallback into older UCX API
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-27 19:59:40 +03:00
Sergey Oblomov
1223b05811 MCA/COMMON/UCX: fixed build scripts
- updated evaluation of UCX lib - used call from UCX v1.3
- updated makefile compilation flags

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2018-06-27 11:10:25 +03:00
Thananon Patinyasakdikul
be76896f7c btl/ofi: progress now happens after a threshold.
This commit changed the way btl/ofi call progress. Before, we force
progression with every rdma/atomic call. This gives performance boost in
some case and slow down on others. Now we only force progression after
some number of rdma calls which result in better performance overall.

Also added new MCA parameter 'mca_btl_ofi_progress_threshold' to set
the threshold number. The new default is 64.

Also:
Added FI_DELIVERY_COMPLETE to tx_rtx flags to ensure that the completion
is generated after the message has been received on the remote side.

Signed-off-by: Thananon Patinyasakdikul <thananon.patinyasakdikul@intel.com>
2018-06-26 10:39:45 -07:00