1
1

29074 Коммитов

Автор SHA1 Сообщение Дата
Nathan Hjelm
df6dd69db8 btl/vader: ensure the fast box tag is always read first
On some platfoms reading a 64-bit value is non-atomic and it is
possible that the two 32-bit values are read in the wrong order. To
ensure the tag is always read first this commit reads the tag before
reading the full 64-bit value.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 66a7dc4c72cb25df67e7f872bee7a20b5fa9c763)
2018-10-03 11:36:17 -04:00
Geoff Paulsen
5cae0ec25b
Merge pull request #5794 from bwbarrett/v4.0.x-ofi-mtl-selection
mtl ofi: Change from opt-in to opt-out provider selection
2018-10-03 08:31:07 -05:00
Geoff Paulsen
1844d87e4c
Merge pull request #5802 from jsquyres/pr/v4.0.x/misc-updates
mpi.h: remove MPI_UB/MPI_LB when not enabling MPI-1 compat
2018-10-03 08:30:46 -05:00
Geoff Paulsen
593d652077
Merge pull request #5823 from jsquyres/pr/v4.0.x/fix-tcp-btl-show-help-ip-address
v4.0.x: btl/tcp: output the IP address correctly
2018-10-03 08:29:37 -05:00
Geoff Paulsen
9d1a6db1a0
Merge pull request #5826 from jsquyres/pr/v4.0.x/tcp-btl-socklen-fix
v4.0.x: TCP BTL socklen fix
2018-10-02 16:14:13 -05:00
George Bosilca
b63bee5da4 Small pedantic fixes.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit a3a492b42cd7b114b435fafa7cd46222dc565dd1)
2018-10-02 14:37:31 -04:00
George Bosilca
d450e460d6 Provide the correct socklen to bind.
Get Brian's patch from #5825 and his log message:
Fix a failure in binding the initiating side of a connection
on MacOS. MacOS doesn't like passing the size of the storage
structure (sockaddr_storage) instead of the expected size of
the structure (sockaddr_in or sockaddr_in6), which was causing
bind() failures. This patch simply changes the structure size
to the expected size.

Add a more clear error message in debug mode.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 9164e26e2f323c43c9a671cb510bb4df03e45628)
2018-10-02 14:37:31 -04:00
Jeff Squyres
81f2f19398 btl/tcp: output the IP address correctly
Per
https://github.com/open-mpi/ompi/issues/3035#issuecomment-426085673,
it looks like the IP address for a given interface is being stashed in
two places: on the endpoint and on the module.

1. On the endpoint, it is storing the moral equivalent of a
   (struct sockaddr_in.sin_addr).
2. On the module, it is storing a full (struct sockaddr_storage).

The call to opal_net_get_hostname() expects a full (struct sockaddr*)
-- not just the stripped-down (struct sockaddr_in.sin_addr).  Hence,
when the original code was passing in the endpoint's (struct
sockaddr_in.sin_addr) and opal_net_get_hostname() was treating it
like a (struct sockaddr), hilarity ensued (i.e., we got the wrong
output).

This commit eliminates the call to opal_net_get_hostname() and just
calls inet_ntop() directly to convert the (struct
sockaddr_in.sin_addr) to a string.

NOTE: Per the github comment cited above, there can be a disparity
between the IP address cached on the endpoint vs. the IP address
cached on the module.  This only happens with interfaces that have
more than one IP address.  This commit does not fix that issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 5dae086f7e4aee28fbb5a7282a2661286a5f68fe)
2018-10-02 10:49:51 -04:00
Geoff Paulsen
1ae1f7d3c6
Merge pull request #5790 from yosefe/topic/shmem-lock-progress-v4.0.x
shmem/lock: progress communications while waiting for shmem_lock
2018-10-01 09:13:19 -05:00
Geoff Paulsen
e6b8738132
Merge pull request #5806 from gpaulsen/topic/v4.0.x/NEWS/mtl_ofi
Topic/v4.0.x/news/mtl ofi
2018-10-01 09:06:24 -05:00
Geoff Paulsen
a4666dc008
Merge pull request #5805 from gpaulsen/topic/v4.0.x/NEWS/vers
Topic/v4.0.x/news/vers
2018-10-01 09:05:40 -05:00
Geoffrey Paulsen
0f984be381 NEWS: PR5794 - change MTL OFI selection
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2018-09-28 16:54:04 -05:00
Geoffrey Paulsen
8138b5b04a NEWS: updated versions of included hwloc and PMIx
Updated versions of included hwloc and PMIx to match
   corresponding VERSION files.

   Updated the spelling of "Open SHMEM" to "OpenSHMEM".

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2018-09-28 16:47:26 -05:00
Geoff Paulsen
a7e275cf3e
Merge pull request #5771 from jsquyres/pr/v4.0.x/readme-update-configure-cli-with-options
v4.0.x: README: Add note about --with-foo and RPATH
2018-09-28 16:20:47 -05:00
Jeff Squyres
46dd266e45 mpi.h: remove MPI_UB/MPI_LB when not enabling MPI-1 compat
When --enable-mpi1-compatibility was specified, the ompi_mpi_ub/lb
symbols were #if'ed out of mpi.h.  But the #defines for MPI_UB/LB
still remained.  This commit also #if's out the MPI_UB/LB macros when
--enable-mpi1-compatibility is specified.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 7223334d4dc1225d49cd2c63714870c3a04ad953)
2018-09-28 10:01:48 -07:00
Brian Barrett
10d0a430c4 mtl ofi: Change from opt-in to opt-out provider selection
Change default provider selection logic for the OFI MTL.  The
old logic was whitelist-only, so any new HPC NIC provider would
have to ask users to do extra work or wait for an OMPI release
to be whitelisted.  The reason for the logic was to avoid
selecting a "generic" provider like sockets or shm that would
frequently have worse performance than the optimized BTL options
Open MPI supports.

With the change, we blacklist the (small, relatively static) list
of providers that duplicate internal capabilities.  Users can use
one of thse blacklisted providers in two ways: first, they can
explicitly request the provider in the include list (which will
override the default exclude list) and second, the can set a new
empty exclude list.

Since most HPC networks require special libraries and therefore
an explicit build of libfabric, it is highly unlikely that this
change will cause users to use libfabric when they didn't want to
do so.  It does, however, solve the whitelisting problem.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit c5eaa38491c7197f7dbc74c299ade18e09bf5f64)
2018-09-27 18:41:47 +00:00
Geoff Paulsen
0fc5034802
Merge pull request #5791 from hoopoepg/topic/update-function-macro-v4.0
OPAL/COMMON/UCX: used __func__ macro instead of __FUNCTION__ - v4.0
2018-09-27 07:55:16 -05:00
Sergey Oblomov
68d3baffd5 OPAL/COMMON/UCX: used __func__ macro instead of __FUNCTION__
- used __func__ macro instead of __FUNCTION__ to unify
  macro usage with other components

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 9a51e257d162e724845024e3505880256194ebe2)
2018-09-27 12:04:07 +03:00
Yossi Itigin
cda310733f shmem/lock: progress communications while waiting for shmem_lock
(cherry picked from commit 4101150)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-09-27 11:42:34 +03:00
Geoff Paulsen
2b471430af
Merge pull request #5787 from gpaulsen/v4.0.x
Updating VERSION to v4.0.0rc3
2018-09-26 20:06:44 -05:00
Geoffrey Paulsen
27f3262403 Updating VERSION to v4.0.0rc3
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2018-09-26 18:38:29 -05:00
Jeff Squyres
6b91855ecc README: additional clarification about --with-<foo>-libdir.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 36c9f92117053ccd343c30c4540971240717a233)
2018-09-25 08:56:51 -07:00
Howard Pritchard
ec2e6eb9b1
Merge pull request #5766 from jsquyres/pr/v4.0.x/fix-ompi-info-mca-var-settable-output
v4.0.x: mca_base_var: fix output bug about settable vars
2018-09-25 09:25:48 -06:00
Jeff Squyres
02c5838cdf README: Add note about --with-foo and RPATH
Specifically mention our intended behavior about /usr and /usr/lib
(and why we don't add /usr/lib[64] and /usr/local/lib[64] to RPATH).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 9367440e3210cf2bfae33d3c72411ab8b9fa6622)
2018-09-25 06:35:34 -07:00
Jeff Squyres
8bdf4553d9 mca_base_var: fix output bug about settable vars
Fix the test that determined whether we output "writeable" or
"read-only" for MCA vars (it was checking the wrong flag).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 176da51aec0955a51f21157b33b21f60b6f28092)
2018-09-24 14:14:51 -07:00
Geoff Paulsen
9d9ae9286c
Merge pull request #5753 from gpaulsen/man-page-script-abstraction-break
Fix script abstraction break: mv make_manpage.pl to config
2018-09-23 09:01:19 -05:00
Geoff Paulsen
d03ee166cd
Merge pull request #5761 from amaslenn/platform-mellanox-v4
platform/mellanox: cleanup autodetect config — v4.0.x
2018-09-23 09:00:36 -05:00
Geoff Paulsen
97d7b004bb
Merge pull request #5762 from amaslenn/platform-mellanox-conf-v4
platform/mellanox: update default configuration — v4.0.x
2018-09-23 09:00:04 -05:00
Andrey Maslennikov
6fb0185f49 platform/mellanox: update default configuration
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
(cherry picked from commit da18a2d24c8f6192f2ad4fd4781ce67a3fcc5901)
2018-09-23 10:13:50 +03:00
Andrey Maslennikov
b22b54bf92 platform/mellanox: cleanup autodetect config
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
(cherry picked from commit ced50a98ff3f9e5b7812503ee895a2b2db581983)
2018-09-23 10:12:24 +03:00
Jeff Squyres
c83b30755a Fix script abstraction break: mv make_manpage.pl to config
Having the "make_manpage.pl" script in the ompi/ tree broke
"./autogen.pl --no-ompi" (specifically: "make distcheck" of --no-ompi
builds would break).

(cherry picked from commit 89773c41)
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2018-09-22 15:11:06 -05:00
Geoff Paulsen
3d4164e1e1
Merge pull request #5752 from gpaulsen/misc-warnings-fixes
Miscellaneous compiler warning stomps.
2018-09-22 15:01:53 -05:00
Geoff Paulsen
556367af31
Merge pull request #5754 from gpaulsen/event_threading
opal/progress: protect against multiple threads in event base
2018-09-22 15:00:32 -05:00
Geoff Paulsen
bc798b6135
Merge pull request #5755 from gpaulsen/osc_rdma_cleanup
osc/rdma: clean out stale aggregation code
2018-09-22 15:00:21 -05:00
Geoff Paulsen
4462396df3
Merge pull request #5756 from gpaulsen/osc_rdma_warning
osc/rdma: quiet warning
2018-09-22 15:00:11 -05:00
Geoff Paulsen
930db76492
Merge pull request #5757 from gpaulsen/info_snprintf2
snprintf() length fix for info
2018-09-22 14:59:49 -05:00
Mark Allen
5ac3fac6c2 snprintf() length fix for info
The important part of this fix is a couple places 5 was hard-coded that needed to be
strlen(OPAL_INFO_SAVE_PREFIX).

But also this contains a fix for a gcc 7.3.0 compiler warning about snprintf(). There
was an "if" statement making sure all the arguments had appropriate strlen(), but gcc
still complained about the following snprintf() because the size of the struct element
is iterator->ie_key[OPAL_MAX_INFO_KEY + 1].

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2018-09-21 14:47:11 -05:00
Nathan Hjelm
72fc8acb50 osc/rdma: quiet warning
gcc complains about ret possibly being used uninitialized. That will
never happen but we should still quiet the warning. This commit sets
ret to a valid value.

Fixes #5513

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:44:56 -05:00
Nathan Hjelm
56e31f8206 osc/rdma: clean out stale aggregation code
The aggregation code in osc/rdma is currently broken and will likely
not be reused. This commit cleans it out.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:42:45 -05:00
Nathan Hjelm
cd88e307fd opal/progress: protect against multiple threads in event base
libevent does not support multiple threads calling the event loop on
the same event base. This causes external libevent's to print out
re-entrant warning messages. This commit fixes the issue by protecting
the call to the event loop with an atomic swap check.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:40:08 -05:00
Jeff Squyres
2e37f97a38 Miscellaneous compiler warning stomps.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit fe0852bcb4d14a6aaf5a3e1021f60b5be32dd42d)
2018-09-21 14:35:51 -05:00
Geoff Paulsen
c21e1c1cc3
Merge pull request #5751 from hppritcha/topic/new_for_v4.0.0x_pr5692
NEWS: update for user reported issue
2018-09-21 10:26:52 -05:00
Howard Pritchard
9e1d18090c NEWS: update for user reported issue
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-09-21 08:45:35 -06:00
Howard Pritchard
d168cbbe19
Merge pull request #5750 from rhc54/cmr40/ofibtl
v4.0.0:Remove the OFI/BTL component
2018-09-21 05:12:10 -06:00
Geoff Paulsen
b8193cd37d
Merge pull request #5749 from gpaulsen/v4.0.0rc2_vers
Updating VERSION to rc2
2018-09-20 22:37:16 -05:00
Ralph Castain
192f0f6fff Remove the OFI/BTL component
Remove this component pending re-architecture of the overall OFI
components. We have had similar issues before when multiple components
use the same library - typical issues are race conditions, initialize
and finalize errors, etc. We are seeing similar problems here as we get
broader exposure to different library version and environment
combinations.

The correct fix in the past has been to centralize the library
interactions in a "common" component. We will pursue that here by moving
some additional functions (e.g., endpoint creation) into the existing
opal/mca/common/ofi component. We can't do that and thoroughly test it
in time for the v4.0.0 release, so we'll simply remove this component
from the release.

Once we have things correctly fixed, we'll submit a PR to restore the
component plus the related fixes to some future v4.x release.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-09-20 18:09:15 -07:00
Geoff Paulsen
e3945a75c1
Merge pull request #5735 from amaslenn/rpmbuild-topdir-v4
rpmbuild: fix rpmtopdir redefinition — v4.0.x
2018-09-20 18:17:47 -05:00
Geoff Paulsen
4688da0631
Merge pull request #5736 from hoopoepg/topic/topic/common-del-procs-v4.0
MCA/COMMON/UCX: del_procs calls are unified to common module - v4.0
2018-09-20 18:12:25 -05:00
Geoff Paulsen
1a65b0ab66
Merge pull request #5741 from ggouaillardet/topic/v4.0.x/use_mpi_f08_bindings
v4.0.x: fortran/use-mpi-f08: clean [p]ompi_FOO_f bindings
2018-09-20 18:10:11 -05:00
Geoff Paulsen
8dbfb9b032
Merge pull request #5745 from bwbarrett/v4.0.x-cuda-async
openib: Disable CUDA async by default
2018-09-20 18:09:32 -05:00