1
1
Граф коммитов

29142 Коммитов

Автор SHA1 Сообщение Дата
Geoff Paulsen
556367af31
Merge pull request #5754 from gpaulsen/event_threading
opal/progress: protect against multiple threads in event base
2018-09-22 15:00:32 -05:00
Geoff Paulsen
bc798b6135
Merge pull request #5755 from gpaulsen/osc_rdma_cleanup
osc/rdma: clean out stale aggregation code
2018-09-22 15:00:21 -05:00
Geoff Paulsen
4462396df3
Merge pull request #5756 from gpaulsen/osc_rdma_warning
osc/rdma: quiet warning
2018-09-22 15:00:11 -05:00
Geoff Paulsen
930db76492
Merge pull request #5757 from gpaulsen/info_snprintf2
snprintf() length fix for info
2018-09-22 14:59:49 -05:00
Mark Allen
5ac3fac6c2 snprintf() length fix for info
The important part of this fix is a couple places 5 was hard-coded that needed to be
strlen(OPAL_INFO_SAVE_PREFIX).

But also this contains a fix for a gcc 7.3.0 compiler warning about snprintf(). There
was an "if" statement making sure all the arguments had appropriate strlen(), but gcc
still complained about the following snprintf() because the size of the struct element
is iterator->ie_key[OPAL_MAX_INFO_KEY + 1].

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2018-09-21 14:47:11 -05:00
Nathan Hjelm
72fc8acb50 osc/rdma: quiet warning
gcc complains about ret possibly being used uninitialized. That will
never happen but we should still quiet the warning. This commit sets
ret to a valid value.

Fixes #5513

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:44:56 -05:00
Nathan Hjelm
56e31f8206 osc/rdma: clean out stale aggregation code
The aggregation code in osc/rdma is currently broken and will likely
not be reused. This commit cleans it out.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:42:45 -05:00
Nathan Hjelm
cd88e307fd opal/progress: protect against multiple threads in event base
libevent does not support multiple threads calling the event loop on
the same event base. This causes external libevent's to print out
re-entrant warning messages. This commit fixes the issue by protecting
the call to the event loop with an atomic swap check.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:40:08 -05:00
Jeff Squyres
2e37f97a38 Miscellaneous compiler warning stomps.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit fe0852bcb4)
2018-09-21 14:35:51 -05:00
Geoff Paulsen
c21e1c1cc3
Merge pull request #5751 from hppritcha/topic/new_for_v4.0.0x_pr5692
NEWS: update for user reported issue
2018-09-21 10:26:52 -05:00
Howard Pritchard
9e1d18090c NEWS: update for user reported issue
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-09-21 08:45:35 -06:00
Howard Pritchard
d168cbbe19
Merge pull request #5750 from rhc54/cmr40/ofibtl
v4.0.0:Remove the OFI/BTL component
2018-09-21 05:12:10 -06:00
Geoff Paulsen
b8193cd37d
Merge pull request #5749 from gpaulsen/v4.0.0rc2_vers
Updating VERSION to rc2
2018-09-20 22:37:16 -05:00
Ralph Castain
192f0f6fff Remove the OFI/BTL component
Remove this component pending re-architecture of the overall OFI
components. We have had similar issues before when multiple components
use the same library - typical issues are race conditions, initialize
and finalize errors, etc. We are seeing similar problems here as we get
broader exposure to different library version and environment
combinations.

The correct fix in the past has been to centralize the library
interactions in a "common" component. We will pursue that here by moving
some additional functions (e.g., endpoint creation) into the existing
opal/mca/common/ofi component. We can't do that and thoroughly test it
in time for the v4.0.0 release, so we'll simply remove this component
from the release.

Once we have things correctly fixed, we'll submit a PR to restore the
component plus the related fixes to some future v4.x release.

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-09-20 18:09:15 -07:00
Geoff Paulsen
e3945a75c1
Merge pull request #5735 from amaslenn/rpmbuild-topdir-v4
rpmbuild: fix rpmtopdir redefinition — v4.0.x
2018-09-20 18:17:47 -05:00
Geoff Paulsen
4688da0631
Merge pull request #5736 from hoopoepg/topic/topic/common-del-procs-v4.0
MCA/COMMON/UCX: del_procs calls are unified to common module - v4.0
2018-09-20 18:12:25 -05:00
Geoff Paulsen
1a65b0ab66
Merge pull request #5741 from ggouaillardet/topic/v4.0.x/use_mpi_f08_bindings
v4.0.x: fortran/use-mpi-f08: clean [p]ompi_FOO_f bindings
2018-09-20 18:10:11 -05:00
Geoff Paulsen
8dbfb9b032
Merge pull request #5745 from bwbarrett/v4.0.x-cuda-async
openib: Disable CUDA async by default
2018-09-20 18:09:32 -05:00
Geoffrey Paulsen
85e410302d Updating VERSION to rc2
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2018-09-20 18:03:19 -05:00
Geoff Paulsen
4d9deb33ec
Merge pull request #5688 from rhc54/cmr40/pmix302
v4.0.x: Update to PMIx v3.0.2
2018-09-20 16:26:06 -05:00
Brian Barrett
a4fabb7e26 openib: Disable CUDA async by default
Disable async receive for CUDA under OpenIB.  While a performance
optimization, it also causes incorrect results for transfers
larger than the GPUDirect RDMA limit.  This change has been validated
and approved by Akshay.

References #3972

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit 9344afd485)
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
2018-09-20 14:20:54 -07:00
Geoff Paulsen
6ccd7878a8
Merge pull request #5727 from hppritcha/topic/news_4.0.0_update
NEWS:  add a blurb about cuda buffers OMPIO
2018-09-20 10:13:16 -05:00
Howard Pritchard
4a35a30a6e
Merge pull request #5692 from ggouaillardet/topic/v4.0.x/abort_after_finalize
v4.0.x: orte: send error messages to stderr.
2018-09-20 08:05:31 -06:00
Howard Pritchard
19e0289fff
Merge pull request #5729 from jsquyres/pr/v4.0.x/be-conservative-with-mpi-wait-indexes
v4.0.x: Be conservative with the array_of_indices
2018-09-20 08:01:55 -06:00
Gilles Gouaillardet
d0a0fe818f fortran/use-mpi-f08: use bindings from ompi_mpifh_bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@c4ce01d104)
2018-09-20 10:37:33 +09:00
Gilles Gouaillardet
afb66d222b fortran/use-mpi-f08: fix [p]ompi_FOO_f symbols handling
- do not generate bindings for pompi_FOO_f symbols
   (they are simply not used anywhere)

 - move ompi_FOO_f bindings out of mpi_f08.mod into
   ompi_mpifh_bindings.mod that is only used at build time

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@c6070fd2e0)
2018-09-20 10:37:01 +09:00
Gilles Gouaillardet
03d994c9cf configury: do not define "dummy" empty targets any more.
We previously needed to have empty targets because AM couldn't handle
having an AM_CONDITIONAL was targets in the "if" statement but not in the
"else".  :-(

That now appears as an old automake bug that has been fixed,
so cleanup some Makefile.am

Thanks Jeff for the pointer.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@6e04b2a66a)
2018-09-20 10:36:41 +09:00
Gilles Gouaillardet
98156b7ace use-mpi-f08: fix a typo in [P]MPI_Dist_graph_create_adjacent bindings
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@d2393251f7)
2018-09-20 10:36:01 +09:00
Howard Pritchard
27732bdf33
Merge pull request #5738 from hppritcha/topic/remove_scif_support_4.0.x
SCIF: remove it
2018-09-19 12:43:26 -06:00
Howard Pritchard
730f98d8b0
Merge pull request #5665 from hjelmn/v4.0.x_cache_flush
v4.0.x: patcher/base: improve instruction cache flush for aarch64
2018-09-19 12:12:57 -06:00
Howard Pritchard
01d4d52588 SCIF: remove it
KNC is effectively dead.  Remove corresponding SCIF
support in Open MPI.

cherry pick of PR #5737

+

news update

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit b9ac3d8931)
2018-09-19 11:48:17 -06:00
Sergey Oblomov
3cace87749 MCA/COMMON/UCX: del_procs calls are unified to common module
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 920cc2e0d9)
2018-09-19 10:47:27 +03:00
Andrey Maslennikov
547b7da664 rpmbuild: fix rpmtopdir redefinition
Erasing this variable by default makes outside definition useless.

Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
(cherry picked from commit c7d51a3a83)
2018-09-19 10:43:35 +03:00
Ralph Castain
131ea01320 Update to PMIx v3.0.2
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
2018-09-18 13:03:18 -07:00
George Bosilca
8d892f9917 Be conservative with the array_of_indices
We were assuming that the array_of_indices has the same size as the
number of requests (incount), instead of the numberr of actually
active requests. While the patch is trivial, the question of the
size of the array_of_indices should be clarified in the MPI Forum.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit a5fbfa476a)
2018-09-18 12:05:51 -07:00
Howard Pritchard
3a584fee53
Merge pull request #5723 from ggouaillardet/topic/v4.0.x/libnbc_error_path
coll/libnbc: fix various error paths
2018-09-18 09:29:45 -06:00
Howard Pritchard
bbc448f9cf
Merge pull request #5720 from ggouaillardet/topic/v4.0.x/pcollreq_typos
mpiext/pcollreq: fix misc typos
2018-09-18 09:23:08 -06:00
Howard Pritchard
cf649bffbc NEWS: add a blurb about cuda buffers OMPIO
[skip ci]

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2018-09-18 09:18:57 -06:00
KAWASHIMA Takahiro
e83e118ae7 mpiext/pcollreq: fix more typos
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 4a0a2598f6)
2018-09-18 15:43:06 +09:00
Gilles Gouaillardet
ece18aed45 coll/libnbc: fix various error paths
The parameter passed to NBC_Return_handle() was incorrectly casted
and not dereferenced.

Thanks Yossi for the bug report.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@8b51862fb2)
2018-09-18 15:29:33 +09:00
Gilles Gouaillardet
73f531a8f2 mpiext/pcollreq: fix misc typos
Thanks Jeff for the report

Fixes open-mpi/ompi#5712

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@8dc6985a5a)
2018-09-18 12:47:04 +09:00
Howard Pritchard
06c62c6b30
Merge pull request #5685 from selvintxavier/brcm_hcas_v4.0.x
v4.0.x: Add support for different Broadcom HCAs
2018-09-17 16:29:45 -06:00
Geoff Paulsen
de0c595ca5
Merge pull request #5650 from matcabral/remove_psm2_shadow_env_40x
v4.0.x: MTL PSM2: Remove shadow variables from v4.0.x
2018-09-17 14:40:59 -05:00
Geoff Paulsen
f762b14f0d
Merge pull request #5677 from hoopoepg/topic/missing-ucp-deps-v4.0
v4.0.x: UCX: added missing UCX libs to UCX detection
2018-09-17 14:37:47 -05:00
Howard Pritchard
82ce4eda77
Merge pull request #5704 from hjelmn/v4.0.x_btl_vader
v4.0.x: btl/vader: ensure that the send tag is always written last
2018-09-17 13:26:24 -06:00
Selvin Xavier
9114a9ac95 v4.0.x: Add support for different Broadcom HCAs
Adds device ids of different Broadcom adapters from
BCM57XXX and BCM58XXX family of HCAs.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
(cherry-picked from a53a6f7650)
2018-09-16 22:54:07 -07:00
Nathan Hjelm
83668f4b47 btl/vader: ensure that the send tag is always written last
To ensure fast box entries are complete when processed by the
receiving process the tag must be written last. This includes a zero
header for the next fast box entry (in some cases). This commit fixes
two instances where the tag was written too early. In one case, on
32-bit systems it is possible for the tag part of the header to be
written before the size. The second instance is an ordering issue. The
zero header was being written after the fastbox header.

Fixes #5375, #5638

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
(cherry picked from commit 850fbff441)
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-14 12:34:26 -06:00
Gilles Gouaillardet
229ec82cf0 orte: send error messages to stderr.
When a job terminates normally but with a non zero exit code,
display the error message to stderr.

Thanks Emre Brookes for the bug report.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@893270caee)
2018-09-13 10:39:57 +09:00
Sergey Oblomov
265ce340a1 UCX: added missing UCX libs to UCX detection
- added libs to non-default UCX location branch

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e735593bb1)
2018-09-12 19:50:34 +03:00
Sergey Oblomov
6f8df4e0fd UCX: added missing UCX libs to UCX detection
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit c982645a46)
2018-09-12 19:50:34 +03:00