1
1

6782 Коммитов

Автор SHA1 Сообщение Дата
Howard Pritchard
0b915b7e56
Merge pull request #6333 from jsquyres/pr/v4.0.x/hwloc-macro-conflict-fixes
v4.0.x: Various minor hwloc cleanups
2019-02-12 09:13:19 -07:00
Howard Pritchard
5dd63405ce
Merge pull request #6368 from jsquyres/pr/v4.0.x/fix-ofi-configury
v4.0.x: fix OFI configury
2019-02-11 13:15:52 -07:00
Jeff Squyres
9ad871fc38 ofi: revamp OPAL_CHECK_OFI configury
Update the OPAL_CHECK_OFI configury macro:

- Make it safe to call the macro multiple times:
  - The checks only execute the first time it is invoked
  - Subsequent invocations, it just emits a friendly "checking..."
    message so that configure output is sensible/logical
- With the goal of ultimately removing opal/mca/common/ofi, rename the
  output variables from OPAL_CHECK_OFI to be
  opal_ofi_{happy|CPPFLAGS|LDFLAGS|LIBS}.
- Update btl/usnic and mtl/ofi for these new conventions.
- Also, don't use AC_REQUIRE to invoke OPAL_CHECK_OFI because that
  causes the macro to be invoked at a fairly random time, which makes
  configure stdout confusing / hard to grok.
- Remove a little left-over kruft in OPAL_CHECK_OFI, too (which
  resulted in an indenting change, making the change to
  opal_check_ofi.m4 look larger than it really is).

Thanks Alastair McKinstry for the report and initial fix.
Thanks Rashika Kheria for the reminder.

Updated from master cherry pick: the OFI BTL does not exist on the
v4.0.x branch.  Therefore, did not include the OFI BTL changes on
master in this cherry pick.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit f5e1a672ccd5db127e85e1e8f6bcfeb8a8b04527)
2019-02-07 06:36:35 -08:00
René Widera
e30e5b95c6 common/ompio: possible rounding issue
Similar to #6286 rounding number of bytes into a single precision floating point value to round up the result of a division is a potential risk due to rounding errors.

- remove floating point operations for `round up`
- removes floating point conversion for round down (native behavior of integer division)

Signed-off-by: René Widera <r.widera@hzdr.de>
(cherry picked from commit a91fab80a1e55e1df15f649e18d247e5d4654eb9)
2019-01-30 12:31:39 -06:00
Edgar Gabriel
d1e8779fe3 common/ompio: fix a floating point division problem
This commit fixes  a problem reported on the mailing list with
individual writes larger than 512 MB.

The culprit is a floating point division of two large, close values.
Changing the datatypes from float to double (which is what is being
used in the fcoll components) fixes the problem.

See issue #6285 and

 https://forum.hdfgroup.org/t/cannot-write-more-than-512-mb-in-1d/5118

Thanks for Axel Huebl and René Widera for reporting the issue.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
(cherry picked from commit c0f8ce0fff4684b670135043dd150abc9d83d988)
2019-01-30 12:31:16 -06:00
Gilles Gouaillardet
a247292275 topo/treematch: silence a hwloc related warning
treematch/km_partitioning.c #include "config.h",
but there is no such file when the embedded treematch is used.

In order to prevent the embedded treematch from incorrectly using
the config.h from the embedded hwloc, generate a dummy config.h.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit 0aeb27f77650d3ee97e17e770c9e5aa487d5e1f5)
2019-01-30 07:33:33 -05:00
Jeff Squyres
1a1a932acc romio321: ensure to distribute ompi_grequestx.h
Refs https://github.com/open-mpi/ompi/issues/6227.  Thanks to
George Marselis for reporting.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 62321be186dd7d3efcedc2e801f226f6660ea0c4)
2018-12-28 13:18:10 -08:00
Matias A Cabral
b2327049c1 MTL/PSM2: add missing default priority
Missing default priority after PR #6153

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
(cherry picked from commit c76c6d8b2801ca43ba33168a0b92522786c7c5bb)
2018-12-07 16:22:59 -08:00
Matias A Cabral
80113a368f MTL/PSM2: Do not lower the priority when all processes are local.
The intention of lowering the priority when all processes are local
was to favor Vader BTL. However, in builds including the OFI MTL it
gets selected instead.

Reviewed-by: Spruit, Neil R <neil.r.spruit@intel.com>
Reviewed-by: Gopalakrishnan, Aravind <aravind.gopalakrishnan@intel.com>
Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
(cherry picked from commit fc8582c5606b7a3d1b711f8f7b6144808290a48f)
2018-12-07 11:11:43 -08:00
Geoff Paulsen
752bbd195f
Merge pull request #6102 from hoopoepg/topic/set-osc-ucx-level-200-v4.0
OSC: set UCX module used by default - v4.0
2018-12-04 10:26:37 -06:00
Geoff Paulsen
bd2990f502
Merge pull request #6131 from devreal/rdma-plug-memleak-v4.0.x
v4.0.x: Plug two memory leaks in rdma osc
2018-11-30 13:54:51 -06:00
Joseph Schuchart
c5346751e6 Plug two memory leaks in rdma osc
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 91885f5876129aa4fb43ed4b3404c9d1ca7e08b8)
2018-11-29 10:19:26 -05:00
Sergey Oblomov
6651672711 OSC/UCX: set max level value to 60
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 2d230b3aacce0185f0d46e69f608071b670eeb3c)
2018-11-27 20:35:30 +02:00
Yossi Itigin
a112d10c93 pml_ucx: initialize req_mpi_object.comm for error handler
without this fix, an error handler invoked on pml_ucx request would
segfault while trying to dereference requests[i]->req_mpi_object.comm

(picked from master f36eeef)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-11-26 11:57:34 +02:00
Sergey Oblomov
38a4953707 OSC/UCX: added UCX version evaluation
- added UCX version evaluation to set OSC UCX priority

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit e91f214982391b8e1b26be39147c357d32b8380e)
2018-11-22 11:31:53 +02:00
Sergey Oblomov
012e27af77 OSC: set UCX module used by default
- OSC/UCX module set priority to 200 to be used by default

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 36934a8bb2484c3d27d14683d65012ff422334f4)
2018-11-22 10:59:43 +02:00
Howard Pritchard
8adaeb1536
Merge pull request #6007 from aravindksg/coll-tuned-fix-40x
coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
2018-11-19 13:15:40 -07:00
Howard Pritchard
ec79631ba2
Merge pull request #5936 from edgargabriel/pr/testmpio-v4.0.x
Pr/testmpio v4.0.x
2018-11-19 13:11:50 -07:00
Howard Pritchard
e851879081
Merge pull request #5994 from rhc54/cmr40/cleanup
Remove stale defunct tools
2018-10-31 13:29:57 -06:00
Aravind Gopalakrishnan
5a74ddb34d coll/tuned: Fix MPI_IN_PLACE processing in tuned algorithms
PR #5450 addresses MPI_IN_PLACE processing for basic collective algorithms.
But in conjunction with that, we need to check for MPI_IN_PLACE in tuned paths
as well before calling ompi_datatype_type_size() as otherwise we segfault.

MPI spec also stipulates to ignore sendcount and sendtype for Alltoall and
Allgatherv operations. So, extending the check to these algorithms as well.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
(cherry picked from commit 88d781056f43934a93e16db556b340e72cdd3742)
2018-10-31 11:37:29 -07:00
Ralph Castain
ba6ad9fe42 Remove stale defunct tools
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 05ac8fa71c0833eeeaa878b72a31503d361e145e)
2018-10-30 08:51:25 -07:00
Sergey Oblomov
0846c9d112 COMMON/UCX: added error code to log output
Also fixes a PGI compilation error with --enable-debug.

Signed-off-by: Geoff Paulsen <gpaulsen@users.noreply.github.com>
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 1099d5f02327329e0c58d9403e3e0a7f1e1d1920)
2018-10-30 09:55:25 -05:00
Ralph Castain
712ddd326f Remove the stale orte-dvm code
Users should migrate to https://github.com/pmix/prrte

Signed-off-by: Ralph Castain <rhc@open-mpi.org>
(cherry picked from commit 1bd772e8ebf66f705537b9a6e1af2b6093ef8471)
2018-10-30 07:54:35 -07:00
Howard Pritchard
f9d2f3b912
Merge pull request #5941 from hppritcha/topic/remove_bfo_pml_v4.0.x
v4.0.x: remove the bfo pml
2018-10-22 09:50:05 -06:00
Howard Pritchard
a806d09450 remove the bfo pml
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 7d6774acf89558c05f415c96c00502429e26e502)
2018-10-17 14:00:11 -06:00
Edgar Gabriel
278ecf2205 io/ompio: add verification for data representations.
check for providing a data representation that is actually supported
by ompio.

Add also one check for a non-NULL pointer in mpi/c/file_set_view
for the data representation.

Also fixes parts of issue #5643

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:48 -05:00
Edgar Gabriel
a07c9e96b1 io/ompio: execute barrier before sync
this ensures that all processes are done modifying a file
before syncing. Fixes an error in the testmpio testsuite.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:35 -05:00
Edgar Gabriel
96c1a5b9dc common/ompio: check datatypes when setting file view
return MPI_ERR_ARG if the size of the fileview is not a
multiple of the size of the etype provided.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:19 -05:00
Edgar Gabriel
425a71799e common/ompio: return correct error code for improper access
return MPI_ERR_ACCESS if the user tries to read from  a file
that was opened using MPI_MODE_WRONLY

return MPI_ERR_READ_ONLY if the user tries to write a file
that was opened using MPI_MODE_RDONLY

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:22:04 -05:00
Edgar Gabriel
c65dda6f5f io/ompio: fix seek position calculation for SEEK_CUR
This commit fixes the calculation of the position where to
seek to, in case SEEK_CUR is used.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2018-10-17 11:21:47 -05:00
Howard Pritchard
7ceb508b93
Merge pull request #5889 from yosefe/topic/pml-ucx-fix-datatype-leak-v4.0.x
pml_ucx: add ompi datatype attribute to release ucp_datatype - v4.0.x
2018-10-16 16:29:01 -06:00
Howard Pritchard
e2cf1e3ec5
Merge pull request #5887 from yosefe/topic/osc-ucx-fix-finalize-hang-v4.0.x
osc_ucx: fix hang/timeout in component finalize - v4.0
2018-10-16 09:21:50 -06:00
Yossi Itigin
eabc94cab0 osc_ucx: add worker flush before osc module free
Make sure all pending communications are done on all ranks before
closing the window. This way it will be safe to close the endpoints when
closing the component.

(picked from master b8e1af6)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 23:02:19 +03:00
Yossi Itigin
4a97d6b9fa pml_ucx: fix return code from mca_pml_ucx_init()
(picked from master 40ac9e4)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:23:49 +03:00
Yossi Itigin
1bffd196ef pml_ucx: add ompi datatype attribute to release ucp_datatype
(picked from master 4763822)

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2018-10-10 20:23:26 +03:00
Sergey Oblomov
274cbc3c03 OSC/UCX: fixed zero-size window processing
- added processing of zero-size MPI window

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit ae6f81983fe354de812ebe2532120fb20ae24d3b)
2018-10-10 16:49:02 +03:00
Brian Barrett
10d0a430c4 mtl ofi: Change from opt-in to opt-out provider selection
Change default provider selection logic for the OFI MTL.  The
old logic was whitelist-only, so any new HPC NIC provider would
have to ask users to do extra work or wait for an OMPI release
to be whitelisted.  The reason for the logic was to avoid
selecting a "generic" provider like sockets or shm that would
frequently have worse performance than the optimized BTL options
Open MPI supports.

With the change, we blacklist the (small, relatively static) list
of providers that duplicate internal capabilities.  Users can use
one of thse blacklisted providers in two ways: first, they can
explicitly request the provider in the include list (which will
override the default exclude list) and second, the can set a new
empty exclude list.

Since most HPC networks require special libraries and therefore
an explicit build of libfabric, it is highly unlikely that this
change will cause users to use libfabric when they didn't want to
do so.  It does, however, solve the whitelisting problem.

Signed-off-by: Brian Barrett <bbarrett@amazon.com>
(cherry picked from commit c5eaa38491c7197f7dbc74c299ade18e09bf5f64)
2018-09-27 18:41:47 +00:00
Geoff Paulsen
3d4164e1e1
Merge pull request #5752 from gpaulsen/misc-warnings-fixes
Miscellaneous compiler warning stomps.
2018-09-22 15:01:53 -05:00
Geoff Paulsen
bc798b6135
Merge pull request #5755 from gpaulsen/osc_rdma_cleanup
osc/rdma: clean out stale aggregation code
2018-09-22 15:00:21 -05:00
Nathan Hjelm
72fc8acb50 osc/rdma: quiet warning
gcc complains about ret possibly being used uninitialized. That will
never happen but we should still quiet the warning. This commit sets
ret to a valid value.

Fixes #5513

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:44:56 -05:00
Nathan Hjelm
56e31f8206 osc/rdma: clean out stale aggregation code
The aggregation code in osc/rdma is currently broken and will likely
not be reused. This commit cleans it out.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2018-09-21 14:42:45 -05:00
Jeff Squyres
2e37f97a38 Miscellaneous compiler warning stomps.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit fe0852bcb4d14a6aaf5a3e1021f60b5be32dd42d)
2018-09-21 14:35:51 -05:00
Sergey Oblomov
3cace87749 MCA/COMMON/UCX: del_procs calls are unified to common module
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 920cc2e0d9994dfd49062822c89cb502274eb464)
2018-09-19 10:47:27 +03:00
Gilles Gouaillardet
ece18aed45 coll/libnbc: fix various error paths
The parameter passed to NBC_Return_handle() was incorrectly casted
and not dereferenced.

Thanks Yossi for the bug report.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@8b51862fb2)
2018-09-18 15:29:33 +09:00
Geoff Paulsen
de0c595ca5
Merge pull request #5650 from matcabral/remove_psm2_shadow_env_40x
v4.0.x: MTL PSM2: Remove shadow variables from v4.0.x
2018-09-17 14:40:59 -05:00
Gilles Gouaillardet
ff8600f2e4 ompi/hook: plug a misc memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@b79b37465c)
2018-09-10 09:21:49 +09:00
Gilles Gouaillardet
4bd5c538a2 pml/ob1: plug a memory leak in mca_pml_ob1_component_fini()
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(back-ported from commit open-mpi/ompi@fed33c1530)
2018-09-10 09:21:12 +09:00
Gilles Gouaillardet
080e20fa02 mtl/psm2: fix a misc memory leak
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@316e4e38f4)
2018-09-10 09:17:54 +09:00
matcabral
8fa172e60b MTL PSM2: Remove shadow variables from v4.0.x
As agreed on #4574, where removed in past release branches
to avoid perfomance impacts in the default values for
some paramters.

Signed-off-by: Matias Cabral <matias.a.cabral@intel.com>
2018-09-05 18:44:40 -04:00
Howard Pritchard
7e10bc0833
Merge pull request #5607 from edgargabriel/pr/sharedfp-naming-conflict-v4.0
sharedfp/sm and lockedfile: fix naming bug
2018-09-02 16:03:14 -04:00