1
1

29595 Коммитов

Автор SHA1 Сообщение Дата
Andrey Maslennikov
226dfc4ef0 platform/mellanox: disable missing libcuda warning
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
(cherry picked from commit 63ba7bec46e6a08f9948c82cba602d3b8d50fada)
2019-09-23 11:18:55 +03:00
Geoff Paulsen
2f101326fc
Merge pull request #6997 from jsquyres/pr/v4.0.x/vader-do-not-use-cma
v4.0.x: Do not use CMA in user namespaces
2019-09-22 07:39:28 -05:00
Adrian Reber
674655c641 Do not use CMA in user namespaces
Trying out to run processes via mpirun in Podman containers has shown
that the CMA btl_vader_single_copy_mechanism does not work when user
namespaces are involved.

Creating containers with Podman requires at least user namespaces to be
able to do unprivileged mounts in a container

Even if running the container with user namespace user ID mappings which
result in the same user ID on the inside and outside of all involved
containers, the check in the kernel to allow ptrace (and thus
process_vm_{read,write}v()), fails if the same IDs are not in the same
user namespace.

One workaround is to specify '--mca btl_vader_single_copy_mechanism none'
and this commit adds code to automatically skip CMA if user namespaces
are detected and fall back to MCA_BTL_VADER_EMUL.

Signed-off-by: Adrian Reber <areber@redhat.com>
(cherry picked from commit fc68d8a90fe86284e9dc730f878b55c0412f01d2)
2019-09-20 19:12:48 -07:00
Geoff Paulsen
71d97f0355
Merge pull request #6994 from gpaulsen/gpaulsen_v4.0.2rc3
Updating VERSION v4.0.2rc3
2019-09-20 13:52:04 -05:00
Howard Pritchard
83df06275d
Merge pull request #6996 from jsquyres/pr/v4.0.x/enable-timings-compile-fix
v4.0.x: ess/pmi: Fix `--enable-timing` compilation error
2019-09-20 12:42:52 -06:00
KAWASHIMA Takahiro
e5be033c14 ess/pmi: Fix --enable-timing compilation error
This commit fixes an compilation error when configured
with `--enable-timing`.

Procedures in the function `orte_ess_base_app_setup`
in `orte/mca/ess/base/ess_base_std_app.c` are moved
to `orte/mca/ess/pmi/ess_pmi_module.c`
and `orte/mca/ess/singleton/ess_singleton_module.c`
in the recent commit 57f6b94fa5.

In `ess_pmi_module.c`, the first argument of the
`OPAL_TIMING_ENV_NEXT` macro should have been adapted
to the destination function but was not.

In `ess_singleton_module.c`, `OPAL_TIMING_ENV_INIT`
was not used in the destination function originally.
So `OPAL_TIMING_ENV_NEXT` cannot be used in the function.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
(cherry picked from commit 8e7d874e14a5485dceff836419e36b6b24a66f48)
2019-09-19 18:14:06 -04:00
Howard Pritchard
265a47bdf8
Merge pull request #6990 from awlauria/fix_mpir_standard_v4.0.x
v4.0.x: Conform MPIR_Breakpoint to MPIR standard.
2019-09-19 15:41:45 -06:00
Geoffrey Paulsen
0bb0e59345 Updating VERSION to v4.0.2rc3.
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-09-19 14:22:57 -05:00
Austen Lauria
1430df3c0f Add 'orte_' prefix to noop_mpir_breakpoint_ptr.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit 77144689f062f38d2edc9086e3fbb99c3d855f9a)
2019-09-19 08:47:17 -04:00
Austen Lauria
3eb7b27d3a Conform MPIR_Breakpoint to MPIR standard.
- Fix MPIR_Breakpoint standard violation by returning void
  instead of a void*.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
(cherry picked from commit 067adfa417f95396c713f6e6597619fac94f0048)
2019-09-18 09:53:15 -04:00
Geoff Paulsen
90b55db052
Merge pull request #6986 from hppritcha/topic/pr6961_to_4.0.x
btl/vader: when using single-copy emulation fragment large rdma
2019-09-18 07:32:04 -05:00
Nathan Hjelm
5a945f668c btl/vader: when using single-copy emulation fragment large rdma
This commit changes how the single-copy emulation in the vader btl
operates. Before this change the BTL set its put and get limits
based on the max send size. After this change the limits are unset
and the put or get operation is fragmented internally.

References #6568

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit ae91b11de2314ab11a9842d9738cd14f8f1e393b)
2019-09-17 20:01:37 -06:00
Geoff Paulsen
84e4af5175
Merge pull request #6969 from gpaulsen/topic/v4.0.x_VERSION_rc2
Reving VERSION to v4.0.2rc2
2019-09-10 09:58:52 -05:00
Geoffrey Paulsen
49a2558eff Reving VERSION to v4.0.2rc2
Reving VERSION to v4.0.2rc2

Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-09-09 14:48:52 -04:00
Geoff Paulsen
a482edc14e
Merge pull request #6944 from jjhursey/v4/fix-tree-launch
Fix tree spawn routed component issue
2019-09-09 13:10:36 -05:00
Geoff Paulsen
ce228d291f
Merge pull request #6952 from jsquyres/pr/v4.0.x/ddt-opt-and-fix
v4.0.x: Datatype optimization and fix
2019-09-09 13:09:49 -05:00
Geoff Paulsen
287ee150d1
Merge pull request #6967 from rhc54/cmr40x/oob
v4.0.x: Be a little less restrictive on interface requirements
2019-09-09 13:08:58 -05:00
Ralph Castain
95cc53e331
Be a little less restrictive on interface requirements
If both types of interfaces are enabled, don't error out if one of them
isn't able to open listener sockets. Only one interface family may be
available on some machines, but someone might want to build the code to
run more generally.

Refs https://github.com/pmix/prrte/pull/249

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 06d188ebf3646760f50d4513361b50642af9cec4)
2019-09-09 07:52:03 -07:00
George Bosilca
8f16780ee0 Add a test for datatypes composed by multiple predefined
elements that can be merged into a larger UINT1 type.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 82d632278ae5ab4337984d5ef4793f818c4dd437)
2019-09-03 15:09:34 -04:00
George Bosilca
e2b154327e Small optimization on the datatype commit.
This patch fixes the merge of contiguous elements into larger but more
compact datatypes, and allows for contiguous elements to have thir
blocklen increasing instead of the count. The idea is to always maximize
the blocklen, aka. the contiguous part of the datatype.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 41e6f55807b01ad5c04e8387a3699cf743931f6a)
2019-09-03 15:09:33 -04:00
Howard Pritchard
6912e09d7a
Merge pull request #6942 from guserav/v4-fix-osc-sm-post-32-bit-atomics
v4.0.x: Fix osc sm posts when only 32 bit atomics support v4.0.x
2019-09-03 09:20:38 -06:00
guserav
9bf1873215 Fix osc sm posts when only 32 bit atomics support
Signed-off-by: guserav <erik.zeiske@hpe.com>
(cherry picked from commit 3c9f4e682369e6fd5860b46ba81d79f2d1599a35)
2019-08-31 12:31:19 -07:00
Geoff Paulsen
893ea3f91f
Merge pull request #6929 from rhc54/cmr40/pmix314
Remove unnecessary error log
2019-08-30 14:10:36 -05:00
Howard Pritchard
c6fe859c28
Merge pull request #6946 from hkuno/intercept_mmap_fix
v4.0.x: Fix mmap infinite recurse in memory patcher
2019-08-30 10:56:43 -06:00
Howard Pritchard
989461f305
Merge pull request #6915 from sam6258/smiller_regx_none
v4.0.x: regx/naive: add regx/naive component
2019-08-30 07:51:10 -06:00
Harumi Kuno
fbbacc1303 Fix mmap infinite recurse in memory patcher
This commit fixes issue #6853 by removing
MacOS/Darwin-specific logic from intercept_mmap.

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
2019-08-29 18:03:16 -07:00
Joshua Hursey
4c1160e257 Fix tree spawn routed component issue
* Fix #6618
   - See comments on Issue #6618 for finer details.
 * The `plm/rsh` component uses the highest priority `routed` component
   to construct the launch tree. The remote orted's will activate all
   available `routed` components when updating routes. This allows the
   opportunity for the parent vpid on the remote `orted` to not match
   that which was expected in the tree launch. The result is that the
   remote orted tries to contact their parent with the wrong contact
   information and orted wireup will fail.
 * This fix forces the orteds to use the same `routed` component as
   the HNP used when contructing the tree, if tree launch is enabled.

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2019-08-29 16:26:43 -04:00
Geoff Paulsen
2d515f747f
Merge pull request #6934 from devreal/osc-ucx-excl-lock-v4.0.x
UCX osc: properly release exclusive lock to avoid lockup (v4.0.x)
2019-08-29 13:41:03 -05:00
Geoff Paulsen
78b8b0126c
Merge pull request #6938 from jsquyres/pr/v4.0.x/fix-ddt-variable-names-in-make-check
v4.0.x: Update OPAL DDT variable names
2019-08-28 13:57:57 -05:00
Joseph Schuchart
8d130e1964 UCX osc: properly release exclusive lock to avoid lockup
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit 08cb6389e034c1a70368671f745f20904c774a1e)
2019-08-27 23:12:56 +02:00
Howard Pritchard
061574f938
Merge pull request #6935 from vspetrov/v4.0.x_coll_hcoll_nbc_request_bugfix
V4.0.x Coll/hcoll: fixes hcoll non-blocking colls support
2019-08-27 14:40:36 -06:00
Jeff Squyres
8b3fd5682f Update OPAL DDT variable names
These variables were renamed in
904276bb44caec207638247f23139bc21bc6a09e; update them to use the new
names.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 2ab8109be15a7739caa72ec8f863e8e01c2c9a0f)
2019-08-27 13:08:16 -07:00
Valentin Petrov
83a2518994 Coll/hcoll: fixes hcoll non-blocking colls support
open-mpi/ompi@0fe756d416 Introduced
    a bug in coll/hcoll component. The ompi_requests allocated by
    libhcoll would be treated as coll_base_nbc_request during
    ompi_coll_base_retain_<> call. Afterwards this would lead to a
    segv in the request cleanup.

    Fix: since libhcoll interface does not distinguish between the
    blocling/non-blocking requests use coll_base_nbc_request all the
    time and initialize it properly in
    coll/hcoll/get_coll_handle(). It is still within 2 cache lines.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-08-27 17:23:52 +03:00
Ralph Castain
8efc6e1dc1
Remove unnecessary error log
Refs https://github.com/pmix/pmix/pull/1413

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-08-26 23:48:34 -07:00
Geoff Paulsen
83f6c57df6
Merge pull request #6926 from gpaulsen/v4.0.2_NEWS
Updating NEWS for v4.0.2
2019-08-26 15:46:15 -05:00
Geoff Paulsen
57448113a5
Merge pull request #6925 from gpaulsen/v4.0.x_VERSION_rc1
Updating VERSION for v4.0.2rc1
2019-08-26 15:20:57 -05:00
Geoffrey Paulsen
197607c896 Updating NEWS for v4.0.2
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-08-26 10:45:11 -05:00
Scott Miller
1b0cfdf264 v4.0.x: regx/naive: add regx/naive component
Signed-off-by: Scott Miller <scott.miller1@ibm.com>
2019-08-26 11:37:07 -04:00
Geoff Paulsen
be67734fdf
Merge pull request #6922 from hoopoepg/topic/fixed-hand-on-shmem-finalize-v4.0
SPML/UCX: fixed hang in SHMEM_FINALIZE - v4.0
2019-08-26 10:26:45 -05:00
Geoffrey Paulsen
b07d58a0fe Updating VERSION for v4.0.2rc1
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-08-24 14:15:55 -04:00
Howard Pritchard
e4adbeefe7
Merge pull request #6905 from edgargabriel/pr/file-seek-end-fix-v4.0.x
io_ompio_file_open: fix offset calculation with SEEK_END
2019-08-23 13:11:33 -06:00
Sergey Oblomov
1f9fce8955 SPML/UCX: fixed comment
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 01dacaa6a42b35c1b7538d8ff0036bded913c847)
2019-08-22 11:42:03 +03:00
Sergey Oblomov
66e18563bf SPML/UCX: fixed hang in SHMEM_FINALIZE
- used MPI _Barrier to synchronize processes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 182023febb6f8f31ce34dc54c8aa409ad7e44fa2)
2019-08-22 11:41:52 +03:00
Geoff Paulsen
390e0bc5b2
Merge pull request #6863 from bosilca/topic/backport_6695
Refresh of the datatype engine from Topic/backport 6695
2019-08-21 10:49:37 -05:00
Howard Pritchard
d3587f5214
Merge pull request #6911 from jsquyres/pr/v4.0/mpirun-as-root-as-containers-env-var-fix
v4.0.x: mpirun as root as containers env var fix
2019-08-20 09:09:26 -06:00
Jeff Squyres
549abeaa87 orterun: remove duplicate code
https://github.com/open-mpi/ompi/pull/6895 fixed the code in orterun.c
to allow running as root if both OMPI_ALLOW_RUN_AS_ROOT and
OMPI_ALLOW_RUN_AS_ROOT_CONFIRM env vars are set.  However, this
env-var-checking code already exists in
orte_submit.c:orte_submit_init() -- it looks like the
geteuid()/getenv()-checking code here in orterun is now duplicate
code.

So let's just get rid of the duplicate code.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 197beb30d555922b084ac3b89bb97321bf157e88)
2019-08-19 15:49:57 -04:00
Simon Byrne
f49c22af6d Run-as-root env vars in orterun.c
I found that I needed to apply the same change as #5597 to orterun.c for the environment variables to work correctly.

Signed-off-by: Simon Byrne <simonbyrne@gmail.com>
(cherry picked from commit 9c8671c48b946f4387cddb6a66aaab572fa983dd)
2019-08-19 15:34:20 -04:00
Howard Pritchard
f96994b12f
Merge pull request #6865 from rhc54/cmr40/locality
Provide locality for all procs on node
2019-08-19 13:26:59 -06:00
Howard Pritchard
7b09c15b90
Merge pull request #6892 from janjust/v4.0.x-osc_fix
v4.0.x: osc/ucx: Fix possible win creation/destruction race condition
2019-08-19 13:26:32 -06:00
Howard Pritchard
fd13b27423
Merge pull request #6889 from ggouaillardet/topic/v4.0.x/nbc_fixes
coll/base: only retain datatypes/op if the request has not yet completed
2019-08-19 12:40:16 -06:00