1
1
Граф коммитов

30108 Коммитов

Автор SHA1 Сообщение Дата
Adrian Reber
fc68d8a90f
Do not use CMA in user namespaces
Trying out to run processes via mpirun in Podman containers has shown
that the CMA btl_vader_single_copy_mechanism does not work when user
namespaces are involved.

Creating containers with Podman requires at least user namespaces to be
able to do unprivileged mounts in a container

Even if running the container with user namespace user ID mappings which
result in the same user ID on the inside and outside of all involved
containers, the check in the kernel to allow ptrace (and thus
process_vm_{read,write}v()), fails if the same IDs are not in the same
user namespace.

One workaround is to specify '--mca btl_vader_single_copy_mechanism none'
and this commit adds code to automatically skip CMA if user namespaces
are detected and fall back to MCA_BTL_VADER_EMUL.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-09-05 20:15:19 +02:00
Gilles Gouaillardet
50db085272
Merge pull request #6957 from rhc54/topic/buf
Ensure buffer_unload leaves the buffer in a clean state
2019-09-05 09:40:35 +09:00
Raafat Feki
7877743784
Merge pull request #6857 from raafatfeki/pr/ompio_coll_write_clean
Pr/ompio_fcoll_write_clean
2019-09-04 11:06:56 -05:00
Ralph Castain
373e816b37
Ensure buffer_unload leaves the buffer in a clean state
Silence a warning in orte/nidmap

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-09-04 08:32:27 -07:00
Ralph Castain
73c5fe8f05
Merge pull request #6953 from rhc54/topic/pmix4
Update to current PMIx 4.0.0 (master)
2019-09-03 19:47:58 -07:00
Ralph Castain
9a2902c047
Force use of pmix/preg/native component
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-09-03 17:54:44 -07:00
Ralph Castain
2ebc1fa1b7
Update to current PMIx 4.0.0 (master)
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-09-03 14:46:55 -07:00
Mark Allen
14e3d7b8b0 romio info: letting romio keep its internal setup
I'm restoring the info function pointers to the IO module
but allowing the function pointers to be NULL (eg in ompio).
And letting romio321 set its function pointers for those
routines.

This means the info system uses the new OMPI-level info
system for most things, but skips it and uses the pre-existing
romio info system just for the romio module.

It's possible to convert romio, but I went a ways down that
path and found it kind of convoluted.  Having pointers from
the lower level ADIO_File back to the higher level ompi_file_t
wasn't too bad, but I got stuck trying to figure out where/how
to register the infosubscribe_subscribe callbacks vs the way
initial k/v values are scattered around the romio code currently.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2019-09-03 14:08:19 -04:00
Jeff Squyres
f1a065f4dc
Merge pull request #6945 from bosilca/topic/ddt_optimize
Small optimization on the datatype commit.
2019-09-03 12:46:11 -04:00
George Bosilca
82d632278a
Add a test for datatypes composed by multiple predefined
elements that can be merged into a larger UINT1 type.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-30 19:56:48 -04:00
George Bosilca
41e6f55807
Small optimization on the datatype commit.
This patch fixes the merge of contiguous elements into larger but more
compact datatypes, and allows for contiguous elements to have thir
blocklen increasing instead of the count. The idea is to always maximize
the blocklen, aka. the contiguous part of the datatype.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-30 19:56:48 -04:00
Nathan Hjelm
c4d0752036
Merge pull request #6803 from guserav/fix-osc-sm-post-32-bit-atomics
Fix osc sm posts when only 32 bit atomics support
2019-08-27 18:23:48 -07:00
Jeff Squyres
b846028622
Merge pull request #6937 from jsquyres/pr/remove-non-existent-symbols-in-tests
Update OPAL DDT variable names
2019-08-27 15:58:04 -04:00
Jeff Squyres
2ab8109be1 Update OPAL DDT variable names
These variables were renamed in
904276bb44caec207638247f23139bc21bc6a09e; update them to use the new
names.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-08-27 12:00:20 -07:00
Joshua Ladd
8b6f2d95ba
Merge pull request #6930 from vspetrov/coll_hcoll_nbc_request_bugfix
Coll/hcoll: fixes hcoll non-blocking colls support
2019-08-27 12:12:01 -04:00
Valentin Petrov
a0d99ad190 Coll/hcoll: fixes hcoll non-blocking colls support
open-mpi/ompi@0fe756d416 Introduced
    a bug in coll/hcoll component. The ompi_requests allocated by
    libhcoll would be treated as coll_base_nbc_request during
    ompi_coll_base_retain_<> call. Afterwards this would lead to a
    segv in the request cleanup.

    Fix: since libhcoll interface does not distinguish between the
    blocling/non-blocking requests use coll_base_nbc_request all the
    time and initialize it properly in
    coll/hcoll/get_coll_handle(). It is still within 2 cache lines.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-08-27 17:22:58 +03:00
Howard Pritchard
71e1fad4a9
Merge pull request #6855 from hkuno/hkuno/mmap_loop
Fix mmap infinite recurse in memory patcher
2019-08-26 14:28:53 -06:00
raafatfeki
2c6a5eed29 fcoll/dynamic_gen2: Adjustment of displacement index in collective write
Within the shuffle iteration, the aggregators have to set a displacement array needed to receive data from other processes. The array had 1 extra element. We adjust the displacement index to match the number of elements.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2019-08-26 10:03:23 -05:00
raafatfeki
f45e9cfdbe fcoll/vulcan: Adjustment of displacement index in collective write
Within the shuffle iteration, the aggregators have to set a displacement array needed to receive data from other processes. The array had 1 extra element. We adjust the displacement index to match the number of elements.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2019-08-26 10:03:23 -05:00
Howard Pritchard
5a3646fd1d
Merge pull request #6884 from guserav/reintroduce-common-ofi
Reintroduce common ofi
2019-08-23 13:12:25 -06:00
Yossi Itigin
8933171412
Merge pull request #6918 from hoopoepg/topic/fixed-hand-on-shmem-finalize
SPML/UCX: fixed hang in SHMEM_FINALIZE
2019-08-22 11:37:45 +03:00
Sergey Oblomov
01dacaa6a4 SPML/UCX: fixed comment
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-08-21 12:08:58 +03:00
Sergey Oblomov
182023febb SPML/UCX: fixed hang in SHMEM_FINALIZE
- used MPI _Barrier to synchronize processes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-08-21 12:04:46 +03:00
guserav
56c3d9a238 common/ofi: Set HPE as owner of component
Signed-off-by: guserav <erik.zeiske@web.de>
2019-08-20 10:13:02 -07:00
Ralph Castain
69bd9453e9
Merge pull request #6910 from jsquyres/pr/orterun-remove-duplicate-code
orterun: remove duplicate code
2019-08-19 13:58:34 -07:00
Jeff Squyres
197beb30d5 orterun: remove duplicate code
https://github.com/open-mpi/ompi/pull/6895 fixed the code in orterun.c
to allow running as root if both OMPI_ALLOW_RUN_AS_ROOT and
OMPI_ALLOW_RUN_AS_ROOT_CONFIRM env vars are set.  However, this
env-var-checking code already exists in
orte_submit.c:orte_submit_init() -- it looks like the
geteuid()/getenv()-checking code here in orterun is now duplicate
code.

So let's just get rid of the duplicate code.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-08-19 15:36:59 -04:00
Ralph Castain
4e06be1588
Merge pull request #6895 from simonbyrne/sb/root
Run-as-root env vars in orterun.c
2019-08-19 12:28:50 -07:00
Gilles Gouaillardet
1e15c0b573
Merge pull request #6906 from ggouaillardet/topic/oshmem_wrappers
oshmem: fix make uninstall
2019-08-19 11:27:23 +09:00
Gilles Gouaillardet
3f081e0750 oshmem: fix make uninstall
This fixes a typo introduced in open-mpi/ompi@e0e924c4ed

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-08-19 10:21:53 +09:00
bosilca
190d5f1339
Merge pull request #6898 from bosilca/fix/convertor_raw
Fix/convertor raw
2019-08-16 10:27:05 -04:00
George Bosilca
2930bd9d21 Whitespace cleanup
No code or logic changes.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-08-14 11:06:47 -04:00
George Bosilca
904276bb44 Fix the variable names used for the datatype dump.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-14 10:59:50 -04:00
George Bosilca
daf4338c31 Fix the stack displacement.
Fixes the convertor iovec description on the MPI-IO reported by Edgar.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-14 01:16:30 -04:00
Artem Polyakov
d58c59eb71
Merge pull request #6893 from janjust/osc_error_path_fix
osc/ucx: Fix error path
2019-08-12 21:23:57 -07:00
Simon Byrne
9c8671c48b Run-as-root env vars in orterun.c
I found that I needed to apply the same change as #5597 to orterun.c for the environment variables to work correctly.

Signed-off-by: Simon Byrne <simonbyrne@gmail.com>
2019-08-12 18:52:52 -07:00
guserav
8a67a95c99 common/ofi: Fix open-mpi/ompi#2519
As discussed in open-mpi/ompi#2519 the common component does not depend
on libfabric yet. This commit introduces this dependency by just calling
fi_version().

Signed-off-by: guserav <erik.zeiske@hpe.com>
2019-08-12 16:17:59 -07:00
guserav
0e25c95eae common/ofi: Fix check for OFI in build files
The changes made in f5e1a672cc
have been done after the common/ofi component was removed and thus the
component doesn't reflect the changes made their.

Namely f5e1a672cc changed:
- How to call OPAL_CHECK_OFI (It sets opal_ofi_happy to yes now)
- Dropped the common part in the build flags for ofi

Signed-off-by: guserav <erik.zeiske@web.de>
2019-08-12 16:15:23 -07:00
Jeff Squyres
ae1f7e0c3b
Merge pull request #6879 from mwheinz/REF6877-master
PSM MTL is obsolete and should be removed
2019-08-12 15:08:25 -04:00
Tomislav Janjusic
d5f6b088ae osc/ucx: Fix error path
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
2019-08-12 21:54:01 +03:00
Gilles Gouaillardet
cf4910b300
Merge pull request #6880 from ggouaillardet/topic/nbc_fixes
coll/{base,libnbc}: fix datatypes/operator retention
2019-08-12 12:39:33 +09:00
guserav
4ad78aaa15 Revert "Remove opal/mca/common/ofi."
This reverts commit dd20174532.

Signed-off-by: guserav <erik.zeiske@web.de>
2019-08-09 10:33:21 -07:00
Gilles Gouaillardet
63d3ccde9d coll/base: only retain datatypes/op if the request has not yet completed
a non blocking collective might return ompi_request_null, so we should not
retain anything in that case.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-08-09 09:57:56 +09:00
Gilles Gouaillardet
0862c409f1 coll/base: cleanup ompi_coll_base_nbc_request_t elements
Since ompi_coll_base_nbc_request_t is to be used in an
opal_free_list_t, it must be returned into a "clean" state.
So cleanup some data in the callback completion subroutines.

This fixes a regression introduced in open-mpi/ompi@0fe756d416

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-08-08 10:48:06 +09:00
Gilles Gouaillardet
f8eef0fde9 coll/libnbc: fixes ompi ompi_coll_libnbc_request_t parent
base ompi_coll_libnbc_request_t on top of ompi_coll_base_nbc_request_t
to correctly support the retention of datatypes/operators

This fixes a regression introduced in open-mpi/ompi@0fe756d416

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-08-08 10:47:48 +09:00
Michael Heinz
0348d14ff3 PSM MTL is obsolete and should be removed
The PSM MTL for Intel's TrueScale Infiniband HCAs is not being actively
maintained and should be removed from the master branch.

Fixes issue: #6877

Signed-off-by: Michael Heinz <michael.william.heinz@intel.com:
2019-08-07 11:43:03 -04:00
Ralph Castain
6159afc07a
Merge pull request #6872 from rhc54/topic/lsf2
Fix typos
2019-08-07 07:07:25 -07:00
Ralph Castain
bd5a1765ee
Fix typos
Provide a missing header and paren

Thanks to @zerothi for the assistance

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-08-07 05:47:12 -07:00
Yossi Itigin
ec9def1406
Merge pull request #6864 from hoopoepg/topic/ucx-ppn-hint
UCX: added PPN hint for UCX context
2019-08-07 13:45:38 +03:00
Ralph Castain
0b3fe2136e
Merge pull request #6869 from rhc54/topic/lsf
Allow individual jobs to set their map/rank/bind policies
2019-08-06 08:52:37 -07:00
Ralph Castain
ea0dfc3218
Allow individual jobs to set their map/rank/bind policies
Override the defaults when provided. Ignore LSF binding file if user
overrides by specifying a policy.

Fixes #6631

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-08-06 07:48:58 -07:00