1
1

30082 Коммитов

Автор SHA1 Сообщение Дата
Jeff Squyres
ee3564a2dc
Merge pull request #7004 from mwheinz/REFS6976-master
REF6976 Silent failure of OMPI over OFI with large messages sizes
2019-09-23 17:31:21 -04:00
Michael Heinz
3aca4af548 REF6976 Silent failure of OMPI over OFI with large messages sizes
INTERNAL: STL-59403

The OFI (libfabric) MTL does not respect the maximum message size
parameter that OFI provides in the fi_info data.

This patch adds this missing max_msg_size field to the mca_ofi_module_t
structure and adds a length check to the low-level send routines.

Change-Id: I05aa71d332f2df897133b30c28bf37d98f061996
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
Reviewed-by: Adam Goldman <adam.goldman@intel.com>
Reviewed-by: Brendan Cunningham <brendan.cunningham@intel.com>
2019-09-23 15:23:48 -04:00
Yossi Itigin
5ef9cad575
Merge pull request #7000 from amaslenn/mlnx-no-missing-libcuda-warn
platform/mellanox: disable missing libcuda warning
2019-09-23 11:06:36 +03:00
Andrey Maslennikov
63ba7bec46 platform/mellanox: disable missing libcuda warning
Signed-off-by: Andrey Maslennikov <andreyma@mellanox.com>
2019-09-22 16:02:57 +03:00
Yossi Itigin
c864e84cbf
Merge pull request #6979 from hoopoepg/topic/restored-ikrit-compilation
IKRIT: restored compilation
2019-09-21 14:38:36 +03:00
Jeff Squyres
8038fac8f9
Merge pull request #6844 from adrianreber/check_for_user_ns
Do not use CMA in user namespaces
2019-09-20 22:10:42 -04:00
Brian Barrett
a7da93f88c
Merge pull request #6973 from wckzhang/bp_get_vertex
opal/util: Add function to get vertex data from bipartite graph
2019-09-20 15:19:37 -07:00
Jeff Squyres
13e4040037
Merge pull request #6992 from awlauria/fix_prefix_mpir
Add 'orte_' prefix to noop_mpir_breakpoint_ptr.
2019-09-18 22:15:13 -04:00
Austen Lauria
77144689f0 Add 'orte_' prefix to noop_mpir_breakpoint_ptr.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2019-09-18 17:44:40 -04:00
Jeff Squyres
cc586d808a
Merge pull request #6991 from devreal/grequestx-progress
Ensure that grequestx continuously make progress
2019-09-18 13:46:46 -04:00
Joseph Schuchart
37e6bbb1e1 Ensure that grequestx continuously make progress
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2019-09-18 18:55:11 +02:00
Jeff Squyres
a0d1912af3
Merge pull request #6904 from awlauria/silence_mpir_warning_master
Conform MPIR_Breakpoint to MPIR standard.
2019-09-17 16:06:25 -04:00
Austen Lauria
067adfa417 Conform MPIR_Breakpoint to MPIR standard.
- Fix MPIR_Breakpoint standard violation by returning void
  instead of a void*.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2019-09-17 15:19:00 -04:00
Sergey Oblomov
991082abf2 IKRIT: restored compilation
- due to some refactoring and adding new functionality compilation
  of ikrit module was broken
- this commit restores compilation

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-09-16 19:00:34 +03:00
William Zhang
ca70b703a1 opal/util: Add function to get vertex data from bipartite graph
Currently, there is no function that allows the user to retrieve the
data they have stored in a vertex easily. Using the internal macros and
knowledge of the structures, the new function will return a pointer to
the user provided vertex data.

Signed-off-by: William Zhang <wilzhang@amazon.com>
2019-09-11 18:28:07 +00:00
Jeff Squyres
c672a51ccd
Merge pull request #6971 from hoopoepg/topic/fixed-typos-in-mpi-h-in
MPI.H: fixed few typos in comments
2019-09-10 07:34:19 -05:00
Sergey Oblomov
e0aee1ba5a MPI.H: fixed few typos in comments
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-09-10 13:34:38 +03:00
bosilca
2c4b54cc97
Merge pull request #6965 from bosilca/topic/contig_predef
Mark predefined empty datatype contiguous.
2019-09-09 21:39:17 -04:00
George Bosilca
3522916971
Mark predefined empty datatype contiguous.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-09-07 14:40:21 +10:00
bosilca
d7f6dd0f30
Merge pull request #6961 from hjelmn/fix_btl_vader_fragment_issue
btl/vader: when using single-copy emulation fragment large rdma
2019-09-06 22:24:12 -04:00
Ralph Castain
884d4e78cc
Merge pull request #6964 from rhc54/topic/oob
Be a little less restrictive on interface requirements
2019-09-06 09:06:43 -07:00
Ralph Castain
06d188ebf3
Be a little less restrictive on interface requirements
If both types of interfaces are enabled, don't error out if one of them
isn't able to open listener sockets. Only one interface family may be
available on some machines, but someone might want to build the code to
run more generally.

Refs https://github.com/pmix/prrte/pull/249

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-09-06 08:27:05 -07:00
Nathan Hjelm
ae91b11de2 btl/vader: when using single-copy emulation fragment large rdma
This commit changes how the single-copy emulation in the vader btl
operates. Before this change the BTL set its put and get limits
based on the max send size. After this change the limits are unset
and the put or get operation is fragmented internally.

References #6568

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2019-09-05 23:08:53 -07:00
Geoff Paulsen
5ff6cb6e6a
Merge pull request #6756 from markalle/romio_info
romio info: letting romio keep its internal setup
2019-09-05 15:43:07 -05:00
Adrian Reber
fc68d8a90f
Do not use CMA in user namespaces
Trying out to run processes via mpirun in Podman containers has shown
that the CMA btl_vader_single_copy_mechanism does not work when user
namespaces are involved.

Creating containers with Podman requires at least user namespaces to be
able to do unprivileged mounts in a container

Even if running the container with user namespace user ID mappings which
result in the same user ID on the inside and outside of all involved
containers, the check in the kernel to allow ptrace (and thus
process_vm_{read,write}v()), fails if the same IDs are not in the same
user namespace.

One workaround is to specify '--mca btl_vader_single_copy_mechanism none'
and this commit adds code to automatically skip CMA if user namespaces
are detected and fall back to MCA_BTL_VADER_EMUL.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-09-05 20:15:19 +02:00
Gilles Gouaillardet
50db085272
Merge pull request #6957 from rhc54/topic/buf
Ensure buffer_unload leaves the buffer in a clean state
2019-09-05 09:40:35 +09:00
Raafat Feki
7877743784
Merge pull request #6857 from raafatfeki/pr/ompio_coll_write_clean
Pr/ompio_fcoll_write_clean
2019-09-04 11:06:56 -05:00
Ralph Castain
373e816b37
Ensure buffer_unload leaves the buffer in a clean state
Silence a warning in orte/nidmap

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-09-04 08:32:27 -07:00
Ralph Castain
73c5fe8f05
Merge pull request #6953 from rhc54/topic/pmix4
Update to current PMIx 4.0.0 (master)
2019-09-03 19:47:58 -07:00
Ralph Castain
9a2902c047
Force use of pmix/preg/native component
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-09-03 17:54:44 -07:00
Ralph Castain
2ebc1fa1b7
Update to current PMIx 4.0.0 (master)
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-09-03 14:46:55 -07:00
Mark Allen
14e3d7b8b0 romio info: letting romio keep its internal setup
I'm restoring the info function pointers to the IO module
but allowing the function pointers to be NULL (eg in ompio).
And letting romio321 set its function pointers for those
routines.

This means the info system uses the new OMPI-level info
system for most things, but skips it and uses the pre-existing
romio info system just for the romio module.

It's possible to convert romio, but I went a ways down that
path and found it kind of convoluted.  Having pointers from
the lower level ADIO_File back to the higher level ompi_file_t
wasn't too bad, but I got stuck trying to figure out where/how
to register the infosubscribe_subscribe callbacks vs the way
initial k/v values are scattered around the romio code currently.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
2019-09-03 14:08:19 -04:00
Jeff Squyres
f1a065f4dc
Merge pull request #6945 from bosilca/topic/ddt_optimize
Small optimization on the datatype commit.
2019-09-03 12:46:11 -04:00
George Bosilca
82d632278a
Add a test for datatypes composed by multiple predefined
elements that can be merged into a larger UINT1 type.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-30 19:56:48 -04:00
George Bosilca
41e6f55807
Small optimization on the datatype commit.
This patch fixes the merge of contiguous elements into larger but more
compact datatypes, and allows for contiguous elements to have thir
blocklen increasing instead of the count. The idea is to always maximize
the blocklen, aka. the contiguous part of the datatype.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-08-30 19:56:48 -04:00
Nathan Hjelm
c4d0752036
Merge pull request #6803 from guserav/fix-osc-sm-post-32-bit-atomics
Fix osc sm posts when only 32 bit atomics support
2019-08-27 18:23:48 -07:00
Jeff Squyres
b846028622
Merge pull request #6937 from jsquyres/pr/remove-non-existent-symbols-in-tests
Update OPAL DDT variable names
2019-08-27 15:58:04 -04:00
Jeff Squyres
2ab8109be1 Update OPAL DDT variable names
These variables were renamed in
904276bb44caec207638247f23139bc21bc6a09e; update them to use the new
names.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-08-27 12:00:20 -07:00
Joshua Ladd
8b6f2d95ba
Merge pull request #6930 from vspetrov/coll_hcoll_nbc_request_bugfix
Coll/hcoll: fixes hcoll non-blocking colls support
2019-08-27 12:12:01 -04:00
Valentin Petrov
a0d99ad190 Coll/hcoll: fixes hcoll non-blocking colls support
open-mpi/ompi@0fe756d416 Introduced
    a bug in coll/hcoll component. The ompi_requests allocated by
    libhcoll would be treated as coll_base_nbc_request during
    ompi_coll_base_retain_<> call. Afterwards this would lead to a
    segv in the request cleanup.

    Fix: since libhcoll interface does not distinguish between the
    blocling/non-blocking requests use coll_base_nbc_request all the
    time and initialize it properly in
    coll/hcoll/get_coll_handle(). It is still within 2 cache lines.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-08-27 17:22:58 +03:00
Howard Pritchard
71e1fad4a9
Merge pull request #6855 from hkuno/hkuno/mmap_loop
Fix mmap infinite recurse in memory patcher
2019-08-26 14:28:53 -06:00
raafatfeki
2c6a5eed29 fcoll/dynamic_gen2: Adjustment of displacement index in collective write
Within the shuffle iteration, the aggregators have to set a displacement array needed to receive data from other processes. The array had 1 extra element. We adjust the displacement index to match the number of elements.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2019-08-26 10:03:23 -05:00
raafatfeki
f45e9cfdbe fcoll/vulcan: Adjustment of displacement index in collective write
Within the shuffle iteration, the aggregators have to set a displacement array needed to receive data from other processes. The array had 1 extra element. We adjust the displacement index to match the number of elements.

Signed-off-by: raafatfeki <fekiraafat@gmail.com>
2019-08-26 10:03:23 -05:00
Howard Pritchard
5a3646fd1d
Merge pull request #6884 from guserav/reintroduce-common-ofi
Reintroduce common ofi
2019-08-23 13:12:25 -06:00
Yossi Itigin
8933171412
Merge pull request #6918 from hoopoepg/topic/fixed-hand-on-shmem-finalize
SPML/UCX: fixed hang in SHMEM_FINALIZE
2019-08-22 11:37:45 +03:00
Sergey Oblomov
01dacaa6a4 SPML/UCX: fixed comment
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-08-21 12:08:58 +03:00
Sergey Oblomov
182023febb SPML/UCX: fixed hang in SHMEM_FINALIZE
- used MPI _Barrier to synchronize processes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-08-21 12:04:46 +03:00
guserav
56c3d9a238 common/ofi: Set HPE as owner of component
Signed-off-by: guserav <erik.zeiske@web.de>
2019-08-20 10:13:02 -07:00
Ralph Castain
69bd9453e9
Merge pull request #6910 from jsquyres/pr/orterun-remove-duplicate-code
orterun: remove duplicate code
2019-08-19 13:58:34 -07:00
Jeff Squyres
197beb30d5 orterun: remove duplicate code
https://github.com/open-mpi/ompi/pull/6895 fixed the code in orterun.c
to allow running as root if both OMPI_ALLOW_RUN_AS_ROOT and
OMPI_ALLOW_RUN_AS_ROOT_CONFIRM env vars are set.  However, this
env-var-checking code already exists in
orte_submit.c:orte_submit_init() -- it looks like the
geteuid()/getenv()-checking code here in orterun is now duplicate
code.

So let's just get rid of the duplicate code.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-08-19 15:36:59 -04:00