1
1

29927 Коммитов

Автор SHA1 Сообщение Дата
Edgar Gabriel
88b3b24101
Merge pull request #6697 from edgargabriel/topic/external32-v2
Topic/external32 v2
2019-05-20 19:11:11 -05:00
Edgar Gabriel
27b2ec71a7 common/ompio: add support for read operations and collective I/O
external32 data representation is now support by ompio for everything
but non-blocking collective I/O operations. The support can further be improved
in a second step to limit the temporary buffer size (at least for blocking operations),
but it does work now for many scenarios.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-20 17:56:16 -05:00
Edgar Gabriel
ab56e6f0db common/ompio: make individual read operations work.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-20 17:22:33 -05:00
Edgar Gabriel
f6b3a0af52 common/ompio: individual write of external32 works
both blocking and non-blocking. collective write and read operations not yet.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-20 16:26:14 -05:00
Edgar Gabriel
d955753cb8 common/ompio: abstraction for different convertor types
introduce separate convertors for memory vs. file representation. Adjust the interfaces for decode_datatype to provide the convertor to be used for that.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-20 13:35:38 -05:00
Edgar Gabriel
35be18b266 common/ompio: rename ompio_cuda* to ompio_buffer*
the infrastructure put in place to manage cuda buffers is actually
a lot more generic than just for cuda buffers. Specifically, we ca
reuse much of the code to implement the external32 data representation.
This commit converts the code from common_ompio_cuda* to
common_ompio_buffer*. There are just very few places where we actually need to keep the OPAL_CUDA_SUPPORT ifdef in place.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-20 12:50:04 -05:00
Edgar Gabriel
a96efb7620 common/ompio: add comm_ompio_read_all/write_all functions
in preparation for adding support for the external32 data
representation.

Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
2019-05-20 12:49:36 -05:00
Yossi Itigin
d65fae11bd
Merge pull request #6693 from hoopoepg/topic/fixed-compilation-warnings
SPML/UCX: fixed few compilation warnings
2019-05-20 15:54:42 +03:00
Sergey Oblomov
421a7fd47d SPML/UCX: fixed few compilation warnings
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-20 14:40:24 +03:00
valentin petrov
5e0e1b63f3
Merge pull request #6690 from vspetrov/master
Coll/hcoll: don't init opal memhooks unless explicitely requested
2019-05-20 12:27:13 +03:00
Valentin Petrov
f19f6f432a Coll/hcoll: don't init opal memhooks unless explicitely requested by user
If user sets HCOLL_EXTERNAL_UCM_EVENTS=1 then we try init opal
    memory framework and register a mem release cb. Otherwise, rely on ucx.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-05-20 11:17:44 +03:00
Yossi Itigin
0c1da0fcab
Merge pull request #6687 from yosefe/topic/osc-ucx-fix-ud-self-deadlock
OSC/UCX: Fix deadlock with atomic lock
2019-05-20 09:50:20 +03:00
Yossi Itigin
9d1994b906 OSC/UCX: Fix deadlock with atomic lock
Atomic lock must progress local worker while obtaining the remote lock,
otherwise an active message which actually releases the lock might not
be processed while polling on local memory location.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-19 20:10:09 +03:00
Alex Anenkov
77d466edf3 coll/libnbc: add recursive doubling algorithm for MPI_Iallreduce
Signed-off-by: Alex Anenkov <anenkov.ru@gmail.com>
2019-05-19 18:39:11 +07:00
Yossi Itigin
61adcd9fc2
Merge pull request #6680 from hoopoepg/topic/suppressed-pml-ucx-mt-warning
PML/UCX: disable PML UCX if MT is requested but not supported
2019-05-19 10:21:46 +03:00
Sergey Oblomov
a3578d9ece PML/UCX: disable PML UCX if MT is requested but not supported
- in case if multithreading requested but not supported
  disable PML UCX

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-17 11:25:23 +03:00
Gilles Gouaillardet
5cfa1cf666
Merge pull request #6676 from ggouaillardet/topic/oshmem_no_orte
oshmem: remove useless reference to orte header
2019-05-17 16:35:37 +09:00
Gilles Gouaillardet
5c14f8439a oshmem: remove useless reference to orte header
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-05-17 15:38:31 +09:00
Yossi Itigin
84ae05c7bc
Merge pull request #6675 from hoopoepg/topic/ucx-common-init-patcher-on-hooks-used-only
COMMON/UCX: init memhooks infra on external hooks only
2019-05-16 22:35:32 +03:00
Sergey Oblomov
ebc457baf5 COMMON/UCX: removed ucs stuff
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-16 20:56:30 +03:00
Sergey Oblomov
a0a9306066 COMMON/UCX: init memhooks infra on external hooks only
- initialize memory hooks infrastructure only in case
  if external memory hooks are requested

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-16 20:13:16 +03:00
Jeff Squyres
aaa0b57f50
Merge pull request #6654 from jdhayes/master
Validate slurm params function.
2019-05-16 12:12:21 -04:00
Sergey Oblomov
a51badd627 SHADOW ALLOCATOR: minor code optimization
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-16 09:38:01 +03:00
Jordan Hayes
7dad74032e plm_slurm_module: adjust for new SLURM CLI options
SLURM 19 discontinued the use of --cpu_bind (and changed it to
--cpu-bind).  There's no easy way to test at run time which one is
accepted, so set the environment variable SLURM_CPU_BIND to "none",
which should do the same thing as the srun CLI parameter.

Signed-off-by: Jordan Hayes <jhayes@ucr.edu>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-15 14:52:33 -07:00
Nathan Hjelm
3e1dd36241 btl/uct: check for support before disabling UCX memory hooks
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2019-05-15 13:49:10 -06:00
Sergey Oblomov
277c2a9e5c ALLOC_WITH_HINT: added implace realloc
- in some cases realloc operation may be completed without
  allocation of new buffer (and without additional data copy)
- added logic to reallocate buffer inplace if possible

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-15 19:37:38 +03:00
Yossi Itigin
7fe5c5431f
Merge pull request #6662 from yosefe/topic/oshmem-sshmem-mmap-sysv-hint-no-impl
OSHMEM/MMAP/SYSV: Return ERR_NOT_IMPLEMENTED if segment hint != 0
2019-05-15 17:09:49 +03:00
Jeff Squyres
4a420bb1e2
Merge pull request #6659 from jsquyres/pr/openmpi-specfile-minor-fix
openmpi.spec: make sure grep failure doesn't abort
2019-05-15 09:53:28 -04:00
Yossi Itigin
f7086746e9 OSHMEM/MMAP/SYSV: Return ERR_NOT_IMPLEMENTED if segment hint != 0
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-15 16:12:19 +03:00
Yossi Itigin
fe5ad67127
Merge pull request #6657 from brminich/topic/fix_cov_errors
SPML/UCX: Fix coverity error
2019-05-15 12:29:35 +03:00
Jeff Squyres
013f5b03f5 openmpi.spec: make sure grep failure doesn't abort
Thanks to Daniel Letai for bringing this to our attention.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-14 16:28:53 -07:00
Jeff Squyres
db0775974d
Merge pull request #6658 from jsquyres/pr/usnic-fix-coverity-cid-1445095
usnic: fix Coverity false positives
2019-05-14 18:22:29 -04:00
bosilca
6089608858
Merge pull request #6647 from bosilca/fix/length_0
Fix/length 0
2019-05-14 17:59:15 -04:00
Jeff Squyres
df5f7afb14 usnic: fix Coverity false positives
Add some Coverity inline notation to tell Coverity that these
functions never return.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-14 13:53:25 -07:00
Mikhail Brinskii
d81dc533f6 SPML/UCX: Fix coverity error
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-05-14 22:34:01 +03:00
Jeff Squyres
9442989e2c
Merge pull request #6382 from jsquyres/pr/ofi-mtl-gitignore
mtl/ofi: add a .gitignore
2019-05-13 12:00:41 -04:00
Yossi Itigin
4e356cd788
Merge pull request #6653 from hoopoepg/topic/suppressed-coverity-issue
OSHMEM/free: suppressed coverity issue
2019-05-13 15:06:30 +03:00
Sergey Oblomov
4df8c1b3e3 OSHMEM/free: suppressed coverity issue
- removed dead code

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-13 11:59:02 +03:00
Yossi Itigin
02bf863ac2
Merge pull request #6641 from hoopoepg/topic/alloc-with-hint-impl-master
OSHMEM: Add support for shmemx_malloc_with_hint()
2019-05-12 10:16:50 +03:00
Jeff Squyres
5ee3f54173
Merge pull request #5215 from jsquyres/pr/usnic-updates
usNIC updates
2019-05-11 09:19:25 -04:00
Jeff Squyres
566e6f1ca3 btl/usnic: remove legacy code
Remove compatibility code for multiple versions of BTL_IN_OPAL,
BTL_VERSION, and RCACHE_VERSION.  This stuff was really only necessary
when we were actively swapping code between multiple release branches
that had large variations in core OMPI infrastructure.  These large
variations have now been around for quite a while, so the need for
this "compat" layer is significantly reduced.  It hasn't been removed
simply because a few of the "compat" names a slightly more friendly
than the real names (e.g., the SEND/RECV/PUT names).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-11 05:19:36 -07:00
Jeff Squyres
8a2441603f btl/usnic: remove all calls to abort()
Inspired by https://github.com/open-mpi/ompi/pull/5205, finally remove
all calls to abort() from the usnic BTL.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-11 05:17:29 -07:00
George Bosilca
42119254c7 Fix incorrect behavior with length == 0
Fixes #6575.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-10 19:53:34 -04:00
George Bosilca
d141bf7912 Update the datatype dump to match the actual types.
Update the comments to better reflect what is going on.
Minor indentations.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-10 18:03:57 -04:00
Yossi Itigin
94b5e91194 OSHMEM: Add support for shmemx_malloc_with_hint()
- added multiple segments processing
- added shmemx_malloc_with_hint call + set of hints

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-10 20:04:57 +03:00
Nathan Hjelm
b82a08254f btl/ugni: fix 32-bit compare-and-swap atomics
This commit fixes an error in the 32-bit compare-and-swap atomic support
for Aries networks. The code was incorrectly using the non-fetching
version of cswap which was causing the routing to return
OPAL_ERR_BAD_ARG.

Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>
2019-05-10 09:59:54 -06:00
Joseph Schuchart
c67e229193 OSC rdma: make sure accumulating in shared memory is safe
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2019-05-10 17:27:10 +02:00
Nathan Hjelm
4345308dfd osc/rdma: fix CAS 32-bit network atomic compatibility check
When checking for btl compatibility with 32-bit CAS osc/rdma was
checking the incorrect flag field.

Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>
2019-05-10 07:27:53 -06:00
KAWASHIMA Takahiro
dabad084b5
Merge pull request #6621 from bosilca/topic/persistent_req_leak
Fix the leak of fragments for persistent sends (issue #6565)
2019-05-03 15:21:42 +09:00
Jeff Squyres
2821f5ac06
Merge pull request #6629 from mwheinz/master
buildrpm.sh no longer respects the value of rpmtopdir
2019-05-02 13:02:40 -04:00