1
1
Граф коммитов

29969 Коммитов

Автор SHA1 Сообщение Дата
Sergey Oblomov
421a7fd47d SPML/UCX: fixed few compilation warnings
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-20 14:40:24 +03:00
valentin petrov
5e0e1b63f3
Merge pull request #6690 from vspetrov/master
Coll/hcoll: don't init opal memhooks unless explicitely requested
2019-05-20 12:27:13 +03:00
Valentin Petrov
f19f6f432a Coll/hcoll: don't init opal memhooks unless explicitely requested by user
If user sets HCOLL_EXTERNAL_UCM_EVENTS=1 then we try init opal
    memory framework and register a mem release cb. Otherwise, rely on ucx.

Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2019-05-20 11:17:44 +03:00
Yossi Itigin
0c1da0fcab
Merge pull request #6687 from yosefe/topic/osc-ucx-fix-ud-self-deadlock
OSC/UCX: Fix deadlock with atomic lock
2019-05-20 09:50:20 +03:00
Yossi Itigin
9d1994b906 OSC/UCX: Fix deadlock with atomic lock
Atomic lock must progress local worker while obtaining the remote lock,
otherwise an active message which actually releases the lock might not
be processed while polling on local memory location.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-19 20:10:09 +03:00
Alex Anenkov
77d466edf3 coll/libnbc: add recursive doubling algorithm for MPI_Iallreduce
Signed-off-by: Alex Anenkov <anenkov.ru@gmail.com>
2019-05-19 18:39:11 +07:00
Yossi Itigin
61adcd9fc2
Merge pull request #6680 from hoopoepg/topic/suppressed-pml-ucx-mt-warning
PML/UCX: disable PML UCX if MT is requested but not supported
2019-05-19 10:21:46 +03:00
Sergey Oblomov
a3578d9ece PML/UCX: disable PML UCX if MT is requested but not supported
- in case if multithreading requested but not supported
  disable PML UCX

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-17 11:25:23 +03:00
Gilles Gouaillardet
5cfa1cf666
Merge pull request #6676 from ggouaillardet/topic/oshmem_no_orte
oshmem: remove useless reference to orte header
2019-05-17 16:35:37 +09:00
Gilles Gouaillardet
5c14f8439a oshmem: remove useless reference to orte header
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-05-17 15:38:31 +09:00
Yossi Itigin
84ae05c7bc
Merge pull request #6675 from hoopoepg/topic/ucx-common-init-patcher-on-hooks-used-only
COMMON/UCX: init memhooks infra on external hooks only
2019-05-16 22:35:32 +03:00
Sergey Oblomov
ebc457baf5 COMMON/UCX: removed ucs stuff
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-16 20:56:30 +03:00
Sergey Oblomov
a0a9306066 COMMON/UCX: init memhooks infra on external hooks only
- initialize memory hooks infrastructure only in case
  if external memory hooks are requested

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-16 20:13:16 +03:00
Jeff Squyres
aaa0b57f50
Merge pull request #6654 from jdhayes/master
Validate slurm params function.
2019-05-16 12:12:21 -04:00
Sergey Oblomov
a51badd627 SHADOW ALLOCATOR: minor code optimization
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-16 09:38:01 +03:00
Jordan Hayes
7dad74032e plm_slurm_module: adjust for new SLURM CLI options
SLURM 19 discontinued the use of --cpu_bind (and changed it to
--cpu-bind).  There's no easy way to test at run time which one is
accepted, so set the environment variable SLURM_CPU_BIND to "none",
which should do the same thing as the srun CLI parameter.

Signed-off-by: Jordan Hayes <jhayes@ucr.edu>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-15 14:52:33 -07:00
Nathan Hjelm
3e1dd36241 btl/uct: check for support before disabling UCX memory hooks
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
2019-05-15 13:49:10 -06:00
Sergey Oblomov
277c2a9e5c ALLOC_WITH_HINT: added implace realloc
- in some cases realloc operation may be completed without
  allocation of new buffer (and without additional data copy)
- added logic to reallocate buffer inplace if possible

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-15 19:37:38 +03:00
Yossi Itigin
7fe5c5431f
Merge pull request #6662 from yosefe/topic/oshmem-sshmem-mmap-sysv-hint-no-impl
OSHMEM/MMAP/SYSV: Return ERR_NOT_IMPLEMENTED if segment hint != 0
2019-05-15 17:09:49 +03:00
Jeff Squyres
4a420bb1e2
Merge pull request #6659 from jsquyres/pr/openmpi-specfile-minor-fix
openmpi.spec: make sure grep failure doesn't abort
2019-05-15 09:53:28 -04:00
Yossi Itigin
f7086746e9 OSHMEM/MMAP/SYSV: Return ERR_NOT_IMPLEMENTED if segment hint != 0
Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
2019-05-15 16:12:19 +03:00
Yossi Itigin
fe5ad67127
Merge pull request #6657 from brminich/topic/fix_cov_errors
SPML/UCX: Fix coverity error
2019-05-15 12:29:35 +03:00
Jeff Squyres
013f5b03f5 openmpi.spec: make sure grep failure doesn't abort
Thanks to Daniel Letai for bringing this to our attention.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-14 16:28:53 -07:00
Jeff Squyres
db0775974d
Merge pull request #6658 from jsquyres/pr/usnic-fix-coverity-cid-1445095
usnic: fix Coverity false positives
2019-05-14 18:22:29 -04:00
bosilca
6089608858
Merge pull request #6647 from bosilca/fix/length_0
Fix/length 0
2019-05-14 17:59:15 -04:00
Jeff Squyres
df5f7afb14 usnic: fix Coverity false positives
Add some Coverity inline notation to tell Coverity that these
functions never return.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-14 13:53:25 -07:00
Mikhail Brinskii
d81dc533f6 SPML/UCX: Fix coverity error
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-05-14 22:34:01 +03:00
Jeff Squyres
9442989e2c
Merge pull request #6382 from jsquyres/pr/ofi-mtl-gitignore
mtl/ofi: add a .gitignore
2019-05-13 12:00:41 -04:00
Yossi Itigin
4e356cd788
Merge pull request #6653 from hoopoepg/topic/suppressed-coverity-issue
OSHMEM/free: suppressed coverity issue
2019-05-13 15:06:30 +03:00
Sergey Oblomov
4df8c1b3e3 OSHMEM/free: suppressed coverity issue
- removed dead code

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-13 11:59:02 +03:00
Yossi Itigin
02bf863ac2
Merge pull request #6641 from hoopoepg/topic/alloc-with-hint-impl-master
OSHMEM: Add support for shmemx_malloc_with_hint()
2019-05-12 10:16:50 +03:00
Jeff Squyres
5ee3f54173
Merge pull request #5215 from jsquyres/pr/usnic-updates
usNIC updates
2019-05-11 09:19:25 -04:00
Jeff Squyres
566e6f1ca3 btl/usnic: remove legacy code
Remove compatibility code for multiple versions of BTL_IN_OPAL,
BTL_VERSION, and RCACHE_VERSION.  This stuff was really only necessary
when we were actively swapping code between multiple release branches
that had large variations in core OMPI infrastructure.  These large
variations have now been around for quite a while, so the need for
this "compat" layer is significantly reduced.  It hasn't been removed
simply because a few of the "compat" names a slightly more friendly
than the real names (e.g., the SEND/RECV/PUT names).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-11 05:19:36 -07:00
Jeff Squyres
8a2441603f btl/usnic: remove all calls to abort()
Inspired by https://github.com/open-mpi/ompi/pull/5205, finally remove
all calls to abort() from the usnic BTL.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-11 05:17:29 -07:00
George Bosilca
42119254c7 Fix incorrect behavior with length == 0
Fixes #6575.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-10 19:53:34 -04:00
George Bosilca
d141bf7912 Update the datatype dump to match the actual types.
Update the comments to better reflect what is going on.
Minor indentations.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-10 18:03:57 -04:00
Yossi Itigin
94b5e91194 OSHMEM: Add support for shmemx_malloc_with_hint()
- added multiple segments processing
- added shmemx_malloc_with_hint call + set of hints

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2019-05-10 20:04:57 +03:00
Nathan Hjelm
b82a08254f btl/ugni: fix 32-bit compare-and-swap atomics
This commit fixes an error in the 32-bit compare-and-swap atomic support
for Aries networks. The code was incorrectly using the non-fetching
version of cswap which was causing the routing to return
OPAL_ERR_BAD_ARG.

Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>
2019-05-10 09:59:54 -06:00
Joseph Schuchart
c67e229193 OSC rdma: make sure accumulating in shared memory is safe
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
2019-05-10 17:27:10 +02:00
Nathan Hjelm
4345308dfd osc/rdma: fix CAS 32-bit network atomic compatibility check
When checking for btl compatibility with 32-bit CAS osc/rdma was
checking the incorrect flag field.

Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>
2019-05-10 07:27:53 -06:00
KAWASHIMA Takahiro
dabad084b5
Merge pull request #6621 from bosilca/topic/persistent_req_leak
Fix the leak of fragments for persistent sends (issue #6565)
2019-05-03 15:21:42 +09:00
Jeff Squyres
2821f5ac06
Merge pull request #6629 from mwheinz/master
buildrpm.sh no longer respects the value of rpmtopdir
2019-05-02 13:02:40 -04:00
George Bosilca
a16cf0e4dd
Fix the leak of fragments for persistent sends.
The rdma_frag attached to the send request was not correctly released
upon request completion, leaking until MPI_Finalize. A quick solution
would have been to add RDMA_FRAG_RETURN at different locations on the
send request completion, but it would have unnecessarily made the
sendreq completion path more complex. Instead, I added the length to
the RDMA fragment so that it can be completed during the remote ack.

Be more explicit on the comment.

The rdma_frag can only be freed once when the peer forced a protocol
change (from RDMA GET to send/recv). Otherwise the fragment will be
returned once all data pertaining to it has been trasnferred.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2019-05-02 09:40:11 -04:00
Jeff Squyres
ac54d771ec mtl/ofi: add a .gitignore
Ignore generated files.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2019-05-01 14:00:00 -07:00
Michael Heinz
8562211623 Corrects some whitespace issues with buildrpm.sh
Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
2019-05-01 15:22:03 -04:00
Michael Heinz
687a5603a1 buildrpm.sh no longer respects the value of rpmtopdir
In OMPI 2.1.2, buildrpm.sh could work with a value of rpmtopdir that was
set in the environment. In newer versions this is no longer true,
causing such values to be ignored. This patch adds a new argument to
buildrpm.sh, -R, which allows the user to specify where to build the
RPMs.

Signed-off-by: Michael Heinz <michael.william.heinz@intel.com>
2019-05-01 15:20:41 -04:00
Yossi Itigin
5d2200a7d6
Merge pull request #6605 from brminich/topic/shmem_all2all_put
SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
2019-05-01 12:00:21 +03:00
Mikhail Brinskii
d4843b1651 SPML/UCS: CR comments p2
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-04-30 16:49:11 +03:00
Mikhail Brinskii
c4c99457db SPML/UCX: CR comments p1
Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
2019-04-30 16:26:45 +03:00
bosilca
399b7133ab
Merge pull request #6556 from EmmanuelBRELLE/PR_fix_local_handle_in_PUT_message
pml/ob1: fixed local handle sent during PUT control message
2019-04-27 13:51:22 -04:00