1
1

3640 Коммитов

Автор SHA1 Сообщение Дата
Joseph Schuchart
a346756bf4 uGNI: Fix potential deadlock when processing outstanding transfers
Signed-off-by: Joseph Schuchart <schuchart@hlrs.de>
(cherry picked from commit c09ca039b4703e26d6d7a0494e042dd27827e091)
2019-11-19 22:39:21 +01:00
Akshay Venkatesh
db3e563749 OPAL/MCA/BTL/OPENIB: Detect ConnectX-6 HCAs
Signed-off-by: Akshay Venkatesh <akvenkatesh@nvidia.com>
2019-11-18 17:04:54 -08:00
Howard Pritchard
59b24ab4f7 btl/uct: add UCT API version check to configury
related to #7128

The UCX crew is no longer guaranteeing that the UCT API is going to be frozen,
so this is kind of a whack-a-mole problem trying to keep the BTL UCT working
with various changing UCT APIs.

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 9d345d9aa000233bec148540b071cecffc94438c)
2019-11-07 10:01:52 -07:00
Nathan Hjelm
55e01220cd btl/uct: fix compilation for UCX 1.7.0
Ref #7128

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit a3026c016a6a8be379f62585b6ddc070175c8106)
2019-11-07 10:00:33 -07:00
Nathan Hjelm
47ec3e4d2b btl/uct: add support for OpenUCX v1.8 API changes
OpenUCX broke the UCT API again in v1.8. This commit updates
btl/uct to fix compilation with current OpenUCX master
(future v1.8). Further changes will likely be needed for
the final release.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 526775dfd7ad75c308532784de4fb3ffed25458f)
2019-11-07 10:00:21 -07:00
Jeff Squyres
c6592822c0 btl/usnic: set retrans_timeout back down to 5ms
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 3080033a8c4db64199b03b6058e18488f619088c)
2019-10-15 07:54:32 -07:00
Jeff Squyres
1565239506 btl/usnic: set ack_iteration_delay default to 4
It was previously accidentally set to 0.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 132e4cab3bc71df0da87368a332d6af0090a6977)
2019-10-15 07:54:31 -07:00
Jeff Squyres
22bc268e6e btl/usnic: properly size freelist items
Move the prefix area from the head to the body in relevant size
computations.  This fixes a problem in high traffic situations where
usNIC may have sent from unregistered memory.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit fe7f772f21627b01838c007db7cedbbb0ce8b536)
2019-10-04 16:47:19 -07:00
Jeff Squyres
58155bc760 btl/usnic: cap the number of resends per progress iteration
New MCA param: btl_usnic_max_resends_per_iteration.  This is the max
number of resends we'll do in a single pass through usNIC component
progress.  This prevents progress from getting stuck in an endless
loop of retransmissions (i.e., if more retransmissions are triggered
during the sending of retransmissions).  Specifically: we need to
leave the resend loop to allow receives to happen (which may ACK
messages we have sent previously, and therefore cause pending resends
to be moot).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 27e3040dfeba00a9a2615a217c164899f0009e59)
2019-10-04 16:47:13 -07:00
Jeff Squyres
8f929c68f1 btl/usnic: increase default retrans_timeout
Significantly increase the default retrans timeout.  If the
retrans timeout is too soon, we can end up in a retransmission storm
where the logic will continually re-transmit the same frames during a
single run through the usNIC progress function (because the timer for
a single frame expires before we have run through re-transmitting all
the frames pending re-transmission).

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 3cc95d86b2123f38f392e56adca7ac8a1fef6454)
2019-10-04 16:47:11 -07:00
Jeff Squyres
b5cb03450c btl/usnic: clarifications and fixes regarding ACKs
New MCA parameter: btl_usnic_ack_iteration_delay.  Set this to the
number of times through the usNIC component progress function before
sending a standalone ACK (vs. piggy-backing the ACK on any other send
going to the target peer).

Use "ticks" language to clarify that we're really counting the number
of times through the usNIC component DATA_CHANNEL completion check (to
check for incoming messages) -- it has no relation to wall clock time
whatsoever.

Also slightly change the channel-checking scheme in usNIC component
progress: only check the PRIORITY channel once (vs. checking it once,
not finding anything, and then falling through the progress_2() where we
check PRIORITY again and then check the DATA channel).

As before, if our "progress" libevent fires, increment the tick
counter enough to guarantee that all endpoints that need an ACK will
get triggered to send standalone ACKs the next time through progress,
if necessary.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 968b1a51b59898877a8c7268d463d3d7d78d86a3)
2019-10-04 16:47:09 -07:00
Jeff Squyres
0839a9c313 btl/usnic: s/get_nsec/get_nticks/g
Rename "get_nsec()" to "get_ticks()" to more accurately reflect that
this function has no correlation to wall clock time at all.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit ce2910a28aea61043b81324c67999f3a47cfe7ac)
2019-10-04 16:47:08 -07:00
Adrian Reber
674655c641 Do not use CMA in user namespaces
Trying out to run processes via mpirun in Podman containers has shown
that the CMA btl_vader_single_copy_mechanism does not work when user
namespaces are involved.

Creating containers with Podman requires at least user namespaces to be
able to do unprivileged mounts in a container

Even if running the container with user namespace user ID mappings which
result in the same user ID on the inside and outside of all involved
containers, the check in the kernel to allow ptrace (and thus
process_vm_{read,write}v()), fails if the same IDs are not in the same
user namespace.

One workaround is to specify '--mca btl_vader_single_copy_mechanism none'
and this commit adds code to automatically skip CMA if user namespaces
are detected and fall back to MCA_BTL_VADER_EMUL.

Signed-off-by: Adrian Reber <areber@redhat.com>
(cherry picked from commit fc68d8a90fe86284e9dc730f878b55c0412f01d2)
2019-09-20 19:12:48 -07:00
Nathan Hjelm
5a945f668c btl/vader: when using single-copy emulation fragment large rdma
This commit changes how the single-copy emulation in the vader btl
operates. Before this change the BTL set its put and get limits
based on the max send size. After this change the limits are unset
and the put or get operation is fragmented internally.

References #6568

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit ae91b11de2314ab11a9842d9738cd14f8f1e393b)
2019-09-17 20:01:37 -06:00
Geoff Paulsen
893ea3f91f
Merge pull request #6929 from rhc54/cmr40/pmix314
Remove unnecessary error log
2019-08-30 14:10:36 -05:00
Harumi Kuno
fbbacc1303 Fix mmap infinite recurse in memory patcher
This commit fixes issue #6853 by removing
MacOS/Darwin-specific logic from intercept_mmap.

Signed-off-by: Harumi Kuno <harumi.kuno@hpe.com>
2019-08-29 18:03:16 -07:00
Ralph Castain
8efc6e1dc1
Remove unnecessary error log
Refs https://github.com/pmix/pmix/pull/1413

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-08-26 23:48:34 -07:00
Sergey Oblomov
66e18563bf SPML/UCX: fixed hang in SHMEM_FINALIZE
- used MPI _Barrier to synchronize processes

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit 182023febb6f8f31ce34dc54c8aa409ad7e44fa2)
2019-08-22 11:41:52 +03:00
Ralph Castain
167ca31a31
Update PMIx to official v3.1.4 release
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-08-09 13:14:48 -07:00
Howard Pritchard
71f240f078 btl/openib: fix issue 6785
Commit d7053a3 broke things for the case when Open MPI 4.0.x is built
without UCX support.  Problem was it was trying to partially initialize
the btl to try and delay printing of a help message till wireup.  Well
this sort of doesn't work in all cases.  Rather than keep piling on
changes to support a help message for a BTL that we are deprecating, take
a keep it simple stupid approach.

So, revert most of d7053a3 and instead put the help message back in the
original location, during scan of ports of the available HCAs to check
for whether or not link layer for that port is configured for ethernet or infiniband.
If Open MPI was built with UCX support, don't emit the help message, if
UCX was not linked in, emit the help message.

Verified on a system with connectX5 HCAs configured with two ports configured
for ethernet and two for infiniband.

relates to #6785

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
2019-07-12 08:21:21 -06:00
Ralph Castain
1d0e0557b9
v4.0.x: Update PMIx to official v3.1.3 release
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-07-02 08:56:49 -07:00
Ralph Castain
9d0adbc6bc
Update to track 32-bit support commit
Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-06-26 09:31:43 -07:00
Ralph Castain
b353639573
Update to PMIx v3.1.3rc4
Will provide PR to update VERSION to final release once passes MTT

Signed-off-by: Ralph Castain <rhc@pmix.org>
2019-06-25 13:45:19 -07:00
Ralph Castain
05fa5845bc
Fix finalize of flux component
Per patches from @SteVwonder and @garlick

Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit d4070d5f58f0c65aef89eea5910b202b8402e48b)
2019-06-19 06:00:02 -07:00
Nathan Hjelm
b5428aaf71 btl/uct: add support for UCX 1.6.x
This commit updates the uct btl to support the v1.6.x release of
UCX. This release breaks API.

Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>
(cherry picked from commit b78066720c3e3299bd76f2e22d2c0e415db572fc)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
2019-06-07 15:54:47 -05:00
Geoff Paulsen
18f10377eb
Merge pull request #6152 from ggouaillardet/topic/v4.0.x/ucx_warning
btl/openib: delay UCX warning to add_procs()
2019-06-03 15:09:43 -05:00
Howard Pritchard
6c74d4031b
Merge pull request #6720 from markalle/patcher_additions_v40x
shmat/shmdt additions for patcher
2019-06-03 12:51:05 -07:00
Mark Allen
5f79dfaa0a shmat/shmdt additions for patcher
This is mostly based off recent UCX additions to their patcher:
    https://github.com/openucx/ucx/pull/2703

They added triggers for
* mmap when (flags & MAP_FIXED) && (addr != NULL)
* shmat when (shmflg & SHM_REMAP) && (shmaddr != NULL)

Beyond that I noticed they already had a trigger for
* madvise when (advice == MADV_FREE)
that we didn't so I added that.

And the other main thing is we didn't really have shmat/shmdt
active for some systems because we only had a path for
syscall(SYS_shmdt, ) but we needed to also have a path for
syscall(SYS_ipc, IPCOP_shmdt, ) and same for shmat.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit eb888118e83f56c131aff900b03eab34c92b7805)
2019-05-30 13:31:02 -04:00
Nathan Hjelm
11cb0f24a5 btl/uct: check for support before disabling UCX memory hooks
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
(cherry picked from commit 3e1dd362411f1da5564d3402f65e9b3b74f50759)
2019-05-20 16:42:38 -05:00
Sergey Oblomov
1944295da3 COMMON/UCX: removed ucs stuff
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit ebc457baf5ded5dd46cd73918a2f69555f408c54)
2019-05-17 09:58:20 +03:00
Sergey Oblomov
fa0a0b1597 COMMON/UCX: init memhooks infra on external hooks only
- initialize memory hooks infrastructure only in case
  if external memory hooks are requested

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit a0a93060668cd11a783cc94c753efb3129df9dde)
2019-05-17 09:58:12 +03:00
George Bosilca
4946570b24 Remove few warnings identified by @rhc in #5514.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>

(cherry picked from commit open-mpi/ompi@6d11a45f44)
2019-05-11 16:38:31 +09:00
Gilles Gouaillardet
70a864fce3 btl/vader: fix finalize sequence
free the component mpool in mca_btl_vader_component_close()
and after freeing soem objects that depend on it such as
mca_btl_vader_component.vader_frags_user

Thanks Christoph Niethammer for reporting this.

Refs. open-mpi/ompi#6524

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@77060cad07)
2019-05-11 13:04:23 +09:00
Mikhail Brinskii
e4ee56d1f3 SPML/UCX: Add shmemx_alltoall_global_nb routine to shmemx.h
The new routine transfers the data asynchronously from the source PE to all
PEs in the OpenSHMEM job. The routine returns immediately. The source and
target buffers are reusable only after the completion of the routine.
After the data is transferred to the target buffers, the counter object
is updated atomically. The counter object can be read either using atomic
operations such as shmem_atomic_fetch or can use point-to-point synchronization
routines such as shmem_wait_until and shmem_test.

Signed-off-by: Mikhail Brinskii <mikhailb@mellanox.com>
(cherry picked from commit 2ef5bd8b3671f1e10caf00d06d66d120eac9c5be)
2019-05-02 21:25:59 +03:00
Xin Zhao
69a80fce9f ompi/oshmem/spml/ucx: use lockfree array to optimize spml_ucx_progress/delete oshmem_barrier in shmem_ctx_destroy
ompi/oshmem/spml/ucx: optimize spml ucx progress

Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit 9c3d00b144641d2929f830279dcc9d163c38e9e1)
2019-03-21 23:59:58 +02:00
Xin Zhao
596997c194 ompi/oshmem/spml/ucx: defer clean up shmem_ctx to shmem_finalize
Signed-off-by: Tomislav Janjusic <tomislavj@mellanox.com>
(cherry picked from commit e1c1ab020227fc18d145379ab29ea86a3cdb66b1)
2019-03-21 23:58:23 +02:00
Joshua Hursey
45526fadee Do not force 'hash' gds on direct modex
* Forcing the 'hash' gds component should not be necessary any more.

Port of PR #6498 (component names changed so a cherry-pick would not work)

Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
2019-03-19 10:52:17 -05:00
Gilles Gouaillardet
8da4605589 btl/openib: immediately release the device when no port is allowed
Many thanks to Sergey Oblomov for reporting this issue
and the countless traces provided when troubleshooting it.

This is a one-off commit for the v4.0.x branch since btl/openib has been removed
 from master.

Refs. open-mpi/ompi#6137

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-19 09:26:11 +09:00
Gilles Gouaillardet
c58c774981 btl/openib: have add_proc() return immediately when the port is disabled.
Fixes an issue introduced in open-mpi/ompi@0a2ce58040

This is a one-off commit for the v4.0.x branch since btl/openib has been removed from master.

Refs. open-mpi/ompi#6137

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
2019-03-19 09:24:25 +09:00
Gilles Gouaillardet
d7053a306a btl/openib: delay UCX warning to add_procs()
If UCX is available, then pml/ucx will be used instead of
pml/ob1 + btl/openib, so there is no need to warn about
btl/openib not supporting Infiniband.

Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>

(cherry picked from commit open-mpi/ompi@0a2ce58040)
2019-03-19 09:24:00 +09:00
Howard Pritchard
ceb93d7c03
Merge pull request #6491 from bosilca/v4.0.x
v4.0.x: Cherry-pick fixes for issue #6258 from master (vader fixes)
2019-03-15 17:08:52 -06:00
Howard Pritchard
27899b0e8f
Merge pull request #6486 from hoopoepg/topic/check-ucx-params-v4.0
PML/SPML/UCX: added evaluation of mmap events - v4.0
2019-03-14 17:02:46 -06:00
Nathan Hjelm
3df8ed9cc0
btl/vader: fix fragment sizes used by free lists
This commit fixes a bug introduced in
f62d26ddbc8cda4d985cceee531a2ec32406d1f6. That commit changed how
vader allocates fragment memory from the shared memory
segment. Unfortunately, the values used for the fragment sizes did not
include space for the fragment header. This can cause an overrun of
data from one fragment to the header of the next fragment.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-03-14 17:25:31 -04:00
Nathan Hjelm
20017d345e
btl/vader: use basic mpool type to handle frag/fbox allocation
This commit updates btl/vader to use an mpool for handling all shared
memory allocations (frags, fboxes).

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-03-14 17:21:12 -04:00
Nathan Hjelm
bac6024b5a
mpool: add new base module type "basic"
This commit adds a new mpool base module type: basic. This module can
be used with an opal_free_list_t to allocate space from a
pre-allocated block (such as a shared memory region). The new module
only supports allocation and is not meant for more dynamic use cases
at this time.

Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
2019-03-14 17:20:30 -04:00
Mark Allen
fcf53becc3 opal_hwloc_base_cset2str() off-by-1 in its strncat()
I think the strncat() calls here need to be of the form
    strncat(str, new_str_to_add, len - strlen(new_str_to_addstr) - 1);
since in the OMPI calls len is being used as total number of bytes
in str.

strncat(dest,src,n) on the other hand is documented as writing up to
n chars from the incoming string plus 1 for the null, for n+1 total
bytes it can write.

Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit 30d60994d258f5f0b7c432efd284d1b6b8333faf)

Conflicts:
	opal/mca/hwloc/base/hwloc_base_util.c
2019-03-14 13:08:25 -04:00
Sergey Oblomov
bed8141088 COMMON/UCX: rewording of hooks suggestion
- also updated output macro

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit c319cf9adefb69c78a73eb4a83a40dee5b697a53)
2019-03-14 16:48:36 +02:00
Sergey Oblomov
14c271f993 PML/SPML/UCX: added evaluation of mmap events
- there was a set of UCX related issues reported which caused
  by mmap API hooks conflicts. We added diagnostic of such
  problems to simplify bug-resolving pipeline

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
(cherry picked from commit d8e3562bae700d84873c1d5ca9c45c846d7387ed)
2019-03-14 16:48:25 +02:00
Aurelien Bouteiller
cf34de33eb Avoid a double lock interlock when calling pmix_finalize
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2019-03-08 15:33:17 -05:00
Jeff Squyres
8c4c982271 btl/usnic: amend Makefile.am fix from b4097626ab
Use $(AM_CPPFLAGS) in $(usnic_btl_run_tests_CPPFLAGS) so that we don't
have to replicate hard-coded values.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 14563770a1d64c465ee1f205c9981de39970bb33)
2019-03-05 09:42:03 -08:00