Clarify in README what --with-hwloc does in its different use cases.
Also, ensure that the behavior when specifying `--with-hwloc` is the
same as if that option is not specified at all. This is what we did
in Open MPI <= v3.x; looks like we inadvertantly caused `--with-hwloc`
to be synonymous with `--with-hwloc=external` in v4.0.0.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 18c3e1af5ef281e8b502a7ad778256889cc707d1)
syscall() returns a long, but we are invoking shmat(), which returns
a void*.
Signed-off-by: Maxwell Coil <mcoil@nd.edu>
(cherry picked from commit 52a9cce6f3dcd87e9ae66177398b60b9317e9339)
This commit fixes a configure bug that caused flow control to be
disabled regardless of the configure options used.
Signed-off-by: Todd Kordenbrock <thkgcode@gmail.com>
(cherry picked from commit f7e74b6a3d19cd6e3edc3364783311e595f81b3d)
The prior implementation was simply wrong.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 53ebea12aa98ba5440924a8ee914f6707f63608b)
related to #7128
The UCX crew is no longer guaranteeing that the UCT API is going to be frozen,
so this is kind of a whack-a-mole problem trying to keep the BTL UCT working
with various changing UCT APIs.
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
(cherry picked from commit 9d345d9aa000233bec148540b071cecffc94438c)
OpenUCX broke the UCT API again in v1.8. This commit updates
btl/uct to fix compilation with current OpenUCX master
(future v1.8). Further changes will likely be needed for
the final release.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit 526775dfd7ad75c308532784de4fb3ffed25458f)
It was previously accidentally set to 0.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 132e4cab3bc71df0da87368a332d6af0090a6977)
Move the prefix area from the head to the body in relevant size
computations. This fixes a problem in high traffic situations where
usNIC may have sent from unregistered memory.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit fe7f772f21627b01838c007db7cedbbb0ce8b536)
New MCA param: btl_usnic_max_resends_per_iteration. This is the max
number of resends we'll do in a single pass through usNIC component
progress. This prevents progress from getting stuck in an endless
loop of retransmissions (i.e., if more retransmissions are triggered
during the sending of retransmissions). Specifically: we need to
leave the resend loop to allow receives to happen (which may ACK
messages we have sent previously, and therefore cause pending resends
to be moot).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 27e3040dfeba00a9a2615a217c164899f0009e59)
Significantly increase the default retrans timeout. If the
retrans timeout is too soon, we can end up in a retransmission storm
where the logic will continually re-transmit the same frames during a
single run through the usNIC progress function (because the timer for
a single frame expires before we have run through re-transmitting all
the frames pending re-transmission).
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 3cc95d86b2123f38f392e56adca7ac8a1fef6454)
New MCA parameter: btl_usnic_ack_iteration_delay. Set this to the
number of times through the usNIC component progress function before
sending a standalone ACK (vs. piggy-backing the ACK on any other send
going to the target peer).
Use "ticks" language to clarify that we're really counting the number
of times through the usNIC component DATA_CHANNEL completion check (to
check for incoming messages) -- it has no relation to wall clock time
whatsoever.
Also slightly change the channel-checking scheme in usNIC component
progress: only check the PRIORITY channel once (vs. checking it once,
not finding anything, and then falling through the progress_2() where we
check PRIORITY again and then check the DATA channel).
As before, if our "progress" libevent fires, increment the tick
counter enough to guarantee that all endpoints that need an ACK will
get triggered to send standalone ACKs the next time through progress,
if necessary.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 968b1a51b59898877a8c7268d463d3d7d78d86a3)
Rename "get_nsec()" to "get_ticks()" to more accurately reflect that
this function has no correlation to wall clock time at all.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit ce2910a28aea61043b81324c67999f3a47cfe7ac)
Trying out to run processes via mpirun in Podman containers has shown
that the CMA btl_vader_single_copy_mechanism does not work when user
namespaces are involved.
Creating containers with Podman requires at least user namespaces to be
able to do unprivileged mounts in a container
Even if running the container with user namespace user ID mappings which
result in the same user ID on the inside and outside of all involved
containers, the check in the kernel to allow ptrace (and thus
process_vm_{read,write}v()), fails if the same IDs are not in the same
user namespace.
One workaround is to specify '--mca btl_vader_single_copy_mechanism none'
and this commit adds code to automatically skip CMA if user namespaces
are detected and fall back to MCA_BTL_VADER_EMUL.
Signed-off-by: Adrian Reber <areber@redhat.com>
(cherry picked from commit fc68d8a90fe86284e9dc730f878b55c0412f01d2)
This commit changes how the single-copy emulation in the vader btl
operates. Before this change the BTL set its put and get limits
based on the max send size. After this change the limits are unset
and the put or get operation is fragmented internally.
References #6568
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit ae91b11de2314ab11a9842d9738cd14f8f1e393b)
This patch fixes the merge of contiguous elements into larger but more
compact datatypes, and allows for contiguous elements to have thir
blocklen increasing instead of the count. The idea is to always maximize
the blocklen, aka. the contiguous part of the datatype.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 41e6f55807b01ad5c04e8387a3699cf743931f6a)
Start optimizing the code.
This commit divides the operations in 2 parts, the first, outside the
critical part, deals with partial blocks of predefined elements, and the
second, inside the critical path, only deals with full blocks of
elements. This reduces the number of expensive operations in the
critical path and results in a decent performance increase.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Amazing how a bad instruction scheduling can have such a drastic impact
on the code performance. With this change, the get a boost of at least
50% on the performance of data with a small blocklen and/or count.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Optimize contiguous loops by collapsing them into a single element.
During datatype optimization collapse similar elements into larger
blocks.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Upon detecting a datatype loop representation skip the entire loop
according the the remaining space.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
- optimize handling of contiguous with gaps datatypes.
- fixes a performance issue for all datatypes with a count of 1.
- optimize the pack/unpack of contiguous with gaps datatype.
- optimize the case of blocklen == 1
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Move toward a base type of vector (count, type, blocklen, extent, disp)
with disp and extent applying toward the count repertition and blocklen
being a contiguous memory of type type.
Implement 2 optimizations on this description used during type_commit:
- collapse: successive similar datatype descriptions are collapsed
together with an increased count.
- fusion: fuse successive datatype descriptions in order to minimize the
number of resulting memcpy during pack/unpack.
Fixes at the OMPI datatype level including:
- Fix the create_hindexed and vector creation.
- Fix the handling of [get|set]_elements and _count.
- Correctly compute the dispacement for block indexed types.
- Support the MPI_LB and MPI_UB deprecation, aka. OMPI_ENABLE_MPI1_COMPAT.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Commit d7053a3 broke things for the case when Open MPI 4.0.x is built
without UCX support. Problem was it was trying to partially initialize
the btl to try and delay printing of a help message till wireup. Well
this sort of doesn't work in all cases. Rather than keep piling on
changes to support a help message for a BTL that we are deprecating, take
a keep it simple stupid approach.
So, revert most of d7053a3 and instead put the help message back in the
original location, during scan of ports of the available HCAs to check
for whether or not link layer for that port is configured for ethernet or infiniband.
If Open MPI was built with UCX support, don't emit the help message, if
UCX was not linked in, emit the help message.
Verified on a system with connectX5 HCAs configured with two ports configured
for ethernet and two for infiniband.
relates to #6785
Signed-off-by: Howard Pritchard <howardp@lanl.gov>
Per patches from @SteVwonder and @garlick
Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit d4070d5f58f0c65aef89eea5910b202b8402e48b)
This commit updates the uct btl to support the v1.6.x release of
UCX. This release breaks API.
Signed-off-by: Nathan Hjelm <hjelmn@cs.unm.edu>
(cherry picked from commit b78066720c3e3299bd76f2e22d2c0e415db572fc)
Signed-off-by: Geoffrey Paulsen <gpaulsen@us.ibm.com>
This is mostly based off recent UCX additions to their patcher:
https://github.com/openucx/ucx/pull/2703
They added triggers for
* mmap when (flags & MAP_FIXED) && (addr != NULL)
* shmat when (shmflg & SHM_REMAP) && (shmaddr != NULL)
Beyond that I noticed they already had a trigger for
* madvise when (advice == MADV_FREE)
that we didn't so I added that.
And the other main thing is we didn't really have shmat/shmdt
active for some systems because we only had a path for
syscall(SYS_shmdt, ) but we needed to also have a path for
syscall(SYS_ipc, IPCOP_shmdt, ) and same for shmat.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit eb888118e83f56c131aff900b03eab34c92b7805)