This commit changes how the single-copy emulation in the vader btl
operates. Before this change the BTL set its put and get limits
based on the max send size. After this change the limits are unset
and the put or get operation is fragmented internally.
References #6568
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
(cherry picked from commit ae91b11de2)
If both types of interfaces are enabled, don't error out if one of them
isn't able to open listener sockets. Only one interface family may be
available on some machines, but someone might want to build the code to
run more generally.
Refs https://github.com/pmix/prrte/pull/249
Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit 06d188ebf3)
This patch fixes the merge of contiguous elements into larger but more
compact datatypes, and allows for contiguous elements to have thir
blocklen increasing instead of the count. The idea is to always maximize
the blocklen, aka. the contiguous part of the datatype.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 41e6f55807)
* Fix#6618
- See comments on Issue #6618 for finer details.
* The `plm/rsh` component uses the highest priority `routed` component
to construct the launch tree. The remote orted's will activate all
available `routed` components when updating routes. This allows the
opportunity for the parent vpid on the remote `orted` to not match
that which was expected in the tree launch. The result is that the
remote orted tries to contact their parent with the wrong contact
information and orted wireup will fail.
* This fix forces the orteds to use the same `routed` component as
the HNP used when contructing the tree, if tree launch is enabled.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
These variables were renamed in
904276bb44caec207638247f23139bc21bc6a09e; update them to use the new
names.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 2ab8109be1)
open-mpi/ompi@0fe756d416 Introduced
a bug in coll/hcoll component. The ompi_requests allocated by
libhcoll would be treated as coll_base_nbc_request during
ompi_coll_base_retain_<> call. Afterwards this would lead to a
segv in the request cleanup.
Fix: since libhcoll interface does not distinguish between the
blocling/non-blocking requests use coll_base_nbc_request all the
time and initialize it properly in
coll/hcoll/get_coll_handle(). It is still within 2 cache lines.
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
https://github.com/open-mpi/ompi/pull/6895 fixed the code in orterun.c
to allow running as root if both OMPI_ALLOW_RUN_AS_ROOT and
OMPI_ALLOW_RUN_AS_ROOT_CONFIRM env vars are set. However, this
env-var-checking code already exists in
orte_submit.c:orte_submit_init() -- it looks like the
geteuid()/getenv()-checking code here in orterun is now duplicate
code.
So let's just get rid of the duplicate code.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit 197beb30d5)
I found that I needed to apply the same change as #5597 to orterun.c for the environment variables to work correctly.
Signed-off-by: Simon Byrne <simonbyrne@gmail.com>
(cherry picked from commit 9c8671c48b)
Update PMIx to latest master to get supporting updates. For
connect/accept (part of comm_spawn as well), lookup locality for all
participating procs on the node and compute the relative locality so it
can be used for MPI operations.
Signed-off-by: Ralph Castain <rhc@pmix.org>
(cherry picked from commit d202e10c14)
To avoid fully initializing the osc/ucx component for MPI application
that are not using One-Sided functionality, the initialization happens
at the first MPI window creation.
This commit ensures atomicity of global state modifications.
ported from: 6678ac0f55
Signed-off-by: Artem Polyakov <artpol84@gmail.com>
fix alignment, and fix error path
a non blocking collective might return ompi_request_null, so we should not
retain anything in that case.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit open-mpi/ompi@63d3ccde9d)
Since ompi_coll_base_nbc_request_t is to be used in an
opal_free_list_t, it must be returned into a "clean" state.
So cleanup some data in the callback completion subroutines.
This fixes a regression introduced in open-mpi/ompi@0fe756d416
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
(cherry picked from commit open-mpi/ompi@0862c409f1)