Still in the "needs to be done" category:
* mapping/ranking/binding options aren't correctly supported
* if the DVM encounters some errors (e.g., not enough resources for the job), the resulting error is globally set and impacts any subsequent job submission
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Cisco wrote a bipartite graph solver to properly solve
interface pair selection for usNIC. Using the reachable
framework, the TCP BTL (and possibly the runtime network
code) can use the graph solver to make more optimal pair
selection. Jeff was happy to have the code more broadly
used, but didn't have time to do the move, hence this
commit.
There are a couple of minor changes to the code compared
to the usNIC version. Obviously, the functions have
been renamed to match naming convention for their new
home. Since it's easier to write unit tests for
util/ code, the unit tests have been made first class
tests run at "make check" time. This last bit required
moving some of the definitions into a new header,
bipartite_graph_internal.h, so that they could be
included in both the library code and the test code.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
This commit adds the code necessary to support forming connections across
subnets. The primary changes are to 1) add the gid to the modex, and 2)
use the gid to create the address handle.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
Unlike "orterun", "prun" is a PMIx-only program that discovers the DVM connection instead of requiring that we explicitly provide it. Only build "prun" if PMIx v2.x is available.
This gets the DVM working again, but still is showing problems for multiple executions. I'll detail those in a separate issue. Thus, the DVM should still be considered "broken".
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
to choose the first available non-socket provider.
modified: orte/mca/rml/ofi/rml_ofi_component.c
modified: orte/mca/rml/ofi/rml_ofi_send.c
Signed-off-by: Anandhi Jayakumar <anandhi.s.jayakumar@intel.com>
Remove two of the three instances of components requiring
64 bit atomics, even on 32 bit systems. The SM OSC component
also uses 64 bit atomics, but is a more complicated fix that
will follow this one. Currently, no one is testing on
platforms that don't provide 64 bit atomics (even in 32 bit
mode), but with the removal of the non-inline assembly for
IA32, the older compilers on Absoft's test systems now
result in no practical way to call cmpxchg8 in 32 bit mode.
At that point, these failures started popping up.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
This commit fixes a compile issue on 32-bit systems that do not
support 64-bit atomic math. The active target path was using 64-bit
atomics exclusively to support PSCW. This commit updates the code to
use either 32 or 64-bit atomic math depending on what is available.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
* Even if we are only launching one app context, we might call spawn
later and the remote groups might want their global rank information.
Signed-off-by: Joshua Hursey <jhursey@us.ibm.com>
The recent changes to remove non-inline atomics have caused
a cascade of issues with cmpset_64 on IA32. cmpxchg8 requires
the use of a bunch of registers (2 for every operand, 3 operands),
and one of them is ebx, which is used by the compiler to do
shared library things. Some compilers don't deal well with
ebx being clobbered (I'm looking at you, gcc 4.1). Rather than
continue trying to fight, remove cmpset_64 from the supported
atomic operations on IA32. Other 32 bit platforms (MIPS32,
SPARC32, ARM, etc.) already don't support a 64 bit compare-and-
swap, so while this might slightly reduce performance, it will
at least be correct.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
Both the C++ and Vampir notes appear in release branch notes
already, so remove from the "not on release branch" section.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
The monitoring code causes MPI_T based tools to segfault when
monitoring is disabled. This happens because the performance
variables remain registered after the common/monitoring
component is dlclosed due to a missing variable registration
flag. This commit adds the necessary flag to all the registered
performance variables.
The issue on github is #4162. Close when applied to master.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This test used to have fixed-sized arrays for the mounts that it was
checking. However, we periodically run across machines with more
mounts than can fit into those fixed-size arrays. Rather than
periodically increasing the size of those arrays (after re-discovering
that the error is due to fixed-size arrays), just count how many
entries there are and make arrays that are big enough.
Additionally, add a check to ensure that we don't go over the max size
of the array when reading/filling them.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
before this fix, mca_spml_ucx_component_open was using
oshmem_num_procs() to set the value of params.estimated_num_eps for UCX.
The oshmem_num_procs() function uses oshmem_group_all which will be
initialized after the call to mca_spml_ucx_component_open and therefore,
cannot be used there.
Signed-off-by: Alina Sklarevich <alinas@mellanox.com>
MPI_IN_PLACE is not a valid send buffer for neighborhood collectives, so do not
invoke memchecker in this case.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
PSM2 enables support for GPU buffers and CUDA managed memory and it can
directly recognize GPU buffers, handle copies between HFIs and GPUs.
Therefore, it is not required for OMPI to handle GPU buffers for pt2pt cases.
In this patch, we allow the PSM2 MTL to specify when
it does not require CUDA convertor support. This allows us to skip CUDA
convertor init phases and lets PSM2 handle the memory transfers.
This translates to improvements in latency.
The patch enables blocking collectives and workloads with GPU contiguous,
GPU non-contiguous memory.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
Some OSes have hardcoded limits to prevent overflowing over an int32_t.
We can either detect this at configure (which might be a nicer but
incomplete solution), or always force the pipelined protocol over TCP.
As it only covers data larger than 1GB, no performance penalty is to be
expected.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
they are supposed to be unsigned, casting them to a signed
value for all atomic operations is as errorprone as handling
them as signed entities.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
as the writev and readv support a sum larger than a uint32_t
this version will work. For the other OSes a different patch
is required. This patch is a slight modification of the one
proposed by @ggouaillardet.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
This reverts commit b5ea5e0994
This commit reverts a change that is hopefully not necessary. If this
is the case this will fix#4146.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>