This commit looks large, but its really mostly a cleanup step.
1. introduce proper error handling for the return values of fcntl and the fbtl_posix_lock function
2. rename a parameter to more accurately reflect what it does
3. introduce an mca parameter in the fs/ufs component that allows to control
what the level of locking the user would like to enforce
4. move the initialization of the fs_block_size parameter from fs/ufs into the
common/ompio component. An fs component might be allowed to overwrite this
value, but none of the actual fs components do that.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
in case a named semaphore is used, it is necessary to close the semaphore to remove
all sm segments. sem_unlink just removes the name references once all proceeses have closed
the sem.
Fixes issue: #4336
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
sharedfp/sm: unlink only needs to be called by one process
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
If hcoll fails to create mpi derived type let's set zero_dte on this dtype.
This will save cycles on subsequent collective calls with the same derived
type since we will not try to create hcoll type again.
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
The ompi_datatype_get_single_predefined_type_from_args() recurses down
into a constructed type to identify what base datatype it's built from
if it's built from a single type. But if the type has MPI_LB/MPI_UB,
for example
lens[0] = 1;
lens[1] = 1;
disps[0] = 0;
disps[1] = 0;
types[0] = MPI_LB;
types[1] = MPI_INT;
MPI_Type_create_struct(2, lens, disps, types, &mydt);
then this function will see the base type MPI_LB as differing from MPI_INT
and will identify mydt as not being constructed from a single base type, so
the type will be rejected for calls like MPI_Accumulate.
I think those "meta data" types shouldn't result in rejection like that, and
the above mydt should still be identified as having a single base type
of MPI_INT.
Addition: boslica wanted another change discussed here
https://github.com/open-mpi/ompi/pull/3609
relating to the calculation for "count" after identifying the
predefined_type that was being used.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
romio314 is a just a component that does not require Fortran bindings,
so simply disable Fortran support to prevent warnings about deprecated flags
Fixesopen-mpi/ompi#4281
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
If Open MPI is configured with CUDA, then user also should be using a CUDA build of
PSM2 and therefore be setting PSM2_CUDA environment variable to 1 while using
CUDA buffers for transfers. If we detect this setting to be missing, force set
it. If user wants to use this build for regular (Host buffer) transfers, we
allow the option of setting PSM2_CUDA=0, but print a warning
message to user that it is not a recommended usage scenario.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
Replaced matching array with k and bcast with scatter.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Guillaume Mercier <mercier@labri.fr>
As the reordering is an optional step, if any operation during the
reorder fails we can return the duplicata of the original communicator
associated with the topology information.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
This allows mtl_ofi_provider_include to work with layered providers as well.
e.g. --mca mtl_ofi_provider_include "providerX;ofi_rxm"
Signed-off-by: yohann <yohann.burette@intel.com>
the new grouping option simple+ performs all calculations used
for the aggregator selection as if the default file view would be used,
thus avoiding communication in file_set_view all together. This mode
is useful for applications that do not set a file view, but use
explicit offset operations on the default file view.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
Still in the "needs to be done" category:
* mapping/ranking/binding options aren't correctly supported
* if the DVM encounters some errors (e.g., not enough resources for the job), the resulting error is globally set and impacts any subsequent job submission
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Remove two of the three instances of components requiring
64 bit atomics, even on 32 bit systems. The SM OSC component
also uses 64 bit atomics, but is a more complicated fix that
will follow this one. Currently, no one is testing on
platforms that don't provide 64 bit atomics (even in 32 bit
mode), but with the removal of the non-inline assembly for
IA32, the older compilers on Absoft's test systems now
result in no practical way to call cmpxchg8 in 32 bit mode.
At that point, these failures started popping up.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
This commit fixes a compile issue on 32-bit systems that do not
support 64-bit atomic math. The active target path was using 64-bit
atomics exclusively to support PSCW. This commit updates the code to
use either 32 or 64-bit atomic math depending on what is available.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The monitoring code causes MPI_T based tools to segfault when
monitoring is disabled. This happens because the performance
variables remain registered after the common/monitoring
component is dlclosed due to a missing variable registration
flag. This commit adds the necessary flag to all the registered
performance variables.
The issue on github is #4162. Close when applied to master.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
MPI_IN_PLACE is not a valid send buffer for neighborhood collectives, so do not
invoke memchecker in this case.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
PSM2 enables support for GPU buffers and CUDA managed memory and it can
directly recognize GPU buffers, handle copies between HFIs and GPUs.
Therefore, it is not required for OMPI to handle GPU buffers for pt2pt cases.
In this patch, we allow the PSM2 MTL to specify when
it does not require CUDA convertor support. This allows us to skip CUDA
convertor init phases and lets PSM2 handle the memory transfers.
This translates to improvements in latency.
The patch enables blocking collectives and workloads with GPU contiguous,
GPU non-contiguous memory.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
they are supposed to be unsigned, casting them to a signed
value for all atomic operations is as errorprone as handling
them as signed entities.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>