set proper error codes in mca_fs_ufs_file_open by mapping the errno value to
the MPI error code.
Refs. open-mpi/ompi#4443
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
This is a bug fix based on a problem reported on the mailing list.
For very large read/write operations, ompio breaks the operation
down into multiple cycles. The problem was that
one of the variables required to maintain its values
across the different cycles did not do that, and because
of that the calculations of the memory offsets was wrong.
Fixes issue #4453
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
the fs/lustre component has missed out on a number of updates to the fs/ufs component.
This commit tries to import all the changes performed on the fs/ufs component
w.r.t to the file_open operation, including updates on how the amode is set,
error is propegated and setting the fs_block_size value (which is required for
locking purposes).
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
set proper error codes in mca_fs_ufs_file_open by mapping the errno value to
the MPI error code. Fixes an issue reported on the mailing by Wei-keng Liao
Fixes Issue #4443
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
This commit renames the atomic compare-and-swap functions to indicate
the return value. This is in preperation for adding support for a
compare-and-swap that returns the old value. At the same time the
return type has been changed to bool.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The problem is that the waiting thread is cycling using OMPI_LAZY_WAIT_FOR_COMPLETION so it can exercise opal_progress. This probably isn't as critical for the modex step, but definitely necessary for the barrier at the end of mpi_init. The problem this creates is that the lazy macro exits as soon as "active" becomes false, and then we destruct the lock.
However, wakeup_thread sets "active" to false - and then calls the condition broadcast to wakeup any waiting threads. So there is a race condition between that broadcast and the lock destruct.
Add OPAL_ACQUIRE_OBJECT and OPAL_POST_OBJECT memory barriers to help protect against thread race conditions on some platforms
Signed-off-by: Ralph Castain <rhc@open-mpi.org>
Currently, the progress function is incorrectly interpreting any error
value other than a positive value or -FI_EAVAIL to mean CQ is empty.
CQ is empty only if fi_cq_read() call returned -EAGAIN error
code. Fix that here.
While at it, fix help text output for calls made to OFI API.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
The messages should be printed only in the event of CUDA builds and in the
presence of supporting hardware and when PSM2 MTL has actually been selected
for use. To this end, move help text output to component init phase.
Also use opal_setenv/unsetenv() for safer setting, unsetting of the environment
variable and sanitize the help text message.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
This commit looks large, but its really mostly a cleanup step.
1. introduce proper error handling for the return values of fcntl and the fbtl_posix_lock function
2. rename a parameter to more accurately reflect what it does
3. introduce an mca parameter in the fs/ufs component that allows to control
what the level of locking the user would like to enforce
4. move the initialization of the fs_block_size parameter from fs/ufs into the
common/ompio component. An fs component might be allowed to overwrite this
value, but none of the actual fs components do that.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
in case a named semaphore is used, it is necessary to close the semaphore to remove
all sm segments. sem_unlink just removes the name references once all proceeses have closed
the sem.
Fixes issue: #4336
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
sharedfp/sm: unlink only needs to be called by one process
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
If hcoll fails to create mpi derived type let's set zero_dte on this dtype.
This will save cycles on subsequent collective calls with the same derived
type since we will not try to create hcoll type again.
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
The ompi_datatype_get_single_predefined_type_from_args() recurses down
into a constructed type to identify what base datatype it's built from
if it's built from a single type. But if the type has MPI_LB/MPI_UB,
for example
lens[0] = 1;
lens[1] = 1;
disps[0] = 0;
disps[1] = 0;
types[0] = MPI_LB;
types[1] = MPI_INT;
MPI_Type_create_struct(2, lens, disps, types, &mydt);
then this function will see the base type MPI_LB as differing from MPI_INT
and will identify mydt as not being constructed from a single base type, so
the type will be rejected for calls like MPI_Accumulate.
I think those "meta data" types shouldn't result in rejection like that, and
the above mydt should still be identified as having a single base type
of MPI_INT.
Addition: boslica wanted another change discussed here
https://github.com/open-mpi/ompi/pull/3609
relating to the calculation for "count" after identifying the
predefined_type that was being used.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
romio314 is a just a component that does not require Fortran bindings,
so simply disable Fortran support to prevent warnings about deprecated flags
Fixesopen-mpi/ompi#4281
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
If Open MPI is configured with CUDA, then user also should be using a CUDA build of
PSM2 and therefore be setting PSM2_CUDA environment variable to 1 while using
CUDA buffers for transfers. If we detect this setting to be missing, force set
it. If user wants to use this build for regular (Host buffer) transfers, we
allow the option of setting PSM2_CUDA=0, but print a warning
message to user that it is not a recommended usage scenario.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
Replaced matching array with k and bcast with scatter.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Guillaume Mercier <mercier@labri.fr>
As the reordering is an optional step, if any operation during the
reorder fails we can return the duplicata of the original communicator
associated with the topology information.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>