The call of MPI_Gather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault in the root process.
The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard (page 150, line 37), sendtype and sendcount parameters should be ignored in this case.
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
- added common logging infrastructure for all
UCX modules
- all UCX modules are switched to new infra
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
- some common functionality of del_procs calls is moved into
mca_common module
- blocking ucp_put call is replaced by non-blocking routine
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
moving some code from fs/ufs into fs/base. The benefit of this approach is
that fs components that are fundamentally based on posix I/O (and only differ
in some non-posix functionality such as setting stripe size, or which hints
are being supported) can avoid having to replicate the same code over and
over again. First beneficiary is the lustre fs component, but more
are to follow soon.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
This commit fixes a typo where a bcast is used instead of the intended
collective (barrier).
References #5262
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The osc/rdma module did not wait for all pending atomics to complete
before tearing down. This could lead to weird issues as the target
location may no longer be registered or allocated.
This commit also fixes an offset calculation issue in
ompi_osc_get_data_blocking ().
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
a bug sneaked into constructing the list of aggregators
processes when using the fileview based grouping options
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
handle the situation where the user requests a non-zero amount
of data but has a zero-size fileview. My instrinct would have been
to return an error code, but according to the test that I used
it should be MPI_SUCCESS and zero bytes. It is definitely better
than segfaulting :-)
THis makes another test from the IBM testsuite pass.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
the seek offset calculation did not treat the offset as a multiple
of the etype provided. Fixing this makes some more ibm tests pass.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
use an allocator to manage temporary buffers when copying
unmanaged data from GPU buffer to host. This is necessary,
since the buffers have to be pinned for better performance,
which is an expensive operation.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
the two_phase compoment does not work with some collective I/O
operations on CUDA buffers due to the data sieving (i.e.
both read and write operations) executed on some buffers, which are
not anticipated in the GPU buffer management of the code.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
this commit adds the initial support for cuda buffers in ompio, for blocking
and non-blocking individual read and write operations.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
two minor updates:
- in all components: use the fh->f_bytes_per_agg value
(which might have been set by an info object) instead
of re-reading the mca parameter
- vulcan and dynamic_gen2: replace one allgather operation
by an allreduce, since it is used to determine the sum
of an array.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>