The call of MPI_Allgather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault.
The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard, sendtype and sendcount parameters should be ignored in this case.
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
instead of invoking ompi_request_test_all(), that will end up
calling opal_progress() recursively, manually check the status
of the requests.
the same method is used in ompi_comm_request_progress()
Refs open-mpi/ompi#3901
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Move memory hooks init (for request based operation) in osc ucx to window
creation time, to avoid performance issue in MPI initialization.
Signed-off-by: Xin Zhao <xinz@mellanox.com>
This commit fixes a crash that occurs when using btl/vader as an RDMA
btl. This btl supports using CPU atomics and does not support using
the btl for self communication so we must use the local memory
optimizations in osc/rdma.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Data transferred by `MPI_BSEND` may corrupt if all of the following
conditions are met.
- The message size is less than the eager limit.
- The `btl_alloc` function in the BTL interface returns `NULL`
for some reason.
- The MPI program overwrites the send buffer after `MPI_BSEND`
returns.
The problem is in the way of pending a send request in ob1 PML.
The `mca_pml_ob1_send_request_start_copy` function retruns
`OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns
`des = NULL`. In this case, the send request is added to the
`send_pending` list and `MPI_BSEND` returns immediately. Next time
the `mca_pml_ob1_send_request_start_copy` function tries sending,
the user buffer may have been overwritten by the MPI program.
Call hierarchy of `MPI_BSEND`:
```
MPI_Bsend
mca_pml_ob1_send
if (MCA_PML_BASE_SEND_BUFFERED == sendmode)
mca_pml_ob1_isend
MCA_PML_OB1_SEND_REQUEST_START_W_SEQ
mca_pml_ob1_send_request_start_seq
mca_pml_ob1_send_request_start_btl
if (size <= eager_limit)
if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED)
mca_pml_ob1_send_request_start_copy
mca_bml_base_alloc
btl_alloc
if (OMPI_ERR_OUT_OF_RESOURCE == rc)
add_request_to_send_pending
ompi_request_free
```
To solve this problem, we should save the data to the buffer
attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`.
This problem was introduced by ob1 optimization (commits 2b57f422
and a06e491c) in v1.8 series.
Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
This commit fixes an issue identified by MTT where we can have two
different sets of processes on the same node creating a shared memory
window with communicators sharing the same CID. To avoid this issue
the temporary filename now includes the creating processes vpid.
References #5363
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Use internal pack/unpack subroutines that operate on MPI_Aint instead of int
and hence solve some integer overflows.
Thanks Clyde Stanfield for reporting this issue.
Refs open-mpi/ompi#5383
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Current implementation of `coll/base/MPI_Scatter` is based on in-order binomial tree. This tree is right skewed and it provides good performance for a MPI_Gather operation. But for a MPI_Scatter operation left skewed binomial tree is effective.
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
-Updated the design for sync send MPI calls to use 2 protocol bits for
denoting "sync_send" or "sync_send_ack".
-"Sync_send" is added to the send tag only and is masked out in receives
such that it can be read by the original Recv posted in the send/recv
operation.
-"Sync_send_ack" is sent from the recv callback to the send side. This 0
byte send does not generate a completion entry and instead sends the
message and immediately completes the opal completion in the recv.
-Tag formats ofi_tag_1 and ofi_tag_2 have been updated to include 2
more tag bits per format type due to the reduced protocal bits required
by OMPI.
Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com>
The call of MPI_Gather with sendbuf and sendtype parameters equal to MPI_IN_PLACE and NULL correspondingly, produces the segmentation fault in the root process.
The problem is that sendtype is used even when sendbuf value is MPI_IN_PLACE. But according to the standard (page 150, line 37), sendtype and sendcount parameters should be ignored in this case.
Signed-off-by: Mikhail Kurnosov <mkurnosov@gmail.com>
- added common logging infrastructure for all
UCX modules
- all UCX modules are switched to new infra
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
- some common functionality of del_procs calls is moved into
mca_common module
- blocking ucp_put call is replaced by non-blocking routine
Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
moving some code from fs/ufs into fs/base. The benefit of this approach is
that fs components that are fundamentally based on posix I/O (and only differ
in some non-posix functionality such as setting stripe size, or which hints
are being supported) can avoid having to replicate the same code over and
over again. First beneficiary is the lustre fs component, but more
are to follow soon.
Signed-off-by: Edgar Gabriel <egabriel@central.uh.edu>
This commit fixes a typo where a bcast is used instead of the intended
collective (barrier).
References #5262
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
The osc/rdma module did not wait for all pending atomics to complete
before tearing down. This could lead to weird issues as the target
location may no longer be registered or allocated.
This commit also fixes an offset calculation issue in
ompi_osc_get_data_blocking ().
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>