0021616984
Data transferred by `MPI_BSEND` may corrupt if all of the following conditions are met. - The message size is less than the eager limit. - The `btl_alloc` function in the BTL interface returns `NULL` for some reason. - The MPI program overwrites the send buffer after `MPI_BSEND` returns. The problem is in the way of pending a send request in ob1 PML. The `mca_pml_ob1_send_request_start_copy` function retruns `OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns `des = NULL`. In this case, the send request is added to the `send_pending` list and `MPI_BSEND` returns immediately. Next time the `mca_pml_ob1_send_request_start_copy` function tries sending, the user buffer may have been overwritten by the MPI program. Call hierarchy of `MPI_BSEND`: ``` MPI_Bsend mca_pml_ob1_send if (MCA_PML_BASE_SEND_BUFFERED == sendmode) mca_pml_ob1_isend MCA_PML_OB1_SEND_REQUEST_START_W_SEQ mca_pml_ob1_send_request_start_seq mca_pml_ob1_send_request_start_btl if (size <= eager_limit) if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED) mca_pml_ob1_send_request_start_copy mca_bml_base_alloc btl_alloc if (OMPI_ERR_OUT_OF_RESOURCE == rc) add_request_to_send_pending ompi_request_free ``` To solve this problem, we should save the data to the buffer attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`. This problem was introduced by ob1 optimization (commits |
||
---|---|---|
.. | ||
configure.m4 | ||
help-mpi-pml-ob1.txt | ||
Makefile.am | ||
owner.txt | ||
pml_ob1_comm.c | ||
pml_ob1_comm.h | ||
pml_ob1_component.c | ||
pml_ob1_component.h | ||
pml_ob1_cuda.c | ||
pml_ob1_hdr.h | ||
pml_ob1_iprobe.c | ||
pml_ob1_irecv.c | ||
pml_ob1_isend.c | ||
pml_ob1_progress.c | ||
pml_ob1_rdma.c | ||
pml_ob1_rdma.h | ||
pml_ob1_rdmafrag.c | ||
pml_ob1_rdmafrag.h | ||
pml_ob1_recvfrag.c | ||
pml_ob1_recvfrag.h | ||
pml_ob1_recvreq.c | ||
pml_ob1_recvreq.h | ||
pml_ob1_sendreq.c | ||
pml_ob1_sendreq.h | ||
pml_ob1_start.c | ||
pml_ob1.c | ||
pml_ob1.h | ||
post_configure.sh |