1
1
openmpi/ompi/mca/pml/ob1
KAWASHIMA Takahiro 0021616984 pml/ob1: Fix data corruption of MPI_BSEND
Data transferred by `MPI_BSEND` may corrupt if all of the following
conditions are met.

- The message size is less than the eager limit.
- The `btl_alloc` function in the BTL interface returns `NULL`
  for some reason.
- The MPI program overwrites the send buffer after `MPI_BSEND`
  returns.

The problem is in the way of pending a send request in ob1 PML.
The `mca_pml_ob1_send_request_start_copy` function retruns
`OMPI_ERR_OUT_OF_RESOURCE` if `mca_bml_base_alloc` function returns
`des = NULL`. In this case, the send request is added to the
`send_pending` list and `MPI_BSEND` returns immediately. Next time
the `mca_pml_ob1_send_request_start_copy` function tries sending,
the user buffer may have been overwritten by the MPI program.

Call hierarchy of `MPI_BSEND`:

```
  MPI_Bsend
    mca_pml_ob1_send
      if (MCA_PML_BASE_SEND_BUFFERED == sendmode)
        mca_pml_ob1_isend
          MCA_PML_OB1_SEND_REQUEST_START_W_SEQ
            mca_pml_ob1_send_request_start_seq
              mca_pml_ob1_send_request_start_btl
                if (size <= eager_limit)
                  if (req_send_mode == MCA_PML_BASE_SEND_BUFFERED)
                    mca_pml_ob1_send_request_start_copy
                      mca_bml_base_alloc
                        btl_alloc
              if (OMPI_ERR_OUT_OF_RESOURCE == rc)
                add_request_to_send_pending
        ompi_request_free
```

To solve this problem, we should save the data to the buffer
attached by `MPI_BUFFER_ATTACH` before leaving `MPI_BSEND`.

This problem was introduced by ob1 optimization (commits 2b57f422
and a06e491c) in v1.8 series.

Signed-off-by: KAWASHIMA Takahiro <t-kawashima@jp.fujitsu.com>
2018-07-12 14:30:58 +09:00
..
configure.m4 Revert "Update to sync with OMPI master and cleanup to build" 2016-11-22 15:03:20 -08:00
help-mpi-pml-ob1.txt Revert "Update to sync with OMPI master and cleanup to build" 2016-11-22 15:03:20 -08:00
Makefile.am mca: Dynamic components link against project lib 2017-08-24 11:56:16 -04:00
owner.txt Revert "Update to sync with OMPI master and cleanup to build" 2016-11-22 15:03:20 -08:00
pml_ob1_comm.c pml/ob1: fixed out of sequence bug. 2018-02-27 13:49:40 -05:00
pml_ob1_comm.h pml/ob1: fixed out of sequence bug. 2018-02-27 13:49:40 -05:00
pml_ob1_component.c Remove warnings identified by clang. 2018-04-14 17:14:12 -04:00
pml_ob1_component.h Revert "Update to sync with OMPI master and cleanup to build" 2016-11-22 15:03:20 -08:00
pml_ob1_cuda.c pml/ob1: do not cache leave_pinned 2017-03-14 09:00:40 -06:00
pml_ob1_hdr.h Revert "Update to sync with OMPI master and cleanup to build" 2016-11-22 15:03:20 -08:00
pml_ob1_iprobe.c ompi/request: Fix a persistent request creation bug 2016-12-08 21:42:05 +09:00
pml_ob1_irecv.c pml/ob1: have memchecker make recv buffer defined again when mca_pml_ob1_recv completes 2017-09-04 11:18:05 +09:00
pml_ob1_isend.c Added Software-based Performance Counters driver code along with several counters. 2018-06-11 22:48:16 -04:00
pml_ob1_progress.c opal/asm: rename existing arithmetic atomic functions 2017-11-30 10:41:22 -07:00
pml_ob1_rdma.c pml/ob1: do not cache leave_pinned 2017-03-14 09:00:40 -06:00
pml_ob1_rdma.h Revert "Update to sync with OMPI master and cleanup to build" 2016-11-22 15:03:20 -08:00
pml_ob1_rdmafrag.c Revert "Update to sync with OMPI master and cleanup to build" 2016-11-22 15:03:20 -08:00
pml_ob1_rdmafrag.h Revert "Update to sync with OMPI master and cleanup to build" 2016-11-22 15:03:20 -08:00
pml_ob1_recvfrag.c Added Software-based Performance Counters driver code along with several counters. 2018-06-11 22:48:16 -04:00
pml_ob1_recvfrag.h pml/ob1: fixed out of sequence bug. 2018-02-27 13:49:40 -05:00
pml_ob1_recvreq.c Added Software-based Performance Counters driver code along with several counters. 2018-06-11 22:48:16 -04:00
pml_ob1_recvreq.h opal/asm: rename existing arithmetic atomic functions 2017-11-30 10:41:22 -07:00
pml_ob1_sendreq.c pml/ob1: Fix data corruption of MPI_BSEND 2018-07-12 14:30:58 +09:00
pml_ob1_sendreq.h pml/ob1: Fix data corruption of MPI_BSEND 2018-07-12 14:30:58 +09:00
pml_ob1_start.c Revert "Update to sync with OMPI master and cleanup to build" 2016-11-22 15:03:20 -08:00
pml_ob1.c pml/ob1: fixed out of sequence bug. 2018-02-27 13:49:40 -05:00
pml_ob1.h pml/ob1: fixed out of sequence bug. 2018-02-27 13:49:40 -05:00
post_configure.sh Revert "Update to sync with OMPI master and cleanup to build" 2016-11-22 15:03:20 -08:00