MTL OFI: Fix Deadlock in fi_cancel given completion during cancel
- If a message for a recv that is being cancelled gets completed after the call to fi_cancel, then the OFI mtl will enter a deadlock state waiting for ofi_req->super.ompi_req->req_status._cancelled which will never happen since the recv was successfully finished. - To resolve this issue, the OFI mtl now checks ofi_req->req_started to see if the request has been started within the loop waiting for the event to be cancelled. If the request is being completed, then the loop is broken and fi_cancel exits setting ofi_req->super.ompi_req->req_status._cancelled = false; Signed-off-by: Spruit, Neil R <neil.r.spruit@intel.com> (cherry picked from commit 767135c580f75d3dde9cb9c88601dd18afda949a)
Этот коммит содержится в:
родитель
5704d4fab5
Коммит
9cc6bc1ea6
@ -1003,8 +1003,11 @@ ompi_mtl_ofi_cancel(struct mca_mtl_base_module_t *mtl,
|
|||||||
*/
|
*/
|
||||||
while (!ofi_req->super.ompi_req->req_status._cancelled) {
|
while (!ofi_req->super.ompi_req->req_status._cancelled) {
|
||||||
opal_progress();
|
opal_progress();
|
||||||
|
if (ofi_req->req_started)
|
||||||
|
goto ofi_cancel_not_possible;
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
|
ofi_cancel_not_possible:
|
||||||
/**
|
/**
|
||||||
* Could not cancel the request.
|
* Could not cancel the request.
|
||||||
*/
|
*/
|
||||||
|
Загрузка…
x
Ссылка в новой задаче
Block a user