1
1

[v4.1.x] ompi : add memory barrier in PMIx registration callback

PMIx reigstration callback functions are used when regitering PMIx
event handler.

This patch adjusts two such callback functions:

    model_registration_callback()
         in ompi/interlib/interlib.c and

    ompi_errhandler_registration_callback()
         in ompi/errhandler/errhandler.c

Both of them employes the following code structure:

static void xxx_callback(int status,
			 size_t errhandler_ref,
			 void *cbdata)
{
    myreg_t *trk = (myreg_t*)cbdata;

    trk->status = status;
    interlibhandler_id = errhandler_ref;
    trk->active = false;
}

The workflow is:

1. caller will call opal_pmix.register_evhandler() with
   callback function as an argument.
2. caller will call OMPI_LAZY_WAIT_FOR_COMPLETION(trk.active)
   to wait for trk->active to became false,
3. PMIx do the registration on anther thread, then call the
   registration callback function, which will set trk->active
   to false.
4. caller check trk->status to determine whether registration
   succeeded.

The expected behavior of the registration callback functions therefore
is that trk->status be updated first, then trk->active be set to false.

However, on ARM based systems, the expected behavior is not guaranteed
because ARM uses a relaxed memory model.

To address this issue, this patch added a call to opal_atomic_wmb()
(write memory barrier) after trk->status being set, to achieve the
expected behavior.

Signed-off-by: Wei Zhang <wzam@amazon.com>
This commit is contained in:
Wei Zhang 2020-12-08 01:46:09 +00:00
parent d472f5a40f
commit 6760d531d5
2 changed files with 2 additions and 0 deletions

View File

@ -229,6 +229,7 @@ void ompi_errhandler_registration_callback(int status,
default_errhandler_id = errhandler_ref;
errtrk->status = status;
opal_atomic_wmb();
errtrk->active = false;
}

View File

@ -52,6 +52,7 @@ static void model_registration_callback(int status,
trk->status = status;
interlibhandler_id = errhandler_ref;
opal_atomic_wmb();
trk->active = false;
}
static void model_callback(int status,