From 8a329a797c3c174c1a4cfbff669885551c1ce268 Mon Sep 17 00:00:00 2001 From: Yossi Itigin Date: Sun, 28 Oct 2018 19:00:23 +0200 Subject: [PATCH] SCOLL/BASIC: Fix invalid pSync pointer passed to barrier func mca_scoll_basic_alltoall() passed (pSync + 1) to barrier function, but the value of _SHMEM_ALLTOALL_SYNC_SIZE is 1, which made the barrier function use an invalid memory location. In particular, this location was not initialized to _SHMEM_SYNC_VALUE, which broke the barrier algorithm and it did not complete: One PE could read 0 from its peer and assume the peer already started the barrier, and then write 1 to the peer. Then, the peer entered the barrier and overwrote the 1 with 0, and then it waited forever to see '1' in its pSync. Found with shmem_verifier test suite. (picked from master 6754bf1) Signed-off-by: Yossi Itigin --- oshmem/mca/scoll/basic/scoll_basic_alltoall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/oshmem/mca/scoll/basic/scoll_basic_alltoall.c b/oshmem/mca/scoll/basic/scoll_basic_alltoall.c index 9843d985e7..1698ee1335 100644 --- a/oshmem/mca/scoll/basic/scoll_basic_alltoall.c +++ b/oshmem/mca/scoll/basic/scoll_basic_alltoall.c @@ -79,7 +79,7 @@ int mca_scoll_basic_alltoall(struct oshmem_group_t *group, /* Wait for operation completion */ SCOLL_VERBOSE(14, "[#%d] Wait for operation completion", group->my_pe); - rc = BARRIER_FUNC(group, pSync + 1, SCOLL_DEFAULT_ALG); + rc = BARRIER_FUNC(group, pSync, SCOLL_DEFAULT_ALG); /* Restore initial values */ SCOLL_VERBOSE(12, "PE#%d Restore special synchronization array",