SCOLL/BASIC: Fix invalid pSync pointer passed to barrier func
mca_scoll_basic_alltoall() passed (pSync + 1) to barrier function, but the value of _SHMEM_ALLTOALL_SYNC_SIZE is 1, which made the barrier function use an invalid memory location. In particular, this location was not initialized to _SHMEM_SYNC_VALUE, which broke the barrier algorithm and it did not complete: One PE could read 0 from its peer and assume the peer already started the barrier, and then write 1 to the peer. Then, the peer entered the barrier and overwrote the 1 with 0, and then it waited forever to see '1' in its pSync. Found with shmem_verifier test suite. Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
Этот коммит содержится в:
родитель
f2e6d7891e
Коммит
6754bf1465
@ -79,7 +79,7 @@ int mca_scoll_basic_alltoall(struct oshmem_group_t *group,
|
||||
|
||||
/* Wait for operation completion */
|
||||
SCOLL_VERBOSE(14, "[#%d] Wait for operation completion", group->my_pe);
|
||||
rc = BARRIER_FUNC(group, pSync + 1, SCOLL_DEFAULT_ALG);
|
||||
rc = BARRIER_FUNC(group, pSync, SCOLL_DEFAULT_ALG);
|
||||
|
||||
/* Restore initial values */
|
||||
SCOLL_VERBOSE(12, "PE#%d Restore special synchronization array",
|
||||
|
Загрузка…
x
Ссылка в новой задаче
Block a user