1
1

SCOLL/BASIC: Fix invalid pSync pointer passed to barrier func

mca_scoll_basic_alltoall() passed (pSync + 1) to barrier function, but
the value of _SHMEM_ALLTOALL_SYNC_SIZE is 1, which made the barrier
function use an invalid memory location. In particular, this location
was not initialized to _SHMEM_SYNC_VALUE, which broke the barrier
algorithm and it did not complete: One PE could read 0 from its peer and
assume the peer already started the barrier, and then write 1 to the
peer. Then, the peer entered the barrier and overwrote the 1 with 0, and
then it waited forever to see '1' in its pSync.

Found with shmem_verifier test suite.

Signed-off-by: Yossi Itigin <yosefe@mellanox.com>
Этот коммит содержится в:
Yossi Itigin 2018-10-28 19:00:23 +02:00
родитель f2e6d7891e
Коммит 6754bf1465

Просмотреть файл

@ -79,7 +79,7 @@ int mca_scoll_basic_alltoall(struct oshmem_group_t *group,
/* Wait for operation completion */
SCOLL_VERBOSE(14, "[#%d] Wait for operation completion", group->my_pe);
rc = BARRIER_FUNC(group, pSync + 1, SCOLL_DEFAULT_ALG);
rc = BARRIER_FUNC(group, pSync, SCOLL_DEFAULT_ALG);
/* Restore initial values */
SCOLL_VERBOSE(12, "PE#%d Restore special synchronization array",