1
1

Make the smcuda BTL great again.

It has been broken for months because of the lack of initialization of the
HWLOC library. The smcuda process creating the backing file (local rank 0)
uses opal_cache_line_size to align the objects in the backing file, and the
opal_cache_line_size is initialized by default to 128. Later on, when the rest
of the processes attach the same backing file, HWLOC has been called and the
cache size has now been updated to the correct value. If this value is
different than the default one (and they are as most cache sizes are 64 bytes
right now) the objects in the backing file will be misaligned.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Этот коммит содержится в:
George Bosilca 2020-07-14 01:48:08 -04:00
родитель 96e8cbe25f
Коммит fd4ca394e2

Просмотреть файл

@ -42,6 +42,7 @@
#include <sys/stat.h> /* for mkfifo */
#endif /* HAVE_SYS_STAT_H */
#include "opal/mca/hwloc/base/base.h"
#include "opal/mca/shmem/base/base.h"
#include "opal/mca/shmem/shmem.h"
#include "opal/util/bit_ops.h"
@ -866,6 +867,13 @@ mca_btl_smcuda_component_init(int *num_btls,
* shared-memory segment. this routine sets component sm_max_procs. */
calc_sm_max_procs(num_local_procs);
/* Before we can safely create the backend file we need to know minimal
* information about the local node. We need at least a size of a cache line
* as we align the data in the backing file to it. The simplest way for now is
* to force the HWLOC initialization.
*/
opal_hwloc_base_get_topology();
/* This is where the modex will live some day. For now, just have local rank
* 0 create a rendezvous file containing the backing store info, so the
* other local procs can read from it during add_procs. The rest will just