PSM2 enables support for GPU buffers and CUDA managed memory and it can
directly recognize GPU buffers, handle copies between HFIs and GPUs.
Therefore, it is not required for OMPI to handle GPU buffers for pt2pt cases.
In this patch, we allow the PSM2 MTL to specify when
it does not require CUDA convertor support. This allows us to skip CUDA
convertor init phases and lets PSM2 handle the memory transfers.
This translates to improvements in latency.
The patch enables blocking collectives and workloads with GPU contiguous,
GPU non-contiguous memory.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@intel.com>
* Don't overflow the internal datatype count.
Change the type of the count to be a size_t (it does not alter the total
size of the internal structures, so has no impact on the ABI).
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Optimize the datatype creation.
The internal array of counts of predefined types is now only created
when needed, which is either in a heterogeneous environment, or when
one call get_elements. It saves space and makes the convertor creation a
little faster in some cases.
Rearrange the fields in the datatype description structs.
The macro OPAL_DATATYPE_INIT_PTYPES_ARRAY had a bug, and the
static array was only partially created. All predefined types should
have the ptypes array created and initialized.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Fix the boundary computation.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* test/datatype: add test for short unpack on heteregeneous cluster
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Trying to reduce the cost of creating a convertor.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
* Respect the unpack boundaries.
As Gilles suggested on #2535 the opal_unpack_general_function was
unpacking based on the requested count and not on the amount of packed
data provided.
Fixes#2535.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
since Open MPI now requires a C99, and ptrdiff_t type is part of C99,
there is no more need for the abstract OPAL_PTRDIFF_TYPE type.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
clang 7.0 with the picky option on is extremely verbose, and complains
about almost everything. Trying to make him happy, at least regarding
the datatype engine.
The heterogeneous code need to gracefully handly the contiguous
datatype loops in order to have the "#if 0" code path enabled again.
This is a performance issue (the correctness is guaranteed by the
current code).
provide external32 support). Add a pack function allowing to
provide send conversion (needed on little endian machine in
order to pack in the external32 format).
some of the collective modules. Added a new function
opan_datatype_span, to compute the memory span of
count number of datatype, excluding the gaps in the
beginning and at the end. If a memory allocation is
made using the returned value, the gap (also returned)
should be removed from the allocated pointer.
When you "autogen.pl --no-ompi", the AC_SIZEOF(bool) test is not run.
But we *do* run AC_SIZEOF(_Bool), which is the equivalent. So switch
the uses of SIZEOF_BOOL in the code base to be SIZEOF__BOOL, and it's
all good.
This commit does two things. It removes checks for C99 required
headers (stdlib.h, string.h, signal.h, etc). Additionally it removes
definitions for required C99 types (intptr_t, int64_t, int32_t, etc).
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
long double internal representation is arch specific,
and no conversion is done (yet), so MPI_LONG_DOUBLE should not
be used (yet) in heterogeneous mode
A few uninitialized common symbols are remaining:
common symbols generated by flex :
* opal/util/keyval/keyval_lex.l: opal_util_keyval_yyleng
* opal/util/keyval/keyval_lex.o: opal_util_keyval_yytext
* opal/util/show_help_lex.l: opal_show_help_yyleng
* opal/util/show_help_lex.l: opal_show_help_yytext
common symbol generated by "external" hwloc library:
* opal/mca/hwloc/hwloc191/hwloc/src/components.o: component_map