There are 2 reasons for this:
- pending CUDA events are not progressed by this BTL, so anything that becomes
asychronous will never be completed.
- we use the packed data on the shared memory backing file, and this will be
returned to the peer process upon return (thus if we copy asynchronously we
might not copy the right data).
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
We do not want to be patching upstream components anymore.
The proper method is to get this merged upstream, then
pull it in the next upstream release.
This reverts commit c39fb5758a772c062e20db9b42f2b06805884802.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
It has been broken for months because of the lack of initialization of the
HWLOC library. The smcuda process creating the backing file (local rank 0)
uses opal_cache_line_size to align the objects in the backing file, and the
opal_cache_line_size is initialized by default to 128. Later on, when the rest
of the processes attach the same backing file, HWLOC has been called and the
cache size has now been updated to the correct value. If this value is
different than the default one (and they are as most cache sizes are 64 bytes
right now) the objects in the backing file will be misaligned.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
`--enable-mem-debug` `#define`s `realloc`/`free` as macros, though macros
are also matched if they appear in references to members. Rename the
members to avoid this matching.
See #6995
Signed-off-by: Bert Wesarg <bert.wesarg@tu-dresden.de>
Add logic to handle different architectural capabilities
Detect the compiler flags necessary to build specialized
versions of the MPI_OP. Once the different flavors (AVX512,
AVX2, AVX) are built, detect at runtime which is the best
match with the current processor capabilities.
Add validation checks for loadu 256 and 512 bits.
Add validation tests for MPI_Op.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
It is important to have the mpi_f08 Type(MPI_Status) be the same
length (in bytes) as the mpif.h status (which is an array of
MPI_STATUS_SIZE INTEGERs). The reason is because MPI_Status_ctof()
basically does the following:
MPI_Fint *f_status = ...;
int *s = (int*) &c_status;
for i=0..sizeof(MPI_Status)/sizeof(int)
f_status[i] = c_status[i];
Meaning: the Fortran status needs to be able to hold as many INTEGERs
are there are C int's that can fit in sizeof(MPI_Status) bytes.
This is because a Fortran INTEGER may be larger than a C int (e.g.,
Fortran 8 bytes vs. C 4 bytes). Hence, the assignment on the Fortran
side will take sizeof(INTEGER) bytes for each sizeof(int) bytes in the
C MPI_Status.
This commit pads out the mpi_f08 Type(MPI_Status) with enough INTEGERs
to make it the same size as an array of MPI_TYPE_SIZE INTEGERs.
Hence, MPI_Status_ctof() will work properly, regardless of whether it
is assinging to an mpi_f08 Type(MPI_Status) or an mpif.h array of
MPI_STATUS_SIZE INTEGERs.
Thanks to @ahaichen for reporting the issue.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
This commit updates the btl interface to change the parameters
passed to receive callbacks. The interface used to pass the tag,
a btl base descriptor, and the callback context. Most of the
values in the btl base descriptor were unused and only helped
simplify the callbacks from the self btl. All of the arguments
have now been replaced with a single receive callback descriptor.
This descriptor contains the incoming endpoint, data segment(s),
tag, and callback context. All btls have been updated to use
the new callback and the btl interface version has been bumped
to v3.2.0.
As part of this change the descriptor argument (and the segments
contained within it) have been marked as const. The were treated
as const before but this change could allow the compiler to make
better optimization decisions and will enforce that the callback
does not attempt to change the data in the descriptor.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
We completely disable C11 atomic op support for _Atomic for
all Intel compiler prior to 20200310 (which is currently the
latest released), by switching to our pre-C11 atomic
operations.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
This PR adds a new configure option: --with-wrapper-cc.
This option allows the user to set the compiler that will
be invoked by mpicc, shmemcc, etc. This allows the user
to build Open MPI with one compiler (a C standards compliant
compiler like clang or gcc) and have the wrapper use
another compiler (icc for example). This allows building
Open MPI with the best available compiler while still
supporting compiling Open MPI for a specific compiler
suite.
Signed-off-by: Nathan Hjelm <hjelmn@google.com>
The "-r" option now concatenates using the rpmbuild_options
variable. The "-r" option in prior versions of buildrpm.sh
concatenated using the configure_options variable, which uses special
delineation for Autoconf options (first word of argument string is
"configure_options"). This resulted in an RPM build failure as the
Autoconf options would contain nested RPM option statements.
Signed-off-by: John K. McIver III <john.mciver.iii@gmail.com>
Since we added the use of git submodules recently, this trivial script
has been helpful to me to "git clean" not only the top-level Open MPI
repo, but also all the included submodules, too.
NOTE: this script does the (harsh) "git clean -dfx" command, which
deletes everything that git does not know about. Use with care!
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
NULL pointer arithmetic is undefined behaviour in c.
The payload_ptr can be NULL in the moment when mpool is not initialized.
References from the c11 standard:
- 6.5.6 Additive operators
- 6.3.2.3 Pointers
Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
bugfix: provider selection would not differentiate between ipv4
and ipv6 addresses which would cause some nodes to be unable
to communicate between each other. Adding a check for address
format to provider selection to ensure that all nodes use the
same address format.
Signed-off-by: Nikola Dancejic <dancejic@amazon.com>