Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
make things happen before the terminal call
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
The default algorithm selections were out of date and not performing
well. After gathering data from OMPI developers, new default algorithm
decisions were selected for:
allgather
allgatherv
allreduce
alltoall
alltoallv
barrier
bcast
gather
reduce
reduce_scatter_block
reduce_scatter
scatter
These results were gathered using the ompi-collectives-tuning package
and then averaged amongst the results gathered from multiple OMPI
developers on their clusters.
You can access the graphs and averaged data here:
https://drive.google.com/drive/folders/1MV5E9gN-5tootoWoh62aoXmN0jiWiqh3
Signed-off-by: William Zhang <wilzhang@amazon.com>
default
And other streamlining of aborting behavior.
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
Remove OMPI_COMM_ERRORS and use NOHANDLE macros instead.
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
route unbound errors to self error handler
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
Do not raise the error handler from within components
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
If building Open MPI with sanitizers, e.g
$ configure CC=clang CFLAGS=-fsanitize=address ....
configure test programs are also build with the sanitizers and will
report errors resulting in configure to fail.
Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>
Use the same env to transmit the initial error handler to spawnees
Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
There are 2 reasons for this:
- pending CUDA events are not progressed by this BTL, so anything that becomes
asychronous will never be completed.
- we use the packed data on the shared memory backing file, and this will be
returned to the peer process upon return (thus if we copy asynchronously we
might not copy the right data).
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
We do not want to be patching upstream components anymore.
The proper method is to get this merged upstream, then
pull it in the next upstream release.
This reverts commit c39fb5758a772c062e20db9b42f2b06805884802.
Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
It has been broken for months because of the lack of initialization of the
HWLOC library. The smcuda process creating the backing file (local rank 0)
uses opal_cache_line_size to align the objects in the backing file, and the
opal_cache_line_size is initialized by default to 128. Later on, when the rest
of the processes attach the same backing file, HWLOC has been called and the
cache size has now been updated to the correct value. If this value is
different than the default one (and they are as most cache sizes are 64 bytes
right now) the objects in the backing file will be misaligned.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>