1
1

31085 Коммитов

Автор SHA1 Сообщение Дата
Aurélien Bouteiller
3cd85a9ec5
Add the initial_errhandler info key to MPI_INFO_ENV and populate the
value from prun populated paremeters

Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>

Allow errhandlers to invoke the initial error handler before MPI_INIT

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>

Indentation

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-16 03:10:32 -04:00
Aurélien Bouteiller
703b8c356f
Make error_class and error_string callable before/after
MPI_INIT/FINALIZE

Signed-off-by: Aurélien Bouteiller <bouteill@icl.utk.edu>

make lazy initialization opal unlikely

Signed-off-by: Aurelien Bouteiller <bouteill@icl.utk.edu>
2020-07-16 03:10:32 -04:00
Ralph Castain
7702dfcdd2
Merge pull request #7942 from rhc54/topic/init
Ensure we init and protect values
2020-07-15 07:59:01 -07:00
George Bosilca
8bc1f3d8fb Don't allow any asynchronous CUDA operations.
There are 2 reasons for this:
- pending CUDA events are not progressed by this BTL, so anything that becomes
  asychronous will never be completed.
- we use the packed data on the shared memory backing file, and this will be
  returned to the peer process upon return (thus if we copy asynchronously we
  might not copy the right data).

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-15 01:37:09 -04:00
George Bosilca
0e32b0acef Avoid a lock if no CUDA IPC operations are pending.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-15 01:35:34 -04:00
Ralph Castain
a574addce9
Ensure we init and protect values
Scrub the entire ompi_rte.c file to initialize and protect values
received from PMIx.

Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-07-14 15:25:14 -07:00
Austen Lauria
67d90166cf Revert "Address a race condition in libevent select."
We do not want to be patching upstream components anymore.
The proper method is to get this merged upstream, then
pull it in the next upstream release.

This reverts commit c39fb5758a772c062e20db9b42f2b06805884802.

Signed-off-by: Austen Lauria <awlauria@us.ibm.com>
2020-07-14 16:23:21 -04:00
George Bosilca
fd4ca394e2 Make the smcuda BTL great again.
It has been broken for months because of the lack of initialization of the
HWLOC library. The smcuda process creating the backing file (local rank 0)
uses opal_cache_line_size to align the objects in the backing file, and the
opal_cache_line_size is initialized by default to 128. Later on, when the rest
of the processes attach the same backing file, HWLOC has been called and the
cache size has now been updated to the correct value. If this value is
different than the default one (and they are as most cache sizes are 64 bytes
right now) the objects in the backing file will be misaligned.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-14 01:48:08 -04:00
George Bosilca
96e8cbe25f First step on fixing the BTL API conversion for the SMCUDA BTL
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-13 14:46:10 -04:00
Joshua Ladd
aa8f7f4ede
Merge pull request #7893 from bureddy/cuda-ucx
UCX: initialize cuda from ucx pml component
2020-07-13 14:18:48 -04:00
bosilca
1f237f5fc9
Merge pull request #7419 from bosilca/topic/avx512
Add support for AVX512/AVX2/SSE/MMX
2020-07-13 11:56:50 -04:00
Bert Wesarg
3111877ad9 oshmem/mca/sshmem: Fix build with --enable-mem-debug
`--enable-mem-debug` `#define`s `realloc`/`free` as macros, though macros
are also matched if they appear in references to members. Rename the
members to avoid this matching.

See #6995

Signed-off-by: Bert Wesarg <bert.wesarg@tu-dresden.de>
2020-07-13 08:56:38 +02:00
Devendar Bureddy
2547e24c55 UCX: initialize cuda from ucx pml component
Signed-off-by: Devendar Bureddy <devendar@mellanox.com>
2020-07-12 18:41:40 +03:00
Nathan Hjelm
d0c0cb7144
Merge pull request #7913 from hjelmn/btl_base_atomics_are_awesome
btl: change argument type of BTL receive callbacks
2020-07-11 12:13:26 -06:00
dongzhong
14b3c70628
Add supports for MPI_OP using AVX512, AVX2 and MMX
Add logic to handle different architectural capabilities
Detect the compiler flags necessary to build specialized
versions of the MPI_OP. Once the different flavors (AVX512,
AVX2, AVX) are built, detect at runtime which is the best
match with the current processor capabilities.

Add validation checks for loadu 256 and 512 bits.
Add validation tests for MPI_Op.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-10 21:25:35 -04:00
Howard Pritchard
677b662295
Merge pull request #7912 from tomhers/fix_opal_ofi_compile_bug
BTL/OFI: Fix missing include file.
2020-07-10 14:43:38 -06:00
Jeff Squyres
b62545ef4c
Merge pull request #7921 from jsquyres/pr/fix-fortran8integer-vs-c4int-issue
mpi_f08: fix Fortran-8-byte-INTEGER vs. C-4-byte-int issue
2020-07-10 09:15:55 -04:00
Jeff Squyres
62d1f89622
Merge pull request #7917 from jmciver/fix/rpmbuild-rpm-build-parameter
Fix buildrpm.sh "-r" option used for RPM options specification
2020-07-09 20:58:07 -04:00
Brian Barrett
4a6e1629e8
Merge pull request #7906 from dancejic/multi
common/ofi: added address format check to fix provider selection
2020-07-09 16:59:02 -07:00
Jeff Squyres
98bc7af7d4 mpi_f08: fix Fortran-8-byte-INTEGER vs. C-4-byte-int issue
It is important to have the mpi_f08 Type(MPI_Status) be the same
length (in bytes) as the mpif.h status (which is an array of
MPI_STATUS_SIZE INTEGERs).  The reason is because MPI_Status_ctof()
basically does the following:

MPI_Fint *f_status = ...;
int *s = (int*) &c_status;
for i=0..sizeof(MPI_Status)/sizeof(int)
   f_status[i] = c_status[i];

Meaning: the Fortran status needs to be able to hold as many INTEGERs
are there are C int's that can fit in sizeof(MPI_Status) bytes.

This is because a Fortran INTEGER may be larger than a C int (e.g.,
Fortran 8 bytes vs. C 4 bytes).  Hence, the assignment on the Fortran
side will take sizeof(INTEGER) bytes for each sizeof(int) bytes in the
C MPI_Status.

This commit pads out the mpi_f08 Type(MPI_Status) with enough INTEGERs
to make it the same size as an array of MPI_TYPE_SIZE INTEGERs.
Hence, MPI_Status_ctof() will work properly, regardless of whether it
is assinging to an mpi_f08 Type(MPI_Status) or an mpif.h array of
MPI_STATUS_SIZE INTEGERs.

Thanks to @ahaichen for reporting the issue.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-07-09 15:12:43 -07:00
Nathan Hjelm
ad81839dd5
Merge pull request #7916 from hjelmn/allow_splitting_ompi_c_compiler_and_end_user_c_compiler_so_eventually_we_can_require_a_sane_standards_compliance_compiler_for_building_open_mpi
config: add support for setting the wrapper C compiler
2020-07-09 14:15:46 -06:00
Jeff Squyres
27d30f3d17
Merge pull request #7910 from jsquyres/pr/mr-clean-for-submodules
Trivial helper script to git clean submodules
2020-07-09 09:58:53 -04:00
Nathan Hjelm
88f51fbb8e btl: change argument type of BTL receive callbacks
This commit updates the btl interface to change the parameters
passed to receive callbacks. The interface used to pass the tag,
a btl base descriptor, and the callback context. Most of the
values in the btl base descriptor were unused and only helped
simplify the callbacks from the self btl. All of the arguments
have now been replaced with a single receive callback descriptor.
This descriptor contains the incoming endpoint, data segment(s),
tag, and callback context. All btls have been updated to use
the new callback and the btl interface version has been bumped
to v3.2.0.

As part of this change the descriptor argument (and the segments
contained within it) have been marked as const. The were treated
as const before but this change could allow the compiler to make
better optimization decisions and will enforce that the callback
does not attempt to change the data in the descriptor.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-07-08 07:38:46 -07:00
George Bosilca
ddfb4def2d Second take on fixing the Inel _Atomic atomic operation warning.
We completely disable C11 atomic op support for _Atomic for
all Intel compiler prior to 20200310 (which is currently the
latest released), by switching to our pre-C11 atomic
operations.

Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
2020-07-08 09:58:43 -04:00
Nathan Hjelm
ade1b58d84 config: add support for setting the wrapper C compiler
This PR adds a new configure option: --with-wrapper-cc.
This option allows the user to set the compiler that will
be invoked by mpicc, shmemcc, etc. This allows the user
to build Open MPI with one compiler (a C standards compliant
compiler like clang or gcc) and have the wrapper use
another compiler (icc for example). This allows building
Open MPI with the best available compiler while still
supporting compiling Open MPI for a specific compiler
suite.

Signed-off-by: Nathan Hjelm <hjelmn@google.com>
2020-07-07 22:43:43 -07:00
John McIver
da701042b5 Fix buildrpm.sh "-r" option used for RPM options specification
The "-r" option now concatenates using the rpmbuild_options
variable. The "-r" option in prior versions of buildrpm.sh
concatenated using the configure_options variable, which uses special
delineation for Autoconf options (first word of argument string is
"configure_options"). This resulted in an RPM build failure as the
Autoconf options would contain nested RPM option statements.

Signed-off-by: John K. McIver III <john.mciver.iii@gmail.com>
2020-07-07 22:31:51 -06:00
tomhers
88f9d2c90f BTL/OFI: Fix missing include file.
The missing include file causes an error when using an external version of LibEvent.

Signed-off-by: tomhers <tom.herschberg@gmail.com>
2020-07-06 16:32:37 -04:00
Austen Lauria
dbc56758b6
Merge pull request #7802 from badgerious/mtl_ofi_cqread_break
mtl/ofi: break from progress loop when events are read
2020-07-06 09:20:07 -04:00
Jeff Squyres
86f4128e12 Trivial helper script to git clean submodules
Since we added the use of git submodules recently, this trivial script
has been helpful to me to "git clean" not only the top-level Open MPI
repo, but also all the included submodules, too.

NOTE: this script does the (harsh) "git clean -dfx" command, which
deletes everything that git does not know about.  Use with care!

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
2020-07-06 04:57:42 -07:00
Jeff Squyres
a7ed13d74a
Merge pull request #7907 from cniethammer/null_pointer_arithmetic_fix
Fix null pointer arithmetic resulting in potential undefined behavior
2020-07-06 07:04:56 -04:00
Christoph Niethammer
a3483e4b71 Fix null pointer arithmetic resulting in potential undefined behavior
NULL pointer arithmetic is undefined behaviour in c.
The payload_ptr can be NULL in the moment when mpool is not initialized.

References from the c11 standard:
- 6.5.6 Additive operators
- 6.3.2.3 Pointers

Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
2020-07-03 22:18:17 +02:00
Nikola Dancejic
7e46371301 common/ofi: added address format check to fix provider selection
bugfix: provider selection would not differentiate between ipv4
and ipv6 addresses which would cause some nodes to be unable
to communicate between each other. Adding a check for address
format to provider selection to ensure that all nodes use the
same address format.

Signed-off-by: Nikola Dancejic <dancejic@amazon.com>
2020-07-02 23:45:59 +00:00
Jeff Squyres
e547e2d3ea
Merge pull request #7905 from cniethammer/integer_shift_fix
Fix shifting signed integer
2020-07-02 14:35:40 -04:00
Christoph Niethammer
8483ec41ab Fix shifting signed integer
Shifting signed int will lead to undefined behavior in case of 1<<31 here

Signed-off-by: Christoph Niethammer niethammer@hlrs.de
2020-07-02 17:43:45 +02:00
Austen Lauria
d1a9c8a092
Merge pull request #7904 from cniethammer/delete_duplicate_cpuset_initializer
Fix initialization warning reported by clang-10
2020-07-02 11:13:18 -04:00
Christoph Niethammer
6ac111c6dc Fix initialization warning reported by clang-10
Delete duplicate initializer for cpuset in opal_process_info.

Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
2020-07-02 15:15:54 +02:00
Christoph Niethammer
7f6263cdd4 Update out of date documentation for opal_cmd_line_create()
Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
2020-06-30 22:04:55 +02:00
Christoph Niethammer
f9c42c96d6 Fix wrong enum type in default command line option category assignment
Signed-off-by: Christoph Niethammer <niethammer@hlrs.de>
2020-06-30 22:04:48 +02:00
Austen Lauria
410596ee1c
Merge pull request #7680 from janjust/oshmem-default-nb-params
oshmem spml ucx: set default nonblocking put/get progress thresholds
2020-06-30 13:56:15 -04:00
Austen Lauria
9b86f1442a
Merge pull request #7823 from jsquyres/pr/put-osc-pt2pt-back
Fix typos in OSC RDMA BTL allowlist
2020-06-30 10:55:16 -04:00
Austen Lauria
b6b300d25d
Merge pull request #6784 from abouteiller/export/event-infloop
Address a race condition in libevent select.
2020-06-30 10:26:57 -04:00
Todd Kordenbrock
4358e75a75
Merge pull request #7866 from tkordenbrock/topic/master/portals4.fix-inappropriate-use-of-abort
portals4: fix inappropriate use of abort() in mtl-portals4 and coll-portals4 components
2020-06-30 08:46:03 -05:00
Valentin Petrov
d366812030 coll/hcoll: compile warning fix
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2020-06-30 09:44:46 +03:00
Valentin Petrov
1d54071fc1 coll/hcoll: reduce_scatter(block) interface
Signed-off-by: Valentin Petrov <valentinp@mellanox.com>
2020-06-30 09:44:46 +03:00
Austen Lauria
868eee31c1
Merge pull request #7883 from hoopoepg/topic/fixed-potential-deadlock-wpool
UCX/WPOOL: fixed potential deadlock
2020-06-29 17:21:39 -04:00
Austen Lauria
3ed466e629
Merge pull request #7800 from abouteiller/mpi-next/errors_abort
MPI4: Add ERRORS_ABORT infrastructure
2020-06-29 15:45:29 -04:00
Austen Lauria
a26e494953
Merge pull request #7882 from devreal/osc-rdma-noncontig-requests
osc rdma: check for outstanding fragments before completing a request (II)
2020-06-29 09:51:47 -04:00
Sergey Oblomov
a383312393 UCX/WPOOL: fixed potential deadlock
- fixed funcs:
  opal_common_ucx_wpmem_putget
  opal_common_ucx_wpmem_cmpswp
  opal_common_ucx_wpmem_post
  opal_common_ucx_wpmem_fetch
  opal_common_ucx_wpmem_fetch_nb

Signed-off-by: Sergey Oblomov <sergeyo@mellanox.com>
2020-06-29 13:40:50 +03:00
Ralph Castain
f565ecf80a
Merge pull request #7886 from rhc54/topic/sync
Sync ot PMIx/PRRTE master branches
2020-06-27 19:59:01 -07:00
Ralph Castain
ba27fb79b5
Sync ot PMIx/PRRTE master branches
Signed-off-by: Ralph Castain <rhc@pmix.org>
2020-06-27 16:50:09 -07:00