Add logic to handle different architectural capabilities
Detect the compiler flags necessary to build specialized
versions of the MPI_OP. Once the different flavors (AVX512,
AVX2, AVX) are built, detect at runtime which is the best
match with the current processor capabilities.
Add validation checks for loadu 256 and 512 bits.
Add validation tests for MPI_Op.
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
Signed-off-by: dongzhong <zhongdong0321@hotmail.com>
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
(cherry picked from commit 14b3c70628)
Thanks to Stefan Teleman for identifying this issue and providing a
proof-of-concept patch. We ended up revamping the detection of
128-bit atomics to reduce duplicated code and be a slightly simpler --
albiet perhaps a bit more verbose -- approach:
- Remove the --enable-cross-* options; they were confusing and
unnecessary.
- Always try to compile / link the compiler-intrinsic 128-bit atomic
functions.
- Strengthen the C tests we use to be more robust.
- Use m4 to avoid duplicating the C tests multiple times in the .m4
source.
- If not cross-compiling, try to run a short test and ensure that they
actually work (as of Aug 2018, there's at least one platform where
they don't: clang 6 on ARM64). If cross-compiling, just assume that
they work.
- Add more comments about what is going on with all the tests; it's
tricky stuff. Our Future Selves will thank us.
Signed-off-by: George Bosilca <bosilca@icl.utk.edu>
Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
(cherry picked from commit ff9df91887)
We no longer officially support MIPS or ARM before v6. This commit
updates the configury to check for sync builtins on these
architectures and removes the MIPS and IA64 assembly from
opal/include/opal/sys.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Every modern compiler supports either inline assembly or builtin atomic
operations. Because of this it is time to delete all the code associated
with pre-built atomics.
This commit also clean out the DEC and XLC asm checks. Neither check
does anything and the XLC compiler supports GCC ASM.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This removes a copy-and-paste error where we were setting the
OPAL_ASM_SYNC_HAVE_64BIT more than once.
References #3993. Close when on master and v3.0.x.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
test for both 32 and 64 bits.
clang only support 32 bits builtin atomics when -m32 is used
Thanks Paul Hargrove for reporting this.
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
We accepted a change that enabled CMA on s390 and s390x. This change
had the side-effect that we were no longer using the builtin atomics
for these systems. This is a problem since we do not have ASM for
s390 and s390x. This commit restores the atomics.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Look for amd64 in addition to x86_64 as the platform
type for x86_64 assembly. The FreeBSD-packaged
Autoconf package has a patch to return
amd64-unknown-freebsd11.0 instead of the
x86_64-unknown-freebsd11.0 that a stock Autoconf
package would return. Since we want to run Jenkins
builds on FreeBSD, working around the FreeBSD patch
is probably the easiest thing.
Signed-off-by: Brian Barrett <bbarrett@amazon.com>
in this context, AMD64 really means amd64 or em64t, so let's
rename this into X86_64 in order to avoid any confusion
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
there is no need to look for an assembly file when BUILTIN_GCC is used
Fixesopen-mpi/ompi#3032
Refs open-mpi/ompi#3036
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
perl is required by ompi/mpi/man/make_manpage.pl, that is even used in opal.
so simply aborts at configure time if perl is not available
Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
We disabled this support a long time ago. Probably safe to assume
whatever bug we were working around no longer exists.
Closesopen-mpi/ompi#2044
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Compiler implementations are free to include support for atomics that
use locks. Unfortunately lock-free and lock atomics do not mix. Older
versions of llvm on OS X use locks to provide
__atomic_compare_exchange on 128-bit values but are lock-free on
64-bit values. This screws up our lifo implementation which mixes
64-bit and 128-bit atomics on the same values to improve
performance. This commit adds a configure-time check if 128-bit
atomics are lock free. If they are not then the 128-bit __atomic CAS
is disabled and we check for the __sync version as a fallback.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
* atomic: add support for __atomic builtins
This commit adds support for the gcc __atomic builtins. The __sync
builtins are deprecated and have been replaced by these atomics. In
addition, the new atomics support atomic exchange which was not
supported by __sync.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
* atomic: add support for transactional memory
This commit adds support for using transactional memory when using
opal atomic locks. This feature is enabled if the __HLE__ feature is
available and the gcc builtin atomics are in use.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit changes the asm configure logic to fall back on inline asm
atomics on systems that 1) have __sync atomics, 2) do not have 64-bit
__sync atomics, and 3) support 64-bit asm.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit adds an additional check for 64-bit atomic support for __sync
builtins. If 64-bit support is not available the opal_atomic_*_64 atomics
are disabled.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit updates the check for __sync builtin atomics to see if the
compiler supports both __sync_bool_compare_and_swap and
__sync_add_and_fetch. If either of these functions are not available
then we can't use the __sync builtins.
Fixes#1487
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit fixes the check for sync builtin atomics.
AC_COMPILE_IFELSE is insufficient to check for the builtins. Need to
use AC_LINK_IFELSE.
Fixesopen-mpi/ompi#1487
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
This commit removes an erroneous else statement from the OSX built-in
atomics check. The else branch sets the built-in atomics support to
BUILTIN_NO if either opal_cv_asm_builtin is not BUILTIN_NO or OSX
atomics support is disabled.
Signed-off-by: Nathan Hjelm <hjelmn@me.com>
This commit removes alpha asm support. No current processor
manufacturer makes chips compatible with DEC alpha and no
participating organization has alpha processors. This makes it
difficult to support alpha via assembly.
This doesn't mean Open MPI will no longer build/work on alpha
processors. It should continue to work with gcc's builtin sync
atomics.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>
Use of this configuration option can cause crashing, hanging, and
(worse) incorrect results when btl/sm, btl/scif, or btl/vader are
in use. We discussed this at the January 2015 developers meeting
and it was decided to remove the option entirely. This commit does
just that. All usage of OPAL_WANT_SMP_LOCKS has been removed.
per several reports on the devel ML, the opal_lifo test hangs
with intel icc 14.0.0.080 (aka 2013sp1) and intel icc 14.0.1.106 (aka 2013sp1u1).
/* older and more recents compilers work fine
* buggy compilers work also fine but only with -O0 */
Before this commit we checked if the compiler supported compare-and-exchange
on 128-bit values. This turned out to be insufficient. This commit strengthens
the check to see if the processor supports the instruction (or built-in). This
check will not work when cross-compiling (will always disable the 128-bit
atomic) so overrides have been added for this case.
Some versions of gcc require this flag to be set before the __sync
builtin atomic compare and swap will support 128-bit values. If the
flag is required this check adds the flag to the CFLAGS.
A 128-bit compare-and-swap will enable a better atomic lifo implementation
that uses the pointer + counter method to avoid ABA issues. This commit
adds configury to check for the instruction (cmpxchg16b) and adds an
implementation that uses the __int128 type available in C99.