The add_64, sub_64, and cmpset_64 atomics used "+m" (*addr) to
indicate the asm also writes the memory location. This is better than
using a memory clobber. PGI 16.9 introduced a bug that causes a
compiler failure on the "+m" constraint (input/output). It seems to
work with "=m" (output) which matches the 32-bit atomics.
Fixesopen-mpi/ompi#2086
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>