This commit contains the following changes:
- There is a bug in the PGI 16.x betas for ppc64 that causes them to
emit the incorrect instruction for loading 64-bit operands. If not
cast to void * the operands are loaded with lwz (load word and
zero) instead of ld. This does not affect optimized mode. The work
around is to cast to void * and was implemented similar to a
work-around for a xlc bug.
- Actually implement 64-bit add/sub. These functions were missing and
fell back to the less efficient compare-and-swap implementations.
Thanks to @PHHargrove for helping to track this down. With this update
the GCC inline assembly works as expected with pgi and ppc64.
Signed-off-by: Nathan Hjelm <hjelmn@lanl.gov>