as I understand how that one works and don't really understand what
was in the amd64 code (which was copied from before I started working
on the inline assembly). This fixes the race condition we were
seeing on PGI causing test failures
* sync non-inline assembly with inline assembly version
This needs to go to the v1.0 branch
This commit was SVN r9539.