MCA_BTL_SM_FRAG_SEND) and status success/fail in low bits of pointers we
are passing through circular buffer. The rank that receives ACK doesn't need
to look into data it received and this is a big win since this data is not in
the cache of the rank's CPU. (Note that we can use low bits of pointers because
free_list always return pointers aligned at least to cache line size).
This commit was SVN r13922.