This bug was first seen in a different product that's using the same
interception code as OMPI. But I think it's potentially in OMPI too.
In my vanilla build of OMPI master on RH8 if I "gdb libopen-pal.so" and
"disassemble intercept_brk", I'm seeing a suspicious extra instruction
in front of PATCHER_BEGIN:
0x00000000000d6778 <+40>: std r2,24(r1) // something gcc put in front
0x00000000000d677c <+44>: std r2,96(r1) // PATCHER_BEGIN's toc_save
0x00000000000d6780 <+48>: nop // NOPs from PATCHER_BEGIN
0x00000000000d6784 <+52>: nop // that get replaced
0x00000000000d6788 <+56>: nop // by instructions that
0x00000000000d678c <+60>: nop // change r2
0x00000000000d6790 <+64>: nop //
Later there are loads from that location like
0x000000000019e0e4 <+132>: ld r2,24(r1)
that make me nervous since that's the pre-updated value.
I believe this is the same thing Nathan is describing way back in a9bc692d
and his solution was to put a second call around each interception, where
the outer call is just
intercept_brk():
PATCHER_BEGIN
_intercept_brk()
PATCHER_END
and the inner call _intercept_brk() is where the bulk of the code goes.
What I'm seeing is that _intercept_brk() is being inlined and probably
negating Nathan's fix. So I want to add __opal_attribute_noinline__ to
restore the fix.
With this commit in place, the disassembly of intercept_brk becomes tiny
because it's no longer inlining _intercept_brk() and the susipicious
early save of r2 is gone. I made the same fix to all the intercept_*
functions, although intercept_brk was the only one that had a suspicious
save of r2.
As far as empirical failures though, we only have those from the non-OMPI
product that's using the same patcher code. I'm not actually getting OMPI
to fail from the above suspicious data being saved in r1+24.
Signed-off-by: Mark Allen <markalle@us.ibm.com>
(cherry picked from commit ddd1f578ecfc443d05250e09bf5e5077c6d6f304)