Jason A. Donenfeld
2018-10-11 16:07:22 UTC
Hey GCConauts,
I've noticed some strange behavior in gcc > 5.
unsigned int hmmm5(unsigned int a, unsigned int b, unsigned int c)
{
return a + (b << c << c << c << c << c);
}
This function compiles how you'd expect:
mov ecx, edx
sal esi, cl
sal esi, cl
sal esi, cl
sal esi, cl
sal esi, cl
lea eax, [rsi+rdi]
ret
However, when performing more than 5 shifts, gcc switches to using add
instead of lea, which then generates an extra mov instruction:
unsigned int hmmm6(unsigned int a, unsigned int b, unsigned int c)
{
return a + (b << c << c << c << c << c << c);
}
Producing:
mov ecx, edx
mov eax, esi
sal eax, cl
sal eax, cl
sal eax, cl
sal eax, cl
sal eax, cl
sal eax, cl
add eax, edi
ret
Thinking this might be a side effect of avoid_lea_for_addr, I tried
setting '-mtune-ctrl=^avoid_lea_for_addr', but to no avail.
I also couldn't find anything in various documentation and instruction
tables regarding the latencies or scheduling of these functions that
would imply switching to add after 5 operations is somehow better.
I realize this is probably a fairly trivial matter, but I am very
curious if somebody knows which heuristic gcc is applying here, and
why exactly. It's not something done by any other compiler I could
find, and it only started happening with gcc 6.
Regards,
Jason
I've noticed some strange behavior in gcc > 5.
unsigned int hmmm5(unsigned int a, unsigned int b, unsigned int c)
{
return a + (b << c << c << c << c << c);
}
This function compiles how you'd expect:
mov ecx, edx
sal esi, cl
sal esi, cl
sal esi, cl
sal esi, cl
sal esi, cl
lea eax, [rsi+rdi]
ret
However, when performing more than 5 shifts, gcc switches to using add
instead of lea, which then generates an extra mov instruction:
unsigned int hmmm6(unsigned int a, unsigned int b, unsigned int c)
{
return a + (b << c << c << c << c << c << c);
}
Producing:
mov ecx, edx
mov eax, esi
sal eax, cl
sal eax, cl
sal eax, cl
sal eax, cl
sal eax, cl
sal eax, cl
add eax, edi
ret
Thinking this might be a side effect of avoid_lea_for_addr, I tried
setting '-mtune-ctrl=^avoid_lea_for_addr', but to no avail.
I also couldn't find anything in various documentation and instruction
tables regarding the latencies or scheduling of these functions that
would imply switching to add after 5 operations is somehow better.
I realize this is probably a fairly trivial matter, but I am very
curious if somebody knows which heuristic gcc is applying here, and
why exactly. It's not something done by any other compiler I could
find, and it only started happening with gcc 6.
Regards,
Jason