Discussion:
movmem pattern and missed alignment
Paul Koning
2018-10-08 13:57:11 UTC
Permalink
I have a movmem pattern in my target that pays attention to the alignment argument.

GCC isn't passing in the expected alignment part of the time. I have this test case:

extern int *i, *j;
extern int iv[40], jv[40];

void f1(void)
{
__builtin_memcpy (i, j, 32);
}

void f2(void)
{
__builtin_memcpy (iv, jv, 32);
}

When the movmem pattern is called for f1, alignment is 1. In f2, it is 2 (int is 2 bytes in pdp11) as expected.

The compiler clearly knows that int* points to aligned data, since it generates instructions that assume alignment (this is a strict-alignment target) when I dereference the pointer. But somehow it gets it wrong for block move.

I also see this for the individual move operations that are generated for very short memcpy operations; if the count is 4, I get four move byte operations for f1, but two move word operations for f2.

This seems like a bug. Am I missing something?

paul
Richard Biener
2018-10-08 15:09:49 UTC
Permalink
Post by Paul Koning
I have a movmem pattern in my target that pays attention to the alignment argument.
extern int *i, *j;
extern int iv[40], jv[40];
void f1(void)
{
__builtin_memcpy (i, j, 32);
}
void f2(void)
{
__builtin_memcpy (iv, jv, 32);
}
When the movmem pattern is called for f1, alignment is 1. In f2, it is 2 (int is 2 bytes in pdp11) as expected.
The compiler clearly knows that int* points to aligned data, since it generates instructions that assume alignment (this is a strict-alignment target) when I dereference the pointer. But somehow it gets it wrong for block move.
I also see this for the individual move operations that are generated for very short memcpy operations; if the count is 4, I get four move byte operations for f1, but two move word operations for f2.
This seems like a bug. Am I missing something?
Yes, memcpy doesn't require anything bigger than byte alignment and
GCC infers alignemnt
only from actual memory references or from declarations (like iv /
jv). For i and j there
are no dereferences and thus you get alignment of 1.

Richard.
Post by Paul Koning
paul
Paul Koning
2018-10-08 16:39:53 UTC
Permalink
Post by Richard Biener
Post by Paul Koning
I have a movmem pattern in my target that pays attention to the alignment argument.
extern int *i, *j;
extern int iv[40], jv[40];
void f1(void)
{
__builtin_memcpy (i, j, 32);
}
void f2(void)
{
__builtin_memcpy (iv, jv, 32);
}
When the movmem pattern is called for f1, alignment is 1. In f2, it is 2 (int is 2 bytes in pdp11) as expected.
The compiler clearly knows that int* points to aligned data, since it generates instructions that assume alignment (this is a strict-alignment target) when I dereference the pointer. But somehow it gets it wrong for block move.
I also see this for the individual move operations that are generated for very short memcpy operations; if the count is 4, I get four move byte operations for f1, but two move word operations for f2.
This seems like a bug. Am I missing something?
Yes, memcpy doesn't require anything bigger than byte alignment and
GCC infers alignemnt
only from actual memory references or from declarations (like iv /
jv). For i and j there
are no dereferences and thus you get alignment of 1.
Richard.
Ok, but why is that not a bug? The whole point of passing alignment to the movmem pattern is to let it generate code that takes advantage of the alignment. So we get a missed optimization.

paul
Michael Matz
2018-10-08 17:20:48 UTC
Permalink
Hi,
Post by Paul Koning
Post by Richard Biener
Post by Paul Koning
extern int *i, *j;
extern int iv[40], jv[40];
void f1(void)
{
__builtin_memcpy (i, j, 32);
}
void f2(void)
{
__builtin_memcpy (iv, jv, 32);
}
Yes, memcpy doesn't require anything bigger than byte alignment and
GCC infers alignemnt
only from actual memory references or from declarations (like iv /
jv). For i and j there
are no dereferences and thus you get alignment of 1.
Richard.
Ok, but why is that not a bug? The whole point of passing alignment to
the movmem pattern is to let it generate code that takes advantage of
the alignment. So we get a missed optimization.
Only if you somewhere visibly add accesses to *i and *j. Without them you
only have the "accesses" via memcpy, and as Richi says, those don't imply
any alignment requirements. The i and j pointers might validly be char*
pointers in disguise and hence be in fact only 1-aligned. I.e. there's
nothing in your small example program from which GCC can infer that those
two global pointers are in fact 2-aligned.


Ciao,
Michael.
Andrew Haley
2018-10-08 17:29:15 UTC
Permalink
Post by Michael Matz
Only if you somewhere visibly add accesses to *i and *j. Without them you
only have the "accesses" via memcpy, and as Richi says, those don't imply
any alignment requirements. The i and j pointers might validly be char*
pointers in disguise and hence be in fact only 1-aligned. I.e. there's
nothing in your small example program from which GCC can infer that those
two global pointers are in fact 2-aligned.
So all you'd actually have to say is

void f1(void)
{
*i; *j;
__builtin_memcpy (i, j, 32);
}
--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Paul Koning
2018-10-08 18:38:01 UTC
Permalink
Post by Andrew Haley
Post by Michael Matz
Only if you somewhere visibly add accesses to *i and *j. Without them you
only have the "accesses" via memcpy, and as Richi says, those don't imply
any alignment requirements. The i and j pointers might validly be char*
pointers in disguise and hence be in fact only 1-aligned. I.e. there's
nothing in your small example program from which GCC can infer that those
two global pointers are in fact 2-aligned.
So all you'd actually have to say is
void f1(void)
{
*i; *j;
__builtin_memcpy (i, j, 32);
}
No, that doesn't help. Not even if I make it:

void f1(void)
{
k = *i + *j;
__builtin_memcpy (i, j, 4);
}

The first line does word aligned references to *i and *j, but the memcpy stubbornly remains a byte move.

paul
Michael Matz
2018-10-08 18:43:22 UTC
Permalink
Hi,
Post by Paul Koning
Post by Andrew Haley
So all you'd actually have to say is
void f1(void)
{
*i; *j;
__builtin_memcpy (i, j, 32);
}
void f1(void)
{
k = *i + *j;
__builtin_memcpy (i, j, 4);
}
The first line does word aligned references to *i and *j, but the memcpy stubbornly remains a byte move.
k is a global, so the loads from i/j can't be optimized away? If so, now
you have a missed optimization bug ;-) Might be non-trivial to fix for
general situations (basically the natural alignment can only be inferred
in regions that are dominated by such accesses, but not e.g. for:
if (cond()) k = *i+*j;
memcpy(i,j,4);
as cond() might be always false).


Ciao,
Michael.
Andrew Haley
2018-10-09 08:02:18 UTC
Permalink
Post by Paul Koning
Post by Andrew Haley
Post by Michael Matz
Only if you somewhere visibly add accesses to *i and *j. Without them you
only have the "accesses" via memcpy, and as Richi says, those don't imply
any alignment requirements. The i and j pointers might validly be char*
pointers in disguise and hence be in fact only 1-aligned. I.e. there's
nothing in your small example program from which GCC can infer that those
two global pointers are in fact 2-aligned.
So all you'd actually have to say is
void f1(void)
{
*i; *j;
__builtin_memcpy (i, j, 32);
}
No, that doesn't help.
It could do.
Post by Paul Koning
void f1(void)
{
k = *i + *j;
__builtin_memcpy (i, j, 4);
}
The first line does word aligned references to *i and *j, but the memcpy stubbornly remains a byte move.
Right, so that is a missed optimization.
--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Richard Biener
2018-10-09 08:33:38 UTC
Permalink
Post by Andrew Haley
Post by Paul Koning
Post by Andrew Haley
Post by Michael Matz
Only if you somewhere visibly add accesses to *i and *j. Without them you
only have the "accesses" via memcpy, and as Richi says, those don't imply
any alignment requirements. The i and j pointers might validly be char*
pointers in disguise and hence be in fact only 1-aligned. I.e. there's
nothing in your small example program from which GCC can infer that those
two global pointers are in fact 2-aligned.
So all you'd actually have to say is
void f1(void)
{
*i; *j;
__builtin_memcpy (i, j, 32);
}
No, that doesn't help.
It could do.
Post by Paul Koning
void f1(void)
{
k = *i + *j;
__builtin_memcpy (i, j, 4);
}
The first line does word aligned references to *i and *j, but the memcpy stubbornly remains a byte move.
Right, so that is a missed optimization.
Yes. Note that on GIMPLE alignment of pointers info is carried as
side-info for SSA names
which make the above cases difficult to deal with since the
dereference and the call argument
use the same SSA names. So if you consider

if (i_1 & 7 == 0)
{
k = *i_1;
__builtin_memcpy (i_1, j, 4);
}

then we cannot set the alignment of i_1 at/after k = *i_1 because doing so would
affect the alignment test which we'd then optimize away. We'd need to introduce
a SSA copy to get a new SSA name but that would be optimized away quickly.

So the option would be to change the representation of __builtin_memcpy
either by making it an aggregate assignment or by using a builtin with
explicit alignment or compute alignment at RTL expansion time.

Note the pass that "computes" alignment is currently SSA based (it's
the CCP pass).

Richard.
Post by Andrew Haley
--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Alexander Monakov
2018-10-09 09:00:24 UTC
Permalink
Post by Richard Biener
then we cannot set the alignment of i_1 at/after k = *i_1 because doing so would
affect the alignment test which we'd then optimize away. We'd need to introduce
a SSA copy to get a new SSA name but that would be optimized away quickly.
We preserve __builtin_assume_aligned up to pass-fold-all-builtins, so would it
work to emit it just before the memcpy

i_2 = __builtin_assume_aligned(i_1, 4);
__builtin_memcpy(j, i_2, 32);

in theory?

Alexander
Richard Biener
2018-10-09 09:08:44 UTC
Permalink
Post by Alexander Monakov
Post by Richard Biener
then we cannot set the alignment of i_1 at/after k = *i_1 because doing so would
affect the alignment test which we'd then optimize away. We'd need to introduce
a SSA copy to get a new SSA name but that would be optimized away quickly.
We preserve __builtin_assume_aligned up to pass-fold-all-builtins, so would it
work to emit it just before the memcpy
i_2 = __builtin_assume_aligned(i_1, 4);
__builtin_memcpy(j, i_2, 32);
in theory?
That's still before RTL expansion so I'm not sure that is enough.

Richard.
Post by Alexander Monakov
Alexander
Jakub Jelinek
2018-10-09 09:23:49 UTC
Permalink
Post by Richard Biener
Post by Alexander Monakov
Post by Richard Biener
then we cannot set the alignment of i_1 at/after k = *i_1 because doing so would
affect the alignment test which we'd then optimize away. We'd need to introduce
a SSA copy to get a new SSA name but that would be optimized away quickly.
We preserve __builtin_assume_aligned up to pass-fold-all-builtins, so would it
work to emit it just before the memcpy
i_2 = __builtin_assume_aligned(i_1, 4);
__builtin_memcpy(j, i_2, 32);
in theory?
That's still before RTL expansion so I'm not sure that is enough.
But we likely won't invalidate the computed SSA_NAME_INFO afterwards.

Jakub
Richard Biener
2018-10-09 09:29:38 UTC
Permalink
Post by Jakub Jelinek
Post by Richard Biener
Post by Alexander Monakov
Post by Richard Biener
then we cannot set the alignment of i_1 at/after k = *i_1 because doing so would
affect the alignment test which we'd then optimize away. We'd need to introduce
a SSA copy to get a new SSA name but that would be optimized away quickly.
We preserve __builtin_assume_aligned up to pass-fold-all-builtins, so would it
work to emit it just before the memcpy
i_2 = __builtin_assume_aligned(i_1, 4);
__builtin_memcpy(j, i_2, 32);
in theory?
That's still before RTL expansion so I'm not sure that is enough.
But we likely won't invalidate the computed SSA_NAME_INFO afterwards.
But we've propagated out the i_2 = i_1 copy, no?

Richard.
Post by Jakub Jelinek
Jakub
Alexander Monakov
2018-10-08 17:29:38 UTC
Permalink
Post by Michael Matz
Post by Paul Koning
Ok, but why is that not a bug? The whole point of passing alignment to
the movmem pattern is to let it generate code that takes advantage of
the alignment. So we get a missed optimization.
Only if you somewhere visibly add accesses to *i and *j. Without them you
only have the "accesses" via memcpy, and as Richi says, those don't imply
any alignment requirements. The i and j pointers might validly be char*
pointers in disguise and hence be in fact only 1-aligned. I.e. there's
nothing in your small example program from which GCC can infer that those
two global pointers are in fact 2-aligned.
Well, it's not that simple. C11 6.3.2.3 p7 makes it undefined to form an
'int *' value that is not suitably aligned:

A pointer to an object type may be converted to a pointer to a different
object type. If the resulting pointer is not correctly aligned for the
referenced type, the behavior is undefined.

So in addition to what you said, we should probably say that GCC decides
not to exploit this UB in order to allow code to round-trip pointer values
via arbitrary pointer types?


To put Michael's explanation in different words:

This is not obviously a bug, because static pointer type does not imply the
dynamic pointed-to type. The caller of 'f1' could look like

void call_f1(void)
{
short ibuf[20] = {0}, jbuf[20] = {0};
i = (void *) ibuf;
j = (void *) jbuf;
f1();
}

and it's valid to memcpy from jbuf to ibuf, memcpy does not "see" the
static pointer type, and works as if by dereferencing 'char *' pointers.
(although as mentioned above it's more subtly invalid when assigning to
i and j).

If 'f1' dereferences 'i', GCC may deduce that dynamic type of '*i' is 'int' and
therefore 'i' must be suitably aligned. But in absence of dereferences GCC
does not make assumptions about dynamic type and alignment.

Alexander
Michael Matz
2018-10-08 17:43:35 UTC
Permalink
Hi,
Post by Alexander Monakov
Post by Michael Matz
Only if you somewhere visibly add accesses to *i and *j. Without them
you only have the "accesses" via memcpy, and as Richi says, those
don't imply any alignment requirements. The i and j pointers might
validly be char* pointers in disguise and hence be in fact only
1-aligned. I.e. there's nothing in your small example program from
which GCC can infer that those two global pointers are in fact
2-aligned.
Well, it's not that simple. C11 6.3.2.3 p7 makes it undefined to form an
So in addition to what you said, we should probably say that GCC decides
not to exploit this UB in order to allow code to round-trip pointer values
via arbitrary pointer types?
That's correct, I was explaining from the middle-end perspective. There
we are consciously more lenient as we have to support the real world and
other languages than C. This is one of the cases.


Ciao,
Michael.
Eric Botcazou
2018-10-08 21:43:00 UTC
Permalink
Post by Michael Matz
That's correct, I was explaining from the middle-end perspective. There
we are consciously more lenient as we have to support the real world and
other languages than C. This is one of the cases.
This had worked as Paul expects until GCC 4.4 IIRC and this was perfectly OK
for every language on strict-alignment platforms. This was changed only
because of SSE on x86.
--
Eric Botcazou
Paul Koning
2018-10-09 00:03:52 UTC
Permalink
Post by Eric Botcazou
Post by Michael Matz
That's correct, I was explaining from the middle-end perspective. There
we are consciously more lenient as we have to support the real world and
other languages than C. This is one of the cases.
This had worked as Paul expects until GCC 4.4 IIRC and this was perfectly OK
for every language on strict-alignment platforms. This was changed only
because of SSE on x86.
--
Eric Botcazou
So does that mean this should be a target-specific behavior, but it isn't at the moment?

paul
Richard Biener
2018-10-09 04:22:34 UTC
Permalink
Post by Eric Botcazou
Post by Michael Matz
That's correct, I was explaining from the middle-end perspective.
There
Post by Michael Matz
we are consciously more lenient as we have to support the real world
and
Post by Michael Matz
other languages than C. This is one of the cases.
This had worked as Paul expects until GCC 4.4 IIRC and this was
perfectly OK
for every language on strict-alignment platforms. This was changed only
because of SSE on x86.
And because we ended up ignoring all pointer casts.

Richard.
Alexander Monakov
2018-10-09 05:51:41 UTC
Permalink
Post by Richard Biener
Post by Eric Botcazou
This had worked as Paul expects until GCC 4.4 IIRC and this was perfectly OK
for every language on strict-alignment platforms. This was changed only
because of SSE on x86.
And because we ended up ignoring all pointer casts.
It's not quite obvious what SSE has to do with this - any hint please?

(according to my quick check this changed between gcc-4.5 and gcc-4.6)

Alexander
Eric Botcazou
2018-10-09 06:41:40 UTC
Permalink
Post by Alexander Monakov
It's not quite obvious what SSE has to do with this - any hint please?
SSE introduced alignment constraints into the non-strict-alignment target x86
so people didn't really want to play by the rules of strict-alignment targets.
Post by Alexander Monakov
(according to my quick check this changed between gcc-4.5 and gcc-4.6)
Possibly indeed, I remembered GCC 4.5 as being the turning point.
--
Eric Botcazou
Richard Biener
2018-10-09 08:27:48 UTC
Permalink
Post by Eric Botcazou
Post by Alexander Monakov
It's not quite obvious what SSE has to do with this - any hint please?
SSE introduced alignment constraints into the non-strict-alignment target x86
so people didn't really want to play by the rules of strict-alignment targets.
Yeah. We've walked back and forth for that very issue though. We now require
all targest to play by the same rules -- if you have a *(double *) access then
that has to be aligned according to double.

We couldn't realistically walk back and rely on alignment of addresses based
on their type (like C would allow us to do) because we've thrown away types
on addresses. See also the thread about string-length warning stuff where
we've posted testcases that show you can get arbitrarily typed addresses
into your strlen() calls for example by means of CSE. The middle-end is
simply not prepared to preserve that information.

It was repeatedly suggested that we _could_ derive alignment info from
function parameter types since we rely on precise typing there for example
for points-to analysis (albeit only for restrict qualification processing and
for DECL_BY_REFERENCE "pointers"). That would fix the simple testcase
that was presented here.
Post by Eric Botcazou
Post by Alexander Monakov
(according to my quick check this changed between gcc-4.5 and gcc-4.6)
Possibly indeed, I remembered GCC 4.5 as being the turning point.
It was really changing over several releases, but yes.

Richard.
Post by Eric Botcazou
--
Eric Botcazou
Eric Botcazou
2018-10-09 08:46:26 UTC
Permalink
Post by Richard Biener
It was repeatedly suggested that we _could_ derive alignment info from
function parameter types since we rely on precise typing there for example
for points-to analysis (albeit only for restrict qualification processing
and for DECL_BY_REFERENCE "pointers"). That would fix the simple testcase
that was presented here.
OK, I keep forgetting it and that would be a good compromise indeed.
--
Eric Botcazou
Joseph Myers
2018-10-09 11:53:37 UTC
Permalink
Post by Richard Biener
It was repeatedly suggested that we _could_ derive alignment info from
function parameter types since we rely on precise typing there for example
for points-to analysis (albeit only for restrict qualification processing and
for DECL_BY_REFERENCE "pointers"). That would fix the simple testcase
that was presented here.
Even in that case you mustn't assume alignment for pointer comparisons,
only for dereferences. Assuming it for comparisons breaks e.g. glibc's

# define LC_GLOBAL_LOCALE ((locale_t) -1L)

(locale_t is a pointer-to-pointer-aligned-struct) and other similar
constructs involving magic constants (not dereferenced) of pointer type;
comparisons of a locale_t value against LC_GLOBAL_LOCALE need to work.
--
Joseph S. Myers
***@codesourcery.com
Richard Biener
2018-10-09 11:59:46 UTC
Permalink
Post by Joseph Myers
Post by Richard Biener
It was repeatedly suggested that we _could_ derive alignment info from
function parameter types since we rely on precise typing there for example
for points-to analysis (albeit only for restrict qualification processing and
for DECL_BY_REFERENCE "pointers"). That would fix the simple testcase
that was presented here.
Even in that case you mustn't assume alignment for pointer comparisons,
only for dereferences. Assuming it for comparisons breaks e.g. glibc's
# define LC_GLOBAL_LOCALE ((locale_t) -1L)
(locale_t is a pointer-to-pointer-aligned-struct) and other similar
constructs involving magic constants (not dereferenced) of pointer type;
comparisons of a locale_t value against LC_GLOBAL_LOCALE need to work.
Heh! That's non-conforming!

But yes, looks like it won't fly after all.

Richard.
Post by Joseph Myers
--
Joseph S. Myers
Loading...