Gimpel lowering question.

Discussion:

Michael Eager

2018-11-28 17:00:56 UTC

I have a small test case which generates poor quality code on my target.
Here is the original:

if (cond1 == 2048 || cond2 == 8)
{
x = x + y;
}
return x;

This ends up generating a series of instructions to compute a flag with
the result of the condition followed by a single compare with zero and
a jump. Better code would be two compares and two jumps.

The gimple is

_1 = cond1 == 2048;
_2 = cond2 == 8;
_3 = _1 | _2;
if (_3 != 0) goto <D.1464>; else goto <D.1465>;
...

so this poor code sequence is essentially a correct translation.

On MIPS, for the same test case, the gimple is different:

if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
<D.1493>:
if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
<D.1491>:
...

which generates the better code sequence.

Can someone give me a pointer where to find where the lowering
pass decides to break up a condition into multiple tests? Is
there a target configuration option which I have overlooked or
maybe set incorrectly?

--
Michael Eager ***@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306

Jeff Law

2018-11-28 17:10:36 UTC

Permalink

Post by Michael Eager
I have a small test case which generates poor quality code on my target.
if (cond1 == 2048 || cond2 == 8)
    {
      x = x + y;
    }
return x;
This ends up generating a series of instructions to compute a flag with
the result of the condition followed by a single compare with zero and
a jump. Better code would be two compares and two jumps.
The gimple is
_1 = cond1 == 2048;
_2 = cond2 == 8;
_3 = _1 | _2;
if (_3 != 0) goto <D.1464>; else goto <D.1465>;
...
so this poor code sequence is essentially a correct translation.
if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
...
which generates the better code sequence.
Can someone give me a pointer where to find where the lowering
pass decides to break up a condition into multiple tests? Is
there a target configuration option which I have overlooked or
maybe set incorrectly?

BRANCH_COST, which comes into play during generation of the initial
trees as well in passes which try to optimize branchy code into
straighter code.

jeff

Michael Eager

2018-11-28 17:47:16 UTC

Permalink

Post by Jeff Law

BRANCH_COST, which comes into play during generation of the initial
trees as well in passes which try to optimize branchy code into
straighter code.

Thanks. I did look at BRANCH_COST, and played with the values, but it
didn't seem to have any affect. I'll take a closer look.

--
Michael Eager ***@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306

Jeff Law

2018-11-28 17:51:42 UTC

Permalink

Post by Jeff Law

Post by Michael Eager
I have a small test case which generates poor quality code on my target.
   if (cond1 == 2048 || cond2 == 8)
     {
       x = x + y;
     }
   return x;
This ends up generating a series of instructions to compute a flag with
the result of the condition followed by a single compare with zero and
a jump. Better code would be two compares and two jumps.
The gimple is
   _1 = cond1 == 2048;
   _2 = cond2 == 8;
   _3 = _1 | _2;
   if (_3 != 0) goto <D.1464>; else goto <D.1465>;
...
so this poor code sequence is essentially a correct translation.
   if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
   if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
...
which generates the better code sequence.
Can someone give me a pointer where to find where the lowering
pass decides to break up a condition into multiple tests? Is
there a target configuration option which I have overlooked or
maybe set incorrectly?

BRANCH_COST, which comes into play during generation of the initial
trees as well in passes which try to optimize branchy code into
straighter code.

Thanks. I did look at BRANCH_COST, and played with the values, but it
didn't seem to have any affect. I'll take a closer look.

I'd start by looking at the state the trees during gimplification then
walk forward or backward as necessary.

Jeff

Andrew Pinski

2018-11-28 22:37:39 UTC

Permalink

Post by Michael Eager

Post by Jeff Law

Post by Michael Eager
I have a small test case which generates poor quality code on my target.
if (cond1 == 2048 || cond2 == 8)
{
x = x + y;
}
return x;
This ends up generating a series of instructions to compute a flag with
the result of the condition followed by a single compare with zero and
a jump. Better code would be two compares and two jumps.
The gimple is
_1 = cond1 == 2048;
_2 = cond2 == 8;
_3 = _1 | _2;
if (_3 != 0) goto <D.1464>; else goto <D.1465>;
...
so this poor code sequence is essentially a correct translation.
if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
...
which generates the better code sequence.
Can someone give me a pointer where to find where the lowering
pass decides to break up a condition into multiple tests? Is
there a target configuration option which I have overlooked or
maybe set incorrectly?

BRANCH_COST, which comes into play during generation of the initial
trees as well in passes which try to optimize branchy code into
straighter code.

Thanks. I did look at BRANCH_COST, and played with the values, but it
didn't seem to have any affect. I'll take a closer look.

Look at LOGICAL_OP_NON_SHORT_CIRCUIT . By defualt, it is defined as:
(BRANCH_COST (optimize_function_for_speed_p (cfun), \
false) >= 2)
But MIPS defines it as 0.

Changing LOGICAL_OP_NON_SHORT_CIRCUIT to 0 will disable this optimization.

LOGICAL_OP_NON_SHORT_CIRCUIT controls both places where (cond1 == 2048
|| cond2 == 8) would remove the branch.

NOTE I think MIPS defines LOGICAL_OP_NON_SHORT_CIRCUIT incorrectly but
that is a different story.

Thanks,
Andrew Pinski

Post by Michael Eager
--
1960 Park Blvd., Palo Alto, CA 94306

Michael Eager

2018-11-29 18:28:50 UTC

Permalink

Post by Andrew Pinski

Post by Michael Eager

Post by Jeff Law

Post by Michael Eager
I have a small test case which generates poor quality code on my target.
if (cond1 == 2048 || cond2 == 8)
{
x = x + y;
}
return x;
This ends up generating a series of instructions to compute a flag with
the result of the condition followed by a single compare with zero and
a jump. Better code would be two compares and two jumps.
The gimple is
_1 = cond1 == 2048;
_2 = cond2 == 8;
_3 = _1 | _2;
if (_3 != 0) goto <D.1464>; else goto <D.1465>;
...
so this poor code sequence is essentially a correct translation.
if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
...
which generates the better code sequence.
Can someone give me a pointer where to find where the lowering
pass decides to break up a condition into multiple tests? Is
there a target configuration option which I have overlooked or
maybe set incorrectly?

BRANCH_COST, which comes into play during generation of the initial
trees as well in passes which try to optimize branchy code into
straighter code.

Thanks. I did look at BRANCH_COST, and played with the values, but it
didn't seem to have any affect. I'll take a closer look.

(BRANCH_COST (optimize_function_for_speed_p (cfun), \
false) >= 2)
But MIPS defines it as 0.
Changing LOGICAL_OP_NON_SHORT_CIRCUIT to 0 will disable this optimization.
LOGICAL_OP_NON_SHORT_CIRCUIT controls both places where (cond1 == 2048
|| cond2 == 8) would remove the branch.
NOTE I think MIPS defines LOGICAL_OP_NON_SHORT_CIRCUIT incorrectly but
that is a different story.
Thanks,
Andrew Pinski

I set BRANCH_COST to 1 for both predicted and non-predicted branches.
This generates the shorter sequence with a compare, but I'm concerned
that it will adversely affect code generation elsewhere. Branches are
higher cost than simple instructions.

Looking at the code generated with BRANCH_COST > 1, it doesn't do what
I indicated above. This did not reduce the number of branches.
In both cases there are two branches in this short test case. When
BRANCH_COST > 1, there are three instructions to do a compare vs one
when BRANCH_COST = 1. I'm at a loss to see where there is any benefit.

I'll take a look at the LOGICAL_OP_NON_SHORT_CIRCUIT settings and see
if this makes a difference.

--
Michael Eager ***@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306

Michael Eager

2018-11-30 19:09:06 UTC

Permalink

Post by Michael Eager

Post by Jeff Law

Post by Michael Eager
I have a small test case which generates poor quality code on my target.
    if (cond1 == 2048 || cond2 == 8)
      {
        x = x + y;
      }
    return x;
This ends up generating a series of instructions to compute a flag with
the result of the condition followed by a single compare with zero and
a jump. Better code would be two compares and two jumps.
The gimple is
    _1 = cond1 == 2048;
    _2 = cond2 == 8;
    _3 = _1 | _2;
    if (_3 != 0) goto <D.1464>; else goto <D.1465>;
...
so this poor code sequence is essentially a correct translation.
    if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
    if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
...
which generates the better code sequence.
Can someone give me a pointer where to find where the lowering
pass decides to break up a condition into multiple tests? Is
there a target configuration option which I have overlooked or
maybe set incorrectly?

BRANCH_COST, which comes into play during generation of the initial
trees as well in passes which try to optimize branchy code into
straighter code.

Thanks. I did look at BRANCH_COST, and played with the values, but it
didn't seem to have any affect. I'll take a closer look.

(BRANCH_COST (optimize_function_for_speed_p (cfun), \
false) >= 2)
But MIPS defines it as 0.
Changing LOGICAL_OP_NON_SHORT_CIRCUIT to 0 will disable this
optimization.
LOGICAL_OP_NON_SHORT_CIRCUIT controls both places where (cond1 == 2048
|| cond2 == 8) would remove the branch.
NOTE I think MIPS defines LOGICAL_OP_NON_SHORT_CIRCUIT incorrectly but
that is a different story.
Thanks,
Andrew Pinski

I set BRANCH_COST to 1 for both predicted and non-predicted branches.
This generates the shorter sequence with a compare, but I'm concerned
that it will adversely affect code generation elsewhere. Branches are
higher cost than simple instructions.
Looking at the code generated with BRANCH_COST > 1, it doesn't do what
I indicated above. This did not reduce the number of branches.
In both cases there are two branches in this short test case. When
BRANCH_COST > 1, there are three instructions to do a compare vs one
when BRANCH_COST = 1. I'm at a loss to see where there is any benefit.
I'll take a look at the LOGICAL_OP_NON_SHORT_CIRCUIT settings and see
if this makes a difference.

Setting LOGICAL_OP_NON_SHORT_CIRCUIT to zero generates the better code
sequence. Thanks for the pointer.

I stepped through the code at the two places where
LOGICAL_OP_NON_SHORT_CIRCUIT is tested, with it set both to 0 and 1, and
it is not clear what the code is trying to do. Or what this parameter
is really supposed to mean. I'll have to go back and look again.

Setting LOGICAL_OP_NON_SHORT_CIRCUIT to 0 or 1 does not change the
number of branches. Why this would depend on BRANCH_COST is not clear,
and especially why it would depend on the costs of branches which are
predicted to fail makes no sense to me.

Is there a use case which would help this make some sense?

--
Michael Eager ***@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306

Richard Biener

2018-11-30 07:39:02 UTC

Permalink

Post by Jeff Law

Post by Michael Eager
I have a small test case which generates poor quality code on my target.
if (cond1 == 2048 || cond2 == 8)
{
x = x + y;
}
return x;
This ends up generating a series of instructions to compute a flag with
the result of the condition followed by a single compare with zero and
a jump. Better code would be two compares and two jumps.
The gimple is
_1 = cond1 == 2048;
_2 = cond2 == 8;
_3 = _1 | _2;
if (_3 != 0) goto <D.1464>; else goto <D.1465>;
...
so this poor code sequence is essentially a correct translation.
if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
...
which generates the better code sequence.
Can someone give me a pointer where to find where the lowering
pass decides to break up a condition into multiple tests? Is
there a target configuration option which I have overlooked or
maybe set incorrectly?

BRANCH_COST, which comes into play during generation of the initial
trees

Sth I don't like very much... maybe we can revisit removing
the few cases in fold-const.c (thus GENERIC folding) and rely
on later GIMPLE passes instead plus on RTL expansion to
do the reverse transform.

Not for GCC 9 of course.

Post by Jeff Law
as well in passes which try to optimize branchy code into
straighter code.
jeff

Jeff Law

2018-11-30 16:08:34 UTC

Permalink

Post by Richard Biener

Post by Jeff Law

Post by Michael Eager
I have a small test case which generates poor quality code on my target.
if (cond1 == 2048 || cond2 == 8)
{
x = x + y;
}
return x;
This ends up generating a series of instructions to compute a flag with
the result of the condition followed by a single compare with zero and
a jump. Better code would be two compares and two jumps.
The gimple is
_1 = cond1 == 2048;
_2 = cond2 == 8;
_3 = _1 | _2;
if (_3 != 0) goto <D.1464>; else goto <D.1465>;
...
so this poor code sequence is essentially a correct translation.
if (cond1 == 2048) goto <D.1491>; else goto <D.1493>;
if (cond2 == 8) goto <D.1491>; else goto <D.1492>;
...
which generates the better code sequence.
Can someone give me a pointer where to find where the lowering
pass decides to break up a condition into multiple tests? Is
there a target configuration option which I have overlooked or
maybe set incorrectly?

BRANCH_COST, which comes into play during generation of the initial
trees

Agreed, 100%. That was supposed to be where Kai's work was headed, but
we never seemed to make significant headway. Getting BRANCH_COST out
of the tree generation/folding would be a good thing at many levels.

There'll be fallout, of course.

jeff

Jakub Jelinek

2018-11-30 16:14:11 UTC

Permalink

Post by Jeff Law

Post by Richard Biener
Sth I don't like very much... maybe we can revisit removing
the few cases in fold-const.c (thus GENERIC folding) and rely
on later GIMPLE passes instead plus on RTL expansion to
do the reverse transform.
Not for GCC 9 of course.

Note, for the GCC 9 timeframe, for PR85368 I wrote today
--param logical-op-non-short-circuit={0,1} support that allows
to override LOGICAL_OP_NON_SHORT_CIRCUIT, because clearly it is impossible
to handle it in testcases otherwise, just -mbranch-cost= and the defaults
aren't good enough, as some targets override LOGICAL_OP_NON_SHORT_CIRCUIT
and figuring out all the details is too hard in tcl.

I prefer a param to make it clear that it won't last too long if we
implement the above for GCC 10 or 11.

Jakub

Jeff Law

2018-11-30 16:18:11 UTC

Permalink

Post by Jakub Jelinek

Post by Jeff Law

Yea, that sounds quite reasonable. The target conditions in those tests
are just insane and obviously unmaintainable. A param to control
behavior seems like a good idea.

The param also makes it easier to experiment with this stuff as we try
to kill the BRANCH_COST and LOGICAL_OP_NON_SHORT_CIRCUIT bits early.
jeff

Continue reading on narkive:

Search results for 'Gimpel lowering question.' (Questions and Answers)

replies

Why is it that whenever god decides to speak through someone, that someone is always the village idiot?

started 2007-09-25 12:38:55 UTC

religion & spirituality