Discussion:
Gfortran and using C99 cbrt for X ** (1./3.)
Toon Moene
2006-12-03 15:45:47 UTC
Permalink
Richard,

Somewhere, in a mail lost in my large e-mail clash with my ISP
(verizon), you said that gfortran couldn't profit from the pow(x, 1./3.)
-> cbrt(x) conversion because gfortran didn't "know" of cbrt.

Could you be more explicit about this - I'd like to repair this
deficiency, if at all possible.

Thanks in advance !
--
Toon Moene - e-mail: ***@moene.indiv.nluug.nl - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
A maintainer of GNU Fortran: http://gcc.gnu.org/fortran/
Who's working on GNU Fortran:
http://gcc.gnu.org/ml/gcc/2006-01/msg00000.html
Richard Guenther
2006-12-04 16:27:35 UTC
Permalink
Post by Toon Moene
Richard,
Somewhere, in a mail lost in my large e-mail clash with my ISP
(verizon), you said that gfortran couldn't profit from the pow(x, 1./3.)
-> cbrt(x) conversion because gfortran didn't "know" of cbrt.
Could you be more explicit about this - I'd like to repair this
deficiency, if at all possible.
Thanks in advance !
It's a matter of making the cbrt builtin available - I have a patch for this,
but wondered if the fortran frontend can rely on the cbrt library call being
available? Or available in a fast variant, not a fallback implementation in
libgfortran which does pow (x, 1./3.) which will then of course pessimize
pow (x, 2./3.) -> tmp = cbrt(x); tmp * tmp expansion.

Richard.
Howard Hinnant
2006-12-04 18:37:15 UTC
Permalink
Post by Richard Guenther
Post by Toon Moene
Richard,
Somewhere, in a mail lost in my large e-mail clash with my ISP
(verizon), you said that gfortran couldn't profit from the pow(x, 1./3.)
-> cbrt(x) conversion because gfortran didn't "know" of cbrt.
Could you be more explicit about this - I'd like to repair this
deficiency, if at all possible.
Thanks in advance !
It's a matter of making the cbrt builtin available - I have a patch for this,
but wondered if the fortran frontend can rely on the cbrt library call being
available? Or available in a fast variant, not a fallback
implementation in
libgfortran which does pow (x, 1./3.) which will then of course pessimize
pow (x, 2./3.) -> tmp = cbrt(x); tmp * tmp expansion.
Is pow(x, 1./3.) == cbrt(x) ?

My handheld calculator says (imagining a 3 decimal digit machine):

pow(64.0, .333) == 3.99

In other words, can pow assume that if it sees .333, that the client
actually meant the non-representable 1/3? Or must pow assume that .
333 means .333? My inclination is that if pow(x, 1./3.) (computed
correctly to the last bit) ever differs from cbrt(x) (computed
correctly to the last bit) then this substitution should not be done.

-Howard
Richard Guenther
2006-12-04 21:57:38 UTC
Permalink
Post by Howard Hinnant
Post by Richard Guenther
Post by Toon Moene
Richard,
Somewhere, in a mail lost in my large e-mail clash with my ISP
(verizon), you said that gfortran couldn't profit from the pow(x, 1./3.)
-> cbrt(x) conversion because gfortran didn't "know" of cbrt.
Could you be more explicit about this - I'd like to repair this
deficiency, if at all possible.
Thanks in advance !
It's a matter of making the cbrt builtin available - I have a patch for this,
but wondered if the fortran frontend can rely on the cbrt library call being
available? Or available in a fast variant, not a fallback
implementation in
libgfortran which does pow (x, 1./3.) which will then of course pessimize
pow (x, 2./3.) -> tmp = cbrt(x); tmp * tmp expansion.
Is pow(x, 1./3.) == cbrt(x) ?
pow(64.0, .333) == 3.99
In other words, can pow assume that if it sees .333, that the client
actually meant the non-representable 1/3? Or must pow assume that .
333 means .333?
It certainly will _not_ recognize 0.333 as 1/3 (or 0.333333 as used in
Polyhedron aermod). Instead it will require the exponent to be
exactly equal to the result of 1./3. in the precision of the exponent,
with correct rounding.
Post by Howard Hinnant
My inclination is that if pow(x, 1./3.) (computed
correctly to the last bit) ever differs from cbrt(x) (computed
correctly to the last bit) then this substitution should not be done.
C99 7.12.7.1 says "The cbrt functions compute the real cube root
of x." and "The cbrt functions return x**1/3". So it looks to me
that cbrt is required to return the same as pow(x, 1/3).

Richard.
Howard Hinnant
2006-12-04 22:44:50 UTC
Permalink
Post by Richard Guenther
Post by Howard Hinnant
My inclination is that if pow(x, 1./3.) (computed
correctly to the last bit) ever differs from cbrt(x) (computed
correctly to the last bit) then this substitution should not be done.
C99 7.12.7.1 says "The cbrt functions compute the real cube root
of x." and "The cbrt functions return x**1/3". So it looks to me
that cbrt is required to return the same as pow(x, 1/3).
<shrug> For me, this:

#include <math.h>
#include <stdio.h>

int main()
{
printf("pow(1000000., 1./3.) = %a\ncbrt(1000000.) = %a\n",
pow(1000000., 1./3.), cbrt(1000000.));
}

prints out:

pow(1000000., 1./3.) = 0x1.8fffffffffffep+6
cbrt(1000000.) = 0x1.9p+6

I suspect that both are correct, rounded to the nearest least
significant bit. Admittedly I haven't checked the computation by
hand for pow(1000000., 1./3.). But I did the computation using gcc
4.0 on both PPC and Intel Mac hardware, and on PPC using CodeWarrior
math libs, and got the same results on all three platforms.

The pow function is not raising 1,000,000 to the power of 1/3. It is
raising 1,000,000 to some power which is very close to, but not equal
to 1/3.

Perhaps I've misunderstood and you have some way of exactly
representing the fraction 1/3 in Gfortran. In C and C++ we have no
way to exactly represent that fraction except implicitly using cbrt.
(or with a user-defined rational type)

-Howard
Richard Guenther
2006-12-04 23:08:37 UTC
Permalink
Post by Howard Hinnant
Post by Richard Guenther
Post by Howard Hinnant
My inclination is that if pow(x, 1./3.) (computed
correctly to the last bit) ever differs from cbrt(x) (computed
correctly to the last bit) then this substitution should not be done.
C99 7.12.7.1 says "The cbrt functions compute the real cube root
of x." and "The cbrt functions return x**1/3". So it looks to me
that cbrt is required to return the same as pow(x, 1/3).
#include <math.h>
#include <stdio.h>
int main()
{
printf("pow(1000000., 1./3.) = %a\ncbrt(1000000.) = %a\n",
pow(1000000., 1./3.), cbrt(1000000.));
}
pow(1000000., 1./3.) = 0x1.8fffffffffffep+6
cbrt(1000000.) = 0x1.9p+6
I suspect that both are correct, rounded to the nearest least
significant bit. Admittedly I haven't checked the computation by
hand for pow(1000000., 1./3.). But I did the computation using gcc
4.0 on both PPC and Intel Mac hardware, and on PPC using CodeWarrior
math libs, and got the same results on all three platforms.
The pow function is not raising 1,000,000 to the power of 1/3. It is
raising 1,000,000 to some power which is very close to, but not equal
to 1/3.
Perhaps I've misunderstood and you have some way of exactly
representing the fraction 1/3 in Gfortran. In C and C++ we have no
way to exactly represent that fraction except implicitly using cbrt.
(or with a user-defined rational type)
1./3. is represented (round-to-nearest) as 0x1.5555555555555p-2
and pow (x, 1./3.) is of course not the same as the cube-root of x with
exact arithmetic. cbrt as defined by C99 suggests that an
approximation by pow (x, 1./3.) fullfils the requirements. The question
is whether a correctly rounded "exact" cbrt differs from the pow
replacement by more than 1ulp - it looks like this is not the case.

For cbrt (100.) I get the same result as from pow (100., nextafter (1./3., 1))
for example. So, instead of only recognizing 1./3. rounded to nearest
we should recognize both representable numbers that are nearest to
1./3..

Richard.
Joseph S. Myers
2006-12-04 23:24:27 UTC
Permalink
Post by Richard Guenther
is whether a correctly rounded "exact" cbrt differs from the pow
replacement by more than 1ulp - it looks like this is not the case.
They fairly obviously differ for negative arguments, which are valid for
cbrt but not for pow (raising to a fraction with even denominator). (The
optimization from pow to cbrt is valid if you don't care about no longer
getting a NaN from a negative argument. Converting the other way (cbrt to
pow) is only OK if you don't care about negative arguments to cbrt at
all.)
--
Joseph S. Myers
***@codesourcery.com
Richard Guenther
2006-12-04 23:37:55 UTC
Permalink
Post by Joseph S. Myers
Post by Richard Guenther
is whether a correctly rounded "exact" cbrt differs from the pow
replacement by more than 1ulp - it looks like this is not the case.
They fairly obviously differ for negative arguments, which are valid for
cbrt but not for pow (raising to a fraction with even denominator). (The
optimization from pow to cbrt is valid if you don't care about no longer
getting a NaN from a negative argument. Converting the other way (cbrt to
pow) is only OK if you don't care about negative arguments to cbrt at
all.)
True, F.9.4.4 says "pow (x, y) returns a NaN and raises the invalid
floating-point
exception for finite x < 0 and finite non-integer y. I'll adjust the expander
to cover these cases by conditionalizing on tree_nonnegative_p or
HONOR_NANS. I will probably also require flag_unsafe_math_optimizations
as we else will optimize cbrt (x) - pow (x, 1./3.) to zero. Or even
pow (x, 1./3.) - pow (x, nextafter (1./3, 1)).

Richard.
Howard Hinnant
2006-12-05 01:19:57 UTC
Permalink
Post by Richard Guenther
The question
is whether a correctly rounded "exact" cbrt differs from the pow
replacement by more than 1ulp - it looks like this is not the case.
If that is the question, I'm afraid your answer is not accurate. In
the example I showed the difference is 2 ulp. The difference appears
to grow with the magnitude of the argument. On my systems, when the
argument is DBL_MAX, the difference is 75 ulp.

pow(DBL_MAX, 1./3.) = 0x1.428a2f98d7240p+341
cbrt(DBL_MAX) = 0x1.428a2f98d728bp+341

And yes, I agree with you about the C99 standard. It allows the
vendor to compute pretty much any answer it wants from either pow or
cbrt. Accuracy is not mandated. And I'm not trying to mandate
accuracy for Gfortran either. I just had a knee jerk reaction when I
read that pow(x, 1./3.) could be optimized to cbrt(x) (and on re-
reading, perhaps I inferred too much right there). This isn't just
an optimization. It is also an approximation. Perhaps that is
acceptable. I'm only highlighting the fact in case it might be
important but not recognized.

-Howard
Brooks Moses
2006-12-05 04:08:19 UTC
Permalink
Post by Howard Hinnant
Post by Richard Guenther
The question
is whether a correctly rounded "exact" cbrt differs from the pow
replacement by more than 1ulp - it looks like this is not the case.
If that is the question, I'm afraid your answer is not accurate. In
the example I showed the difference is 2 ulp. The difference appears
to grow with the magnitude of the argument. On my systems, when the
argument is DBL_MAX, the difference is 75 ulp.
pow(DBL_MAX, 1./3.) = 0x1.428a2f98d7240p+341
cbrt(DBL_MAX) = 0x1.428a2f98d728bp+341
Another relevant question is whether it's monotonic, in the sense that
cbrt(DBL_MAX) is less than pow(DBL_MAX, 1./3. + 1ulp). If it's not,
that could potentially be trouble.
Post by Howard Hinnant
And yes, I agree with you about the C99 standard. It allows the
vendor to compute pretty much any answer it wants from either pow or
cbrt. Accuracy is not mandated. And I'm not trying to mandate
accuracy for Gfortran either. I just had a knee jerk reaction when I
read that pow(x, 1./3.) could be optimized to cbrt(x) (and on re-
reading, perhaps I inferred too much right there). This isn't just
an optimization. It is also an approximation. Perhaps that is
acceptable. I'm only highlighting the fact in case it might be
important but not recognized.
I think it's important (at least when -fwrong-math ... er, I mean,
-ffast-math is not specified) that pow(x, 1./3.) return the same value
as pow(x, y) when y has been set to 1./3. via some process at runtime.

However, this approximation is probably reasonable for -ffast-math.

- Brooks
Geert Bosch
2006-12-05 17:20:55 UTC
Permalink
Post by Howard Hinnant
If that is the question, I'm afraid your answer is not accurate.
In the example I showed the difference is 2 ulp. The difference
appears to grow with the magnitude of the argument. On my systems,
when the argument is DBL_MAX, the difference is 75 ulp.
pow(DBL_MAX, 1./3.) = 0x1.428a2f98d7240p+341
cbrt(DBL_MAX) = 0x1.428a2f98d728bp+341
And yes, I agree with you about the C99 standard. It allows the
vendor to compute pretty much any answer it wants from either pow
or cbrt. Accuracy is not mandated. And I'm not trying to mandate
accuracy for Gfortran either. I just had a knee jerk reaction when
I read that pow(x, 1./3.) could be optimized to cbrt(x) (and on re-
reading, perhaps I inferred too much right there). This isn't just
an optimization. It is also an approximation. Perhaps that is
acceptable. I'm only highlighting the fact in case it might be
important but not recognized.
This is really a very similar case as using a fused multiply-add
instead of two separate instructions with a separate rounding.
In computing cbrt(x) instead of pow(x, 1.0/3.0), we omit the rounding
of 1.0/3.0 and only round after computing the cube root.

As you should for IEEE double, for cbrt, the relative difference is
still quite small. For fused multiply-add, cancellation can make
the relative difference as arbitrarily large.

Still this is not nearly as bad as the re-association of summands,
which throws away any hope of being able to determine accuracy of
computations due to catastrophic cancellation. This is the
one of the biggest issues with -funsafe-math-optimizations.

Maybe we should have an option between IEEE math and
-funsafe--math-optimizations that says for the standard
arithmetic operations *, / and sqrt, the result can either
be correctly rounded floating-point value or the exact result?
This would allow useful optimizations such as
X * 4.0 / 2.0 -> X + X (otherwise invalid in case of overflow)
X / 2.0 * 4.0 -> X + X (otherwise invalid in case of overflow)
X * Y + Z -> fma(X, Y, Z)
X + Y - Y -> X
X / (1.0 / Y) -> X * Y
pow((sqrt (X), 2.0) -> X
sqrt (X * X) -> X
pow (x, 1.0 / 3.0) -> cbrt (x)
In addition we should have a built-in identity function that
only explicitly evaluate its argument to a correctly rounded
floating-point number. That would allow users (or front ends)
to selectively avoid contractions where necessary. Currently
the only brute force workaround is using volatile variables.
It would be great to keep the value in registers instead,
but still prevent other clever optimizations from wreaking havoc.

-Geert
Toon Moene
2006-12-05 20:32:18 UTC
Permalink
Post by Richard Guenther
It's a matter of making the cbrt builtin available - I have a patch for
this, but wondered if the fortran frontend can rely on the cbrt library call
being available? Or available in a fast variant, not a fallback
implementation in libgfortran which does pow (x, 1./3.) which will then of course pessimize
pow (x, 2./3.) -> tmp = cbrt(x); tmp * tmp expansion.
Couldn't libgfortran just simply borrow, errr, include the glibc version
? That one seems simple and fast enough.

OTOH, I somehow assumed this expansion was protected by
-funsafe-math-optimizations, but further testing showed me that it is not.

This wouldn't scare me one bit (I'm used to the phrase "a processor
dependent approximation to the value of", which is used *a lot* in the
Fortran Standard), but some people might think this to be too valiant.
--
Toon Moene - e-mail: ***@moene.indiv.nluug.nl - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
A maintainer of GNU Fortran: http://gcc.gnu.org/fortran/
Who's working on GNU Fortran:
http://gcc.gnu.org/ml/gcc/2006-01/msg00000.html
Toon Moene
2006-12-05 20:42:04 UTC
Permalink
Post by Richard Guenther
It's a matter of making the cbrt builtin available - I have a patch
for this,
Oh, BTW, my own version of this patch (plus your work in the area of
sqrt) had the following effect on a profile breakdown (every routine
above 2 %) of the forecasting code in HIRLAM (http://hirlam.org) - just
look at the %; the times are not comparable, as you can see from the
number of calls of each routine:

Before:

% cumulative self self total
time seconds seconds calls Ks/call Ks/call name
14.29 755.79 755.79 116160 0.00 0.00 verint_
12.66 1425.23 669.43 __ieee754_powf
5.50 1716.19 290.97 116160 0.00 0.00 bixint_
4.96 1978.62 262.42 121 0.00 0.03 sl2tim_
4.94 2239.98 261.36 15730 0.00 0.00 hlcondcv_
4.26 2465.30 225.32 15730 0.00 0.00 hlvcbr_
3.62 2656.71 191.41 15730 0.00 0.00 hlradia_
2.85 2807.19 150.48 __ieee754_logf
2.64 2946.87 139.68 15730 0.00 0.00 phtask_
2.34 3070.80 123.93 1452 0.00 0.00 invlo4_
2.25 3189.55 118.75 __ieee754_expf
2.23 3307.65 118.10 fesetround
2.21 3424.45 116.80 fesetenv
2.08 3534.31 109.86 15730 0.00 0.00 hlprevap_

After:

% cumulative self self total
time seconds seconds calls Ks/call Ks/call name
14.22 575.66 575.66 93120 0.00 0.00 verint_
9.83 973.59 397.93 __ieee754_powf
5.84 1210.14 236.55 93120 0.00 0.00 bixint_
5.14 1418.12 207.98 97 0.00 0.03 sl2tim_
5.05 1622.62 204.50 12610 0.00 0.00 hlcondcv_
4.28 1795.91 173.29 12610 0.00 0.00 hlvcbr_
3.64 1943.48 147.56 12610 0.00 0.00 hlradia_
3.12 2069.66 126.19 __ieee754_logf
2.77 2181.96 112.30 12610 0.00 0.00 phtask_
2.39 2278.56 96.60 1164 0.00 0.00 invlo4_
2.34 2373.12 94.56 fesetround
2.26 2464.71 91.59 fesetenv
2.13 2551.07 86.36 12610 0.00 0.00 hlprevap_
2.06 2634.31 83.24 12610 0.00 0.00 phys_

2 Ghz Athlon 64, AMD64 Debian testing, gfortran 4.3.0.
--
Toon Moene - e-mail: ***@moene.indiv.nluug.nl - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
A maintainer of GNU Fortran: http://gcc.gnu.org/fortran/
Who's working on GNU Fortran:
http://gcc.gnu.org/ml/gcc/2006-01/msg00000.html
Toon Moene
2006-12-05 22:51:11 UTC
Permalink
Post by Toon Moene
Post by Richard Guenther
It's a matter of making the cbrt builtin available - I have a patch
for this,
Oh, BTW, my own version of this patch (plus your work in the area of
sqrt) had the following effect on a profile breakdown
The speed up is around 5 %.
--
Toon Moene - e-mail: ***@moene.indiv.nluug.nl - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
A maintainer of GNU Fortran: http://gcc.gnu.org/fortran/
Who's working on GNU Fortran:
http://gcc.gnu.org/ml/gcc/2006-01/msg00000.html
Richard Guenther
2006-12-05 22:53:43 UTC
Permalink
Post by Toon Moene
Post by Toon Moene
Post by Richard Guenther
It's a matter of making the cbrt builtin available - I have a patch
for this,
Oh, BTW, my own version of this patch (plus your work in the area of
sqrt) had the following effect on a profile breakdown
The speed up is around 5 %.
Is this because of cbrt or a combined effect? Can you measure the cbrt
effect in isolation?

Thanks,
Richard.
Toon Moene
2006-12-06 20:50:40 UTC
Permalink
Post by Richard Guenther
Post by Toon Moene
The speed up is around 5 %.
Is this because of cbrt or a combined effect? Can you measure the cbrt
effect in isolation?
This is a combined effect (assuming that things like x**1.5 didn't
expand to x * sqrt (x) before) - however, the number of opportunities of
converting x**N.5 -> x**N * sqrt(x) is very limited in our code.

I artificially up'd the number of opportunities for the x**(N+{1,2}./3.)
-> x**N * cbrt(x)[**2] by changing about two dozen places where
x**0.333333 or x**0.3333 or x**0.6666666 [sic] was used instead of
x**(1./3.) or x**(2./3.). In fact, I'd like to use the carrot of
potential optimizations to get people to use the clearer x**(1./3.) (we
also have a legitimate use of x**0.3, where the 0.3 really is 3./10.)

I can measure the contribution of the cbrt effect in isolation, though
(given the above change of HIRLAM source code).

Cheers,
--
Toon Moene - e-mail: ***@moene.indiv.nluug.nl - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
A maintainer of GNU Fortran: http://gcc.gnu.org/fortran/
Who's working on GNU Fortran:
http://gcc.gnu.org/ml/gcc/2006-01/msg00000.html
Toon Moene
2006-12-08 00:25:44 UTC
Permalink
Post by Toon Moene
I can measure the contribution of the cbrt effect in isolation, though
(given the above change of HIRLAM source code).
Well, the effect of the cbrt change (X ** (1./3.) -> cbrt(X)) is close
to zero.

Some other change must have diminished the number of pow[f] calls
substantially. I think Andrew P's patch to declare the Fortran pow
functions constant might be a good candidate.

[ I did this exercise for g77 in 2000 - I'd never thought that that
wouldn't be the basis for people to work on gfortran ... ]

Cheers,
--
Toon Moene - e-mail: ***@moene.indiv.nluug.nl - phone: +31 346 214290
Saturnushof 14, 3738 XG Maartensdijk, The Netherlands
A maintainer of GNU Fortran: http://gcc.gnu.org/fortran/
Who's working on GNU Fortran:
http://gcc.gnu.org/ml/gcc/2006-01/msg00000.html
Richard Guenther
2006-12-05 20:44:31 UTC
Permalink
Post by Toon Moene
Post by Richard Guenther
It's a matter of making the cbrt builtin available - I have a patch for
this, but wondered if the fortran frontend can rely on the cbrt library call
being available? Or available in a fast variant, not a fallback
implementation in libgfortran which does pow (x, 1./3.) which will then of course pessimize
pow (x, 2./3.) -> tmp = cbrt(x); tmp * tmp expansion.
Couldn't libgfortran just simply borrow, errr, include the glibc version
? That one seems simple and fast enough.
Does libgfortran have a cbrt now? The frontend could make the
builtin available conditional on TARGET_C99_FUNCTIONS, this
way we won't need a fallback.
Post by Toon Moene
OTOH, I somehow assumed this expansion was protected by
-funsafe-math-optimizations, but further testing showed me that it is not.
Yes, that was a mistake. I posted a patch for this:
http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00320.html
Post by Toon Moene
This wouldn't scare me one bit (I'm used to the phrase "a processor
dependent approximation to the value of", which is used *a lot* in the
Fortran Standard), but some people might think this to be too valiant.
Yes, I guess so ;)

Richard.
Mike Stump
2006-12-05 22:09:45 UTC
Permalink
Post by Toon Moene
Couldn't libgfortran just simply borrow, errr, include the glibc
version ?
No, not without asking the FSF (rms) as I think the license is
different (GPL v GPL+libgcc exception).
Loading...