Joel Sherrill
2018-11-07 05:28:58 UTC
Hi,
Over the past couple of years, I have hand-assembled a new floating point
library for the ARM Cortex M0 architecture. I know the M0 is not generally
regarded as a number-crunching machine, but I felt it deserved at least
some of the attention that has previously been bestowed on the AVR
architecture. As this work has been incidental to my employer's line of
business, they have tentatively agreed to assign the copyright and
facilitate a release of this library as open source.
I have efficient implementations of all of the integer and
* clzsi2, clzdi2, umulsidi3, mulsidi3, muldi3 (aeabi_lmul)
* ashldi3 (aeabi_llsl), lshrdi3 (aeabi_llsr), ashrdi3 (aeabi_lasr)
* aeabi_lcmp, aeabi_ulcmp
* udivsi3 (aeabi_uidivmod), divsi3 (aeabi_idivmod), udivdi3
_aeabi_uldivmod), divdi3 (aeabi_ldivmod)
* addsf3 (aeabi_fadd), subsf3 (aeabi_fsub, aeabi_frsub), mulsf3
(aeabi_fmul), divsf3 (aeabi_fdiv), fdimf
* cmpsf2 (aeabi_fcmpun), eqsf2 (aeabi_fcmpeq), nesf2 (aeabi_fcmpne),
gesf2 (aeabi_fcmpge), gtsf2, unordsf2
* floatundisf (aeabi_ul2f),floatunsisf (aeabi_ui2f),floatdisf
(aeabi_l2f),floatsisf (aeabi_i2f)
* fixsfdi (aeabi_f2lz), fixunssfdi (aeabi_f2ulz), fixsfsi (aeabi_f2iz),
fixunssfsi (aeabi_f2uiz)
* aeabi_f2d, aeabi_d2f, aeabi_h2f, aeabi_f2h
I also have efficient implementations of several of the simpler libm
* frexpf, ldexpf, scalbnf
* fmaxf, fminf
* rintf, lrintf, ulrintf, llrintf, ullrintf, roundf, lroundf, ulroundf,
llroundf, ullroundf
* truncf, ceilf, floorf
* fpclassifyf, isnormalf, isnanf, isinff, isfinitef, isposf, isnegf
* ilogbf, logbf, modff
* sqrtf, cbrtf
* log2f, logf, log10f, log1p2f, log1pf, log1p10f, logXf, log1pXf
* sinf, cosf, sincosf, sinpif, cospif, sincospif
* tanf, cotf, tanpif, cotpif
Presently, the library comprises about 40 files with about 8000 lines of
asm (unified syntax). The test vectors weigh significantly more. All of
the floating point functions are IEEE754 compliant. I can provide more
* Small: Less than 3kb for everything above. Only 450 bytes for basic
addsf3, subsf3, mulsf3, divsf3, and cmpsf2.
* Fast: addsf3 = 75 instruction cycles, subsf3 = 80, mulsf3 = 95, divsf3 =
260 to 360, cmpsf2 = 35.
* Correct: Simultaneous calculation of sincosf() in less than 500
instruction cycles, accurate within +/- 1 ulp, including arbitrarily large
values of 'x'.
* Bonus: round10iff(x, n) (a non-standard function) correctly rounds
floating point values 'x' to an integer power of 10 'n'; this function
simulates conversion to a decimal string, truncation, and conversion back
to binary32 without any string-handling overhead.
This sounds like a nice body of work. Congratukations.Over the past couple of years, I have hand-assembled a new floating point
library for the ARM Cortex M0 architecture. I know the M0 is not generally
regarded as a number-crunching machine, but I felt it deserved at least
some of the attention that has previously been bestowed on the AVR
architecture. As this work has been incidental to my employer's line of
business, they have tentatively agreed to assign the copyright and
facilitate a release of this library as open source.
I have efficient implementations of all of the integer and
* clzsi2, clzdi2, umulsidi3, mulsidi3, muldi3 (aeabi_lmul)
* ashldi3 (aeabi_llsl), lshrdi3 (aeabi_llsr), ashrdi3 (aeabi_lasr)
* aeabi_lcmp, aeabi_ulcmp
* udivsi3 (aeabi_uidivmod), divsi3 (aeabi_idivmod), udivdi3
_aeabi_uldivmod), divdi3 (aeabi_ldivmod)
* addsf3 (aeabi_fadd), subsf3 (aeabi_fsub, aeabi_frsub), mulsf3
(aeabi_fmul), divsf3 (aeabi_fdiv), fdimf
* cmpsf2 (aeabi_fcmpun), eqsf2 (aeabi_fcmpeq), nesf2 (aeabi_fcmpne),
gesf2 (aeabi_fcmpge), gtsf2, unordsf2
* floatundisf (aeabi_ul2f),floatunsisf (aeabi_ui2f),floatdisf
(aeabi_l2f),floatsisf (aeabi_i2f)
* fixsfdi (aeabi_f2lz), fixunssfdi (aeabi_f2ulz), fixsfsi (aeabi_f2iz),
fixunssfsi (aeabi_f2uiz)
* aeabi_f2d, aeabi_d2f, aeabi_h2f, aeabi_f2h
I also have efficient implementations of several of the simpler libm
* frexpf, ldexpf, scalbnf
* fmaxf, fminf
* rintf, lrintf, ulrintf, llrintf, ullrintf, roundf, lroundf, ulroundf,
llroundf, ullroundf
* truncf, ceilf, floorf
* fpclassifyf, isnormalf, isnanf, isinff, isfinitef, isposf, isnegf
* ilogbf, logbf, modff
* sqrtf, cbrtf
* log2f, logf, log10f, log1p2f, log1pf, log1p10f, logXf, log1pXf
* sinf, cosf, sincosf, sinpif, cospif, sincospif
* tanf, cotf, tanpif, cotpif
Presently, the library comprises about 40 files with about 8000 lines of
asm (unified syntax). The test vectors weigh significantly more. All of
the floating point functions are IEEE754 compliant. I can provide more
* Small: Less than 3kb for everything above. Only 450 bytes for basic
addsf3, subsf3, mulsf3, divsf3, and cmpsf2.
* Fast: addsf3 = 75 instruction cycles, subsf3 = 80, mulsf3 = 95, divsf3 =
260 to 360, cmpsf2 = 35.
* Correct: Simultaneous calculation of sincosf() in less than 500
instruction cycles, accurate within +/- 1 ulp, including arbitrarily large
values of 'x'.
* Bonus: round10iff(x, n) (a non-standard function) correctly rounds
floating point values 'x' to an integer power of 10 'n'; this function
simulates conversion to a decimal string, truncation, and conversion back
to binary32 without any string-handling overhead.
Does paranoia pass?
To date, I have only built this library as part of a user space embedded
application. I have not attempted to build or patch the GCC toolchain
itself. If accepted, I suspect there will be at least a little work to
restructure it for inclusion with libgcc. But, before proceeding with that
work, I need to have some idea of direction and goal.
The first question, then, is what might the best home for this library
be? Many of the lower level functions (e.f. clzsi2, addsf3) replace the
generic implementations of libgcc. However, the higher level functions
(e.g. ldexpf, sincosf) traditionally link from libm, which I don't believe
is typically distributed with gcc. The compact nature of this library of
course follows from a tight integration between higher and lower level
* Add everything into the base libgcc,
* Add everything into libm (newlib?) and rely on link order to supersede
libgcc,
This will almost certainly break at some point, for someone, and be hard toapplication. I have not attempted to build or patch the GCC toolchain
itself. If accepted, I suspect there will be at least a little work to
restructure it for inclusion with libgcc. But, before proceeding with that
work, I need to have some idea of direction and goal.
The first question, then, is what might the best home for this library
be? Many of the lower level functions (e.f. clzsi2, addsf3) replace the
generic implementations of libgcc. However, the higher level functions
(e.g. ldexpf, sincosf) traditionally link from libm, which I don't believe
is typically distributed with gcc. The compact nature of this library of
course follows from a tight integration between higher and lower level
* Add everything into the base libgcc,
* Add everything into libm (newlib?) and rely on link order to supersede
libgcc,
even figure out it happened because the code will work but just be bigger
or slower.
* Split the implementation with some magic to ensure that libm functions
only link in the presence of the correct libgcc,
I think this is the proper solution. It just puts better implementations inthe place the infrastructure already supports having a target specific
option.
* Establish an independent library specific to the Cortex M0 architecture,
or
This is likely to get you the smallest number of users. People have tofind it and then integrate it on their own. Don't make it hard for folks to
find and use your work.
* Something else entirely...
If there is any interest in incorporating this work into GCC, please
advise.
I think so but I am just one voice from the RTEMS community. But I thinkadvise.
any M0 user would be pleased.
--joel
Thanks,
Daniel Engel
Daniel Engel