Discussion:
"SSE instruction set disabled?"
Scott Robert Ladd
2002-12-03 23:20:39 UTC
Permalink
Hi,

I'm compiling code with the following options:

gcc -march=i686 -mfpmath=sse -ml -o myprog myprog.c

And see the following warning:

cc1: SSE instruction set disabled, using 387 arithmetic

Why?

And yes, I am compiling on a Pentium III system... ;)

--
Scott Robert Ladd
Coyote Gulch Productions, http://www.coyotegulch.com
No ads -- just very free (and somewhat unusual) code.
Andrew Pinski
2002-12-03 23:22:57 UTC
Permalink
Post by Scott Robert Ladd
Hi,
gcc -march=i686 -mfpmath=sse -ml -o myprog myprog.c
try with the -msse option.

Thanks,
Andrew Pinski
Post by Scott Robert Ladd
cc1: SSE instruction set disabled, using 387 arithmetic
Why?
And yes, I am compiling on a Pentium III system... ;)
--
Scott Robert Ladd
Coyote Gulch Productions, http://www.coyotegulch.com
No ads -- just very free (and somewhat unusual) code.
Scott Robert Ladd
2002-12-03 23:30:02 UTC
Permalink
Post by Andrew Pinski
try with the -msse option.
Ah-ha! Thank you.

This makes me wonder if -msse should be implied by -mfpmath=sse, or vice
versa.

..Scott

--
Scott Robert Ladd
Coyote Gulch Productions, http://www.coyotegulch.com
No ads -- just very free (and somewhat unusual) code.
Jan Hubicka
2002-12-03 23:33:35 UTC
Permalink
Post by Scott Robert Ladd
Post by Andrew Pinski
try with the -msse option.
Ah-ha! Thank you.
This makes me wonder if -msse should be implied by -mfpmath=sse, or vice
versa.
if -mfpmath=sse impled -msse and not -msse2 most people on P4 system
would never enable SSE2 support then.
Perhaps we can drop -msse now and require people to use
non-contradicitng -march=XXX (ie not asking for -mfpmath=sse and tunning
for CPU with no SSE support in it)

Honza
Post by Scott Robert Ladd
..Scott
--
Scott Robert Ladd
Coyote Gulch Productions, http://www.coyotegulch.com
No ads -- just very free (and somewhat unusual) code.
Paolo Carlini
2002-12-03 23:35:34 UTC
Permalink
Post by Scott Robert Ladd
This makes me wonder if -msse should be implied by -mfpmath=sse, or vice
versa.
It is with -march=pentium3 !

Paolo.
R. Kelley Cook
2002-12-04 19:07:51 UTC
Permalink
Post by Jan Hubicka
Post by Scott Robert Ladd
Post by Andrew Pinski
try with the -msse option.
Ah-ha! Thank you.
This makes me wonder if -msse should be implied by -mfpmath=sse, or
vice versa.
if -mfpmath=sse impled -msse and not -msse2 most people on P4 system
would never enable SSE2 support then.
Perhaps we can drop -msse now and require people to use
non-contradicitng -march=XXX (ie not asking for -mfpmath=sse and
tunning for CPU with no SSE support in it)
This makes absolute complete sense.

"-mmmx", "-msse", "-msse2", and "-m3dnow" (and their coresponding
-mno-xxx) should just go away.

They already are implied by -march=blahblah that we have. If someone
wants code tuned for a pentium4, but without any SSE or MMX support then
they can type "-march=pentiumpro -mcpu=pentium4"

Although, in my opinion "-mcpu=" should be renamed "-mtune=" like was
debated ages ago: http://gcc.gnu.org/ml/gcc/1999-01n/msg01004.html

One more question: Why isn't "-mfpmath=sse" the default on archs that
support it?

Kelley Cook
Joseph S. Myers
2002-12-04 19:21:04 UTC
Permalink
Post by R. Kelley Cook
This makes absolute complete sense.
"-mmmx", "-msse", "-msse2", and "-m3dnow" (and their coresponding
-mno-xxx) should just go away.
They already are implied by -march=blahblah that we have. If someone
wants code tuned for a pentium4, but without any SSE or MMX support then
they can type "-march=pentiumpro -mcpu=pentium4"
Don't some CPU types exist in versions with and without some of these
features, or should this be allowed for by more fine-grained -march?

There's also the question of allowing people to specify the options that
are the closest approximation to right if they have some clone CPU that
doesn't have a specific GCC option.
Post by R. Kelley Cook
Although, in my opinion "-mcpu=" should be renamed "-mtune=" like was
debated ages ago: http://gcc.gnu.org/ml/gcc/1999-01n/msg01004.html
I think the last proposal was for consistently named --arch / --tune
options across all targets (with corresponding ability for --help to list
the supported types with descriptions) ("gcc/config/mips/mips.h reorg"
thread on gcc-patches, June 2001).
--
Joseph S. Myers
***@cam.ac.uk
Jan Hubicka
2002-12-04 19:29:22 UTC
Permalink
Post by Joseph S. Myers
Post by R. Kelley Cook
This makes absolute complete sense.
"-mmmx", "-msse", "-msse2", and "-m3dnow" (and their coresponding
-mno-xxx) should just go away.
They already are implied by -march=blahblah that we have. If someone
wants code tuned for a pentium4, but without any SSE or MMX support then
they can type "-march=pentiumpro -mcpu=pentium4"
I am not agains this. What I think does not make sense is
-march=pentiumpro -msse2 -mfpmath=sse
(ie I have imaginary pentiumpro with SSE2 supported)
Perhaps more sense would make -march=pentium4 -mnosse, but similary we
may want -march=pentium4 -mno-cmov or any other extension...
Post by Joseph S. Myers
Don't some CPU types exist in versions with and without some of these
features, or should this be allowed for by more fine-grained -march?
This should be dealt with the -march. Situation is somewhat complicated
for Athlons that exists in SSE and non-SSE version and additionally some
CPUs have SSE support enabled by feature register by BIOS so on some
notebooks SSE is not available and appears after BIOS update as long as
I am informed.

There are also operating systems not supporting SSE context switch so
one can't use SSE on these... So I am not quite sure how to deal with
all these dead ends.
Post by Joseph S. Myers
There's also the question of allowing people to specify the options that
are the closest approximation to right if they have some clone CPU that
doesn't have a specific GCC option.
Post by R. Kelley Cook
Although, in my opinion "-mcpu=" should be renamed "-mtune=" like was
debated ages ago: http://gcc.gnu.org/ml/gcc/1999-01n/msg01004.html
I think the last proposal was for consistently named --arch / --tune
options across all targets (with corresponding ability for --help to list
the supported types with descriptions) ("gcc/config/mips/mips.h reorg"
thread on gcc-patches, June 2001).
-mtune is more clear name for the feature than -mcpu, so I would not
object against adding it. We can support both as well to avoid people
from rewriting their makefiles once again...

Honza
Post by Joseph S. Myers
--
Joseph S. Myers
Scott Robert Ladd
2002-12-04 19:26:00 UTC
Permalink
Post by R. Kelley Cook
"-mmmx", "-msse", "-msse2", and "-m3dnow" (and their coresponding
-mno-xxx) should just go away.
They already are implied by -march=blahblah that we have. If someone
wants code tuned for a pentium4, but without any SSE or MMX support then
they can type "-march=pentiumpro -mcpu=pentium4"
I very much like this idea. The number of command-line options is very
confusing; anything that can rationally be done to simplify matters would be
in porgrammers' and gcc's) best interest.
Post by R. Kelley Cook
Although, in my opinion "-mcpu=" should be renamed "-mtune=" like was
debated ages ago: http://gcc.gnu.org/ml/gcc/1999-01n/msg01004.html
The gcc docs state: "Moreover, specifying -march=cpu-type
implies -mcpu=cpu-type." That suggests that assuming "-msse" and such is
reasonable.

BTW, why would someone set "-mcpu" different from "-march" anyway? Why two
switches? Why wouldn't it be a good idea to always "tune" for the chosen
architecture?

..Scott
Jakub Jelinek
2002-12-04 19:43:05 UTC
Permalink
Post by R. Kelley Cook
Post by Jan Hubicka
Post by Scott Robert Ladd
Post by Andrew Pinski
try with the -msse option.
Ah-ha! Thank you.
This makes me wonder if -msse should be implied by -mfpmath=sse, or
vice versa.
if -mfpmath=sse impled -msse and not -msse2 most people on P4 system
would never enable SSE2 support then.
Perhaps we can drop -msse now and require people to use
non-contradicitng -march=XXX (ie not asking for -mfpmath=sse and
tunning for CPU with no SSE support in it)
This makes absolute complete sense.
"-mmmx", "-msse", "-msse2", and "-m3dnow" (and their coresponding
-mno-xxx) should just go away.
So what do you do if you want your binaries or libraries run on any
CPU supporting SSE, ie. ATM pIII, p4, athlon-{4,xp,mp}?
-march=i686 -msse -mfpmath=sse is what you use now, using -march=pentium3
is not a good idea for the athlons and likewise -march=athlon-xp is
not a good idea for pentiums.

Jakub
Jan Hubicka
2002-12-04 19:53:27 UTC
Permalink
Post by Jakub Jelinek
Post by R. Kelley Cook
Post by Jan Hubicka
Post by Scott Robert Ladd
Post by Andrew Pinski
try with the -msse option.
Ah-ha! Thank you.
This makes me wonder if -msse should be implied by -mfpmath=sse, or
vice versa.
if -mfpmath=sse impled -msse and not -msse2 most people on P4 system
would never enable SSE2 support then.
Perhaps we can drop -msse now and require people to use
non-contradicitng -march=XXX (ie not asking for -mfpmath=sse and
tunning for CPU with no SSE support in it)
This makes absolute complete sense.
"-mmmx", "-msse", "-msse2", and "-m3dnow" (and their coresponding
-mno-xxx) should just go away.
So what do you do if you want your binaries or libraries run on any
CPU supporting SSE, ie. ATM pIII, p4, athlon-{4,xp,mp}?
-march=i686 -msse -mfpmath=sse is what you use now, using -march=pentium3
is not a good idea for the athlons and likewise -march=athlon-xp is
not a good idea for pentiums.
What makes -march=pentium3 worse for Athlon than -march=pentiumpro ==
i686?
I was thinking about introducing some switch "optimize well for commonly
used CPUs - currently probably Pentium3, Athlon and Pentium4" that can
be used for distribution build.
Some targets do have such a switch, however it's definition seems to be
pretty weak.

Honza
Post by Jakub Jelinek
Jakub
Paolo Carlini
2002-12-03 23:27:16 UTC
Permalink
Post by Scott Robert Ladd
And yes, I am compiling on a Pentium III system... ;)
So, why not -march=pentium3 ??

Paolo.
Scott Robert Ladd
2002-12-03 23:42:30 UTC
Permalink
From Paolo Carlini
Post by Scott Robert Ladd
And yes, I am compiling on a Pentium III system... ;)
So, why not -march=pentium3 ??
I was under the impression -march=i686 is the same thing as
-march=pentium3. Am I mistaken? Why would they be different?

..Scott

--
Scott Robert Ladd
Coyote Gulch Productions, http://www.coyotegulch.com
No ads -- just very free (and somewhat unusual) code.
Jan Hubicka
2002-12-03 23:45:06 UTC
Permalink
Post by Scott Robert Ladd
From Paolo Carlini
Post by Scott Robert Ladd
And yes, I am compiling on a Pentium III system... ;)
So, why not -march=pentium3 ??
I was under the impression -march=i686 is the same thing as
-march=pentium3. Am I mistaken? Why would they be different?
Because i686 is PentiumPro and that one didn't have SSE.
Pentium3 has.

Honza
Post by Scott Robert Ladd
..Scott
--
Scott Robert Ladd
Coyote Gulch Productions, http://www.coyotegulch.com
No ads -- just very free (and somewhat unusual) code.
Scott Robert Ladd
2002-12-04 00:04:42 UTC
Permalink
From Jan Hubicka
Because i686 is PentiumPro and that one didn't have SSE.
Pentium3 has.
I'm moderately embarrassed; for some reason, I thought i586 was PentiumPro,
and i686 was Pentium III. Thanks, all, for the pointers.

..Scott
R***@comerica.com
2002-12-04 21:49:29 UTC
Permalink
Post by Jakub Jelinek
Post by R. Kelley Cook
This makes absolute complete sense.
"-mmmx", "-msse", "-msse2", and "-m3dnow" (and their corresponding
-mno-xxx) should just go away.
So what do you do if you want your binaries or libraries run on any
CPU supporting SSE, ie. ATM pIII, p4, athlon-{4,xp,mp}?
-march=i686 -msse -mfpmath=sse is what you use now, using -march=pentium3
is not a good idea for the athlons and likewise -march=athlon-xp is
not a good idea for pentiums.
See this is why it is so confusing. To GCC, what you wrote is the same
thing since

"-march=pentium3" == "-march=i686 -msse"

To be even more anal, looking through specs

this implies
-march=i386 -mcpu=i386
-march=i486 -mcpu=i486
-march=i586 -mcpu=i586
-march=pentium -mcpu=i586
-march=pentium-mmx -mcpu=i586 -mmmx
-march=i686 -mcpu=i686
-march=pentiumpro -mcpu=i686
-march=pentium2 -mcpu=i686 -mmmx
-march=pentium3 -mcpu=i686 -msse
-march=pentium4 -mcpu=pentium4 -msse2
-march=athlon -mcpu=athlon -m3dnow
-march=athlon-tbird -mcpu=athlon -m3dnow
-march=athlon-xp -mcpu=athlon -m3dnow -msse
-march=athlon-mp -mcpu=athlon -m3dnow -msse
-march=athlon-4 -mcpu=athlon -m3dnow -msse
-march=k6 -mcpu=k6
-march=k6-2 -mcpu=k6 -m3dnow
-march=k6-2 -mcpu=k6 -m3dnow

This is just far to difficult for the user to keep track of. It is VERY
confusing.

Now it really doesn't matter to me if everyone wants just the second column
or the first. Just please do not have both.

Personally, I was originally arguing for just the first column, but maybe
that was just me. And the more I think about it the more I think that only
having the second column is the way to go.

It is clear to me that even most technical users have trouble with this.
Many, for example, have no idea that the "Athlon XP, Athlon MP, and (their
horribly misnamed mobile chip) Athlon4" are all the same core and therefore
all have SSE. Even Mr. Ladd missed the fact that a PIII was the same basic
core as a PentiumPro (Intel's 6th generation x86 chip) with SSE added.

So maybe GCC should depreciate the first column synonyms that are just too
confusing and are missing things like "Duron" and "Celeron <500Mhz" (which
doesn't have SSE), "Celeron >600Mhz" (which does), "WinChip" and whatever
else the Marketing Geniuses come up with.

Just have items for the basic cores that GCC schedules for: i386, i486,
pentium/i586, k6, i686, athlon, pentium4 and whatever the k8-64 gets named.

....

I agree with others "-msse" or "-msse2" should generally imply "
-mfpmath=sse". Its kind of silly to have to specify that twice.

...

And depreciate "-mcpu" for "-mtune"; it is much more intuitive (at least
for english speakers).

Choose the minimum ARCHitecture (capabilities) that you wish to run the
binary on. Then, if you wish, choose a more recent model to TUNE (optimize)
the binary for.

Kelley Cook
Jan Hubicka
2002-12-04 22:07:54 UTC
Permalink
Post by R***@comerica.com
Post by Jakub Jelinek
Post by R. Kelley Cook
This makes absolute complete sense.
"-mmmx", "-msse", "-msse2", and "-m3dnow" (and their corresponding
-mno-xxx) should just go away.
So what do you do if you want your binaries or libraries run on any
CPU supporting SSE, ie. ATM pIII, p4, athlon-{4,xp,mp}?
-march=i686 -msse -mfpmath=sse is what you use now, using -march=pentium3
is not a good idea for the athlons and likewise -march=athlon-xp is
not a good idea for pentiums.
See this is why it is so confusing. To GCC, what you wrote is the same
thing since
"-march=pentium3" == "-march=i686 -msse"
To be even more anal, looking through specs
this implies
-march=i386 -mcpu=i386
-march=i486 -mcpu=i486
-march=i586 -mcpu=i586
-march=pentium -mcpu=i586
-march=pentium-mmx -mcpu=i586 -mmmx
-march=i686 -mcpu=i686
-march=pentiumpro -mcpu=i686
-march=pentium2 -mcpu=i686 -mmmx
-march=pentium3 -mcpu=i686 -msse
-march=pentium4 -mcpu=pentium4 -msse2
-march=athlon -mcpu=athlon -m3dnow
-march=athlon-tbird -mcpu=athlon -m3dnow
-march=athlon-xp -mcpu=athlon -m3dnow -msse
-march=athlon-mp -mcpu=athlon -m3dnow -msse
-march=athlon-4 -mcpu=athlon -m3dnow -msse
-march=k6 -mcpu=k6
-march=k6-2 -mcpu=k6 -m3dnow
-march=k6-2 -mcpu=k6 -m3dnow
....
I agree with others "-msse" or "-msse2" should generally imply "
-mfpmath=sse". Its kind of silly to have to specify that twice.
It is not the same, unfortuantely.
-msse2 simply enables the instruction set extension. The compiled
programs will have same behaviour as before just can be faster.
-mfpmath=sse changes behaviour of floating point execution by not using
80bit temporaries like x87 code does. This breaks some things (glibc)
and makes other happy (those who want numberic stability).
I am inclined to argue to make -mfpmath=sse default when SSE is
available however last time I lost the battle.

Honza
Post by R***@comerica.com
...
And depreciate "-mcpu" for "-mtune"; it is much more intuitive (at least
for english speakers).
Choose the minimum ARCHitecture (capabilities) that you wish to run the
binary on. Then, if you wish, choose a more recent model to TUNE (optimize)
the binary for.
Kelley Cook
Scott Robert Ladd
2002-12-05 01:00:24 UTC
Permalink
Post by Jan Hubicka
-mfpmath=sse changes behaviour of floating point execution by not using
80bit temporaries like x87 code does. This breaks some things (glibc)
and makes other happy (those who want numberic stability).
I am inclined to argue to make -mfpmath=sse default when SSE is
available however last time I lost the battle.
I hadn't even considered this problem -- and I was just about to send off a
raft of questions about gcc and IEC 60559 compliance, so floating-point is
definitely on my mind. I have a major free documentation project underway
about numerical programming in C99...

I need to dig out the Intel processor docs again; it's been a while since I
looked over all the various instructions sets and their implications.
Implied -fpmath=sse may *not* be a good thing under certain circumstances...

..Scott
Ian Ollmann
2002-12-04 23:44:03 UTC
Permalink
Post by Scott Robert Ladd
Why wouldn't it be a good idea to always "tune" for the chosen
architecture?
It is a bad idea when you are shipping a binary that needs to have high
performance but which also must run on a diversity of processors. In this
case, there is no "chosen architecture". For some apps, least common
denomenator performance is not good enough.

In order to achieve superior broad spectrum performance, a frequent
approach is to have an application that has multiple parallel functions
for different architectures. So, for example, in the PowerPC world, we
might have one piece of code that just uses the scalar units for a G3
processor, and another piece of code that does the same thing using
AltiVec for newer processors. At run time, you make the decision about
what hardware is available and call the appropriate function or load the
appropriate library, etc.

This all falls flat on its face when a single flag, such as -msse does
multiple things. It (1) turns on the sse builtins, (2) replaces x87 scalar
code with xmm based scalar SSE code, and (3) may use the xmm register file
for other things like caching integer values that spill off the integer
register file. Number (1) we need in order to write SSE *vector* code.
You need that for performance. However, numbers (2) and (3) are a poison
pill if you are trying to have the same executable also run on a PPro --
it would crash when the PPro hits the automagically generated SSE code.
Also, the reduced precision available in SSE/SSE2 compared to x87 may also
cause problems for some apps, because certain calculations that used to
work now return Inf.

Of course, we could move the vector code off to its own compilation unit.
However this is undesirable for many reasons. We can get into lengthy
religious discussions about exactly how undesirable it actually is.
However, I believe it is more productive to simply point out that there is
no apparent reason to require the -msse flag in order to use the
__builtins or vector types like V4SI. There does not seem to me to be any
potential for namespace collision.

I personally would advocate having the __builtins and vector types
available all the time. It would solve an awful lot of problems like the
one that spawned this thread.

Ian

---------------------------------------------------
Ian Ollmann, Ph.D. ***@cco.caltech.edu
---------------------------------------------------
Andi Kleen
2002-12-05 01:22:25 UTC
Permalink
Post by Scott Robert Ladd
Implied -fpmath=sse may *not* be a good thing under certain circumstances...
One problem is that SSE2 instructions are much bigger than x87. SSE is usually
three bytes or more for everthing, while x87 has shorter encodings. You usually
bloat the program a lot when you use a lot of floating point.
Fortunately that's rare.

-Andi

Loading...