Discussion:
backslash whitespace newline
Howard Hinnant
2005-10-24 22:45:25 UTC
Permalink
I've been reviewing the age-old issue of interpreting
<whitespace>*<newline> as the end-of-line indicator as is the current
practice with gcc. For those not familiar with this issue, gcc takes
advantage of C99's 5.1.1.2p1b1 "implementation-defined manner" to
convert multibyte end-of-line indicators to newline characters. gcc
considers zero or more whitespace characters preceding a more
traditional CR and/or LF as the end-of-line indicator. This behavior
can cause differences in some code compared to compilers which do not
strip trailing whitespace off of lines. For example:

// comment \
int x;
int y;

Pretend there's one or more spaces or tabs after the '\'. gcc will
interpret this as:

A:

// comment int x;
int y;

while other compilers (Microsoft, EDG-based, CodeWarrior to name a
few) interpret it as:

B:

// comment
int x;
int y;

And depending on what you're trying to do, either A or B is the
"correct" answer. I've seen code broken either way (by assuming A
and having the compiler do B and vice-versa).

This issue has recently been discussed on the C standards reflector,
and though I was not privy to that discussion, my understanding is
that the likely resolution from this standards body will be that a
compiler implementing either A or B is conforming.

That being said, gcc to the best of knowledge, is the only modern
compiler to implement end-of-line whitespace stripping (yes I'm aware
of older compilers and dealing with punch cards). So on the basis of
conforming to a de-facto standard alone, I propose that gcc abandon
end-of-line whitespace stripping, or at least strip 2 or more
whitespace characters down to 1 space instead of to 0 spaces during
translation phase 1.

I realize that this change could break some existing code. But I am
also aware of existing code wishing to port to gcc which is broken by
gcc's current behavior. If we want gcc to "gain market share", does
it not make sense to "welcome" new comers when possible by adopting
what is otherwise industry-wide practice?

Thanks,
Howard
Neil Booth
2005-10-24 22:52:36 UTC
Permalink
Howard Hinnant wrote:-
Post by Howard Hinnant
I've been reviewing the age-old issue of interpreting
<whitespace>*<newline> as the end-of-line indicator as is the current
practice with gcc.
FWIW I support abandoning this behaviour too.

Neil.
Joe Buck
2005-10-24 22:57:20 UTC
Permalink
Post by Neil Booth
Howard Hinnant wrote:-
Post by Howard Hinnant
I've been reviewing the age-old issue of interpreting
<whitespace>*<newline> as the end-of-line indicator as is the current
practice with gcc.
FWIW I support abandoning this behaviour too.
It would probably be a good idea to warn about code where this will
make a difference, since the older gcc versions will be around for a long
time.
Vincent Lefevre
2005-10-25 00:52:31 UTC
Permalink
Post by Neil Booth
Howard Hinnant wrote:-
Post by Howard Hinnant
I've been reviewing the age-old issue of interpreting
<whitespace>*<newline> as the end-of-line indicator as is the current
practice with gcc.
FWIW I support abandoning this behaviour too.
But then, copy-paste would no longer always work since spaces are
sometimes added at the end of some lines (depending on the terminal
and the context).
--
Vincent Lefèvre <***@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / SPACES project at LORIA
Joe Buck
2005-10-25 01:02:12 UTC
Permalink
Post by Vincent Lefevre
Post by Neil Booth
Howard Hinnant wrote:-
Post by Howard Hinnant
I've been reviewing the age-old issue of interpreting
<whitespace>*<newline> as the end-of-line indicator as is the current
practice with gcc.
FWIW I support abandoning this behaviour too.
But then, copy-paste would no longer always work since spaces are
sometimes added at the end of some lines (depending on the terminal
and the context).
I believe that this is why gcc got this behavior in the first place, so
that files that visually look the same are handled the same.

gcc 2.95.3 does not have it; the oldest 3.x version I can locate around
here, gcc 3.0.3, does.

That is,

// this is a comment \
and so is this.

where there are two spaces after the slash, is all one comment for gcc
3.x, and is a syntax error for 2.95.3.
Mike Stump
2005-10-25 01:59:44 UTC
Permalink
Post by Vincent Lefevre
But then, copy-paste would no longer always work since spaces are
sometimes added at the end of some lines (depending on the terminal
and the context).
Please name such systems. We can then know to not use them, and can
document in the manual they are broken if we wish.
DJ Delorie
2005-10-25 02:07:33 UTC
Permalink
Post by Mike Stump
Please name such systems. We can then know to not use them, and can
document in the manual they are broken if we wish.
IIRC the Windows cut-n-paste cuts a rectangle, not as-printed.

Also, beware of the \r\n line endings, where \r is considered
"whitespace" if it's not auto-converted.
Phil Edwards
2005-10-25 22:42:25 UTC
Permalink
Post by DJ Delorie
Post by Mike Stump
Please name such systems. We can then know to not use them, and can
document in the manual they are broken if we wish.
IIRC the Windows cut-n-paste cuts a rectangle, not as-printed.
Yes, to this day, even using their latest command shells. Curse them.
--
"It won't be any more frightening than the time
I climbed up an elevator shaft with my teeth."
- Sunny Baudelaire
Florian Weimer
2005-10-25 05:39:41 UTC
Permalink
Post by Mike Stump
Post by Vincent Lefevre
But then, copy-paste would no longer always work since spaces are
sometimes added at the end of some lines (depending on the terminal
and the context).
Please name such systems. We can then know to not use them, and can
document in the manual they are broken if we wish.
Emacs in an xterm, from time to time. I don't know if this is fixed
in Emacs 22, though.
Mike Stump
2005-10-25 19:53:23 UTC
Permalink
Post by Florian Weimer
Emacs in an xterm, from time to time.
Yeah, I knew about that one, cutting and pasting from any full screen
program running in a terminal emulator tends to be wrong. Tab
characters are usually the first causalities, along with long
lines. :-( I'd propose to encourage people submit bug reports
against xterm/emacs/terminfo/termcap/curses if they care much about
the issue. I think it might be possible to improve the situation alot.

Doing this is a better solution, as then cutting and pasting would be
fundamentally more reliable in that environment.

Thanks for the pointer.
Daniel Jacobowitz
2005-10-25 20:01:43 UTC
Permalink
Post by Mike Stump
Post by Florian Weimer
Emacs in an xterm, from time to time.
Yeah, I knew about that one, cutting and pasting from any full screen
program running in a terminal emulator tends to be wrong. Tab
characters are usually the first causalities, along with long
lines. :-( I'd propose to encourage people submit bug reports
against xterm/emacs/terminfo/termcap/curses if they care much about
the issue. I think it might be possible to improve the situation alot.
In fact, it really isn't (or at least it's defeated everyone who
tried); I spent some time talking with the ncurses maintainer about
this earlier in the month.
--
Daniel Jacobowitz
CodeSourcery, LLC
Vincent Lefevre
2005-10-25 21:16:23 UTC
Permalink
Post by Daniel Jacobowitz
Post by Mike Stump
Yeah, I knew about that one, cutting and pasting from any full screen
program running in a terminal emulator tends to be wrong. Tab
characters are usually the first causalities, along with long
lines. :-( I'd propose to encourage people submit bug reports
against xterm/emacs/terminfo/termcap/curses if they care much about
the issue. I think it might be possible to improve the situation alot.
In fact, it really isn't (or at least it's defeated everyone who
tried); I spent some time talking with the ncurses maintainer about
this earlier in the month.
This can be improved a bit. The fact that ncurses clears the end of
a line with spaces instead of an escape sequence when there are a
few columns left is 100% ncurses fault, and easy to fix IMHO (just
remove the optimization).
--
Vincent Lefèvre <***@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / SPACES project at LORIA
Daniel Jacobowitz
2005-10-25 21:28:14 UTC
Permalink
Post by Vincent Lefevre
Post by Daniel Jacobowitz
Post by Mike Stump
Yeah, I knew about that one, cutting and pasting from any full screen
program running in a terminal emulator tends to be wrong. Tab
characters are usually the first causalities, along with long
lines. :-( I'd propose to encourage people submit bug reports
against xterm/emacs/terminfo/termcap/curses if they care much about
the issue. I think it might be possible to improve the situation alot.
In fact, it really isn't (or at least it's defeated everyone who
tried); I spent some time talking with the ncurses maintainer about
this earlier in the month.
This can be improved a bit. The fact that ncurses clears the end of
a line with spaces instead of an escape sequence when there are a
few columns left is 100% ncurses fault, and easy to fix IMHO (just
remove the optimization).
I invite you to talk to Thomas about that one; he may well agree. That
doesn't affect the general case.
--
Daniel Jacobowitz
CodeSourcery, LLC
Vincent Lefevre
2005-10-26 00:53:18 UTC
Permalink
Post by Daniel Jacobowitz
I invite you to talk to Thomas about that one; he may well agree.
That doesn't affect the general case.
I've just seen that it was fixed two years ago. From the ncurses
changelog:

20030719
+ use clr_eol in preference to blanks for bce terminals, so select and
paste will have fewer trailing blanks, e.g., when using xterm
(request by Vincent Lefevre).

Indeed I haven't had any problem with Mutt in an xterm for quite a
while, IIRC.

However, there are many problems with Mutt in an iTerm under Mac OS X:
I don't even always get newline characters. So, here, copy-paste of
C code with continuation lines would definitively lead to incorrect
behavior, whatever gcc does.
--
Vincent Lefèvre <***@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / SPACES project at LORIA
Florian Weimer
2005-10-25 20:42:49 UTC
Permalink
Post by Mike Stump
Post by Florian Weimer
Emacs in an xterm, from time to time.
Yeah, I knew about that one, cutting and pasting from any full screen
program running in a terminal emulator tends to be wrong. Tab
characters are usually the first causalities, along with long
lines. :-( I'd propose to encourage people submit bug reports
against xterm/emacs/terminfo/termcap/curses if they care much about
the issue. I think it might be possible to improve the situation alot.
There's a mitigating factor: If you past it into another Emacs in an
xterm of the same width, all lines are wrapped. This provides a
strong incentive to remove at least some of the trailing whitespace.
Vincent Lefevre
2005-10-25 08:33:32 UTC
Permalink
Post by Mike Stump
Post by Vincent Lefevre
But then, copy-paste would no longer always work since spaces are
sometimes added at the end of some lines (depending on the terminal
and the context).
Please name such systems. We can then know to not use them, and can
document in the manual they are broken if we wish.
The problem occurs very often with iTerm under Mac OS X.

Also, this sometimes happen with *any* terminal when the application
has added spaces for a redraw. Emacs and Mutt at least are affected
by this problem (even when one uses bce). Concerning Mutt, I know
that ncurses sometimes clears the end of a line with spaces instead
of the escape sequence because this is more efficient when there are
only a few characters left.

Well, some terminals can be configured to strip the trailing spaces,
but then, one gets the inverse problem, i.e. lines with "\[LF]"
instead of "\[whitespace][LF]".
--
Vincent Lefèvre <***@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / SPACES project at LORIA
Joe Buck
2005-10-25 15:54:18 UTC
Permalink
Post by Mike Stump
Post by Vincent Lefevre
But then, copy-paste would no longer always work since spaces are
sometimes added at the end of some lines (depending on the terminal
and the context).
Please name such systems. We can then know to not use them, and can
document in the manual they are broken if we wish.
Firefox. Emacs.

Often when I cut and paste a program example from Firefox into Emacs, I
wind up with extra whitespace.
Joe Buck
2005-10-25 19:12:31 UTC
Permalink
Post by Joe Buck
Often when I cut and paste a program example from Firefox into Emacs, I
wind up with extra whitespace.
I've got an elisp file that removes trailing whitespace if you'd like
it...
Not needed; the following sequence removes trailing whitespace in Emacs:

ESC-x picture-mode NL
CTL-c CTL-c

(picture-mode cleans up trailing whitespace on exit).

The problem, though, is the program behavior is affected by something that
is invisible (the trailing whitespace).

If it's an issue that different compilers handle this differently and that
the standard does not specify the behavior, the answer, I think, is to
warn when trailing whitespace affects the behavior of the program.
I think that the only case is when the last character is a \ but there
may be others.
Tom Tromey
2005-10-25 19:51:13 UTC
Permalink
Joe> Not needed; the following sequence removes trailing whitespace in Emacs:
Joe> ESC-x picture-mode NL
Joe> CTL-c CTL-c
Joe> (picture-mode cleans up trailing whitespace on exit).

There's also the more direct M-x delete-trailing-whitespace

Tom
Andreas Schwab
2005-10-25 20:54:18 UTC
Permalink
Post by Joe Buck
Post by Joe Buck
Often when I cut and paste a program example from Firefox into Emacs, I
wind up with extra whitespace.
I've got an elisp file that removes trailing whitespace if you'd like
it...
ESC-x picture-mode NL
CTL-c CTL-c
(picture-mode cleans up trailing whitespace on exit).
There's also delete-trailing-whitespace (since Emacs 21).

Andreas.
--
Andreas Schwab, SuSE Labs, ***@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Vincent Lefevre
2005-10-25 20:56:14 UTC
Permalink
Post by Joe Buck
ESC-x picture-mode NL
CTL-c CTL-c
(picture-mode cleans up trailing whitespace on exit).
What about M-x delete-trailing-whitespace?

Anyway by removing trailing whitespace, one assumes that it is not
significant. So, let the compiler regard it as not significant.
Post by Joe Buck
The problem, though, is the program behavior is affected by
something that is invisible (the trailing whitespace).
Yes.
Post by Joe Buck
If it's an issue that different compilers handle this differently and that
the standard does not specify the behavior, the answer, I think, is to
warn when trailing whitespace affects the behavior of the program.
I agree.
Post by Joe Buck
I think that the only case is when the last character is a \ but there
may be others.
--
Vincent Lefèvre <***@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / SPACES project at LORIA
Dave Korn
2005-10-25 10:18:20 UTC
Permalink
Post by Neil Booth
Howard Hinnant wrote:-
Post by Howard Hinnant
I've been reviewing the age-old issue of interpreting
<whitespace>*<newline> as the end-of-line indicator as is the current
practice with gcc.
FWIW I support abandoning this behaviour too.
Neil.
I would like it to be retained in at least one case: CRLF line endings
should still work, specifically backslash-CR-LF should be usable to indicate
a continued line. So how about having gcc accept

<cr>?<newline>

instead?

cheers,
DaveK
--
Can't think of a witty .sigline today....
'Neil Booth'
2005-10-25 12:53:08 UTC
Permalink
Dave Korn wrote:-
Post by Dave Korn
I would like it to be retained in at least one case: CRLF line endings
should still work, specifically backslash-CR-LF should be usable to indicate
a continued line. So how about having gcc accept
<cr>?<newline>
instead?
This is entirely orthogonal; the two issues should not be confused.

Neil.
Dave Korn
2005-10-25 13:36:16 UTC
Permalink
Post by 'Neil Booth'
Dave Korn wrote:-
Post by Dave Korn
I would like it to be retained in at least one case: CRLF line endings
should still work, specifically backslash-CR-LF should be usable to
indicate a continued line. So how about having gcc accept
<cr>?<newline>
instead?
This is entirely orthogonal; the two issues should not be confused.
Neil.
So it is. For a long time gcc accepted CRLF line ends everywhere *except*
after a continuation character; when it started working, I thought it was an
indirect consequence of the whitespace collapsing, but a quick browse
through cpplex.c/skip_escaped_newlines shows that it's only non-vertical
whitespace that gets collapsed, and that handle_newline specifically accepts
CR-LF (and LF-CR) as well as LF.

Apologies for the noise; I certainly agree it's a violation of the
language spec to allow tabs and spaces after the backslash.


cheers,
DaveK
--
Can't think of a witty .sigline today....
Eric Christopher
2005-10-25 22:14:27 UTC
Permalink
Post by Neil Booth
Howard Hinnant wrote:-
Post by Howard Hinnant
I've been reviewing the age-old issue of interpreting
<whitespace>*<newline> as the end-of-line indicator as is the current
practice with gcc.
FWIW I support abandoning this behaviour too.
I filed bugzilla 24531 about this.

Haven't heard Joseph weigh in on this issue, but here are the options
that I see:

a) We enable the conditional warning for line continuation in a
comment at all times
(just as we do for normal lines)

b) Change the preprocessor to remove the behavior and disable the
continuation
if we have a \ followed by a space

c) Do nothing.

Option c will leave us with the current behavior that I don't believe
I've heard
anyone want to keep (other than it's the current documented behavior).

I'll work up a patch for b since that's what Apple would like the
most. I believe that
option a is needed at least.

This is, as Mr. Buck has noted, a regression from 2.95.

-eric
Andrew Pinski
2005-10-25 22:20:29 UTC
Permalink
OK, so it must be this, then
Installed.
That works. Thanks.
Andrew Pinski
2005-10-25 22:26:01 UTC
Permalink
OK, so it must be this, then
Installed.
That works. Thanks.
Ignore this, this was a typo.

-- Pinski
Andrew Pinski
2005-10-25 22:22:21 UTC
Permalink
Post by Eric Christopher
Post by Neil Booth
Howard Hinnant wrote:-
Post by Howard Hinnant
I've been reviewing the age-old issue of interpreting
<whitespace>*<newline> as the end-of-line indicator as is the current
practice with gcc.
FWIW I support abandoning this behaviour too.
I filed bugzilla 24531 about this.
Note this is documented behavior and I don't think we should change it at
all since it is one more thing to break old gcc code like stuff in Linux kernel.

We have people already complaining about removing extensions. Why should we change
this implementionation defined documented behavior.

-- Pinski
Andrew Pinski
2005-10-25 22:33:42 UTC
Permalink
Post by Andrew Pinski
Post by Eric Christopher
Post by Neil Booth
Howard Hinnant wrote:-
Post by Howard Hinnant
I've been reviewing the age-old issue of interpreting
<whitespace>*<newline> as the end-of-line indicator as is the current
practice with gcc.
FWIW I support abandoning this behaviour too.
I filed bugzilla 24531 about this.
Note this is documented behavior and I don't think we should change it at
all since it is one more thing to break old gcc code like stuff in Linux kernel.
We have people already complaining about removing extensions. Why should we change
this implementionation defined documented behavior.
Oh, one more thing. This seems like the normal problem of not reading the docs
if something does not work the way you want it to work.

The only thing we can do is point it out that it is documented behavior and
then move on to the next issue. Also why are we discussing this when
there are more important bugs to fix currently as this behavior has been documented
for a long time, at least 4 years.

-- Pinski
Howard Hinnant
2005-10-25 23:41:11 UTC
Permalink
Post by Andrew Pinski
We have people already complaining about removing extensions. Why should we change
this implementionation defined documented behavior.
I'm not convinced that "extension" is a proper term for this
behavior. It is more like an incompatibility with the rest of the
world's compilers. The reason for change is to conform to a de-facto
standard, and thus ease the migration of future gcc customers to our
compiler.

These hypothetical customers coming from MS, EDG-based, or
CodeWarrior compilers might have code that looks like:

// A poorly formatted comment \\
int x = 0;
int y = 1;
...

Note I'm not trying to defend code that looks like this. But the
fact is that if you've got a million lines of code that's been
compiling with a non-gcc compiler for a decade, silly things like
this tend to add an extra hurdle to porting to gcc.

I don't claim that gcc's behavior is better or worse than everyone
else's. I only claim that gcc is unique in this regard, and that
isn't a good thing if you're trying to be friendly to customers
wanting to adopt your product.

-Howard
Andrew Pinski
2005-10-25 23:44:34 UTC
Permalink
Post by Howard Hinnant
Post by Andrew Pinski
We have people already complaining about removing extensions. Why should we change
this implementionation defined documented behavior.
I'm not convinced that "extension" is a proper term for this
behavior. It is more like an incompatibility with the rest of the
world's compilers. The reason for change is to conform to a de-facto
standard, and thus ease the migration of future gcc customers to our
compiler.
These hypothetical customers coming from MS, EDG-based, or
// A poorly formatted comment \\
int x = 0;
int y = 1;
...
But this is not an extension at all. This is an implementation defined
behavior which is different than what an extension would do.

People depending on this is not the correct thing do any ways as
there could be another compiler besides which GCC which does this.

Please read what implemenation defined means, this is what you
are talking about.

Also Note there is a much older PR about this, PR 8270: http://gcc.gnu.org/PR8270.

-- Pinski
Gabriel Dos Reis
2005-10-25 23:53:00 UTC
Permalink
Andrew Pinski <***@physics.uc.edu> writes:

| Please read what implemenation defined means, this is what you
| are talking about.

Andrew --

Before taking your time to insult people; please do spend a little
bit of that time on your homework on their background.

-- Gaby
Howard Hinnant
2005-10-26 00:13:36 UTC
Permalink
Post by Andrew Pinski
But this is not an extension at all. This is an implementation defined
behavior which is different than what an extension would do.
People depending on this is not the correct thing do any ways as
there could be another compiler besides which GCC which does this.
<nod> I'm not disagreeing with anything you're saying. My only
point is that it might be in gcc's best interest to have the same
implementation defined behavior as MS, EDG-based compilers and
CodeWarrior when that is a reasonable choice (and probably others, I
know my list of compilers is incomplete).

-Howard
Andrew Pinski
2005-10-26 00:22:12 UTC
Permalink
Post by Howard Hinnant
Post by Andrew Pinski
But this is not an extension at all. This is an implementation defined
behavior which is different than what an extension would do.
People depending on this is not the correct thing do any ways as
there could be another compiler besides which GCC which does this.
<nod> I'm not disagreeing with anything you're saying. My only
point is that it might be in gcc's best interest to have the same
implementation defined behavior as MS, EDG-based compilers and
CodeWarrior when that is a reasonable choice (and probably others, I
know my list of compilers is incomplete).
Why not get other compilers to change to what GCC does? Why does GCC
have to follow what other compilers do, maybe other compilers
would be in the best interest of following what GCC does.

Why not instead get the standard changed and then GCC will just follow
then (and really should only follow at that point)?

(and Yes I know you wrote the MW C++ library and part of the C++
standards comittee).


-- Pinski
Howard Hinnant
2005-10-26 00:34:42 UTC
Permalink
Post by Andrew Pinski
Why not get other compilers to change to what GCC does? Why does GCC
have to follow what other compilers do, maybe other compilers
would be in the best interest of following what GCC does.
Why not instead get the standard changed and then GCC will just follow
then (and really should only follow at that point)?
<shrug> I'm just a pragmatic I guess.

It is only a suggestion. And a tiny suggestion at that. Imho, it
would be in gcc's best interest. I respect the fact that you feel
otherwise.

-Howard
Joe Buck
2005-10-26 00:40:56 UTC
Permalink
Post by Andrew Pinski
Why not get other compilers to change to what GCC does? Why does GCC
have to follow what other compilers do, maybe other compilers
would be in the best interest of following what GCC does.
The problem, I think, is that the behavior of both GCC *and* the
other compilers does not serve the users.

The reason is that there simply isn't any reason why a user would
use a backslash to continue a C++ comment on purpose, and plenty of
reason why she might do it by accident.

There appear to be five relevant cases; the backslash-newline (or
whitespace/backslash/newline) can be in

1) a C++ comment
2) a C comment
3) a string literal
4) a preprocessor directive
5) somewhere else.

For cases 2 and 5, the behavior doesn't matter. For case 3, the
continuation is unlikely to be accidental, and the user is, I think,
less likely to leave a stray slash in case 4.

But case 1 is the nasty one, as users think they can put anything
in a comment. A backslash at the end is likely to be an accident,
since just starting the next line with a // is easy enough.

So, I think that a trailing backslash at the end of a C++ comment
(followed or not by whitespace) should be warned about. I think
Apple's customers would want this warning too, since many of them
will also be interested in compiling for other systems, and there
is a lot of gcc 3.x out there. Best to use warnings to get rid
of the problematic code.
Mike Stump
2005-10-26 01:51:45 UTC
Permalink
Post by Joe Buck
1) a C++ comment
But case 1 is the nasty one, as users think they can put anything
in a comment. A backslash at the end is likely to be an accident,
since just starting the next line with a // is easy enough.
Be interesting to see the results of a grep on a large software
base. Does anyone have ready access to, say a linux distro handy?
Of all the hits I know about, none of them were an accident.
Joe Buck
2005-10-26 04:33:40 UTC
Permalink
Post by Mike Stump
Post by Joe Buck
1) a C++ comment
But case 1 is the nasty one, as users think they can put anything
in a comment. A backslash at the end is likely to be an accident,
since just starting the next line with a // is easy enough.
Be interesting to see the results of a grep on a large software
base. Does anyone have ready access to, say a linux distro handy?
Of all the hits I know about, none of them were an accident.
You're forgetting something: GNU/Linux distros are built with gcc,
and everyone is now using 3.x. So there can't be buildable programs
that depend on behavior gcc doesn't support. There can't be any
currently maintained programs for Linux or BSD that do.

Only users that have never used gcc 3.x could be depending on non-gcc
behavior. Apple is free to do whatever, but please don't claim
that you're doing it for "portability" if you encourage your users
to write code that breaks with gcc 3.x shipped by folks other than
Apple.
Shantonu Sen
2005-10-26 04:46:23 UTC
Permalink
You're forgetting something: GNU/Linux distros are built with
thousands of lines of patches to support new/different gcc behavior.
Thousands were added for the 2->3 transition, and thousands more for
3->4. Please don't claim that all upstream programs in all
distributions support gcc 3.4.4 and 4.0.2 without modification, and
thus gcc is the standard by which portability is defined.

Shantonu
Post by Joe Buck
Post by Mike Stump
Post by Joe Buck
1) a C++ comment
But case 1 is the nasty one, as users think they can put anything
in a comment. A backslash at the end is likely to be an accident,
since just starting the next line with a // is easy enough.
Be interesting to see the results of a grep on a large software
base. Does anyone have ready access to, say a linux distro handy?
Of all the hits I know about, none of them were an accident.
You're forgetting something: GNU/Linux distros are built with gcc,
and everyone is now using 3.x. So there can't be buildable programs
that depend on behavior gcc doesn't support. There can't be any
currently maintained programs for Linux or BSD that do.
Only users that have never used gcc 3.x could be depending on non-gcc
behavior. Apple is free to do whatever, but please don't claim
that you're doing it for "portability" if you encourage your users
to write code that breaks with gcc 3.x shipped by folks other than
Apple.
Joe Buck
2005-10-26 05:45:35 UTC
Permalink
Post by Joe Buck
You're forgetting something: GNU/Linux distros are built with
thousands of lines of patches to support new/different gcc behavior.
Unfortunately, too many C++ programmers in particular never used
a compiler other than g++, and older g++ versions accepted all kinds
of amazing stuff that was not C++. There was never any promise
made that such things would continue to compile forever.
Post by Joe Buck
Thousands were added for the 2->3 transition, and thousands more for
3->4. Please don't claim that all upstream programs in all
distributions support gcc 3.4.4 and 4.0.2 without modification, and
thus gcc is the standard by which portability is defined.
Who's talking about 3.4.4 or 4.0.2?

The behavior we're discussing was in gcc 3.0, as well as in Red
Hat's "2.96", so it's pretty old. My point is not that gcc is
"the standard by which portability is defined", but rather is
that code that still doesn't work with any gcc 3.x release is
not portable, and you won't find such code in the distros because
it's been fixed long ago.
Andrew Pinski
2005-10-26 06:05:59 UTC
Permalink
Post by Joe Buck
Post by Joe Buck
You're forgetting something: GNU/Linux distros are built with
thousands of lines of patches to support new/different gcc behavior.
Unfortunately, too many C++ programmers in particular never used
a compiler other than g++, and older g++ versions accepted all kinds
of amazing stuff that was not C++. There was never any promise
made that such things would continue to compile forever.
Post by Joe Buck
Thousands were added for the 2->3 transition, and thousands more for
3->4. Please don't claim that all upstream programs in all
distributions support gcc 3.4.4 and 4.0.2 without modification, and
thus gcc is the standard by which portability is defined.
Who's talking about 3.4.4 or 4.0.2?
also If you look at Apple, there has already be a revert of a patch which
went into 4.0.1 which fixes a C++ regression but also introduces rejecting
invalid code which was not rejected before 4.0.1 (in 4.0.0).

Why did Apple revert that patch, well because there was push back from
internal developers who did not want to fix their code. Why should
this case be any difference? In fact this case is difference in a
way because both ways are accepted by the standards committee as
acceptable. So you hurt one person who writes valid (but questionable)
C code and help another who writes valid (but still questionable) C code.
So from the looks of it, nobody can win. So the easiest (and in this
case, best) way is not to change and hurt the current customers as they
are more likely to be repeat customers already and would move to another
compiler, or even worse fork GCC.

Now the code in question is even not that hard to fix which in a weird
way as Apple got push back from developers that they don't want to change
their code. The easiest way to fix the problem in both ways is to use C
style comments around the C++ style comments.

The real question here is why should GCC do stuff like other compilers
in terms of implemenation defined behaviors? In my mind, we should do
what we have done right now and should not change, just to conform to
what other people do? In a way we are thinking different, isn't that
what Apple is about anyways?

If the standards committee says we are within the limits of the standard,
why change, especially when it comes to ASCII art. This seems silly
even to ask GCC to change over that.


-- Pinski
(hopefully this is much better worded/agruement than before was, I was upset
that this was even being asked about, when the standards committee said was
it okay and that it was over ASCII art).
Mike Stump
2005-10-26 16:29:48 UTC
Permalink
Post by Andrew Pinski
Why did Apple revert that patch, well because there was push back from
internal developers who did not want to fix their code. Why should
this case be any difference?
I'm sorry you don't understand the differences. In one, we have
every expectation that the code will compile on every C++ compiler
out there, except for gcc, and that makes it a gcc bug that has been
fixed. The other is a gcc feature that causes code that does compile
on every other C++ compiler that I know about to not compile on gcc,
so I guess you're right after all, there is no difference, we'd call
it a bug that needs to be fixed. :-) Wait, what side were you
arguing? :-)
Post by Andrew Pinski
or even worse fork GCC.
Is this like calling someone a Nazi?
Mike Stump
2005-10-26 16:02:56 UTC
Permalink
Post by Joe Buck
Post by Mike Stump
Be interesting to see the results of a grep on a large software
base. Does anyone have ready access to, say a linux distro handy?
Of all the hits I know about, none of them were an accident.
You're forgetting something: GNU/Linux distros are built with gcc,
and everyone is now using 3.x. So there can't be buildable programs
that depend on behavior gcc doesn't support.
? The claim was made that we cannot change it now, as it would break
the non-portable code that gcc now compiles. I said, hogwash, linux
doesn't have any such code, someone else did the grep, and sure
enough, there is no such code, so that cannot be a reason why we
cannot remove support for \ sp nl.

What did you think Andrew's point was? Maybe I entirely misread it?
Dale Johannesen
2005-10-26 17:16:13 UTC
Permalink
Post by Joe Buck
The problem, I think, is that the behavior of both GCC *and* the
other compilers does not serve the users.
The reason is that there simply isn't any reason why a user would
use a backslash to continue a C++ comment on purpose, and plenty of
reason why she might do it by accident.
...users think they can put anything in a comment. A backslash at the
end is likely to be an accident,
since just starting the next line with a // is easy enough.
Yes. From the user's point of view, the best thing appears to be
treating backslashes in C++ comments as part of the comment,
regardless of what follows them; that seems to follow the principle
of least surprise. That's not standard conforming, and therefore I'm
not advocating it for gcc, but it probably wouldn't break anything
outside compiler testsuites. Maybe this treatment should be made
standard conforming...?
Robert Dewar
2005-10-26 17:22:48 UTC
Permalink
Post by Dale Johannesen
Yes. From the user's point of view, the best thing appears to be
treating backslashes in C++ comments as part of the comment,
regardless of what follows them; that seems to follow the principle
of least surprise. That's not standard conforming, and therefore I'm
not advocating it for gcc, but it probably wouldn't break anything
outside compiler testsuites. Maybe this treatment should be made
standard conforming...?
I agree this would be a far preferable definition, allowing comments
to be continued with a \ seems a truly silly feature.
Mike Stump
2005-10-26 01:39:29 UTC
Permalink
Post by Andrew Pinski
People depending on this is not the correct thing do any ways as
there could be another compiler besides which GCC which does this.
Let's enumerate them, what other compilers do this besides gcc?
Andrew Pinski
2005-10-26 01:45:09 UTC
Permalink
Post by Mike Stump
Post by Andrew Pinski
People depending on this is not the correct thing do any ways as
there could be another compiler besides which GCC which does this.
Let's enumerate them, what other compilers do this besides gcc?
Does that really matter?

-- Pinski
Mike Stump
2005-10-26 02:07:08 UTC
Permalink
Post by Andrew Pinski
Does that really matter?
gcc is free to ignore users, existing code, porting problems from
other platforms and other C implementations, if we so choose. I'm
not used to writing such factors off wholesale. I tend to think a
balance is better weighing all the different factors. On one hand,
we have ascii line art in documentation, on the other hand, we have
this esthetic beauty. Existing, otherwise portable code against,
confusing a user that is trying to do cut and paste programming on
terminal emulators running curses packages that break cut-n-paste
that can be themselves fixed, to fix not just gcc cut-n-paste
programming, but all cut-n-pasting they might do.

Absent users, absent porting otherwise portable code, absent other
implementations, I'd stick with the esthetics.
Andrew Pinski
2005-10-26 02:10:00 UTC
Permalink
Post by Mike Stump
Post by Andrew Pinski
Does that really matter?
gcc is free to ignore users, existing code, porting problems from
other platforms and other C implementations, if we so choose. I'm
not used to writing such factors off wholesale. I tend to think a
balance is better weighing all the different factors. On one hand,
we have ascii line art in documentation, on the other hand, we have
this esthetic beauty. Existing, otherwise portable code against,
confusing a user that is trying to do cut and paste programming on
terminal emulators running curses packages that break cut-n-paste
that can be themselves fixed, to fix not just gcc cut-n-paste
programming, but all cut-n-pasting they might do.
Absent users, absent porting otherwise portable code, absent other
implementations, I'd stick with the esthetics.
Joe Buck
2005-10-26 04:28:44 UTC
Permalink
Post by Mike Stump
gcc is free to ignore users, existing code, porting problems from
other platforms and other C implementations, if we so choose.
You still have not demonstrated that this is a real problem. If someone
is having a real problem, then we can offer them a simple sed script to
fix it.

Are you really saying that someone is using ASCII line art in comments
that tweaks this behavior? Wow. Write them a script to put a vertical
bar character at the ends of the comment lines or something.
Kean Johnston
2005-10-26 09:22:55 UTC
Permalink
Post by Joe Buck
You still have not demonstrated that this is a real problem. If someone
is having a real problem, then we can offer them a simple sed script to
fix it.
If I am recalling the original posting correctly, the fact that
gcc behaves differently to "most other compilers" is the actual
problem. Issues relating to code correctness, user brain-deadednes,
or even just sensible practices are rather moot. gcc can be the
only Johnny in step, or it can behave, as Mr. Hinnant was suggesting,
the way everyone else does and adopt the de-facto standard.

Writing sed scripts that change source code is likely to be very
unpalletable to some users. If you're working in an ISO9000
environment where every single source line change is tracked
by a rather burdensome process, the last thing you want to do
is invoke that process for some source base simply because the
new compiler you are moving to behaves differently to the last 5
compilers you used.

Just my $0.02.

Kean
Paul Brook
2005-10-26 12:39:52 UTC
Permalink
Post by Kean Johnston
Writing sed scripts that change source code is likely to be very
unpalletable to some users. If you're working in an ISO9000
environment where every single source line change is tracked
by a rather burdensome process, the last thing you want to do
is invoke that process for some source base simply because the
new compiler you are moving to behaves differently to the last 5
compilers you used.
If you're working in that sort of environment the problem really should never
occur because your coding standards (enforced by an automatic checker) should
prohibit broken whitespace-at-end-of-line anyway.

Paul
Mike Stump
2005-10-26 16:11:08 UTC
Permalink
Post by Joe Buck
Are you really saying that someone is using ASCII line art in comments
that tweaks this behavior?
Yes, I'm sorry if previous message didn't make this clear.
Robert Dewar
2005-10-26 16:18:20 UTC
Permalink
Post by Mike Stump
Post by Joe Buck
Are you really saying that someone is using ASCII line art in comments
that tweaks this behavior?
Yes, I'm sorry if previous message didn't make this clear.
Why would line art ever tweak this problem, why would lines
in such art have trailing white space?
Howard Hinnant
2005-10-26 16:35:27 UTC
Permalink
Post by Robert Dewar
Post by Mike Stump
Post by Joe Buck
Are you really saying that someone is using ASCII line art in comments
that tweaks this behavior?
Yes, I'm sorry if previous message didn't make this clear.
Why would line art ever tweak this problem, why would lines
in such art have trailing white space?
Some programmers purposefully put trailing whitespace on their art in
order to prevent translation phase 2 line splicing. And it actually
works everywhere but gcc. Mind you I'm not defending this practice.
I'm just reporting what happens in the field, and giving the opinion
that gcc might as well adopt the same behavior as every other
compiler ... even though the standards say it doesn't have to.

-Howard
Andrew Pinski
2005-10-26 16:39:31 UTC
Permalink
Post by Howard Hinnant
Post by Robert Dewar
Post by Mike Stump
Post by Joe Buck
Are you really saying that someone is using ASCII line art in comments
that tweaks this behavior?
Yes, I'm sorry if previous message didn't make this clear.
Why would line art ever tweak this problem, why would lines
in such art have trailing white space?
Some programmers purposefully put trailing whitespace on their art in
order to prevent translation phase 2 line splicing. And it actually
works everywhere but gcc. Mind you I'm not defending this practice.
I'm just reporting what happens in the field, and giving the opinion
that gcc might as well adopt the same behavior as every other
compiler ... even though the standards say it doesn't have to.
But in a way you are defending it as you want GCC to change. If there
was any other reason besides ASCII art, some people would be more willing
to change but there is a simple fix person's code to get around this issue.
And that is by not using C++ style comments and just use C style ones.
It is a simple two edits to their code. I still am trying to figure
out why this was even brought up if it was only due to ASCII art, that
seems silly.

-- Pinski
Mike Stump
2005-10-26 16:59:13 UTC
Permalink
I still am trying to figure out why this was even brought up if it
was only due to ASCII art, that seems silly.
sorry ("I find ascii line art silly"); ;-)

We could do that!

If we didn't have any customers or if we expected they wouldn't bring
new code in to gcc that traditionally had been compiled by other
compilers, I'd see no point in doing it either. When one has no
customers or no code being ported into the platform, life is easier.
gcc has had a long colorful history of actually having customers (Hi
Joe) and having new code ported into the platform, I don't see a
reason to abandon that tradition just yet.
Andrew Pinski
2005-10-26 17:16:04 UTC
Permalink
Post by Mike Stump
I still am trying to figure out why this was even brought up if it
was only due to ASCII art, that seems silly.
sorry ("I find ascii line art silly"); ;-)
We could do that!
That is just stupid, that is infact would be invalid to what the standards
says so it is not silly at all. What I am trying to say is that
the only reason why this was brought up was because of some little
ASCII art (ASCII art does have its place in comments, see rs6000.c for
an example of where ASCII art actually helps). If there was another
reason, like for an example someone depends on implementation defined
behavior which actually changes the meaning of the code like bitwise operators
See: http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation

Another compiler could define >> with signed types the same as the
unsigned >> (I think there might be such a compiler too).

We could have it defined as such that >> with signed types is the same as >> with
unsigned types, and the major other compilers do it a different way.

This is one case where changing is much harder and even worse as almost all
people depend on >> with signed types to act like the other compilers.

But since we are not dealing with comments, why this is not a hard decision
to make to stay with what we have already. I am just trying to point
out that it was silly to bring the issue up if it only concerns comments.

-- Pinski
Howard Hinnant
2005-10-26 19:19:36 UTC
Permalink
Post by Andrew Pinski
What I am trying to say is that
the only reason why this was brought up was because of some little
ASCII art (ASCII art does have its place in comments, see rs6000.c for
an example of where ASCII art actually helps). If there was another
reason, like for an example someone depends on implementation defined
behavior which actually changes the meaning of the code like
bitwise operators
I believe I may have not described the situation sufficiently
clearly. My apologies.

Yes, this does have to do with ASCII art. But no, it is not simply a
matter of messing up the comments. Code behavior can change. What
was working code can stop working. Or even worse, the code can
compile and have different run time behavior:

int x = 0;

void f()
{
// ascii art \ <white space here>
int x = 1;
if (x)
do_thing1();
else
do_thing2();
}

If end-of-line white space is stripped in phase 1, do_thing2() is
called. If end-of-line white space is not stripped, do_thing1() is
called.

-Howard
Robert Dewar
2005-10-26 22:25:35 UTC
Permalink
Post by Howard Hinnant
If end-of-line white space is stripped in phase 1, do_thing2() is
called. If end-of-line white space is not stripped, do_thing1() is
called.
SO this is truly appallingly bad code, given its behavior depends
so radically on an implementation defined feature!

Probably this shows that some warnings are needed in this situation
at the very least.
Robert Dewar
2005-10-26 16:54:53 UTC
Permalink
Post by Howard Hinnant
Some programmers purposefully put trailing whitespace on their art in
order to prevent translation phase 2 line splicing. And it actually
works everywhere but gcc. Mind you I'm not defending this practice.
I'm just reporting what happens in the field, and giving the opinion
that gcc might as well adopt the same behavior as every other compiler
... even though the standards say it doesn't have to.
-Howard
Seems a weak argument to me. Changing gcc would create incompatibilities
with previous behavior of gcc, and that is FAR more significant than
worrying about other compilers in my opinion. Having gcc compile
non-portable code accepted by other compilers is a useful goal, but
one of low priority compared to maintaining compatibility as far as
possible between gcc versions.
Kean Johnston
2005-10-26 17:17:16 UTC
Permalink
Post by Robert Dewar
worrying about other compilers in my opinion. Having gcc compile
non-portable code accepted by other compilers is a useful goal, but
one of low priority compared to maintaining compatibility as far as
possible between gcc versions.
You mean like the change between 2.95 that worked the way Howard
wanted it to, and 3.x (I dont know the value of x where the change
happened) where it doesn't?

This compiles on 2.95.3:

(There is whitespace at the end of the following line)
// comment \\
int x = 0;
int y = 1;
int foo() { return x + y;}

But doesn't on 3.4.4. I dont have earlier versions of 2.x or 3.x
to narrow it down further. Sorry. I beleive Pinski posted the
actual changelog entry where the change occured.

Kean
Scott Robert Ladd
2005-10-26 17:35:05 UTC
Permalink
Post by Robert Dewar
Seems a weak argument to me. Changing gcc would create incompatibilities
with previous behavior of gcc, and that is FAR more significant than
worrying about other compilers in my opinion. Having gcc compile
non-portable code accepted by other compilers is a useful goal, but
one of low priority compared to maintaining compatibility as far as
possible between gcc versions.
Wouldn't it be possible to implement a compile-time option to enable the
desired behavior only for those poor folk who have this problem?

It's not as if GCC has been shy about adding options before. :)

The *default* behavior of the compiler should follow published standards
first and past GCC behavior second (and both whenever possible). Support
for "other compilers" is desirable, if it can be implemented without
compromising those primary objectives.
--
Scott Robert Ladd <***@coyotegulch.com>

Coyote Gulch Productions
http://www.coyotegulch.com
Robert Dewar
2005-10-26 18:49:26 UTC
Permalink
Post by Scott Robert Ladd
Wouldn't it be possible to implement a compile-time option to enable the
desired behavior only for those poor folk who have this problem?
Of course this is possible, but it is only worth it if

a) there are a substantial number of such poor folk

b) it is not easy for them to do what they want without the option

I don't see either is true here.
Eric Christopher
2005-10-26 18:52:30 UTC
Permalink
Post by Robert Dewar
I don't see either is true here.
Actually, I agree. While I'd like the change to the compiler, I don't
want it to be a switch. Either we do it, or we don't.

-eric
Dave Korn
2005-10-26 18:52:51 UTC
Permalink
Post by Scott Robert Ladd
Post by Robert Dewar
Seems a weak argument to me. Changing gcc would create incompatibilities
with previous behavior of gcc, and that is FAR more significant than
worrying about other compilers in my opinion. Having gcc compile
non-portable code accepted by other compilers is a useful goal, but
one of low priority compared to maintaining compatibility as far as
possible between gcc versions.
Wouldn't it be possible to implement a compile-time option to enable the
desired behavior only for those poor folk who have this problem?
Yes, absolutely so. Just add a flag in the usual way, test for it in
cpplex.c/skip_escaped_newlines, and change the bit that says

if (saved_cur != buffer->cur - 1
&& !pfile->state.lexing_comment)
cpp_error (pfile, DL_WARNING,
"backslash and newline separated by space");

so that according to the flag setting, it could either issue a DL_WARNING as
it currently does, or the level could be changed to DL_PEDWARN or DL_ERROR, or
it could skip the cpp_error call altogether.

Generating the actual patch is left as an exercise for the reader[*]!


cheers,
DaveK

[*] Or whoever else actually _cares_ about this more than I do!
--
Can't think of a witty .sigline today....
Joe Buck
2005-10-26 00:11:52 UTC
Permalink
Post by Howard Hinnant
I'm not convinced that "extension" is a proper term for this
behavior. It is more like an incompatibility with the rest of the
world's compilers. The reason for change is to conform to a de-facto
standard, and thus ease the migration of future gcc customers to our
compiler.
These hypothetical customers coming from MS, EDG-based, or
// A poorly formatted comment \\
int x = 0;
int y = 1;
...
Howard,

Have you tested the sequence above with various compilers? I just
have. The behavior depends on whether there is whitespace after the
\\ or not. If there is none, then EDG-based compilers will comment
out the declaration of x. If there is whitespace, gcc 3.x comments
it out and the others don't. I personally like the fact that gcc's
behavior does not depend on invisible characters; on the other hand,
there's a good argument that this behavior is ordinarily not desirable
and should be warned about, except to continue strings.

I have difficulty believing that it is desirable for production code
to contain surprises like this.
Howard Hinnant
2005-10-26 00:22:15 UTC
Permalink
Post by Joe Buck
Post by Howard Hinnant
// A poorly formatted comment \\
int x = 0;
int y = 1;
...
Howard,
Have you tested the sequence above with various compilers?
I only know of the results on gcc 4.x, MS, EDG-based, and Freescale
CodeWarrior.
Post by Joe Buck
I just
have. The behavior depends on whether there is whitespace after the
\\ or not. If there is none, then EDG-based compilers will comment
out the declaration of x. If there is whitespace, gcc 3.x comments
it out and the others don't.
Right, that's exactly the issue.
Post by Joe Buck
I personally like the fact that gcc's
behavior does not depend on invisible characters; on the other hand,
there's a good argument that this behavior is ordinarily not desirable
and should be warned about, except to continue strings.
I have difficulty believing that it is desirable for production code
to contain surprises like this.
Personally I'd fire a programmer that depended on the presence of
whitespace after a '\'. It is implementation defined behavior, and
invisible in the source at that (in most editors). I've seen code
broken both by gcc's behavior, and by other compiler's behavior by
depending on what happens when whitespace is included after a '\'.

And it is not my assertion that gcc's behavior is better or worse
than other compilers. Only that gcc's behavior is unique in the
industry (I actually haven't tried all other modern compilers) and
that uniqueness in this way is not an asset to gcc.

-Howard
Joe Buck
2005-10-26 00:30:32 UTC
Permalink
Post by Howard Hinnant
And it is not my assertion that gcc's behavior is better or worse
than other compilers. Only that gcc's behavior is unique in the
industry (I actually haven't tried all other modern compilers) and
that uniqueness in this way is not an asset to gcc.
gcc is "unique in the industry" in any number of ways, as is every
other compiler -- in that each of them will have some kind of behavior
that is perhaps odd, but might have been accidentally exploited by
a programmer who just whacks away at code and accepts anything that
happens to compile.

I'm still waiting for an explanation as to why this is an important
issue, other than that someone has a customer who says that it is.
Why is it important to the customer? Why wouldn't a one-line sed
script that eliminates the issue altogether suffice?
Steven Bosscher
2005-10-26 06:16:31 UTC
Permalink
Post by Joe Buck
I'm still waiting for an explanation as to why this is an important
issue, other than that someone has a customer who says that it is.
Why is it important to the customer? Why wouldn't a one-line sed
script that eliminates the issue altogether suffice?
 
Anything that makes porting from one platform to another is a problem. 
If some app doesn't compile, this customer (usually an ISV ;-) punts and
says "it doesn't work" and that is the end of it for them.  Telling them
that GCC is unique and that they should use a sed script shouldn't be
necessary.
 
I don't understand why _anyone_ sane of mind would support the idea that
it is OK that GCC does B if everyone else in the industry does A.  Maybe
B isn't such a good idea, then?  Even if it is the technical superior
thing to do, it may not be the right thing to do.  People actually use
this compiler, and if GCC does things differently from most others
compiler, it makes GCC a headache instead of a tool if those folks want
to switch compilers.
 
Gr.
Steven
 
 
Joe Buck
2005-10-26 16:28:23 UTC
Permalink
Post by Steven Bosscher
Post by Joe Buck
I'm still waiting for an explanation as to why this is an important
issue, other than that someone has a customer who says that it is.
Why is it important to the customer? Why wouldn't a one-line sed
script that eliminates the issue altogether suffice?
 
Anything that makes porting from one platform to another is a problem. 
If some app doesn't compile, this customer (usually an ISV ;-) punts and
says "it doesn't work" and that is the end of it for them.  Telling them
that GCC is unique and that they should use a sed script shouldn't be
necessary.
I have limited sympathy, because I've spent so much time getting past the
oddities of Sun's compiler, and HP's compiler, and Microsoft's compiler,
and everyone else's. Every single one of those compilers has some
oddities, bugs, and nonconformance to the standard, and in most of those
the compiler is provably wrong. But in many cases, the vendors will not
fix the bugs (or argued for years against doing so) because of backward
compatibility issues. And that is for code with specified behavior.

This is a case of unspecified behavior.
Post by Steven Bosscher
I don't understand why _anyone_ sane of mind would support the idea that
it is OK that GCC does B if everyone else in the industry does A.
OK, I'll be polite and assume that you don't understand me, rather
than that you think I am insane. :-)

That's what we have standards for: so that compilers work the same way
for standard-conformant code.

But in this case, we are talking about the behavior when the compiler is
given code with *unspecified* behavior. I believe that people should not
be writing such code. Also, I believe that people writing code that
depends on behavior A are laying time bombs for their employers, setting
traps that will blow up at some later date. Specifically, they are
creating code that will malfunction as soon as someone strips the
trailing whitespace from it.

I would prefer to detect and eliminate such code, rather than to enable
it, even if (and *especially if*) all compilers in the world implemented
behavior A and it were required by the standard (I would want a warning
option to object loudly to any code whose behavior would change if
trailing whitespace were eliminated).

This will be my last message on this topic.
Steven Bosscher
2005-10-26 16:38:00 UTC
Permalink
Post by Joe Buck
That's what we have standards for: so that compilers work the same way
for standard-conformant code.
And we have de facto standards that you just want to ignore.

Gr.
Steven
Robert Dewar
2005-10-26 16:58:52 UTC
Permalink
Post by Steven Bosscher
Post by Joe Buck
That's what we have standards for: so that compilers work the same way
for standard-conformant code.
And we have de facto standards that you just want to ignore.
No, conflicting "de facto" behaviors (certainly not standards), that
cannot all be resolved. In this case, we have to worry about past
gcc behavior and behavior of foreign compilers. The former is far
more important. The burden of introducing gratuitous incompatibilities
with existing code is very high. It is met if the standard insists
on a change, or if everyone agrees that a change is important enough
to tolerate the incompatibility. It is clear that in this case neither
case holds. So given the argument for change has failed to create a
consensus, it fails and should be ignored.
Mike Stump
2005-10-26 17:04:12 UTC
Permalink
Post by Robert Dewar
No, conflicting "de facto" behaviors (certainly not standards), that
cannot all be resolved. In this case, we have to worry about past
gcc behavior and behavior of foreign compilers.
Yes. I've asked, how many lines exist that rely upon this, the
answer was zero. We can have someone that has ready access to
sourceforge or the google cache to count there (Hi Matt), to improve
the answer, but my guess is that it would remain fairly low.
Andrew Pinski
2005-10-26 17:16:46 UTC
Permalink
Post by Mike Stump
Post by Robert Dewar
No, conflicting "de facto" behaviors (certainly not standards), that
cannot all be resolved. In this case, we have to worry about past
gcc behavior and behavior of foreign compilers.
Yes. I've asked, how many lines exist that rely upon this, the
answer was zero. We can have someone that has ready access to
sourceforge or the google cache to count there (Hi Matt), to improve
the answer, but my guess is that it would remain fairly low.
How many lines depend on this the other way?

The answer none.


-- Pinski
Robert Dewar
2005-10-26 17:19:21 UTC
Permalink
Yes. I've asked, how many lines exist that rely upon this, the answer
was zero. We can have someone that has ready access to sourceforge or
the google cache to count there (Hi Matt), to improve the answer, but
my guess is that it would remain fairly low.
of course that is only a part of the total source base, but I agree
it would be indicative. Of course you are probably also showing that
this is an unimportant issue not worth making a change for.
Steven Bosscher
2005-10-26 17:24:17 UTC
Permalink
Post by Robert Dewar
Post by Steven Bosscher
Post by Joe Buck
That's what we have standards for: so that compilers work the same way
for standard-conformant code.
And we have de facto standards that you just want to ignore.
No, conflicting "de facto" behaviors (certainly not standards), that
cannot all be resolved. In this case, we have to worry about past
gcc behavior and behavior of foreign compilers. The former is far
more important.
The behavior changed from GCC 2.95 to GCC 3, so we already broke
compatibility with past GCC releases.
And most "outsider" people are only now beginning to port things
from 2.95 to something newer...

Gr.
Steven
Mike Stump
2005-10-26 16:45:24 UTC
Permalink
Post by Joe Buck
This is a case of unspecified behavior.
?
Post by Joe Buck
That's what we have standards for: so that compilers work the same way
for standard-conformant code.
But in this case, we are talking about the behavior when the
compiler is
given code with *unspecified* behavior. I believe that people
should not
be writing such code.
? No. Please read the C++ standard again:

1.4.13 unspecified behavior
[defns.unspecified]
behavior, for a well-formed program construct and correct data,
that
depends on the implementation. The implementation is not
required to
document which behavior occurs. [Note: usually, the range of
possible
behaviors is delineated by the Standard. ]


1 Physical source file characters are mapped, in an
implementation-
defined manner


Now, before you claim that people should not write code that relies
upon implementation defined behavior, realize that the sentence above
makes all possible programs reliant upon implementation defined
behavior.
Stan Shebs
2005-10-26 19:25:21 UTC
Permalink
Post by Joe Buck
Post by Howard Hinnant
And it is not my assertion that gcc's behavior is better or worse
than other compilers. Only that gcc's behavior is unique in the
industry (I actually haven't tried all other modern compilers) and
that uniqueness in this way is not an asset to gcc.
gcc is "unique in the industry" in any number of ways, as is every
other compiler -- in that each of them will have some kind of behavior
that is perhaps odd, but might have been accidentally exploited by
a programmer who just whacks away at code and accepts anything that
happens to compile.
I'm still waiting for an explanation as to why this is an important
issue, other than that someone has a customer who says that it is.
Why is it important to the customer?
With all that's going on in politics right now :-) , we Appleites
are being somewhat careful to not give away too much detail
about the users in question. I think it's safe to say that they
are large important code bases, and that this is a historic
opportunity for GCC to displace proprietary compilers out of
some longtime strongholds. Thus we want to ease the transition,
not put up additional obstacles, especially over borderline-
pedantic issues (one-line sed script is easy, revving multiple
multi-million-line source bases, not so much).

Again, I think this could be easily addressed in Apple's GCC only,
but that will mean that the software in question will compile on Macs,
but not on GNU/Linux. Of course, having apps on OS X that can't
be ported to Linux is not necessarily a bad thing from Apple's
point of view... :-)

Stan
Robert Dewar
2005-10-26 22:26:52 UTC
Permalink
Post by Stan Shebs
Again, I think this could be easily addressed in Apple's GCC only,
but that will mean that the software in question will compile on Macs,
but not on GNU/Linux. Of course, having apps on OS X that can't
be ported to Linux is not necessarily a bad thing from Apple's
point of view... :-)
Stan
Don't you think it is reasonable to fix horrible coding
errors like this, you are just asking for maintenance
problems. In the short term, kludging may make sense,
in the long term it sounds a bad idea to keep such
non-portable code around.

Mike Stump
2005-10-26 01:45:55 UTC
Permalink
I personally like the fact that gcc's behavior does not depend on
invisible characters
All other things being equal, this is a nice design goal. I like it
too. Should we break peoples otherwise portable code to have an
implementation defined behavior that no one else has?
Andrew Pinski
2005-10-26 01:50:22 UTC
Permalink
Post by Mike Stump
I personally like the fact that gcc's behavior does not depend on
invisible characters
All other things being equal, this is a nice design goal. I like it
too. Should we break peoples otherwise portable code to have an
implementation defined behavior that no one else has?
but it is not portable code. That is my point. Instead of all
this discussion, I went and found all the times this behavior
was mentioned and found it was mentioned each year since it
was added. I was only mentioned twice last year.

Here are the links to the previous discussion for other people's
benifit:
http://gcc.gnu.org/ml/gcc/2003-11/msg00105.html
http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01685.html
http://gcc.gnu.org/ml/gcc-bugs/2000-10/msg00117.html
http://gcc.gnu.org/ml/gcc/2000-05/msg01032.html
http://gcc.gnu.org/ml/gcc/2001-03/msg00130.html
http://gcc.gnu.org/ml/gcc/2001-10/msg00012.html
http://gcc.gnu.org/ml/gcc/2002-02/msg01181.html
http://gcc.gnu.org/ml/gcc-patches/2001-04/msg00543.html
http://gcc.gnu.org/ml/gcc-patches/2000-08/msg01118.html
http://gcc.gnu.org/ml/gcc/2002-11/msg00267.html
http://gcc.gnu.org/ml/gcc-patches/2001-04/msg00603.html

-- Pinski
Mike Stump
2005-10-26 02:48:31 UTC
Permalink
Post by Andrew Pinski
but it is not portable code. That is my point.
I'm sorry, what word/phrase do you mean for code that compiles and
runs on a plethora of actual C++ implementations? Pretend I used
that word/phrase instead.
Joe Buck
2005-10-26 04:25:03 UTC
Permalink
Post by Mike Stump
I personally like the fact that gcc's behavior does not depend on
invisible characters
All other things being equal, this is a nice design goal. I like it
too. Should we break peoples otherwise portable code to have an
implementation defined behavior that no one else has?
Code that depends on invisible whitespace to function correctly is
already broken. At some point, someone will do the equivalent of
delete-trailing-whitespace and break it.

And the code is easily cleaned up.
Mike Stump
2005-10-26 01:33:16 UTC
Permalink
I don't think we should change it at all since it is one more thing
to break old gcc code like stuff in Linux kernel.
To get concrete, how many times does \ SP SP * NL occur in old/
current linux kernels?
David Daney
2005-10-26 01:46:44 UTC
Permalink
I don't think we should change it at all since it is one more thing
to break old gcc code like stuff in Linux kernel.
To get concrete, how many times does \ SP SP * NL occur in old/ current
linux kernels?
$ egrep -r '\\ +$' *

On my 2.6.14-rc2 tree reports no hits in any source files.

David Daney
Andrew Pinski
2005-10-26 01:46:41 UTC
Permalink
Post by Mike Stump
I don't think we should change it at all since it is one more thing
to break old gcc code like stuff in Linux kernel.
To get concrete, how many times does \ SP SP * NL occur in old/
current linux kernels?
I was just showing where this could show up.

In fact currently there is a testcase in GCC which tests this.

Hint, hint it was not an accident that this was done.

-- Pinski
Mike Stump
2005-10-26 02:25:41 UTC
Permalink
Post by Andrew Pinski
Hint, hint it was not an accident that this was done.
I am not unaware of the history. What we are addressing is, if this
was a mistake.
Joe Buck
2005-10-25 22:50:53 UTC
Permalink
Post by Eric Christopher
This is, as Mr. Buck has noted, a regression from 2.95.
No, it is not (and I did not say that); it is a behavior change, which I
vaguely recall was requested.

Still, many of us have to get code past multiple compilers, so the
warning would be useful.
Eric Christopher
2005-10-25 22:34:16 UTC
Permalink
Oh, one more thing. This seems like the normal problem of not
reading the docs
if something does not work the way you want it to work.
So?
The only thing we can do is point it out that it is documented
behavior and
then move on to the next issue. Also why are we discussing this when
there are more important bugs to fix currently as this behavior has
been documented
for a long time, at least 4 years.
Your important and my important are two different things.

-eric
Andrew Pinski
2005-10-25 22:44:38 UTC
Permalink
Post by Eric Christopher
Oh, one more thing. This seems like the normal problem of not
reading the docs
if something does not work the way you want it to work.
So?
The only thing we can do is point it out that it is documented
behavior and
then move on to the next issue. Also why are we discussing this when
there are more important bugs to fix currently as this behavior has
been documented
for a long time, at least 4 years.
Your important and my important are two different things.
But this is the FSF GCC mailing list so the important here should
be regressions. Hint hint. If people don't want FSF releases
any more say so please, otherwise we get into this fights about
what is really important.

This is not a regression really.


-- Pinski
Eric Christopher
2005-10-25 22:45:36 UTC
Permalink
Post by Andrew Pinski
This is not a regression really.
It is a regression against 2.95.

-eric
Joe Buck
2005-10-25 22:52:18 UTC
Permalink
Post by Eric Christopher
Post by Andrew Pinski
This is not a regression really.
It is a regression against 2.95.
As I said, no it is not. A behavior change is only a regression when
the first behavior is correct and the second is not.

In this case, there is no defined behavior.
Eric Christopher
2005-10-25 22:52:39 UTC
Permalink
Post by Joe Buck
As I said, no it is not. A behavior change is only a regression when
the first behavior is correct and the second is not.
Fair enough. :)

-eric
Stan Shebs
2005-10-25 23:05:55 UTC
Permalink
Post by Andrew Pinski
Post by Eric Christopher
Oh, one more thing. This seems like the normal problem of not
reading the docs
if something does not work the way you want it to work.
So?
The only thing we can do is point it out that it is documented
behavior and
then move on to the next issue. Also why are we discussing this when
there are more important bugs to fix currently as this behavior has
been documented
for a long time, at least 4 years.
Your important and my important are two different things.
But this is the FSF GCC mailing list so the important here should
be regressions. Hint hint. If people don't want FSF releases
any more say so please, otherwise we get into this fights about
what is really important.
I think you've managed to get everything backwards. We have potential
customers (dunno if I'm allowed to name them, so I won't) who can't
use GCC because of its current behavior. We can fix Apple GCC in a
minute to make them happy, but of course then we'll have to tell them
"don't use FSF GCC, you'll lose". So we're offering to make FSF GCC
work for these users also, and asking for input on the idea. As
always, it's the community's choice as to whether this is a desirable
feature for FSF GCC, and that's part of the discussion, but at least
don't p*ss on us for making the offer in the first place!

Stan
Joe Buck
2005-10-25 23:25:46 UTC
Permalink
Post by Stan Shebs
I think you've managed to get everything backwards. We have potential
customers (dunno if I'm allowed to name them, so I won't) who can't
use GCC because of its current behavior.
I had thought that a numer of users had requested the current
behavior back in the egcs days, though I can't track down the brain cell I
stored that info in; also, I'm having a hard time picturing source code that

a) exhibits different behavior because of this bug, and
b) is maintainable (remember, the behavior depends on the presence
of characters that are completely invisible to many tools).

Perhaps you could explain (without violating the customer's
confidentiality) how it is that this is an important problem?
Post by Stan Shebs
minute to make them happy, but of course then we'll have to tell them
"don't use FSF GCC, you'll lose". So we're offering to make FSF GCC
work for these users also, and asking for input on the idea. As
always, it's the community's choice as to whether this is a desirable
feature for FSF GCC, and that's part of the discussion, but at least
don't p*ss on us for making the offer in the first place!
I think that there was only pissing because this feature that many
considered useful was labeled a regression and a bug.
Ian Lance Taylor
2005-10-25 23:48:13 UTC
Permalink
Post by Joe Buck
I had thought that a numer of users had requested the current
behavior back in the egcs days, though I can't track down the brain cell I
stored that info in;
For the record, the current behaviour was implemented here:
http://gcc.gnu.org/ml/gcc-patches/2000-09/msg00430.html

Ian
Mike Stump
2005-10-26 01:37:18 UTC
Permalink
Post by Joe Buck
I'm having a hard time picturing source code that
a) exhibits different behavior because of this bug, and
b) is maintainable (remember, the behavior depends on the presence
of characters that are completely invisible to many tools).
Perhaps you could explain (without violating the customer's
confidentiality) how it is that this is an important problem?
Existing ASCII line art in documentation embedded in the source code
as C++ style comments.
Andrew Pinski
2005-10-26 01:43:46 UTC
Permalink
Post by Mike Stump
Post by Joe Buck
I'm having a hard time picturing source code that
a) exhibits different behavior because of this bug, and
b) is maintainable (remember, the behavior depends on the presence
of characters that are completely invisible to many tools).
Perhaps you could explain (without violating the customer's
confidentiality) how it is that this is an important problem?
Existing ASCII line art in documentation embedded in the source code
as C++ style comments.
In fact the removal of the warning for comment cases was that exact case
so ...

http://gcc.gnu.org/ml/gcc-patches/2001-04/msg00603.html


http://gcc.gnu.org/ml/gcc/2002-11/msg00267.html

-- Pinski
Mike Stump
2005-10-26 02:17:28 UTC
Permalink
Post by Andrew Pinski
In fact the removal of the warning for comment cases was that exact case
so ...
http://gcc.gnu.org/ml/gcc-patches/2001-04/msg00603.html
Curious, the backslash2.c testcase is now:

/* Test warnings for backslash-space-newline.
Source: Neil Booth. 6 Dec 2000. */

foo \
bar
/* { dg-warning "separated by space" "" { target *-*-* } 8 } */

/* foo \
bar */
/* { dg-bogus "separated by space" "" { target *-*-* } 12 } */

;-)

These are the cases that are uninteresting to us.
Continue reading on narkive:
Loading...