David Edelsohn <***@watson.ibm.com> writes:
>>>>>> Zack Weinberg writes:
>>> I hope we won't see so much rigid position in the future regarding
>>> target deperacation. Why is it so urgent to deprecate that target
> Zack> HOST_EBCDIC presents a severe hindrance to implementing multi-charset
> Zack> support in a C99 compliant manner.
> [This motivation should have been disclosed at the beginning
> of the discussion.]
It was. If you reread my original message you will see a long
paragraph about HOST_EBCDIC.
> Is it possible to implement the multi-charset support and say
> that HOST_EBCDIC will not be C99 compliant?
Maybe, at a cost in maintainability, which would not be as extreme as
the cost to maintainability from supporting C99 under HOST_EBCDIC.
Let me go over the problem in some detail. C99 §6.4.3 defines
universal character names, \uXXXX or \UXXXXXXXX (where each X is a
hexadecimal digit), which may appear in identifiers, character
constants, and string literals, "to designate characters that are not
in the basic character set." The constraints and semantics of these
are defined in terms of ISO/IEC 10646 (Unicode) which is a superset of
ASCII. An implementation is allowed to accept "multibyte characters
that are not part of the basic source character set" in the same
contexts where UCNs are acceptable; if so, there shall be a (possibly
many-to-one) function mapping such characters to UCNs and therefore to
ISO10646 code points.
The most straightforward way to implement this functionality is to do
all internal processing of characters in an encoding of ISO10646.
Whatever the input character set actually is, it is transformed by
iconv() or similar utility into such an encoding (probably UTF8) in
translation phase 1. UCNs are replaced with the codepoints they
designate in phase 3. Finally, in phase 5, iconv() is again employed
to convert to the user- or locale-selected execution character set,
which is not necessarily Unicode.
Now, cpplib, c-lex.c, and c-parse.in all make extensive use of
character constants to designate members of the basic source character
set. The actual values of these constants are defined by the
execution character set of the host compiler. If that set is ASCII or
any superset thereof, there is no problem -- ASCII corresponds exactly
to the portion of ISO10646 containing the basic source character set.
But if the execution character set of the host compiler is EBCDIC,
the character constants will have inappropriate values for working
with text encoded in UTF8.
There are four possible ways to get around this problem:
- Replace all these character constants with integer or enumeration
constants. I consider this infeasible, it would make these parts of
the compiler much harder to maintain.
- Under HOST_EBCDIC, use an internal encoding where members of the
basic source character set have their EBCDIC values. UTF-EBCDIC is
such an encoding, but it is not supported by GNU iconv; it would
have to be implemented, and appropriate #ifdeffage added to cpplib,
with consonant (moderate) maintenance burden.
- Under HOST_EBCDIC disallow UCNs and use of extended character sets,
claiming only C90 compliance. This is doable at roughly the same
maintenance burden as option two: a certain amount of #ifdeffage and
rarely-tested code paths in cpplib.
- Abandon HOST_EBCDIC: require that GCC be built in an environment
where the execution character set of the host compiler is ASCII or a
superset thereof. This does /not/ entail that GCC would be unable
to accept input encoded in EBCDIC or use EBCDIC for its own
execution character set; in fact both are quite easy to accomplish,
by the aforementioned transformations in phases 1 and 5.
Obviously I am in favor of option four.
It turns out that only a small part of the i370 back end is dependent
on HOST_EBCDIC. I would be willing to compromise at the removal of
all support for this mode, leaving the rest of the back end intact.
However, I think that the demonstrated lack of interest in maintaining
the port in the FSF repository is still a strong argument for
obsoleting this entire back end.