Markus Scherer | 23 Apr 19:48
Picon

Comments on Unicode Format for Network Interchange

Dear Mr. Klensin and Mr. Padlipsky et al.,

I have reviewed and discussed your draft-klensin-net-utf8-03 with some
colleagues. We welcome the standardization on UTF-8 as the default
internet charset.

We would like to make the following suggestions
(each starting with *** and ending with *** *** among quotes from the
internet-draft):

[...]

2.  Net-Unicode

2.1.  Definition

   The Network Unicode (Net-Unicode) format is defined as follows:

   1.  Characters MUST be coded in UTF-8 as defined in [RFC3629].

   2.  Line-endings MUST be indicated by the sequence Carriage-Return
       (U+000D) followed by Line-Feed (U+000A).

*** Suggested change:
   2.  Line-endings MUST be indicated by the sequence Carriage-Return
       (U+000D) followed by Line-Feed (U+000A), or by a single
       Carriage-Return (U+000D), or by a single Line-Feed (U+000A).

Justification: We believe that single CR and LF are common because of
implementation practice on a variety of platforms, and that it is both
(Continue reading)

Frank Ellermann | 25 Apr 01:19
Picon
Picon

Re: Comments on Unicode Format for Network Interchange

Markus Scherer wrote:

> *** Suggested change:
>    2.  Line-endings MUST be indicated by the sequence Carriage-Return
>        (U+000D) followed by Line-Feed (U+000A), or by a single
>        Carriage-Return (U+000D), or by a single Line-Feed (U+000A).

-1F

> Justification: We believe that single CR and LF are common because of
> implementation practice on a variety of platforms, and that it is both
> unrealistic and unnecessary to try to legislate them away.

No, it causes havoc.

> Applications already commonly handle all of CR, LF and CR+LF, and some
> support even more characters according to the Unicode Newline
> Guidelines.

The draft isn't about arbitrary text or XML (where you'd also need NEL),
it's about telnet.  It tries to extend ALPHA and DIGIT as used in some
syntax constructs for text in Internet protocols, it doesn't try to
introduce a new concept of "line" in these protocols.

> *** Suggested change:
>    4. The UTF-8 signature byte sequence (EF BB BF, UTF-8 encoding of
>       U+FEFF, sometimes called Byte Order Mark ("BOM")), when it
>       appears at the beginning of the text, SHOULD be deleted by the
>       recipient.

(Continue reading)


Gmane