Re: Tilde
Adam M. Costello <idn.amc+0 <at> nicemice.net.RemoveThisWord>
2004-08-10 07:09:43 GMT
tedd <tedd <at> sperling.com> wrote:
> in IDNA the Tilde (code point 007E) is prohibited, but the Tilde
> Operator (code point 223C) is not.
IDNA inherits the prohibition of U+007E from RFC-1123 (STD-3), which by
reference to RFC-952 defined host names as ASCII strings containing only
A-Z, a-z, 0-9, hyphen-minus, and dot. Therefore some ASCII characters
were explicitly allowed, all other ASCII characters were explicitly
forbidden, and non-ASCII characters were not even in the realm of
possibility.
In order to extend the notion of host name to non-ASCII strings, we
needed to keep the existing prohibitions on ASCII characters in host
names (otherwise it wouldn't be a proper extension), but the rules
for non-ASCII characters were up to the working group to define. The
consensus was to allow all non-ASCII Unicode graphic characters (perhaps
because the group could never have reached agreement on any particular
non-empty set of prohibited graphic characters).
> Considering that keyboard space is at a premium, why isn't code point
> 007E mapped to 223C in PUNYCODE?
Punycode accepts and supports all Unicode characters, including
non-graphic characters and all ASCII characters, including U+007E. It
does no mapping. All mapping and prohibition are done at higher layers.
I supposed you could instead ask why tilde isn't mapped to tilde
operator in Nameprep. The mapping step in Nameprep was designed to
avoid alternate representations of the same characters, and to erase
(Continue reading)