tedd | 9 Aug 16:14
Favicon
Gravatar

Tilde

To whomever:

A simple question:

For reasons beyond me, in IDNS the Tilde (code point 007E) is 
prohibited, but the Tilde Operator (code point 223C) is not.

Considering that keyboard space is at a premium, why isn't code point 
007E mapped to 223C in PUNYCODE?

Many thanks in advance for any enlightenment.

tedd
--

-- 
--------------------------------------------------------------------------------
http://sperling.com/

Paul Hoffman / IMC | 9 Aug 17:01
Picon
Gravatar

Re: Tilde

At 10:14 AM -0400 8/9/04, tedd wrote:
>For reasons beyond me, in IDNS the Tilde (code point 007E) is 
>prohibited, but the Tilde Operator (code point 223C) is not.

All ASCII characters that were not allowed by STD3 were continued to 
be not allowed. This was due to the fact that many of them were being 
used as special characters in other protocol elements such as URIs.

--Paul Hoffman, Director
--Internet Mail Consortium

tedd | 9 Aug 22:29
Favicon
Gravatar

Re: Tilde

>At 10:14 AM -0400 8/9/04, tedd wrote:
>>For reasons beyond me, in IDNS the Tilde (code point 007E) is 
>>prohibited, but the Tilde Operator (code point 223C) is not.
>
>All ASCII characters that were not allowed by STD3 were continued to 
>be not allowed. This was due to the fact that many of them were 
>being used as special characters in other protocol elements such as 
>URIs.
>
>--Paul Hoffman, Director
>--Internet Mail Consortium

Paul:

Not that it makes any functional difference, but is the Tilde (code 
point 007E) actually used in other protocol elements, or is it just a 
member of a range (i.e., all ACSII) that is reserved for possible 
use, or do you know?

Thanks for your comment and reply.

tedd
--

-- 
--------------------------------------------------------------------------------
http://sperling.com/

Adam M. Costello | 10 Aug 09:09

Re: Tilde

tedd <tedd <at> sperling.com> wrote:

> in IDNA the Tilde (code point 007E) is prohibited, but the Tilde
> Operator (code point 223C) is not.

IDNA inherits the prohibition of U+007E from RFC-1123 (STD-3), which by
reference to RFC-952 defined host names as ASCII strings containing only
A-Z, a-z, 0-9, hyphen-minus, and dot.  Therefore some ASCII characters
were explicitly allowed, all other ASCII characters were explicitly
forbidden, and non-ASCII characters were not even in the realm of
possibility.

In order to extend the notion of host name to non-ASCII strings, we
needed to keep the existing prohibitions on ASCII characters in host
names (otherwise it wouldn't be a proper extension), but the rules
for non-ASCII characters were up to the working group to define.  The
consensus was to allow all non-ASCII Unicode graphic characters (perhaps
because the group could never have reached agreement on any particular
non-empty set of prohibited graphic characters).

> Considering that keyboard space is at a premium, why isn't code point
> 007E mapped to 223C in PUNYCODE?

Punycode accepts and supports all Unicode characters, including
non-graphic characters and all ASCII characters, including U+007E.  It
does no mapping.  All mapping and prohibition are done at higher layers.

I supposed you could instead ask why tilde isn't mapped to tilde
operator in Nameprep.  The mapping step in Nameprep was designed to
avoid alternate representations of the same characters, and to erase
(Continue reading)

tedd | 10 Aug 16:29
Favicon
Gravatar

Re: Tilde

AMC wrote:

>I supposed you could instead ask why tilde isn't mapped to tilde
>operator in Nameprep.

Yes, that was my question.

>The mapping step in Nameprep was designed to
>avoid alternate representations of the same characters, and to erase
>case distinctions, not to save typing.

It's not a question of "saving typing" -- it's a question of keyboard 
real estate for the end-user.

If, as you say, the mapping step is to avoid alternate 
representations and to erase case distinctions, then it has failed 
because it doesn't produced anything. Instead, the process simply 
prohibits the character, and any replacement, which is not mentioned 
in the aforementioned design.

Now, if the tilde character is currently used in some fashion by 
behind the screens Internet techs, as Paul suggested, then I can 
understand why the tilde character is prohibited.

However, if the tilde character is not being used and if you want to 
take the position that "keyboard real estate" is of no concern to 
you, then that's your decision -- but please realize that you do so 
at the expense of the end-user and you do so without any real reason.

Please tell me why mapping the tilde to the tilde operator wouldn't work.
(Continue reading)

Martin v. Löwis | 10 Aug 22:50
Picon
Gravatar

Re: Tilde

tedd wrote:
> If, as you say, the mapping step is to avoid alternate representations 
> and to erase case distinctions, then it has failed because it doesn't 
> produced anything. 

Why do you say that? The mapping clearly avoids alternate
representations and erases case distinctions. For example,
"www.LÖWIS.de" is treated as if it was "www.löwis.de".

So I fail to see that the mapping step has failed. It is very
successful.

> Now, if the tilde character is currently used in some fashion by behind 
> the screens Internet techs, as Paul suggested, then I can understand why 
> the tilde character is prohibited.

I'd like to point out that it was always the intention, and is the
existing practice, that the IDNA RFCs are augmented by policies of the
registrars, which further constrain the set of characters that you can
use within a particular zone.

To my knowledge, none of the TLD registrars currently allows
registration of names which contain TILDE OPERATOR. So for
one-below-toplevel, the entire issue is irrelevant.

> Please tell me why mapping the tilde to the tilde operator wouldn't work.

Because it would not matter. Consider a domain label "foo~", and assume
we are applying the "ToAscii" function, trying to generate the IDNA
version of the label. Please follow me though chapter 4 of RFC 3490 now.
(Continue reading)

tedd | 11 Aug 02:42
Favicon
Gravatar

Re: Tilde

Martin:

>If, as you say, the mapping step is to avoid 
>alternate representations and to erase case 
>distinctions, then it has failed because it 
>doesn't produced anything.
>
>Why do you say that?

I say that with respect to the Tilde code point 
only. Nameprep, in prohibiting the code point, 
has neither avoided an alternate representation 
nor erased a case distinction -- it just said 
"No".

>  The mapping clearly avoids alternate
>representations and erases case distinctions. For example,
>"www.LÖWIS.de" is treated as if it was "www.löwis.de".

I did not say that it didn't. I only said that it 
failed to do anything with respect to the Tilde 
except prohibit it -- and that statement is still 
true. For sake of argument, what's the alternate 
representation or case distinction problem 
presented by the Tilde?

>So I fail to see that the mapping step has failed. It is very successful.

Mapping has proved to be useful for most code 
points -- I'm not claiming otherwise (other than 
(Continue reading)

Adam M. Costello | 11 Aug 03:22

Re: Tilde

tedd <tedd <at> sperling.com> wrote:

> Please tell me why mapping the tilde to the tilde operator wouldn't
> work.

It wouldn't be backward compatible.  A primary design goal of IDNA was
that it should not alter the way ASCII domain names are treated.  When
an ASCII domain name contains a tilde, existing software might reject
the name because it expects a host name and RFC-1123 prohibits tilde in
host names, or it might pass the tilde straight through, either because
it is not taking responsibility for enforcing RFC-1123 or because it is
expecting a non-host-name domain name that permits tilde (DNS allows
all ASCII characters).  But in any case, existing software does not map
tilde to something else.

IDNA supports both behaviors.  When UseSTD3ASCIIRules is set, it
prohibits non-LDH ASCII characters, and when UseSTD3ASCIIRules is unset,
it permits all ASCII characters.

AMC

P.S.  For examples of non-host-name domain names, see RFC-2782
(SRV records) and RFC-2317 (PTR records for classless in-addr.arpa
delegation).

Adam M. Costello | 11 Aug 03:50

Re: Tilde

tedd <tedd <at> sperling.com> wrote:

> Nameprep, in prohibiting the code point, has neither avoided an
> alternate representation nor erased a case distinction -- it just said
> "No".

Nameprep does not prohibit tilde or any ASCII character.  ASCII
characters are prohibited in ToASCII step 3, only if UseSTD3ASCIIRules
is set.

What I said about Nameprep was "The mapping step in Nameprep was
designed to avoid alternate representations of the same characters, and
to erase case distinctions, not to save typing."  The other steps of
Nameprep are there for other reasons.  For example, the prohibition step
of Nameprep is there to avoid names containing characters that you can't
see.  I focused on the mapping step because you were proposing to add a
mapping.

I don't understand the distinction between "save keyboard real estate"
and "save typing".  Tilde operator is allowed by IDNA, it's just
difficult to type (probably involves typing the Unicode number or
selecting from a menu).  Adding a mapping from tilde to tilde operator
would make it easier to type because it would allow the use of existing
keyboard real estate.  But it would not be compatible with the way ASCII
names have always been treated.

> it appears arbitrary to me until someone provides me with a reason
> otherwise.

The original restriction on host name syntax *was* arbitrary, but it was
(Continue reading)

Martin v. Löwis | 11 Aug 06:59
Picon
Gravatar

Re: Tilde

tedd wrote:
> You are misinformed -- domains names, which include the TILDE OPERATOR, 
> can be registered in both ".com" and ".net" TLD's and most likely other 
> registrars as well.

This is not true. Please take a look at

http://www.verisign.com/products-services/naming-and-directory-services/naming-services/internationalized-domain-names/page_001382.html

This is the list of scripts which are supported in the .com and .net
zones. Characters that don't belong to one of these scripts, or
labels that draw characters from multiple of these scripts, cannot
be registered. As TILDE OPERATOR is in none of the listed scripts,
no label containing it can be registered with Verisign when IDNA
leaves the testbed status in that zone.

For another example, please refer to DeNICs policies for the .de
zone:

http://www.denic.de/de/richtlinien.html

In the section "Anlage", they list all characters supported. So you
can only register labels with a few non-ASCII Latin characters, but
no other scripts - let alone TILDE OPERATOR.

> I don't see step #2.
> 
> If you're argument is "It won't work, because it doesn't", then I can't 
> argue with that circular logic -- other than to say, I don't see any 
> "valid" reason for its foundation.
(Continue reading)


Gmane