Nick Teint | 6 Jul 2010 12:14

Re: UTF-8

2010/6/26 John C Klensin <klensin <at> jck.com>:
> --On Saturday, June 26, 2010 11:54 +0200 Nick Teint
> <nick.teint <at> googlemail.com> wrote:
>> Sometimes, you _do_ want ACEs to leak into the UI:
>> 1. Your user does not know the script. Displaying an ugly ACE
>> string is better than displaying some
>> known-to-be-unrecognisable characters.*
>> 2. You don't have the fonts. Displaying an ugly ACE string is
>> better than displaying "???????".
>
> Both of these points have been made many times before.  Note
> that  "???????" and its equivalents (e.g., row of little boxes)
> are universal confusables -- they can be confused with, and
> match in user perception, _any_ string for which the user does
> not have fonts.  For the cautious user, that should be a strong
> warning.

These two arguments are not about phishing.

While certainly not the preferred form for human consumption, the ACE
form does enable some basic usage patterns such as noting the name
down on paper, reading it aloud over the phone, etc. These tasks are
simply not possible with an unfamiliar script or ersatz characters.

Gibberish out of known characters (a subset of the Latin script) is
still better than gibberish that consists of unknown or
indistinguishable characters.

>> 3. The string contains conspicuous confusables. Displaying an
>> ugly ACE string is better than displaying a
(Continue reading)

Shawn Steele | 6 Jul 2010 18:45
Picon
Favicon

RE: UTF-8

>> Sometimes, you _do_ want ACEs to leak into the UI:
>> 1. Your user does not know the script. Displaying an ugly ACE
>> string is better than displaying some
>> known-to-be-unrecognisable characters.*
>> 2. You don't have the fonts. Displaying an ugly ACE string is
>> better than displaying "???????".

I somewhat disagree.  To most users, xn--qwerty is no different that xn--querty.  In that case, EVERYTHING
is confusable, no different than your case #2.
(In case #1 the user can probably still notice at least a few differences).

> While certainly not the preferred form for human consumption, the ACE
> form does enable some basic usage patterns such as noting the name
> down on paper, reading it aloud over the phone, etc. These tasks are
> simply not possible with an unfamiliar script or ersatz characters.

Possibly true, but is that at all helpful?  If I can't read a script for a web site's domain name, then very
likely that web site is unusable to me, so it matters very little if I can transcribe it.  If your web site is
useful to me, you probably have a CNAME, DNAME, or something else with an ASCII address or other script I can
read.  Even if I'm collecting data for a 3rd party users, like maybe an index of all the breweries I can find,
then I'd likely still want to be able to publish the URLs in a script targeted at the end users that can use the
web site.

> Gibberish out of known characters (a subset of the Latin script) is
> still better than gibberish that consists of unknown or
> indistinguishable characters.

Just barely.

> I doubt it be better. Big warnings have problems of their own,
(Continue reading)

Nicolas Williams | 6 Jul 2010 18:51
Picon
Favicon

Re: UTF-8

On Tue, Jul 06, 2010 at 12:14:17PM +0200, Nick Teint wrote:
> 2010/6/26 John C Klensin <klensin <at> jck.com>:
> 
> These two arguments are not about phishing.
> 
> While certainly not the preferred form for human consumption, the ACE
> form does enable some basic usage patterns such as noting the name
> down on paper, reading it aloud over the phone, etc. These tasks are
> simply not possible with an unfamiliar script or ersatz characters.
> 
> Gibberish out of known characters (a subset of the Latin script) is
> still better than gibberish that consists of unknown or
> indistinguishable characters.

ACE certainly can look like gibberish, and ACE can be confusable.  For
example, xn--fo-6ja, xn--fo-3ja, xn--fo-oja and so on -- pretty close,
but not the same.

In other words, I'm not sure we can win here.

As for '?' and the such: if users treat them as wildcards, that is bad,
but if users treat them as an indication that something is broken, then
that's much better.  It'd be nice to know which way most users are
likely to respond.  My intuition says that if you display garbage that
can't even correctly by cut-n-pasted then we're reasonably safe.  But
that's just intuition, and it only applies when you either can't
represent a label in the user's locale or lack the fonts...

Hmmm, maybe apps and even systems could have a setting where a user can
say that any domainname labels using scripts that the user doesn't
(Continue reading)

=JeffH | 18 Jul 2010 17:07

-idnabis-defs-13: comment on figure 1

Hi,

the figures in -idnabis-defs-13 are very useful, thanks for including them.

Though, I found the labeling of the sets in figure 1 a tad confusing. If I'm 
interpreting the figure correctly, then this is the way I would label the sets...

       __________________________________________________________________
       |                              ASCII-LABEL                       |
       |    _________________________________________________________   |
       |    |            LDH-LABEL (1) (4)                          |   |
       |    |  ___________________________________                  |   |
       |    |  |IDN Reserved LDH Labels          |                  |   |
       |    |  | ("??--") or R-LDH LABELS        | _______________  |   |
       |    |  | _______________________________ | |NON-RESERVED |  |   |
       |    |  | |       XN LABELS             | | | LDH LABELS  |  |   |
       |    |  | | _____________   ___________ | | | (NR-LDH     |  |   |
       |    |  | | | A-labels  |   | Fake (3) || | |   LABELS)   |  |   |
       |    |  | | | "xn--"(2) |   | A-labels || | |_____________|  |   |
       |    |  | | |___________|   |__________|| |                  |   |
       |    |  | |_____________________________| |                  |   |
       |    |  |_________________________________|                  |   |
       |    |_______________________________________________________|   |
       |       __________________________________                       |
       |       |            NON-LDH-LABEL       |                       |
       |       |      ______________________    |                       |
       |       |      | Underscore labels  |    |                       |
       |       |      |  e.g. _tcp         |    |                       |
       |       |      |____________________|    |                       |
       |       |      | Labels with leading|    |                       |
(Continue reading)

=JeffH | 19 Jul 2010 02:13

Re: -idnabis-defs-13: comment on figure 1

when I sent the comment i didn't realize you all are sitting in AUTH48 with 
this passel of specs.

nevermind, it's a just a nit.

sorry for any trouble.

=JeffH
=JeffH | 19 Jul 2010 02:42

non-IDNA LDH Label ?

Is it correct to define the notion of "non-IDNA LDH Labels" as the union of 
"NR-LDH labels" and "Fake A-labels" ?

(figure 1 from -idnabis-defs-13 included below for convenience)

In other words, do "Fake A-labels" exist in the wild (whether or not their 
creation was purposeful or inadvertant) ?

thanks,

=JeffH

        __________________________________________________________________
        |                              ASCII-LABEL                       |
        |    _________________________________________________________   |
        |    |            LDH-LABEL (1) (4)                          |   |
        |    |  ___________________________________                  |   |
        |    |  |IDN Reserved LDH Labels          |                  |   |
        |    |  | ("??--") or R-LDH LABELS        | _______________  |   |
        |    |  | _______________________________ | |NON-RESERVED |  |   |
        |    |  | |       XN LABELS             | | | LDH LABELS  |  |   |
        |    |  | | _____________   ___________ | | | (NR-LDH     |  |   |
        |    |  | | | A-labels  |   | Fake (3) || | |   LABELS)   |  |   |
        |    |  | | | "xn--"(2) |   | A-labels || | |_____________|  |   |
        |    |  | | |___________|   |__________|| |                  |   |
        |    |  | |_____________________________| |                  |   |
        |    |  |_________________________________|                  |   |
        |    |_______________________________________________________|   |
        |       __________________________________                       |
        |       |            NON-LDH-LABEL       |                       |
(Continue reading)

Nicolas Williams | 19 Jul 2010 18:16
Picon
Favicon

Re: non-IDNA LDH Label ?

On Sun, Jul 18, 2010 at 05:42:12PM -0700, =JeffH wrote:
> Is it correct to define the notion of "non-IDNA LDH Labels" as the
> union of "NR-LDH labels" and "Fake A-labels" ?

I believe the answer is "no" because one should not register fake
A-labels.  However, this is a "soft" no because there's no strong
requirement to not allow registration of fake A-labels.

In the long run I'd expect that there will be apps and/or libraries that
will produce warnings, and even errors when fed fake A-labels, even if
that's not today either required nor recommented, and even if it were
explicitly not allowed.  Which means: we're all best off not allowing
the registration of fake A-labels in any DNS zones, and renaming any
where they exist.

> In other words, do "Fake A-labels" exist in the wild (whether or not
> their creation was purposeful or inadvertant) ?

It's possible.  It's always been accepted that the IDNA prefix (xn--)
could collide with existing labels.  It's just not very likely.

Nico
--

-- 
Shawn Steele | 19 Jul 2010 18:20
Picon
Favicon

RE: non-IDNA LDH Label ?

I'd expect some "fake" a-labels due to the disparity of allowed code points between 2003 and 2008. 
Presumably some labels will be fake by one or the other.

-Shawn

 
http://blogs.msdn.com/shawnste



________________________________________
From: idna-update-bounces <at> alvestrand.no [idna-update-bounces <at> alvestrand.no] on behalf of Nicolas
Williams [Nicolas.Williams <at> oracle.com]
Sent: Monday, July 19, 2010 9:16 AM
To: =JeffH
Cc: idna-update <at> alvestrand.no
Subject: Re: non-IDNA LDH Label ?

On Sun, Jul 18, 2010 at 05:42:12PM -0700, =JeffH wrote:
> Is it correct to define the notion of "non-IDNA LDH Labels" as the
> union of "NR-LDH labels" and "Fake A-labels" ?

I believe the answer is "no" because one should not register fake
A-labels.  However, this is a "soft" no because there's no strong
requirement to not allow registration of fake A-labels.

In the long run I'd expect that there will be apps and/or libraries that
will produce warnings, and even errors when fed fake A-labels, even if
that's not today either required nor recommented, and even if it were
explicitly not allowed.  Which means: we're all best off not allowing
the registration of fake A-labels in any DNS zones, and renaming any
(Continue reading)

John C Klensin | 19 Jul 2010 18:41

RE: non-IDNA LDH Label ?


--On Monday, July 19, 2010 16:20 +0000 Shawn Steele
<Shawn.Steele <at> microsoft.com> wrote:

> I'd expect some "fake" a-labels due to the disparity of
> allowed code points between 2003 and 2008.  Presumably some
> labels will be fake by one or the other.

As a set of concrete examples, any label containing a symbol or
punctuation character (most of which were valid under IDNA2003)
becomes a "Fake A-label" with IDNA2008 because it cannot be
obtained by applying the Punycode algorithm to a valid U-label.
Although it was against ICANN (and other) advice, several zones
have permitted such things to be registered.   

The list would get much longer if zones registered things that
were dubious, but not invalid for lookup, under IDNA2003.  An
example of that would be a label that included characters new to
Unicode 4.0 or 5.0 for which someone has extrapolated Nameprep
handling.  I suspect that has been done but, unlike the symbol
case, I can't immediately point to specific examples.

      john

Gmane