JFC (Jefsey) Morfin | 22 Feb 22:59

Re: Re: nameprep, IDN spoofing and the registries

At 21:30 22/02/2005, Erik van der Poel wrote:
>JFC (Jefsey) Morfin wrote:
>>2. could someone list all the Unicode codes to blacklist that way?
>
>It will take a while to create a relatively complete table of homographs, 
>but here are a couple of starting points:
>
>https://bugzilla.mozilla.org/attachment.cgi?id=174139
>https://bugzilla.mozilla.org/show_bug.cgi?id=279099#c192
>
>Also, I've been thinking of writing a program that would look at the 
>"cmap" of every font on a Windows box and check to see which pairs of 
>Unicodes have the same glyph index (which leads to identical display).

This would help.
But a ccTLD managing IDNs in computer environment and wanting to avoid any 
mistake, manages names in most of the case under the ACE format. In ASCII. 
I am not sure about existing dispute cases, but we consider that two IDNs 
are different if they have different in ACE format?
Anyway, I answer you below.

>>3. could someone point a Perl code to use to enter a IDN and to get it 
>>properly punycoded, which could use such a list.
>
>I don't know about Perl, but I believe Python has IDN.

Thank you, but as I said, I have no resource on this. So what would be 
great wold be that this list would actually help preparing a Draft - may be 
someone of more technical skill and competence would be interested in 
leading it? So we can start working on something real. I listed my pratical 
(Continue reading)

Hans Aberg | 22 Feb 19:44
Picon
Picon

Re: [idn] IDN spoofing

At 13:33 +1100 2005/02/22, George W Gerrity wrote:
>it doesn't make sense for these rules to be part of a standard on how to extend
>Domain names to use scripts other than Latin: they are much better handled as
>(algorithmic where possible) regulations specified by the authority for a given
>TLD, or set of TLDs, in the  case of the universal TLDs.

It seems simplest to merely require the names to be 8-bit bytes, UTF-8
encoded.

>At the TLD itself, one can allow a limited, but finite number of character
>strings to be equivalent, including the rule that script mixtures are
>inadmissable, but maybe case folding will be allowed.

Then if DNS name lookup software is not updated, only ASCII cases will be
identified, as before, but no other casings, not even for Latin script
letters with diacritical marks. (In retrospect, when facing the full Unicode
set, it might have been better to identify ASCII letter cases.)

>...it doesn't make sense for these rules to be part of a standard on how to
>extend Domain names to use scripts other than Latin: they are much better
>handled as (algorithmic where possible) regulations specified by the authority
>for a given TLD, or set of TLDs, in the  case of the universal TLDs.

Then all confusable problems will be handled at the registry.

>By using this approach, and starting off with a set of rules that disallow most
>forms of script mixes, then where appeals to common sense and the wishes of a
>reasonable number of potential clients suggest a loosening of the rules, this
>can be done with little disruption to the existing state of affairs.

(Continue reading)

Erik van der Poel | 22 Feb 20:14

nameprep2 and the slash homograph issue

All,

In a way, it was pretty sensible for the IETF to decide to avoid 
subsetting Unicode too much so that national registries could make those 
decisions on their own. After all, those countries know more about their 
own characters than the IETF does, and it seems more fair to let them 
make such a decision.

One could see this as an instance of "Push the problem downstream, so 
that we cannot be blamed for being overly restrictive up here".

Now I'm wondering if we could make a similar argument in the slash 
homograph case. If nameprep2 bans the slash homograph, then there is no 
way for any community to use it in a domain name, even if that domain 
name appears in a context where slash means nothing. Consider the email 
case. There are no slashes in the vicinity of a domain name in an email 
app. The URI case is, of course, different. Here you often see slashes, 
so a slash homograph could easily spoof someone.

So, could nameprep2's position be, "Push the slash homograph problem 
downstream, to the app, so that we cannot be blamed for being overly 
restrictive up here"?

Or is the slash fundamentally different from national characters? And if 
so, who are we to make that statement? Shouldn't the countries be 
deciding that? (Not that TLDs can restrict names in 3LDs and up, only 
the apps can address those.)

Another argument against banning the slash homograph is that any new 
banning would require a new ACE prefix, which is a lot of work, and, as 
(Continue reading)

Adam M. Costello | 22 Feb 00:44

Re: upstream and downstream

I wrote:

> IDNA's special treatment of "." is insufficient to prevent homograph
> attacks against ".".
>
> For example, someone could register a name that looks like
> "foo.bar.com", where the first dot was really U+0702.  This attack
> would be equally effective no matter what larger structure (URI, email
> address, etc) the domain name appeared in.

On second thought, the "." homograph attack is less severe than the "/"
homograph attack.  The former only allows the attacker to spoof names
in the same domain that the attacker is registered in; therefore new
registrants can protect themselves from this attack by registering in
a domain with reasonable admission policies.  The "/" attack, however,
allows the attacker to spoof names in *any* domain, so there's nowhere
registrants can go and be safe from it.

The more severe attack can happen only when domain names are embedded
in larger structures, so a case could be made that each of these larger
structures should create its own recommendations for dealing with spoofs
of its delimiters.

On the other hand, non-technical users might be misled by all sorts of
punctuation, even symbols that don't resemble the true delimiters.

AMC

William Tan | 21 Feb 12:36
Favicon
Gravatar

Re: IDN spoofing

George W Gerrity wrote:

> For the second-level (or third-level where the top is a country code) 
> domain tag, it should be the legal responsibility of the name 
> authorities for the domain above to ensure that spoofed names cannot 
> be registered (or if registered, all belong to one owner). In the 
> Western world, if that is not already the case, then I'm sure that the 
> first time a spoof of, say Coca-Cola (or Pepsi — let's be even-handed) 
> is registered, then we can be certain that afterwards, the issuing 
> authority will never do it again.

While it is true that TLDs are responsible for preventing the 
registration of spoofs, commercial TLDs that have automated registration 
systems never perform that check. Does registering coca-cola.com prevent 
someone else from getting coca-co1a.com?

> In the case of countries whose law systems are still a bit wild and 
> wooly (The former Soviet Union?), then I suspect that for the time 
> being it will remain ‘Caveat Emptor’. In either case, a domain name 
> holder should be able to license all spoofs for free, in order to 
> limit its exposure to spoofing, whether or not there is adequate legal 
> recourse.

If the TLD operator is careful, there is no need to license spoofs to 
protect one's domain from being spoofed. On the other hand, if the TLD 
does not even perform that check (such as .com), then it is unlikely 
that you get to license all spoofs for free anyway - you have to pay for 
each and every permutation of it.

>
(Continue reading)

Soobok Lee | 21 Feb 03:31

another homograph attach: BIDI char

javascript:void(window.open(unescape("http://www.microsoft.com%u202e.uni.cc/%u1160%u1160%u1160"),"_self"))

If some IDNA implementation does not handle BIDI filtering/verifying 
well, you can see similar results as "slash-space combination".
%u202e is a bidi directional formatter (RLO, right-to-left) and should 
not be filtered char-by-char basis, because the char
plays a crucual role in arabic/hebrew writings. You can refer to 
stringprep/nameprep document for details of BIDI checking part.

Good implementations of IDNA would not suffer from the above attack. 
But, current MSIE does not support IDNA, while it
still allow arbitrary utf-8 chars. So, current MSIE is exploitable for 
malicious phinshing attempts. I don't know whether this works
for filefox/mozilla.

The previous example,
javascript:void(window.open(unescape("http://www.microsoft.com%u2044%u1160%u1160%u1160.uni.cc/"),"_self"))
You can replace %u2044 with %u2205,%u3033 etc. I am now searching more 
slash/space like chars. I will post them here.

Soobok

Erik van der Poel | 20 Feb 18:25

[Fwd: Re: IDN spoofing]

George gave me permission to forward his email to this list.

Erik
Picon
From: George W Gerrity <g.gerrity <at> gwg-associates.com.au>
Subject: Re: [idn] IDN spoofing
Date: 2005-02-20 01:42:46 GMT
It has been stated quite clearly that the problem of spoofing TLD tags 
should not exist because the authority for a given TLD can (and should) 
accept only one coding for a TLD tag. However, where the TLD tag is a 
country code, second-level TLD tags can (and should) also be unique 
(eg, .co.uk, .net.au). I don't think that has been stated clearly 
before.

For the second-level (or third-level where the top is a country code) 
domain tag, it should be the legal responsibility of the name 
authorities for the domain above to ensure that spoofed names cannot be 
registered (or if registered, all belong to one owner). In the Western 
world, if that is not already the case, then I'm sure that the first 
time a spoof of, say Coca-Cola (or Pepsi — let's be even-handed) is 
registered, then we can be certain that afterwards, the issuing 
authority will never do it again.

In the case of countries whose law systems are still a bit wild and 
(Continue reading)

Soobok Lee | 20 Feb 06:27

space-like unicode char

You can paste this html/javascript codelet  to an html file in your 
webserver and see in your  MSIE brower.
You will see "www.microsoft.com" isolated in the addressbar from the 
"mozilla.org" domain suffix.
Fortunately, you will see blank space (no phishing page) if  you have 
recent IE patch.
This won't work in firefox 1.x which strips off  those special chars  
for unknow reasons before sending to
the address bar.

<script>
window.open(unescape("http://www.microsoft.com%u1160%u1160%u1160%u1160%u1160%u1160.mozilla.org/"),"_blank");
</script>

U+1160 is  a space-like char and even stringprep/nameprep does not 
filter it out  because
the char  is not for punctuational purpose.
U+1160 is just one example, and i guess there may be many alternatives 
that can be
used   as blank char alternatives.

U+1160 in the above example  is placed  in the 3rd level domain name label,
 over which  .org registry cannot  impose any regulations.

Soobok Lee

Adam M. Costello | 19 Feb 23:17

quick & dirty (but not too dirty) homograph defense

Here's an idea for a quick-and-dirty enhancement to existing
applications:  Rather than disable IDNA entirely (which is quick but
too dirty), or flag all IDNs (almost as quick but still too dirty),
just flag all IDNs in .com and .net.  This would be significantly less
damaging to IDN deployment (which could proceed unhindered in the other
TLDs, particularly the ccTLDs), but is still extremely simple and could
be rolled out immediately while more sophisticated heuristics are
developed.

AMC

Doug Ewell | 19 Feb 22:31
Picon

Re: [idn] IDN spoofing

Philippe Verdy <vpi92 at yahoo dot fr> wrote:

> The bad thing about your argument is that you are trying to mix
> uppercase and uppercase letters. But for IDN, only lowercase letters
> are really made distinct and encodable, as uppercase letters are case
> folded to lowercase. So let's just concentrate on the set of letters
> that really are distinct in lowercase because this is the form where
> DNS servers will make distinctions in ASCII letters.

Fine, get rid of all the examples that involve uppercase.  The problem
is still there:

iι
pρ
uυ
vν
wω
yγ

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Erik van der Poel | 19 Feb 15:01

RRP and language tags

[Removing unicode <at> unicode.org and adding dam <at> icann.org]

Yesterday, it seemed like some people agreed with me that, in the 
general case, the language tag being passed in the RRP protocol is not 
just naming a character table to filter against, but, more generally, a 
set of rules that the registry applies to determine registerability.

Ignoring for now whether we can decide that there is consensus on this 
issue, I'd like to explore the issues some more, and ask a few 
questions, if that's OK.

First, the language tag is just a name, so in some sense it does not 
really matter *what* it is naming, whether that be a human language, a 
registry's table of characters, or a registry's set of rules. But if we 
can show that ISO 639 is not an adequate namespace, then we may wish to 
consider a new namespace.

One reason that 639 might not be adequate is that some languages are 
used in more than one part of the world, and that usage may differ 
enough from one community to another that we want to assign two separate 
codes instead of a single 639 code. A very common way to go about this 
is to simply add a country code (ISO 3166).

However, currently it seems like the trend is to have a number of sets 
of rules, one at each NIC. The .jp registry has a table and some rules, 
the .de registry has a table, and so on. Some organizations appear to be 
working together to unify their methods. China and Taiwan come to mind. 
This trend might lead one to think that the RRP tag should not be a 
language, but simply the name of a NIC or TLD or pair of TLDs.

(Continue reading)


Gmane