JFC (Jefsey) Morfin | 22 Feb 22:59

Re: Re: nameprep, IDN spoofing and the registries

At 21:30 22/02/2005, Erik van der Poel wrote:
>JFC (Jefsey) Morfin wrote:
>>2. could someone list all the Unicode codes to blacklist that way?
>
>It will take a while to create a relatively complete table of homographs, 
>but here are a couple of starting points:
>
>https://bugzilla.mozilla.org/attachment.cgi?id=174139
>https://bugzilla.mozilla.org/show_bug.cgi?id=279099#c192
>
>Also, I've been thinking of writing a program that would look at the 
>"cmap" of every font on a Windows box and check to see which pairs of 
>Unicodes have the same glyph index (which leads to identical display).

This would help.
But a ccTLD managing IDNs in computer environment and wanting to avoid any 
mistake, manages names in most of the case under the ACE format. In ASCII. 
I am not sure about existing dispute cases, but we consider that two IDNs 
are different if they have different in ACE format?
Anyway, I answer you below.

>>3. could someone point a Perl code to use to enter a IDN and to get it 
>>properly punycoded, which could use such a list.
>
>I don't know about Perl, but I believe Python has IDN.

Thank you, but as I said, I have no resource on this. So what would be 
great wold be that this list would actually help preparing a Draft - may be 
someone of more technical skill and competence would be interested in 
leading it? So we can start working on something real. I listed my pratical 
needs. I suppose others would have others to add.

Stephane is key person in supporting many ccTLDs in real life. I am sure he 
will be of great help. So would Gervase's with the ability to test in 
Firefox environment.

I have reported the problem and my request on the ccTLD list. I asked about 
the additional requirements they might have. I will inform this list of any 
additional demands they may have IRT a practical solution for them. I also 
documented that my concern was not about the phishing issue but about the 
ccTLD owns operations. This leaves the legal aside and may be more 
motivating since their own Registry could be the first victim of a 
confusion (in Whois display, for example).

jfc

Hans Aberg | 22 Feb 19:44
Picon
Picon

Re: [idn] IDN spoofing

At 13:33 +1100 2005/02/22, George W Gerrity wrote:
>it doesn't make sense for these rules to be part of a standard on how to extend
>Domain names to use scripts other than Latin: they are much better handled as
>(algorithmic where possible) regulations specified by the authority for a given
>TLD, or set of TLDs, in the  case of the universal TLDs.

It seems simplest to merely require the names to be 8-bit bytes, UTF-8
encoded.

>At the TLD itself, one can allow a limited, but finite number of character
>strings to be equivalent, including the rule that script mixtures are
>inadmissable, but maybe case folding will be allowed.

Then if DNS name lookup software is not updated, only ASCII cases will be
identified, as before, but no other casings, not even for Latin script
letters with diacritical marks. (In retrospect, when facing the full Unicode
set, it might have been better to identify ASCII letter cases.)

>...it doesn't make sense for these rules to be part of a standard on how to
>extend Domain names to use scripts other than Latin: they are much better
>handled as (algorithmic where possible) regulations specified by the authority
>for a given TLD, or set of TLDs, in the  case of the universal TLDs.

Then all confusable problems will be handled at the registry.

>By using this approach, and starting off with a set of rules that disallow most
>forms of script mixes, then where appeals to common sense and the wishes of a
>reasonable number of potential clients suggest a loosening of the rules, this
>can be done with little disruption to the existing state of affairs.

If one uses the method I indicated to define equivalences, then script mixes
can be allowed. If cases are not identified in scripts, then these
equivalence will be between characters of different scripts. Thus, they
should not cut down on manuscript names. (I want to avoid throwing in
general equivalences such as that of casings, as different equivalences can
combine to generate unwanted equivalence chains.)

>The problems for universal TLDs (<.com>,  <.net>) are far more complex, because
>they are required to accept all language scripts.

If all language scripts are already decided admissable on these levels,
these will be the battleground for confusables. So there might not be a
point in restricting other levels. One should also note that the country
codes are not language or codes indicating scripts, and most nations are
multilingual today. It might be constroversal to restrict country codes to
just certain scripts.

>c) At this point, the <.com.ru> registrar will need to exercise some common
>sense. For instance, it seems unreasonable that this domain should accept codes
>outside the Latin and Cyrillic code blocks, and if they do, then  mixes should
>be strongly discouraged. Certainly, the use of, say, Hebrew vowel pointing with
>Latin Codes, while perhaps acceptable in Israel TLD, should be unacceptable in
>the Russia TLD. In fact, as a general rule, mixes of diacritics from one code
>block with code points from another, should never be allowed.

So this assumes that there are no Hebrews in Russia. This restriction might
be interpreted politically as those speaking Hebrew in Russia should go to
Israel, at least as far as defining their Internet domain names goes. It
might be wise to avoid this kind of political controversy. :-)

I think one can define a lot of homograph equivalences, which is then used
only for an automated first check when attempting to register a new name.
The cases that fail to register automatically will become reviewed by a
human. One will then discover if one has defined too many equivalences. It
might be wise to set up a report system, where the public can report
confusable names. Then a committee will have to review those cases, and
decide what to do about them.

(I also like the idea that sites that use a non-ASCII name must register a
parallel ASCII name, for international access: It might be difficult to make
proper control of sites if one has to be an expert on International scripts
in order to access them. One easy way for a criminal to "hide away" a site
might otherwise to give it a strange name.)

  Hans Aberg

Erik van der Poel | 22 Feb 20:14

nameprep2 and the slash homograph issue

All,

In a way, it was pretty sensible for the IETF to decide to avoid 
subsetting Unicode too much so that national registries could make those 
decisions on their own. After all, those countries know more about their 
own characters than the IETF does, and it seems more fair to let them 
make such a decision.

One could see this as an instance of "Push the problem downstream, so 
that we cannot be blamed for being overly restrictive up here".

Now I'm wondering if we could make a similar argument in the slash 
homograph case. If nameprep2 bans the slash homograph, then there is no 
way for any community to use it in a domain name, even if that domain 
name appears in a context where slash means nothing. Consider the email 
case. There are no slashes in the vicinity of a domain name in an email 
app. The URI case is, of course, different. Here you often see slashes, 
so a slash homograph could easily spoof someone.

So, could nameprep2's position be, "Push the slash homograph problem 
downstream, to the app, so that we cannot be blamed for being overly 
restrictive up here"?

Or is the slash fundamentally different from national characters? And if 
so, who are we to make that statement? Shouldn't the countries be 
deciding that? (Not that TLDs can restrict names in 3LDs and up, only 
the apps can address those.)

Another argument against banning the slash homograph is that any new 
banning would require a new ACE prefix, which is a lot of work, and, as 
John said, there should be a high threshold for any demonstration that 
tries to show that a new prefix is necessary.

Instead of banning the slash homograph, nameprep2 could simply warn 
implementors of the spoofing problem, giving some vague advice (without 
overly restricting the apps).

Erik

Adam M. Costello | 22 Feb 00:44

Re: upstream and downstream


I wrote: > IDNA's special treatment of "." is insufficient to prevent homograph > attacks against ".". > > For example, someone could register a name that looks like > "foo.bar.com", where the first dot was really U+0702. This attack > would be equally effective no matter what larger structure (URI, email > address, etc) the domain name appeared in.
On second thought, the "." homograph attack is less severe than the "/" homograph attack. The former only allows the attacker to spoof names in the same domain that the attacker is registered in; therefore new registrants can protect themselves from this attack by registering in a domain with reasonable admission policies. The "/" attack, however, allows the attacker to spoof names in *any* domain, so there's nowhere registrants can go and be safe from it. The more severe attack can happen only when domain names are embedded in larger structures, so a case could be made that each of these larger structures should create its own recommendations for dealing with spoofs of its delimiters. On the other hand, non-technical users might be misled by all sorts of punctuation, even symbols that don't resemble the true delimiters. AMC
William Tan | 21 Feb 12:36

Re: IDN spoofing

George W Gerrity wrote:

> For the second-level (or third-level where the top is a country code) 
> domain tag, it should be the legal responsibility of the name 
> authorities for the domain above to ensure that spoofed names cannot 
> be registered (or if registered, all belong to one owner). In the 
> Western world, if that is not already the case, then I'm sure that the 
> first time a spoof of, say Coca-Cola (or Pepsi — let's be even-handed) 
> is registered, then we can be certain that afterwards, the issuing 
> authority will never do it again.

While it is true that TLDs are responsible for preventing the 
registration of spoofs, commercial TLDs that have automated registration 
systems never perform that check. Does registering coca-cola.com prevent 
someone else from getting coca-co1a.com?

> In the case of countries whose law systems are still a bit wild and 
> wooly (The former Soviet Union?), then I suspect that for the time 
> being it will remain ‘Caveat Emptor’. In either case, a domain name 
> holder should be able to license all spoofs for free, in order to 
> limit its exposure to spoofing, whether or not there is adequate legal 
> recourse.

If the TLD operator is careful, there is no need to license spoofs to 
protect one's domain from being spoofed. On the other hand, if the TLD 
does not even perform that check (such as .com), then it is unlikely 
that you get to license all spoofs for free anyway - you have to pay for 
each and every permutation of it.

>
> The point I'm making is that while the authorities for .com.au or 
> .com.ru may do what they like, we can at least give them advice plus 
> some tables that will detect many, if not most, spoofs. In the case 
> where the authority allows (for whatever reason) a name with mixed 
> orthographies, then clearly the first to apply whose signature is not 
> a spoof for an (already well-established) trade-marked name or domain 
> name, should get the license, and all other applicants with a similar 
> name be refused. The name authority should be protected by the laws of 
> the countries in which it operates from being sued for refusing to 
> register confusable names.

This is a fairly interesting proposal, i.e. to use the bundling (see 
draft-klensin-reg-guidelines or rfc3743) to solve the homograph problem 
at the registry level, provided we can come up with a satisfactory table 
of lookalikes.

As an example, the word "coke" can be represented completely in Cyrillic 
homographs, so one can generate 16 combinations of ASCII and Cyrillic 
characters forming strings that look like "coke". When you register 
"coke.com", the other 16 variants are automatically tied to this domain 
(for free or for a fee). They can be either all activated (put into the 
zone file) or simply blocked from registration.

The good thing about this is that the lookalikes mapping table does not 
have to be set-in-stone at the protocol level, but individual registries 
may choose to implement whatever makes sense for them.

The problem with this is that the number of variants gets out of hand 
pretty quickly, and most registry systems aren't equipped to deal with 
bundles.

wil.

Soobok Lee | 21 Feb 03:31

another homograph attach: BIDI char

javascript:void(window.open(unescape("http://www.microsoft.com%u202e.uni.cc/%u1160%u1160%u1160"),"_self"))

If some IDNA implementation does not handle BIDI filtering/verifying 
well, you can see similar results as "slash-space combination".
%u202e is a bidi directional formatter (RLO, right-to-left) and should 
not be filtered char-by-char basis, because the char
plays a crucual role in arabic/hebrew writings. You can refer to 
stringprep/nameprep document for details of BIDI checking part.

Good implementations of IDNA would not suffer from the above attack. 
But, current MSIE does not support IDNA, while it
still allow arbitrary utf-8 chars. So, current MSIE is exploitable for 
malicious phinshing attempts. I don't know whether this works
for filefox/mozilla.

The previous example,
javascript:void(window.open(unescape("http://www.microsoft.com%u2044%u1160%u1160%u1160.uni.cc/"),"_self"))
You can replace %u2044 with %u2205,%u3033 etc. I am now searching more 
slash/space like chars. I will post them here.

Soobok

Erik van der Poel | 20 Feb 18:25

[Fwd: Re: IDN spoofing]

George gave me permission to forward his email to this list.

Erik
Picon
From: George W Gerrity <g.gerrity <at> gwg-associates.com.au>
Subject: Re: [idn] IDN spoofing
Date: 2005-02-20 01:42:46 GMT
It has been stated quite clearly that the problem of spoofing TLD tags 
should not exist because the authority for a given TLD can (and should) 
accept only one coding for a TLD tag. However, where the TLD tag is a 
country code, second-level TLD tags can (and should) also be unique 
(eg, .co.uk, .net.au). I don't think that has been stated clearly 
before.

For the second-level (or third-level where the top is a country code) 
domain tag, it should be the legal responsibility of the name 
authorities for the domain above to ensure that spoofed names cannot be 
registered (or if registered, all belong to one owner). In the Western 
world, if that is not already the case, then I'm sure that the first 
time a spoof of, say Coca-Cola (or Pepsi — let's be even-handed) is 
registered, then we can be certain that afterwards, the issuing 
authority will never do it again.

In the case of countries whose law systems are still a bit wild and 
wooly (The former Soviet Union?), then I suspect that for the time 
being it will remain ‘Caveat Emptor’. In either case, a domain name 
holder should be able to license all spoofs for free, in order to limit 
its exposure to spoofing, whether or not there is adequate legal 
recourse.

The point I'm making is that while the authorities for .com.au or 
.com.ru may do what they like, we can at least give them advice plus 
some tables that will detect many, if not most, spoofs. In the case 
where the authority allows (for whatever reason) a name with mixed 
orthographies, then clearly the first to apply whose signature is not a 
spoof for an (already well-established) trade-marked name or domain 
name, should get the license, and all other applicants with a similar 
name be refused. The name authority should be protected by the laws of 
the countries in which it operates from being sued for refusing to 
register confusable names.

Thus, our rôle reduces to providing some automatic methods to help the 
authorities deal with the homograph problem, and we can quit discussing 
the question of how to enforce authorities to adopt sensible naming 
conventions: that ultimately belongs to the realm of law and 
regulation.

George

Soobok Lee | 20 Feb 06:27

space-like unicode char

You can paste this html/javascript codelet  to an html file in your 
webserver and see in your  MSIE brower.
You will see "www.microsoft.com" isolated in the addressbar from the 
"mozilla.org" domain suffix.
Fortunately, you will see blank space (no phishing page) if  you have 
recent IE patch.
This won't work in firefox 1.x which strips off  those special chars  
for unknow reasons before sending to
the address bar.

<script>
window.open(unescape("http://www.microsoft.com%u1160%u1160%u1160%u1160%u1160%u1160.mozilla.org/"),"_blank");
</script>

U+1160 is  a space-like char and even stringprep/nameprep does not 
filter it out  because
the char  is not for punctuational purpose.
U+1160 is just one example, and i guess there may be many alternatives 
that can be
used   as blank char alternatives.

U+1160 in the above example  is placed  in the 3rd level domain name label,
 over which  .org registry cannot  impose any regulations.

Soobok Lee

Adam M. Costello | 19 Feb 23:17

quick & dirty (but not too dirty) homograph defense

Here's an idea for a quick-and-dirty enhancement to existing
applications:  Rather than disable IDNA entirely (which is quick but
too dirty), or flag all IDNs (almost as quick but still too dirty),
just flag all IDNs in .com and .net.  This would be significantly less
damaging to IDN deployment (which could proceed unhindered in the other
TLDs, particularly the ccTLDs), but is still extremely simple and could
be rolled out immediately while more sophisticated heuristics are
developed.

AMC

Doug Ewell | 19 Feb 22:31
Picon

Re: [idn] IDN spoofing


Philippe Verdy <vpi92 at yahoo dot fr> wrote: > The bad thing about your argument is that you are trying to mix > uppercase and uppercase letters. But for IDN, only lowercase letters > are really made distinct and encodable, as uppercase letters are case > folded to lowercase. So let's just concentrate on the set of letters > that really are distinct in lowercase because this is the form where > DNS servers will make distinctions in ASCII letters.
Fine, get rid of all the examples that involve uppercase. The problem is still there: iι pρ uυ vν wω yγ -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
Erik van der Poel | 19 Feb 15:01

RRP and language tags

[Removing unicode <at> unicode.org and adding dam <at> icann.org]

Yesterday, it seemed like some people agreed with me that, in the 
general case, the language tag being passed in the RRP protocol is not 
just naming a character table to filter against, but, more generally, a 
set of rules that the registry applies to determine registerability.

Ignoring for now whether we can decide that there is consensus on this 
issue, I'd like to explore the issues some more, and ask a few 
questions, if that's OK.

First, the language tag is just a name, so in some sense it does not 
really matter *what* it is naming, whether that be a human language, a 
registry's table of characters, or a registry's set of rules. But if we 
can show that ISO 639 is not an adequate namespace, then we may wish to 
consider a new namespace.

One reason that 639 might not be adequate is that some languages are 
used in more than one part of the world, and that usage may differ 
enough from one community to another that we want to assign two separate 
codes instead of a single 639 code. A very common way to go about this 
is to simply add a country code (ISO 3166).

However, currently it seems like the trend is to have a number of sets 
of rules, one at each NIC. The .jp registry has a table and some rules, 
the .de registry has a table, and so on. Some organizations appear to be 
working together to unify their methods. China and Taiwan come to mind. 
This trend might lead one to think that the RRP tag should not be a 
language, but simply the name of a NIC or TLD or pair of TLDs.

But this might be too restrictive. What if some organization or 
individual comes up with a set of rules that is desirable for a number 
of reasons, such as the inclusion of a large number of characters? Some 
registries may wish to allow registrars (and by extension, registrants) 
to specify the use of such a set of rules.

This would argue for maximum flexibility in the RRP tag namespace. I.e. 
it should not be limited to 639 or to 639+3166 or to ccTLD name. It 
could just be a separate namespace, possibly with an IANA registry, 
similar to the numerous other namespaces registered there. Charsets come 
to mind, but there are many others. Such a new namespace would even 
allow for private agreements, which usually have names that start with 
"x-" in other IANA namespaces.

Any thoughts on these ideas?

And now for the questions: Where is this RRP spec? Is it an RFC? Which 
RRP document specifies the language tag we have been talking about?

Do all of the registries use RRP? If not, what do they use? And do they 
have language tags?

Thanks,

Erik


Gmane