An implementer's lament
2012-11-28 03:28:44 GMT
http://annevankesteren.nl/2012/11/idna-hell
http://annevankesteren.nl/2012/11/idna-hell
I am loathe to return to the debates of the 2008-2010 period but the strong utility of canonical forms that are unambiguous as to identity (ie between A-Label and U-Label) should not be underestimated. Mapping has the unfortunate side-effect of making things "equivalent" when they are not in fact identical. I think many who were in favor of the IDNA2008 formulation were persuaded that this powerful feature was worth some breakage with regard to backward compatibility. It is obvious that there is a value judgment here and people's opinions varied.
vint
_______________________________________________ Idna-update mailing list Idna-update <at> alvestrand.no http://www.alvestrand.no/mailman/listinfo/idna-update
On Nov 16, 2012, at 6:09 AM, John C Klensin <klensin <at> jck.com> wrote: > I don't want to drag this out, but even that change implies that > we dismissed the "backward compatibility" issues as unimportant. > That wasn't the case. I am someone who, often vocally, disagreed with the way IDNA2008 went with respect to backward compatibility. Having said that, I think Mark's characterization of the people who were promoting IDNA2008 as "people who did not feel that it was an important concern" is simply wrong. The long discussions about backward compatibility on the mailing list very much showed that the authors were concerned about it and were willing to incorporate changes for backward compatibility that had WG consensus (of which I was often on the wrong side). We have IDNA2003 and IDNA2008 in deployment, both partially. We knew that this would happen, we talked about it, and we did IDNA2008 anyway. Name-calling at this point is not helpful to developers and end users of the two protocols. --Paul Hoffman
On Wed, Nov 14, 2012 at 3:23 PM, JFC Morfin <jefsey <at> jefsey.com> wrote: > May I remind that all what IUsers need is the ability to "filter out the > filters" as an option, i.e. that the browser transparently transmit the user > entry. The reason why is that I should get to the same place whatever the > application or the browser on my machine, and for that I do prefer to use > the same IDNEngine for my browsers and applications. IDNEngines are partly > documented in RFC 5895 (partly) because IDNA2008 does not support some key > features (like majuscules). What is an "IUser"? Also, what other than "a" (U+0061) would "A" (U+FF21) map to? Host names have been case-insensitive from the start, the Turkish I is not going to change that. Also, the focus on end users over stability of URLs found in markup in elsewhere feels like a distraction. Most users, for better or worse, use a search engine these days to get to a particular domain. They no longer enter addresses in the address bar. -- -- http://annevankesteren.nl/ _______________________________________________ Idna-update mailing list Idna-update <at> alvestrand.no http://www.alvestrand.no/mailman/listinfo/idna-update
Dear all, I had this small discussion with Mark and Markus and, despite our treefold homonymy, we couldn't get to common ground, so I've decided to get a reading of the standard directly from the IDNA2008 editors. According to my interpretation (cf. RFC 5891, Section 5) the lookup protocol relies on the assumption that names that are already present in the DNS are valid. And, in fact, I have a bunch of domains in my database with hyphens in the third and fourth position, so-called reserved LDH labels that are not XN-labels (s. nomenclature in RFC 5890, Section 2.3.2.1). Take for instance "ad--acta.de". My expectation would be that the protocol doesn't fail on those*. Mark however reminded me of the restrictions in RFC 5891, Section 4.2.3.1. But those are for the registration, which I am not interested in at the moment. If at all relevant, we'd have Section 5.4: "Putative U-labels with any of the following characteristics MUST be rejected prior to DNS lookup: [·..] o Labels containing '--' (two consecutive hyphens) in the third and fourth character positions." On my side, I claim that that restriction simply does not apply because "ad--acta.de" is not a "putative U-label", in fact it is no U-label at all (cf. U-Label definition in RFC 5890, Section 2.3.2.1). Thus, the protocol should never fail on lookup for "ad--acta.de". Is that correct? Best regards, Marcos * FWIW idnkit-2.2 works according to my expectations, ICU does not. _______________________________________________ Idna-update mailing list Idna-update <at> alvestrand.no http://www.alvestrand.no/mailman/listinfo/idna-update
The following errata report has been submitted for RFC5892, "The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)". -------------------------------------- You may review the report below and at: http://www.rfc-editor.org/errata_search.php?rfc=5892&eid=3312 -------------------------------------- Type: Editorial Reported by: Patrik Fältström <paf <at> netnod.se> Section: A and A.1 Original Text ------------- In A: Code point: The code point, or code points, to which this rule is to be applied. Normally, this implies that if any of the code points in a label is as defined, then the rules should be applied. If evaluated to True, the code point is OK as used; if evaluated to False, it is not OK. In A.1: Rule Set: False; If Canonical_Combining_Class(Before(cp)) .eq. Virama Then True; If RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*\u200C (Joining_Type:T)*(Joining_Type:{R,D})) Then True; Corrected Text -------------- In A: Code point: The code point, or code points, to which this rule is to be applied. Normally, this implies that if any of the code points in a label is as defined, then the rules should be applied. If evaluated to True, the code point is OK as used; if evaluated to False, it is not OK. For the rule to be evaluated to True for the label, it MUST be evaluated to True for every occurrence of Code point in the label. In A.1: Rule Set: False; If Canonical_Combining_Class(Before(cp)) .eq. Virama Then True; If cp .eq. \u200C And RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*cp (Joining_Type:T)*(Joining_Type:{R,D})) Then True; Notes ----- The original text did not make it clear whether the actual rule is to be applied once for every occurrence of the code point in the label. This is a regular expression that can be interpreted in multiple ways, plus it does not take into account the case where more than one U+200C exists in a label. Instructions: ------------- This errata is currently posted as "Reported". If necessary, please use "Reply All" to discuss whether it should be verified or rejected. When a decision is reached, the verifying party (IESG) can log in to change the status and edit the report, if necessary. -------------------------------------- RFC5892 (draft-ietf-idnabis-tables-09) -------------------------------------- Title : The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) Publication Date : August 2010 Author(s) : P. Faltstrom, Ed. Category : PROPOSED STANDARD Source : Internationalized Domain Names in Applications (Revised) Area : Applications Stream : IETF Verifying Party : IESG
_______________________________________________ Idna-update mailing list Idna-update <at> alvestrand.no http://www.alvestrand.no/mailman/listinfo/idna-update
Hi,
RFC5892 contains the following rule about the contextual validity of U+200C:
> If RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*\u200C
> (Joining_Type:T)*(Joining_Type:{R,D})) Then True;
By intuition, I understand that "\u200C" within the regex means the code
point in question. So, a feasible interpretation would be:
(*) The code point MUST occur between Joining_Type:{L,D} and
Joining_Type:{R,D}, where arbitrary occurences of Joining_Type:T MAY be
in between.
On the other hand, the statement literally defines just a regex that
should match the string somewhere (with no reference to "cp" as in other
rules), such that the rule would be satisfied already if any U+200C
fulfill the requirement.
The literally interpretation sounds stupid, but I found both variants
within IDNA2008 implementations.
For instance, consider the Perl module Net::IDN::UTS46 on CPAN. Here,
it's taken literally and hence the sequence
U+0628 U+200C U+0627 U+200C U+0627
is considered to be valid, although U+0627 is Joining_Type:R and thus
the second U+200C doesn't meet the requirement (*).
On the other hand, the (probably more reliable) implementation idnkit-2
from the Japan Registry reports a CONTEXTJ rule violation for the same
string. Now, who is right?
regards, Sebastian
Greeting all,
It might be a little bit odd to ask this question at the moment, I know it is a little bit late and I tried my best to search for it. What is the main reason for not supporting the UNICODE in the DNS protocol and to not use the hack-and-slash current way to solve this issue?
I tried to virtualized these scenarios but I failed to fulfill them cause I found another scenario which can contradict it:
1)
It is hard to update the internet old legacy of machines will be impossible to maintain:
Well considering current supporting for IPv6 RRs, DNSSEC RRs, … and other records within the protocol I don’t think it is hard to use the UNICODE as based encoding in DNS servers.
2)
It is bad to increase the size of the zone which will slow the cashing and will increase paging which will cause slowness in responses:
again, with ICANN allowing the new GTLDs and supporting the DNSSEC (big chunk of hashes) these things already increased the size.
3)
As technical part it is hard to maintain supporting other languages within the zone, it is hard to work with them without understanding:
aren’t IDNAs considered to be hashes?
I am looking forward for answers!
AbdulRahman,
_______________________________________________ Idna-update mailing list Idna-update <at> alvestrand.no http://www.alvestrand.no/mailman/listinfo/idna-update
Greetings again. The websec WG is in WG Last Call for draft-ietf-websec-strict-transport-sec, an interesting document that is likely to be widely deployed in web browsers and servers. There are a few places in the document that touch (slam?) into IDNA 2003 and IDNA 2008, so I thought this list should pay attention now rather than later. In specific, please see section 8 (just the beginning), section 9, and section 13. The WG LC ends April 9. Please send any comments to the websec WG mailing list, not here. Thanks! --Paul Hoffman
At 03:13 12/03/2012, Shawn Steele wrote: >What kinds of applications are expected to consume this >data? What's the target? Shawn, The browsers want to use them. To validate the IDNs. This is why we need to get them synced. jfc
| Mon | Tue | Wed | Thu | Fri | Sat | Sun |
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | ||
| 6 | 7 | 8 | 9 | 10 | 11 | 12 |
| 13 | 14 | 15 | 16 | 17 | 18 | 19 |
| 20 | 21 | 22 | 23 | 24 | 25 | 26 |
| 27 | 28 | 29 | 30 | 31 |
RSS Feed65 | |
|---|---|
4 | |
13 | |
10 | |
42 | |
27 | |
11 | |
125 | |
2 | |
28 | |
18 | |
12 | |
5 | |
1 | |
27 | |
34 | |
49 | |
45 | |
58 | |
29 | |
29 | |
126 | |
16 | |
9 | |
81 | |
1 | |
47 | |
70 | |
22 | |
553 | |
108 | |
202 | |
219 | |
250 | |
494 | |
47 | |
29 | |
294 | |
536 | |
228 | |
90 | |
282 | |
131 | |
270 | |
70 | |
93 | |
244 | |
74 | |
308 | |
226 | |
498 | |
168 | |
178 | |
102 | |
33 | |
1 | |
5 | |
112 | |
11 | |
17 | |
40 | |
78 | |
300 | |
96 | |
1 |