Mark.Andrews | 1 Jun 02:17
Favicon

Re: I-D ACTION:draft-ietf-idn-idna-08.txt


	Below is the wire to presentation table from RFC 1034 / 1035
	for label octets.  BIND also escapses '"' but that is not
	strictly required.

     00 \000  01 \001  02 \002  03 \003  04 \004  05 \005  06 \006  07 \007
     08 \008  09 \009  0a \010  0b \011  0c \012  0d \013  0e \014  0f \015
     10 \016  11 \017  12 \018  13 \019  14 \020  15 \021  16 \022  17 \023
     18 \024  19 \025  1a \026  1b \027  1c \028  1d \029  1e \030  1f \031
     20 \032  21  !    22  "    23  #    24 \$    25  %    26  &    27  '
     28 \(    29 \)    2a  *    2b  +    2c  ,    2d  -    2e  \.    2f /
     30  0    31  1    32  2    33  3    34  4    35  5    36  6    37  7
     38  8    39  9    3a  :    3b  \;   3c  <    3d  =    3e  >    3f  ?
     40 \@    41  A    42  B    43  C    44  D    45  E    46  F    47  G
     48  H    49  I    4a  J    4b  K    4c  L    4d  M    4e  N    4f  O
     50  P    51  Q    52  R    53  S    54  T    55  U    56  V    57  W
     58  X    59  Y    5a  Z    5b  [    5c \\    5d  ]    5e  ^    5f  _
     60  `    61  a    62  b    63  c    64  d    65  e    66  f    67  g
     68  h    69  i    6a  j    6b  k    6c  l    6d  m    6e  n    6f  o
     70  p    71  q    72  r    73  s    74  t    75  u    76  v    77  w
     78  x    79  y    7a  z    7b  {    7c  |    7d  }    7e  ~    7f \127
     80 \128  81 \129  82 \130  83 \131  84 \132  85 \133  86 \134  87 \135
     88 \136  89 \137  8a \138  8b \139  8c \140  8d \141  8e \142  8f \143
     90 \144  91 \145  92 \146  93 \147  94 \148  95 \149  96 \150  97 \151
     98 \152  99 \153  9a \154  9b \155  9c \156  9d \157  9e \158  9f \159
     a0 \160  a1 \161  a2 \162  a3 \163  a4 \164  a5 \165  a6 \166  a7 \167
     a8 \168  a9 \169  aa \170  ab \171  ac \172  ad \173  ae \174  af \175
     b0 \176  b1 \177  b2 \178  b3 \179  b4 \180  b5 \181  b6 \182  b7 \183
     b8 \184  b9 \185  ba \186  bb \187  bc \188  bd \189  be \190  bf \191
     c0 \192  c1 \193  c2 \194  c3 \195  c4 \196  c5 \197  c6 \198  c7 \199
(Continue reading)

Adam M. Costello | 1 Jun 06:52

Re: I-D ACTION:draft-ietf-idn-idna-08.txt

Dan Oscarsson <Dan.Oscarsson <at> trab.se> wrote:

> Why is it that IETF can change the definition on for example what
> characters are allowed in host names without an identifer of some
> kind,

That's not what we're doing.  In data intended for machine consumption
(protocol messages, function arguments, machine-readable file formats,
etc) we are insisting that domain names continue to be ASCII-only;
non-ASCII domain names may appear only where they are explicitly invited
by new protocols/interfaces/formats (or new versions of old ones).

In data intended for human consumption (like email message bodies and
user interfaces) we (IDNA) are being more lax, and encouraging domain
names to use the same charset that is used for all text in that context.
The rationale is that compared to machines, humans are better able to
cope with change, and much less tolerant of unintelligible garbage.

(There is a case to be made against this laxity, and Eric has made it.)

> For example, the URI definition have changed making those applications
> following the original specification reject what later revisions see
> as valid URIs.

That's fine.  If the only difference between the old standard and the
new standard is that old software rejects things that new software
accepts, that's the best you can hope for.

What's bad is if old software and new software both accept the same
things, but behave differently.  That's the situation we're afraid of
(Continue reading)

Dan Oscarsson | 2 Jun 10:11
Picon

Re: I-D ACTION:draft-ietf-idn-idna-08.txt

Adam M. Costello wrote:

>That's not what we're doing.  In data intended for machine consumption
>(protocol messages, function arguments, machine-readable file formats,
>etc) we are insisting that domain names continue to be ASCII-only;
>non-ASCII domain names may appear only where they are explicitly invited
>by new protocols/interfaces/formats (or new versions of old ones).
>
>What's bad is if old software and new software both accept the same
>things, but behave differently.  That's the situation we're afraid of
>with 8-bit domain names in DNS.  Existing DNS servers already accept
>8-bit names in queries.  If we were to declare that such queries must
>now be handled differently, we'd create interoperability problems.

But domain names in use today are not ASCII-only. I know at least
two DNS servers serving names using UTF-8 (Microsofts and .NU-bind).
IDNA will change how things are handle. Applications that before
sent UTF-8 will now send ACE-names breaking what worked before.
This creates interoperability problems.
It will force changes in DNS servers, despite IDNA saying that
no changes are needed in DNS servers. The .Nu-bind and Microsoft
servers will have to be changed so they can both recognise the
old native UTF-8 names and the new ACE-names, for the same name.

So you see, IDNA will break the current handling of non-ASCII
names in DNS. As things will break, why not standardise the
usage of non-ASCII in domain names. This would result in
some servers breaking, but will give better stability for the future.

   Dan
(Continue reading)

Dan Oscarsson | 2 Jun 10:58
Picon

Using a new class for IDN

John C Klensin published the idea to use a new class in DNS
for IDN. As I now feel that there is no way to stop IDNA, I have
looked on how to introduce native handling of non-ASCII in DNS
in a easy way. While it can be done by EDNS and a new label type,
I think a cleaner, and probably easier to implement, way is
to use a new class. Using a new class you get:
- a simple flag telling the DNS server that all text data (both
  labels and other) are in a well defined encoding of UCS.
- clients get a error response if server do not support new class.
  Telling the client to retry using old IN class.
- Basic support in DNS can be added by just defining all records
  in IN and new class. When in IN, use ACE encoded labels.
  When in new class, use UCS. A more efficient implementation can
  be added later.
- A clean break from the old DNS world.
- Compared to IDNA you get the same native encoding of
  characters everywhere (all labels get internationalised, and
  text fields like those in HINFO and TXT gets well defined) and
  case can be preserved in responses.

I could write a draft for this, if some of you would support it.

IDNA as it is today, does complicate the above way in at least two
ways:
- The count of characters that can fit into 63 octets differ
  when using ACE-names and native UCS-names.
  The above solution is very simple if we can require all
  names to fit in 63 octets. Then no new label types are needed
  and the DNS server use exactly the same records in both
  classes, only encoding in labels differ.
(Continue reading)

Adam M. Costello | 2 Jun 11:02

Re: I-D ACTION:draft-ietf-idn-idna-08.txt

Dan Oscarsson <Dan.Oscarsson <at> trab.se> wrote:

> IDNA will change how things are handle.  Applications that before sent
> UTF-8 will now send ACE-names breaking what worked before.

Applications that started using proprietary IDN techniques before
anything was standardized knew the risk they were taking.

> The .Nu-bind and Microsoft servers will have to be changed so they can
> both recognise the old native UTF-8 names and the new ACE-names, for
> the same name.

You don't need to change the servers, you merely need to add the ACE
forms to the zone files.  One could write a perl script to do that.

> As things will break, why not standardise the usage of non-ASCII in
> domain names.

I have no problem with that, but as long as you're willing to break
things, why not segregate the non-ASCII requests using EDNS or a new
class, so that things break in a predictable fashion rather than getting
randomly misinterpreted?

AMC

simon+idn | 2 Jun 14:44
Favicon

Re: :Re: Last Call: Preparation of Internationalized Strings

[Resending with different From: address.]

Patrik Fältström <paf <at> cisco.com> writes:

> --On 2002-05-30 12.16 +0200 Simon Josefsson <simon+idn <at> josefsson.org> wrote:
>
>> This is interesting -- has the Unicode consortium promised to always
>> update the CK normalization tables in a forward compatible way?
>
> Yes.

The reference for that statement seem to be (correct me if I'm wrong)
http://www.unicode.org/unicode/standard/policies.html:

,----
| Normalization. Once a character is encoded, its canonical combining
| class and decomposition mapping will not be changed in a way that will
| destabilize normalization.
`----

Which looks good.  However, reading on:

,----
| The decomposition mapping may only be changed at all in the following
| _exceptional_ set of circumstances:
| 
| + when there is a clear and evident mistake identified in the Unicode
|   Character Database (such as a typographic mistake), and
| 
| + when that error constitutes a clear violation of the identity
(Continue reading)

simon+idn | 2 Jun 14:45
Favicon

Re: :Re: Last Call: Preparation of Internationalized Strings

[Resending with different From: address.]

Patrik Fältström <paf <at> cisco.com> writes:

> --On 2002-05-31 00.26 +0900 Soobok Lee <lsb <at> postel.co.kr> wrote:
>
>> This issues were raised at the time of IDN WG last call,  3 months ago.
>
> And my answer was exactly the same then as now. We can not do better. If
> Unicode is updated, you will either get inconsistency with the parties
> using the new version of Unicode (if IDN is not upgraded) or inconsistency
> with old registered names (if IDN is upgraded).
>
> The point is that the current scheme (a) trust Unicode when they say that
> changes will be backward compatble and (b) IF they do an incompatible
> change, we can at that point in time make the desicion -- we don't have to
> decide now what path we take.

Yes.  I think it could be useful to have this discussion in the
specifications, so it is stored in the collective memory.  I am not
sure everyone will remember or understand all subtle problems that
will follow from changes in normalization mapping tables in a few
years (I know I won't).  Is the following reasonable?

,----
| If or when this specification is updated to use a more recent Unicode
| normalization table, the new normalization table must be compared with
| the old to spot backwards incompatible changes.  If there are such
| changes, they must be handled somehow, or there will be security as
| well as operational implications.  Methods to handle the conflicts
(Continue reading)

Patrik Fältström | 2 Jun 16:04
Picon
Favicon

Re: Re: :Re: Last Call: Preparation of Internationalized Strings

--On 2002-06-02 14.45 +0200 simon+idn <at> josefsson.org wrote:

> Is the following reasonable?

I think so personally, but, the way a last call works (and the way I want
to have it working as one of the editors of the documents) is that we
during the last call period collect issues like this one, and _then_ we
editors summarize and together with AD and wg chair rise a list with issues.

I.e. the whole last call period is a "collection of issues" period. If we
start updating the documents now, then comments will be based on multiple
versions of the document -- which doesn't make things easier.

So, I have added this issue to my list of things.

   paf

Patrik Fältström | 2 Jun 16:01
Picon
Favicon

Re: Re: :Re: Last Call: Preparation of Internationalized Strings

--On 2002-06-02 14.44 +0200 simon+idn <at> josefsson.org wrote:

> So it appears as if the statement isn't strictly true?

This is also true. I am discussing the issue about 3.1 versus 3.2 with
officers of the Unicode Consortium as we speak. I have requested some
conclusion no later than friday this coming week, which I will report back
to this wg (if not they send it to the mailing list directly).

  paf

Doug Ewell | 2 Jun 17:27
Picon

Re: Last Call: Preparation of Internationalized Strings

Simon,

There have been two corrections to normalization since Unicode 3.0.  One
involved a Chinese (Han) compatibility character that was mapped to the
wrong "normal" character by error.  The other involved a Yiddish
(Hebrew) compatibility character that should have had a compatibility
mapping, but did not, also by error.

Both corrections were made to characters that are supposedly "very rare"
in actual use, so that the real-world impact would be minimal.  Neither
one has anything to do with transcoding tables.

I know you are very concerned that Unicode has "broken its promise" by
making changes to the normalization tables after claiming they would not
do so.  I think if the corrections had not been made, there would have
been an equal but opposite reaction that Unicode was too stubborn to
correct its own mistakes, and that NFKC was rendered "useless" because
of these two incorrect mappings.

The pages explaining the corrigenda include lengthy, detailed
explanations of why the Technical Committee felt they were necessary and
justified.  As someone already mentioned, one of the justifications
given for the Yiddish change was that no normative references existed
*yet* for the Unicode normalization tables (i.e. from IDN).  This
implies that once such normative references *do* exist, a similar
decision to correct an error might not be made.

I imagine these were very difficult decisions for the UTC, who knew that
someone would jump on the changes immediately as evidence that
normalization is inherently unstable and Unicode is therefore "not
(Continue reading)


Gmane