Adam M. Costello | 1 Dec 2002 01:13

Re: Re: Fwd: Unicode letter ballot

Simon Josefsson <jas <at> extundo.com> wrote:

> Authentication identity "admin", authorization identity U+4711,
> password X.  For the argument, let's say U+4711 decomposes into U+1234
> in Unicode 3.2 but is later changed to U+4321.
>
> The SASL library, acting as a proxy in front of the application
> software, implements the current libstringprep correctly.  It checks
> that admin's password is X and that he is authorized to log in as
> U+1234 (which is the result after stringprep of U+4711, which was sent
> because the client hadn't been updated to use stringprep, which should
> cause no problem) and says OK to the application.
>
> Now, in 1a the application is using updated tables from a more recent
> stringprep that incorporates the fixed decomposition mapping, causing
> it to admit the user to an account U+4321.  This is bad.
>
> In 2a, the application sees that the characters are deprecated due
> to its decomposition mapping changed, and rejects the user.  This is
> good.

It looks like the security hole in 1a stems from the existence of two
Unicode strings X and Y such that now Stringprep(X) != Stringprep(Y)
(so that two distinct accounts for X and Y can be created), but later
(after the update of the decomposition mappings) Stringprep(X) ==
Stringprep(Y), so the two accounts will get confused.

But I think the same phenomenon can happen with 2a.  There are CNS
11643 strings A and B such that now Stringprep(CNS11643toUnicode(A))
!= Stringprep(CNS11643toUnicode(B)) (so that two distinct accounts for
(Continue reading)

Martin v. Löwis | 1 Dec 2002 10:54
Picon
Gravatar

Re: Re: Fwd: Unicode letter ballot

"Adam M. Costello" <idn.amc+0 <at> nicemice.net.RemoveThisWord> writes:

> An approach that would really avoid this pitfall would be to deprecate
> these characters not only in Unicode, but also in CNS 11643 and any
> other character sets that contain them, and create new characters
> in all these character sets, and leave all the mappings of the old
> deprecated characters unchanged in both the Unicode database and the
> WhateverToUnicode tables.

That, of course, defeats the purpose of compatibility characters in
the first place. It is my understanding that CNS 11643 contains
characters that are really duplicates of each other, in different
planes (of CNS 11643). The duplicates originate from creating the
standard from different sources, and not applying a unification
procedure. When integrating them to Unicode, the unification procedure
found out that they are duplicates, and added the compatibility
characters for round-tripping.

[Somebody please correct me if this report on the history is incorrect]

Now, if changing CNS 11643 was an option, you could have just
deprecated the duplicate characters in that standard.

Regards,
Martin

Simon Josefsson | 1 Dec 2002 16:26

Re: Fwd: Unicode letter ballot

"Adam M. Costello" <idn.amc+0 <at> nicemice.net.RemoveThisWord> writes:

> Simon Josefsson <jas <at> extundo.com> wrote:
>
>> Authentication identity "admin", authorization identity U+4711,
>> password X.  For the argument, let's say U+4711 decomposes into U+1234
>> in Unicode 3.2 but is later changed to U+4321.
>>
>> The SASL library, acting as a proxy in front of the application
>> software, implements the current libstringprep correctly.  It checks
>> that admin's password is X and that he is authorized to log in as
>> U+1234 (which is the result after stringprep of U+4711, which was sent
>> because the client hadn't been updated to use stringprep, which should
>> cause no problem) and says OK to the application.
>>
>> Now, in 1a the application is using updated tables from a more recent
>> stringprep that incorporates the fixed decomposition mapping, causing
>> it to admit the user to an account U+4321.  This is bad.
>>
>> In 2a, the application sees that the characters are deprecated due
>> to its decomposition mapping changed, and rejects the user.  This is
>> good.
>
> It looks like the security hole in 1a stems from the existence of two
> Unicode strings X and Y such that now Stringprep(X) != Stringprep(Y)
> (so that two distinct accounts for X and Y can be created), but later
> (after the update of the decomposition mappings) Stringprep(X) ==
> Stringprep(Y), so the two accounts will get confused.
>
> But I think the same phenomenon can happen with 2a.  There are CNS
(Continue reading)

Kent Karlsson | 2 Dec 2002 10:19
Picon
Picon

RE: Re: Fwd: Unicode letter ballot


> Yes, you are right.  I was thinking of when changes like the one
> proposed here applies to any character, that doesn't happen to be a
> compatibility character.

By "compatibility character" you seem to mean "character that has
a (compatibility or canonical) decomposition".  But there are also
other compatibility characters. Among them "characters that should
have had a canonical decomposition, but has no decomposition".

I'm thinking about what Soobok Lee wrote:

> If option A wins in the coming ballot, then I will push 
> forward "correct NFC's hangul
> jamo handling errors!" in Unicode 4.0, because NFC backward 
> compatibility promises would
> be regarded as being broken already in those cases of 5 CJK 
> characters. I think "backward
> compatiblity promise of NFC" is useful and neccesary but 
> proved to be premature at least 
> in current version 3.2.  It is proved that proofreading takes 
> much time.

The correction mentioned here involves characters that are such that 
they in Unicode 3.2 have no decompositions, while they had compatibility
decompositions in Unicode 2.0, but logically should have canonical
decompositions.  For those not familiar with Hangul (jamo) in Unicode,
these characters are those that stand for a sequence of two or three Hangul
letters. For instance HANGUL CHOSEONG NIEUN-KIYEOK ('nk') should have
a canonical decomposition into HANGUL CHOSEONG NIEUN ('n') followed by
(Continue reading)

Mark Davis | 4 Dec 2002 18:14

Fw: BALLOT on Five Canonical Mapping Errors

(Forwarded from another list)

> Patrik, the consortium does take seriously the relationship with the IETF.
> Were it not for that, there probably would not have been a letter ballot
in
> the first place -- it would have simply been dealt with in the last
meeting.
>
> As the discussion on this list and on
> http://www.imc.org/idn/mail-archive/threads.html attest, this is not a
> simple decision; there are good arguments on both sides.
>
> I myself favor option B -- I think it will cause the least disruption
> overall, especially since those 5 characters are rare, and the mapping
> tables from CNS 11643 to those characters are not widely deployed (and one
> already has to handle cases of variant mapping tables: the JIS characters
> that vary in mappings between different operating systems are a heck of a
> lot more common!)
>
> On the other hand, should the UTC decide to go with option A, the
> NormalizationCorrections.txt file does allow IDN to migrate to future
> versions of Unicode while being strictly backwards compatible. And because
> the mappings are one-way, to characters that themselves do not differ have
> canonical/compatibility mappings in either case, an implementation can
> *still* make use of a stock version of NFKC or NFC to do mappings. The
> implementation just needs to have an additional preprocessing step,
mapping
> those 5 characters according to NormalizationCorrections.txt before
applying
> normalization. (We should make this point clear somewhere in the
(Continue reading)

Simon Josefsson | 6 Dec 2002 04:52

Unassigned code points discussion in draft-hoffman-stringprep-07.txt

It has been pointed out that the handling of unassigned code points in
stringprep is unclear, and I agree.  Quoting the document:

,----
| 2. Preparation Overview
|
| The steps for preparing strings are:
|
| 1) Map -- For each character in the input, check if it has a mapping
| and, if so, replace it with its mapping. This is described in section 3.
|
| 2) Normalize -- Possibly normalize the result of step 1 using Unicode
| normalization. This is described in section 4.
|
| 3) Prohibit -- Check for any characters that are not allowed in the
| output. If any are found, return an error. This is described in section
| 5.
|
| 4) Check bidi -- Possibly check for right-to-left characters, and if any
| are found, make sure that the whole string satisfies the requirements
| for bidirectional strings. If the string does not satisfy the requirements
| for bidirectional strings, return an error. This is described in section 6.
|
| The above steps MUST be performed in the order given to comply with this
| specification.
`----

A step for handling unassigned code points would make it clearer:

5) Check unassigned code points -- Possibly check the output for
(Continue reading)

Paul Hoffman / IMC | 6 Dec 2002 17:35
Picon

Re: Unassigned code points discussion in draft-hoffman-stringprep-07.txt

At 4:52 AM +0100 12/6/02, Simon Josefsson wrote:
>A step for handling unassigned code points would make it clearer:
>
>5) Check unassigned code points -- Possibly check the output for
>    unassigned code points, according to the profile.  This is
>    described in section 7.
>
>A comment on whether this is what was intended or not would be
>appreciated.

You can do it last, you can do it first, or you can check for 
unassigned code points during the prohibit step. It will work the 
same regardless of when you check. The mapping, normalization, and 
bidi steps will never add any unassigned characters, so checking for 
them in any step has the same effect.

>It could be argued that step 3 covers for unassigned code points, but
>prohibited characters and unassigned characters are treated separately
>elsewhere, and the forward reference does not include section 7.  So
>unless it is stated explicitly that case 3 covers for unassigned code
>points too, one will not likely reach that conclusion when reading the
>document.

Given the length of section 7, and the many forward references to it, 
it seems like that it would be noticed. If we revised the document, 
we could add explicit text saying you can do the check whenever you 
want.

--Paul Hoffman, Director
--Internet Mail Consortium
(Continue reading)

James Seng | 7 Dec 2002 09:54
Picon

Fw: Results of BALLOT on Five Canonical Mapping Errors

fyi

----- Original Message -----
From: "Lisa Moore" <lisam <at> us.ibm.com>
To: "unicore" <unicore <at> unicode.org>
Sent: Saturday, December 07, 2002 6:09 AM
Subject: Results of BALLOT on Five Canonical Mapping Errors

> Folks,
>
> We are closing the ballot - 14 out of 18 members have voted, a majority of
> full members voted for Option A:
>
> A) Fix the canonical mappings and issue another normalization corrigendum
> with corrected mappings in the Unicode 4.0 time frame, as follows:
>
>    Make the following corrections in UnicodeData.txt for 4.0:
>    Correct canonical mapping for 2F868 from 2136A to 36FC.
>    Correct canonical mapping for 2F874 from 5F33 to 5F53.
>    Correct canonical mapping for 2F91F from 43AB to 243AB.
>    Correct canonical mapping for 2F95F from 7AAE to 7AEE.
>    Correct canonical mapping for 2F9BF from 4D57 to 45D7.
>    Add the following entries to NormalizationCorrections.txt:
>    2F868;2136A;36FC;4.0.0 # Corrigendum 4
>    2F874;5F33;5F53;4.0.0 # Corrigendum 4
>    2F91F;43AB;243AB;4.0.0 # Corrigendum 4
>    2F95F;7AAE;7AEE;4.0.0 # Corrigendum 4
>    2F9BF;4D57;45D7;4.0.0 # Corrigendum 4
>
> The vote was:
(Continue reading)

Soobok Lee | 7 Dec 2002 14:17
Picon

Re: Fw: Results of BALLOT on Five Canonical Mapping Errors


Thanks. I welcome Option A. This is the right decision!

I hope to see Unicode 4.0 NFC also has
 corrected version of hangul jamo processing. My concern was
 that such false mappings become too heavily deployed and established
 to be corrected sometime later, due to critical use in IDN identifiers. 
 But, Today, UTC decision opened the  right way for correction.
 I fullheartedly appreciate for their honest efforts around this.

Then, what would be done on Stringprep ? Will it follow
future Unicode 4.0 ? Of course, it will take some time for authors 
to give the answer. So, I won't prompt them today... :-)

Soobok Lee

On Sat, Dec 07, 2002 at 04:54:17PM +0800, James Seng wrote:
> fyi
> 
> ----- Original Message -----
> From: "Lisa Moore" <lisam <at> us.ibm.com>
> To: "unicore" <unicore <at> unicode.org>
> Sent: Saturday, December 07, 2002 6:09 AM
> Subject: Results of BALLOT on Five Canonical Mapping Errors
> 
> 
> > Folks,
> >
> > We are closing the ballot - 14 out of 18 members have voted, a majority of
> > full members voted for Option A:
(Continue reading)

Kent Karlsson | 7 Dec 2002 15:31
Picon
Picon

RE: Fw: Results of BALLOT on Five Canonical Mapping Errors


Soobok Lee:
> Thanks. I welcome Option A. This is the right decision!
> 
> I hope to see Unicode 4.0 NFC also has
>  corrected version of hangul jamo processing. My concern was
>  that such false mappings become too heavily deployed and established
>  to be corrected sometime later, due to critical use in IDN 
> identifiers. 
>  But, Today, UTC decision opened the  right way for correction.
>  I fullheartedly appreciate for their honest efforts around this.
> 
> Then, what would be done on Stringprep ? Will it follow
> future Unicode 4.0 ? Of course, it will take some time for authors 
> to give the answer. So, I won't prompt them today... :-)

Since IDNs have not yet been deployed, there is still a VERY SMALL
window of opportunity to fix this for IDNs.  Once deployed, it will
be much harder to fix this for IDNs, if at all possible.

	/Kent Karlsson

> Soobok Lee


Gmane