James Seng | 1 Jul 02:56
Picon

Re: Problems in normalisation and matching

Hi Dan,

I remember the "dot issues" was extensively discussed by the Nameprep Design
Team. It is decided that dots (other than U+002E) should be included because
there are IMEs which generate these dots in place of the normal dots (it
become a hassy to switch in and out of IME just for the dot). Now, some may
say IME is out of scope but on the other hand, we really dont need to rehash
a topic which have been concluded. Lets move forward.

If we can agreed on the above, then the "many problems" you point out are
really just misunderstanding of the Nameprep/IDNA relationships.

First, the Nameprep/Stringprep is designed to handle domain names on a _per
label_ basis. Before some IDNs going thru Nameprep, it is already broken up
into its individual labels so Nameprep arent the place to fixed.

The place where IDNs get broken down into label is in IDNA. What IDNA now
specify is that to break down IDNs into their label, you look for this set
of separators (U+002E, U+3002, U+FF0E, U+FF61). (See IDNA Requirement 1)

Comparison is also done on a per label basis. A IDN is considered equivalent
if and only if all their individual labels are equivalent. The separators
during comparison is also irrelevant. (See IDNA Requirement 4)

If the individual labels need to piece back together into a FQDN, then IDNA
have already clearly specified that U+002E should be used. (See IDNA
Requirement 2)

-James Seng

(Continue reading)

Dave Crocker | 1 Jul 04:09

Re: Problems in normalisation and matching

At 08:56 AM 7/1/2002 +0800, James Seng wrote:
>I remember the "dot issues" was extensively discussed by the Nameprep Design
>Team. It is decided that dots (other than U+002E) should be included because
>there are IMEs which generate these dots in place of the normal dots (it
>become a hassy to switch in and out of IME just for the dot).

This is a confuses user interface issues with protocol issues.  The IETF 
tries to stay away from user interface standardization, even though domain 
names do have a human representation.

User interfaces must adapt to a wide range of usability issues.  Protocols 
are not supposed to suffer that burden.

It is the job of the user interface to map whatever typing codes it chooses 
to, into the constrained protocol codes.  The theory behind typical 
Internet protocols -- and most other modern protocol standards -- is that 
the world chooses ONE way to do a thing and everyone with other ways maps 
to that one way.

The concern for cut-and-paste is obviously valid, but it is not the job of 
the IETF protocol standards to operate well within a user cut-and-paste 
environment.

>  Now, some may
>say IME is out of scope but on the other hand, we really dont need to rehash
>a topic which have been concluded. Lets move forward.

Introducing user interface issues into a protocol design is a good way to 
impair interoperability, because it adds variability.  That makes the 
protocol not work.
(Continue reading)

Dave Crocker | 1 Jul 05:03

Re: Problems in normalisation and matching

At 07:09 PM 6/30/2002 -0700, Dave Crocker wrote:
>This is a confuses user interface issues with protocol issues.

and it confuses my typing.  sorry.

The sentence was supposed to read:

         This confuses user interface issues with protocol issues.

d/

----------
Dave Crocker <mailto:dave <at> tribalwise.com>
TribalWise, Inc. <http://www.tribalwise.com>
tel +1.408.246.8253; fax +1.408.850.1850

Soobok Lee | 1 Jul 07:01
Picon

Re: Problems in normalisation and matching

Copy-and-paste *operation*   is not only a user interface but also a trigger to
a critical system service  for interprocess communications between independant applications  
like unix & NT-pipe or socket ,and it transfers some data from one application to the other 
with  conversions or transcodings. 

Some IETF protocols (TCP/IP) are often very strict at forcing on-the-wire communication octets streams
to be 
little-endian and big-endian. Cut-and-paste is a popular communication tool and have much more
stricter rules and conventions for various data formats (images, sounds , URLs and texts ). 
IDN specifications can recommends special treatements of IDN or IDN-like strings in text copy&paste operation.

Identifier security and integrity issue around copy-and-paste operation still 
is of the concern of this WG.

Soobok Lee

----- Original Message ----- 
From: "Dave Crocker" <dhc <at> dcrocker.net>
 > 
> The concern for cut-and-paste is obviously valid, but it is not the job of 
> the IETF protocol standards to operate well within a user cut-and-paste 
> environment.

James Seng | 1 Jul 17:06
Picon

Re: Problems in normalisation and matching

Right. But IETF deals mostly with wire-protocols. UI issues such as
cut-and-paste should be done more at a more appropriate forum e.g. POSIX.

-James Seng

----- Original Message -----
From: "Soobok Lee" <lsb <at> postel.co.kr>
To: <idn <at> ops.ietf.org>
Sent: Monday, July 01, 2002 1:01 PM
Subject: Re: [idn] Problems in normalisation and matching

> Copy-and-paste *operation*   is not only a user interface but also a
trigger to
> a critical system service  for interprocess communications between
independant applications
> like unix & NT-pipe or socket ,and it transfers some data from one
application to the other
> with  conversions or transcodings.
>
> Some IETF protocols (TCP/IP) are often very strict at forcing on-the-wire
communication octets streams to be
> little-endian and big-endian. Cut-and-paste is a popular communication
tool and have much more
> stricter rules and conventions for various data formats (images, sounds ,
URLs and texts ).
> IDN specifications can recommends special treatements of IDN or IDN-like
strings in text copy&paste operation.
>
> Identifier security and integrity issue around copy-and-paste operation
still
(Continue reading)

Soobok Lee | 1 Jul 17:59
Picon

Re: Problems in normalisation and matching

To be more specific, 
double-click&drag&drop mouse operation is an UI for cut&paste data transfer *system* service.
This will clarify my point. It's clear that only "IDN-compliant OS" can support "IDN-aware" applictions
faithfully.

Soobok Lee

----- Original Message ----- 
From: "James Seng" <jseng <at> pobox.org.sg>
To: "Soobok Lee" <lsb <at> postel.co.kr>; <idn <at> ops.ietf.org>
Sent: Tuesday, July 02, 2002 12:06 AM
Subject: Re: [idn] Problems in normalisation and matching

> Right. But IETF deals mostly with wire-protocols. UI issues such as
> cut-and-paste should be done more at a more appropriate forum e.g. POSIX.
> 
> -James Seng
> 
> ----- Original Message -----
> From: "Soobok Lee" <lsb <at> postel.co.kr>
> To: <idn <at> ops.ietf.org>
> Sent: Monday, July 01, 2002 1:01 PM
> Subject: Re: [idn] Problems in normalisation and matching
> 
> 
> > Copy-and-paste *operation*   is not only a user interface but also a
> trigger to
> > a critical system service  for interprocess communications between
> independant applications
> > like unix & NT-pipe or socket ,and it transfers some data from one
(Continue reading)

Dan Oscarsson | 2 Jul 08:30
Picon

Re: Problems in normalisation and matching


>First, the Nameprep/Stringprep is designed to handle domain names on a _per
>label_ basis. Before some IDNs going thru Nameprep, it is already broken up
>into its individual labels so Nameprep arent the place to fixed.
>
>The place where IDNs get broken down into label is in IDNA. What IDNA now
>specify is that to break down IDNs into their label, you look for this set
>of separators (U+002E, U+3002, U+FF0E, U+FF61). (See IDNA Requirement 1)
>

IDNA does define how to handle labels, not complete domain names.

A domain name can be used in many places and are included in many
protocols. For example it is used in HTTP, SMTP and HTML.

Today there exist many restrictions on "hostnames" due to simplify
handling of domain names by software and make them easier to
identify by users. To make this possible the restrictions
do not allow normal separator characters in a name and have labels
separated by ONE separator character.
Now when we expand the allowed characters in a domain name, then
allowed characters and syntax should follow the same rules:
- The labels of a domain name is separated by "full stop" U+002E
  and are written from left to right with least significant label
  first.
  Other characters or display form may be used in user interfaces
  but have to be converted into standard form in protocols.
- Separator characters like SPACE (U+0020) may not be used.
  (this results in that a domain name cannot use NFKC as it
   decomposes non-spacing accents into a space character followed
(Continue reading)

Adam M. Costello | 2 Jul 16:35

Re: Problems in normalisation and matching

Dan Oscarsson <Dan.Oscarsson <at> trab.se> wrote:

> Now when we expand the allowed characters in a domain name, then
> allowed characters and syntax should follow the same rules:
> - The labels of a domain name is separated by "full stop" U+002E
>   and are written from left to right with least significant label
>   first.
>   Other characters or display form may be used in user interfaces
>   but have to be converted into standard form in protocols.

IDNA says virtually nothing about how IDNs are to be represented in
new protocols.  New protocols can use the ASCII representation, or an
unconstrained Unicode representation (like UTF-8), or can define their
own more restricted representation (for example, nameprepped UTF-8 using
only U+002E as separator).

Although IDNA says that the other dot characters must be recognized as
dots, it does not say that they must be allowed in new protocols.  If a
new protocol forbids the other dot characters, recognizing them as dots
will be a no-op.

IDNA says that old protocols must use the ASCII representation, using
only U+002E as dots.

There has never been a consensus on a particular non-ASCII
representation for use in new protocols, and we don't need one in order
to start deploying IDNA.  That's why IDNA is silent on that issue.

AMC

(Continue reading)

Erik Nordmark | 3 Jul 10:28
Picon

Re: I-D ACTION:draft-ietf-idn-idna-10.txt


>  |  2. Perform the steps specified in [NAMEPREP] and fail if there is
>  |     an error. The AllowUnassigned flag is used in [NAMEPREP].
> 
> "allowunassigned" does not appear in draft-ietf-idn-nameprep-11.txt

Searching for "unassigned" in nameprep-11 results in:
	7. Unassigned Code Points in Internationalized Domain Names

	If the processing in [IDNA] specifies that a list of unassigned code
	npoints be used, the system uses table A.1 from [STRINGPREP] as its list
	of unassigned code points.

What is the problem?

> Is this the intended mechanism for allowing alternative profiles to be
> encoded in IDNA? I see that section 1.1 says that nameprep is mandatory:
> 
>  | IDNA requires that implementations process input strings with
>  | Nameprep [NAMEPREP], which is a profile of Stringprep [STRINGPREP],
>  | and then with Punycode [PUNYCODE]. Implementations of IDNA MUST
>  | fully implement Nameprep and Punycode; neither Nameprep nor Punycode
>  | are optional.
> 
> Collectively this means that nameprep is still the gatekeeper, and that
> profiles other than nameprep cannot be encoded with IDNA.

When there are additional profiles for domain names, it might 
makes sense to have a standards track RFC that either updates the IDNA RFC
to say when the new profile can be used, or (depending on how different
(Continue reading)

Eric A. Hall | 3 Jul 16:37

Re: I-D ACTION:draft-ietf-idn-idna-10.txt


on 7/3/2002 3:28 AM Erik Nordmark wrote:
>> |  2. Perform the steps specified in [NAMEPREP] and fail if there is
>> |     an error. The AllowUnassigned flag is used in [NAMEPREP].
>>
>> "allowunassigned" does not appear in draft-ietf-idn-nameprep-11.txt

> What is the problem?

Anal types like me see StudlyCapped functions and flags and expect to find
them documented. Other than that, none I suppose.

>> Collectively this means that nameprep is still the gatekeeper, and
>> that profiles other than nameprep cannot be encoded with IDNA.
>
> When there are additional profiles for domain names, [...]

There are already profiles described for iSCSI names and Kerberos realms.

> it might makes
> sense to have a standards track RFC that either updates the IDNA RFC to
> say when the new profile can be used, or (depending on how different
> things would be for the new usage) it might make sense to have a
> standards track RFC which uses "newprep" plus punycode.

Requiring new codecs for new profiles is all cost and zero benefit. What
possible value is there in forcing applications to define new codecs with
different outputs for their alternative names?

> But such a
(Continue reading)


Gmane