Adam M. Costello | 1 May 03:57

Re: ToUnicode output can be longer than input

Dan Oscarsson <Dan.Oscarsson <at> kiconsulting.se> wrote:

> IDNA defines a way to compare names in ASCII context because it
> requires names to be in IDNA ACE format.

It requires names to be in ASCII format in IDN-unaware contexts.  It
does not require names to be in ASCII format when they are compared.  It
says:

    Whenever two labels are compared, they MUST be considered to match
    if and only if they are equivalent, that is, their ASCII forms
    (obtained by applying ToASCII) match using a case-insensitive ASCII
    comparison.

That doesn't say you must compare the ASCII forms, it says you must
reach the same answer as if you compared the ASCII forms.  And the rule
doesn't say it applies only in certain contexts, it says "whenever"
two labels are compared.  If that's not clear enough, the point is
underscored in the introduction:

    Applications can also define protocols and interfaces that support
    IDNs directly using non-ASCII representations.  IDNA does not
    prescribe any particular representation for new protocols, but it
    still defines which names are valid and how they are compared.

> Comparing names in an international context must be done using UCS
> characters directly.

I assume that's your opinion.  It's certainly not a requirement of any
standard.
(Continue reading)

Edmon Chung | 2 May 22:28

Re: IDN's with any ASCII character

It should be noted however that given an input: "this&that.com"
The proper output for ToACE MUST be "this&that.com" and NOT
"xn--this&that-.com"
I specifically raise this issue because we have found that some
IDNA/Punycode implementation is actually exhibiting this behaviour.
Edmon

----- Original Message -----
From: "Adam M. Costello" <idn.amc+0 <at> nicemice.net.RemoveThisWord>
To: "IETF idn working group" <idn <at> ops.ietf.org>
Sent: Wednesday, April 30, 2003 1:09 AM
Subject: Re: [idn] IDN's with any ASCII character

> Jarrod Hollingworth <jarrod <at> backslash.com.au> wrote:
>
> > Will IDN's allow encoding of domain names with *any* ASCII character?
> >
> > For example, let's say that I want to register the domain name
> > "this&that.com" or "100^10.com".
> >
> > Will IDN allow this or does it only facilitate international
> > languages?
>
> IDNA allows the addition of non-ASCII characters to domain names.  For
> ASCII characters, IDNA adds no new restrictions, but nor does it relax
> the old restrictions.  The ASCII characters & and ^ (and every other
> ASCII character besides letters, digits, and hyphen) are not allowed in
> the "preferred syntax", which is used for domain names that name hosts
> and mail exchangers.
>
(Continue reading)

Edmon Chung | 3 May 23:04

Re: IDN's with any ASCII character

Sorry everyone, In my previous message I have mis-used the term ToACE, it
should be ToASCII (there is no ToACE)
and I should have said that ToASCII SHOULD result ... instead of MUST...

Anyway, just to reiterate my point.

Given an input "this&that.com" the output for ToASCII MUST NOT be
"xn--this&that-.com"

And we have come across implementations that mis-convert it to
"xn--this&that-.com"

Also, while ToASCII should fail in this case, it is important for an
implementation not to further terminate the process.  More specifically,
when ToASCII fails, the implementation to leave further interpretation up to
the original application and should not attempt to alter or terminate its
path.

Edmon

----- Original Message -----
From: "Edmon Chung" <edmon <at> neteka.com>
To: "IETF idn working group" <idn <at> ops.ietf.org>
Sent: Friday, May 02, 2003 4:28 PM
Subject: Re: [idn] IDN's with any ASCII character

> It should be noted however that given an input: "this&that.com"
> The proper output for ToACE MUST be "this&that.com" and NOT
> "xn--this&that-.com"
> I specifically raise this issue because we have found that some
(Continue reading)

Simon Josefsson | 4 May 00:52

Re: IDN's with any ASCII character

"Edmon Chung" <edmon <at> neteka.com> writes:

> Given an input "this&that.com" the output for ToASCII MUST NOT be
> "xn--this&that-.com"
>
> And we have come across implementations that mis-convert it to
> "xn--this&that-.com"

As far as I can tell, this is simply a bug in the implementation, not
in the specification.

> Also, while ToASCII should fail in this case,

It depends.  If the UseSTD3ASCIIRules flag is set, it should fail.
Otherwise, it shouldn't.

Stephane Bortzmeyer | 13 May 17:08
Picon

Searching for IDN free software implementations (Perl and Java)

Hello,

Since http://www.i-d-n.net/ is terribly outdated, I'm looking for help
here.

We plan to register IDNs "soon" in ".fr". Our legacy software is in
Perl and Java. I need IDN "free as in free speech, not free as in free
beer" implementations for both.

For Perl, I could write a XS to the excellent GNU libidn or to MDNkit
but I hope someone else did it before the lazy guy I am. The
implementation in <URL:http://www.imc.org/nameprep/> is quite outdated
and I wonder if I can reasonably use it?

For Java, one (not me) could use the JNI to call GNU libidn but I
would prefer a native Java solution, easing the deployment to various
platforms without recompilation.

Any advice? ("Reprogram everything in Python" is a good advice but
unrealistic, I'm afraid.)

Mark Davis | 13 May 17:56

Re: Searching for IDN free software implementations (Perl and Java)

In the ICU 2.6 release, we are including an IDNA implementation for C,
if you can call into that. ICU is open source, under the X Licence:
http://oss.software.ibm.com/icu/

Mark
________
mark.davis <at> jtcsv.com
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799

----- Original Message ----- 
From: "Stephane Bortzmeyer" <bortzmeyer <at> nic.fr>
To: <idn <at> ops.ietf.org>
Sent: Tuesday, May 13, 2003 08:08
Subject: [idn] Searching for IDN free software implementations (Perl
and Java)

> Hello,
>
> Since http://www.i-d-n.net/ is terribly outdated, I'm looking for
help
> here.
>
> We plan to register IDNs "soon" in ".fr". Our legacy software is in
> Perl and Java. I need IDN "free as in free speech, not free as in
free
> beer" implementations for both.
>
> For Perl, I could write a XS to the excellent GNU libidn or to
(Continue reading)

Paul Hoffman / IMC | 13 May 18:04
Picon
Gravatar

Re: Searching for IDN free software implementations (Perl and Java)

At 5:08 PM +0200 5/13/03, Stephane Bortzmeyer wrote:
>We plan to register IDNs "soon" in ".fr". Our legacy software is in
>Perl and Java. I need IDN "free as in free speech, not free as in free
>beer" implementations for both.
>
>For Perl, I could write a XS to the excellent GNU libidn or to MDNkit
>but I hope someone else did it before the lazy guy I am. The
>implementation in <URL:http://www.imc.org/nameprep/> is quite outdated
>and I wonder if I can reasonably use it?

No, and I thought I had removed it a while ago. I have now. Please 
see <http://www.imc.org/idna/>. All the IMC-created Perl code there 
is free as in speech and beer. It relies on charlint.pl, the W3C's 
Perl-based normalizer that Martin Dürst created; that has a separate 
license.

--Paul Hoffman, Director
--Internet Mail Consortium

Stephane Bortzmeyer | 14 May 13:35
Picon

[SUMMARY] Searching for IDN free software implementations (Perl and Java)

On Tue, May 13, 2003 at 05:08:32PM +0200,
 Stephane Bortzmeyer <bortzmeyer <at> nic.fr> wrote 
 a message of 20 lines which said:

> We plan to register IDNs "soon" in ".fr". Our legacy software is in
> Perl and Java. I need IDN "free as in free speech, not free as in free
> beer" implementations for both.

Solutions I've found:

Java) Verisign GRS distributes a SDK (Software Development Kit) to its
registrars. It includes a full IDN (RFC 3490 and friends)
implementation in Java. It seems well written and documented. There is
no public site for the distribution, if you are not a Verisign
registrar, you have to ask Verisign for a copy. The licence is BSD
(without the ad clause).

ICU for Java <URL:http://oss.software.ibm.com/icu4j/index.html> will
include IDN in a future version.

Perl) Paul Hoffman's test tool is written in Perl, and free. It is not
a ready-to-use, shrinkwrapped library (for instance, there are
hardcoded paths in the code). Requires Perl >=
5.8. <URL:http://www.imc.org/idna/> 

For Perl, we'll try to develop an XS interface, allowing to use GNU
libidn, written in C.

Roozbeh Pournader | 14 May 16:56

Re: IDN's with any ASCII character

On Wed, 30 Apr 2003, John C Klensin wrote:

> While, as far as I know, "high ASCII" has never been a standard term,
> "ASCII-8" and "8-Bit ASCII" definitely have been.  And both terms are
> still used informally, both inside and outside the US, to refer to the
> Standardized form of Latin-1, i.e., ISO 8859-1.

May be true, but the terms are dangerous, as in different cults they mean
different things. For example, for older PC & MS-DOS hackers the 8-bit
ASCII means CP437.

roozbeh

John C Klensin | 14 May 19:36

Re: IDN's with any ASCII character


--On Wednesday, 14 May, 2003 19:26 +0430 Roozbeh Pournader 
<roozbeh <at> sharif.edu> wrote:

> On Wed, 30 Apr 2003, John C Klensin wrote:
>
>> While, as far as I know, "high ASCII" has never been a
>> standard term, "ASCII-8" and "8-Bit ASCII" definitely have
>> been.  And both terms are still used informally, both inside
>> and outside the US, to refer to the Standardized form of
>> Latin-1, i.e., ISO 8859-1.
>
> May be true, but the terms are dangerous, as in different
> cults they mean different things. For example, for older PC &
> MS-DOS hackers the 8-bit ASCII means CP437.

The _only_ thing I was trying to clarify was the apparent 
assertion in Doug's note that ASCII-8 wasn't well defined in 
some standard.  It was and is.  I wouldn't suggest using the 
term either.

      john

>
> roozbeh
>


Gmane