Doug Ewell | 2 Jul 00:58

Solving the UTF-8 problem

This is intentionally cross-posted to LTRU and ietf-languages, since it 
deals with both implementation policy and proposed changes to RFC 4646bis 
and 4645bis.

CE Whitehead <cewcathar at hotmail dot com> wrote on ietf-languages:

> I want to update the 1694acad comments field to include a transliteration 
> into Basic Latin (also--perhaps???--to fix the inconsistency as 4eme is 
> missing the accent grave on the e!! :
>
> Comments: 17th century French, as catalogued in the "Dictionnaire de 
> l'acad&#xE9;mie fran&#xE7;oise" ("l'academie francoise"), 4&#xE8;me (4eme) 
> ed. 1694; frequently includes elements of Middle French, as this is a 
> transitional period.

I really, really don't like the direction this is headed.  Ultimately we 
will find ourselves having to provide duplicate Description and Comments 
content for every non-ASCII character in the Language Subtag Registry, 
removing most of the advantage of being able to represent non-ASCII in the 
first place.

What are we going to do when the ISO 639-3 code list is finalized and we 
have to deal with adding the following pairs of languages, whose names 
differ only by diacritical marks?

aru  Arua
arx  Aruá

bfa  Bari
mot  Barí
(Continue reading)

Mark Davis | 2 Jul 01:18
Favicon

Suggested text for future compatibility of registry processors.

Doug noted the following in his email on UTF-8 (on which topic I agree with him, btw).

> Addison is correct; any structural change to the Registry will break RFC
4646-conformant processors.  This is true not only for UTF-8, but also for
new fields such as "Macrolanguage" or "Modified."  (Section 3.1 says the
Type "MUST" be one of the seven currently defined values.)

I suggest that we add language to 4646bis noting this, with the following suggested text.

Add to the end of 3.1.2.  Record Definitions:

Future versions of the language subtag registry may add more fields. Processors of the registry that are not intended to be updated with each successive version of BCP 47 and thus need to be compatible with future versions of the registry, SHOULD be written so as to ignore additional fields.

--
Mark
John Cowan | 2 Jul 01:28

Re: Solving the UTF-8 problem

Doug Ewell scripsit:

> 1.  UTF-8 doesn't play well with e-mail, which is invaluable for 
> discussing changes on the ietf-languages list and sending the changes to 
> IANA (stated by several).

I note for the record that your message arrived encoded in Latin-1
but tagged "utf-8".

--

-- 
We pledge allegiance to the penguin             John Cowan
and to the intellectual property regime         cowan <at> ccil.org
for which he stands, one world under            http://www.ccil.org/~cowan
Linux, with free music and open source
software for all.               --Julian Dibbell on Brazil, edited

Randy Presuhn | 2 Jul 01:39
Picon

Re: Suggested text for future compatibility of registryprocessors.

Hi -

As a technical contributor...

> From: "Mark Davis" <mark.davis <at> icu-project.org>
> To: "LTRU Working Group" <ltru <at> ietf.org>
> Sent: Sunday, July 01, 2007 4:18 PM
> Subject: [Ltru] Suggested text for future compatibility of registryprocessors.
....
> Add to the end of 3.1.2.  Record Definitions:
> 
> Future versions of the language subtag registry may add more fields.
> Processors of the registry that are not intended to be updated with each
> successive version of BCP 47 and thus need to be compatible with future
> versions of the registry, SHOULD be written so as to ignore additional
> fields.
...

I agree with the intent, but I'd like to propose a slightly different wording:

  Future versions of this memo MAY define additional field types for
  use in the language subtag registry.  Consequently, software to
  process the content of the registry SHOULD tolerate unrecognized
  field types.

Rationales:

(1) I think this is a bit clearer and more concise
(2)  "tolerate" rather than "ignore" - consider the result of
a typo: "Suppers-Script:"   - I think one would like an implementation
to be able to issue a warning, rather than simply ignoring, a field
that it doesn't recognize.

Randy

Doug Ewell | 2 Jul 01:50

Re: Solving the UTF-8 problem

John Cowan <cowan at ccil dot org> wrote:

>> 1.  UTF-8 doesn't play well with e-mail, which is invaluable for 
>> discussing changes on the ietf-languages list and sending the changes to 
>> IANA (stated by several).
>
> I note for the record that your message arrived encoded in Latin-1 but 
> tagged "utf-8".

(For reference: Microsoft Outlook Express 6.00.2900.2180.)

When I saved the sent message as a text file and then opened it in a hex 
editor, I saw the UTF-8 sequences.  So it may have been changed somewhere 
along the way.  Maybe I should have included a non-1252 character somewhere.

In any case, I don't dispute that e-mail support for UTF-8, whether in 
clients or gateways, is unreliable.  I suggested an approach to get around 
that problem.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages

Mark Davis | 2 Jul 01:55
Favicon

Re: Suggested text for future compatibility of registryprocessors.

Your changes sound fine by me. I do have a question. I thought MAY was only capitalized when it was in reference to conformance to this document. But in that statement it doesn't seem to apply: it doesn't appear that we in this document can make any conformance promises (positive or negative) about future versions since that is entirely in the hands of the IETF.

Mark

On 7/1/07, Randy Presuhn <randy_presuhn <at> mindspring.com> wrote:
Hi -

As a technical contributor...

> From: "Mark Davis" <mark.davis <at> icu-project.org>
> To: "LTRU Working Group" < ltru <at> ietf.org>
> Sent: Sunday, July 01, 2007 4:18 PM
> Subject: [Ltru] Suggested text for future compatibility of registryprocessors.
....
> Add to the end of 3.1.2.  Record Definitions:
>
> Future versions of the language subtag registry may add more fields.
> Processors of the registry that are not intended to be updated with each
> successive version of BCP 47 and thus need to be compatible with future
> versions of the registry, SHOULD be written so as to ignore additional
> fields.
...

I agree with the intent, but I'd like to propose a slightly different wording:

  Future versions of this memo MAY define additional field types for
  use in the language subtag registry.  Consequently, software to
  process the content of the registry SHOULD tolerate unrecognized
  field types.

Rationales:

(1) I think this is a bit clearer and more concise
(2)  "tolerate" rather than "ignore" - consider the result of
a typo: "Suppers-Script:"   - I think one would like an implementation
to be able to issue a warning, rather than simply ignoring, a field
that it doesn't recognize.

Randy



_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ltru



--
Mark
Martin Duerst | 1 Jul 04:01
Picon
Gravatar

Re: Resolving issues

with chair hat on:

At 07:06 07/06/30, Mark Davis wrote:
>Addison and I have been looking over some of the remaining issues, and have worked out some suggested
language to resolve some open
issues.
>
><http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-06.html>http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-06.html<http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-06.html> 

You talk about issue*s* here, but below, I can proposed language for
only one issue. Are we supposed to look at the above link for the
other issues, or will language for other issues be proposed in
forthcomming email, or what?

Also, it would be good to give a more specific pointer, i.e.
http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-06.html#subtagreviewer

>The IESG will solicit nominees for the position (initially or upon a vacancy) and seek to ascertain the
candidates' qualifications.
>
>=>
>
>The IESG will solicit nominees for the position (upon adoption of this document or upon a vacancy) and then
solicit feedback on the nominees' qualifications. 
>
>Qualified candidates should be familiar with BCP 47 and its requirements; be willing to fairly,
responsively, and judiciously administer the registration process; and be suitably informed about the
issues of language identification so that they can draw upon and assess the claim and contributions of
language experts and subtag requesters. 

as a technical (or, in this case, procedural) contributor:

I think one big issue that isn't dealt with here is that the appointment
is for an infinite term. I think it would be much better to have a
limited term (e.g. two years), with a possibility for renewal.
This gives a good chance for reviewing from both sides (both the
reviewer as well as the IESG).

Regards,    Martin.

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst <at> it.aoyama.ac.jp     

Randy Presuhn | 2 Jul 02:16
Picon

Re: Suggested text for future compatibility of registryprocessors.

Hi -

As a technical contributor...

> From: "Mark Davis" <mark.davis <at> icu-project.org>
> To: "Randy Presuhn" <randy_presuhn <at> mindspring.com>
> Cc: "LTRU Working Group" <ltru <at> ietf.org>
> Sent: Sunday, July 01, 2007 4:55 PM
> Subject: Re: [Ltru] Suggested text for future compatibility of registryprocessors.
>
> Your changes sound fine by me. I do have a question. I thought MAY was only
> capitalized when it was in reference to conformance to this document. But in
> that statement it doesn't seem to apply: it doesn't appear that we in this
> document can make any conformance promises (positive or negative) about
> future versions since that is entirely in the hands of the IETF.
...

I was looking at it from a protocol specification perspective.
When defining an extensible protocol, using SHOULD, MUST, etc.
makes sense when specifying the nature of permissible extensions.

If we wanted to be really pedantic, for example, we could spell out that
future extensions MUST NOT violate the syntax established for "record"
in the ABNF.  (If we were indeed going to make a commitment to forward
compatibility in the registry format.  At this point, we haven't constrained
ourselves in this way, and please don't read this example as an argument
that we should constrain ourselves in this way.  Indeed, I think we really
need to discourage implementations (other than very specialized developer
tools) from reading the registry itself, in much the same way as network
management applications don't dynamically query the IANA enterprise number
assignments.)

Randy

Mark Davis | 2 Jul 02:32
Favicon

Re: Resolving issues



On 6/30/07, Martin Duerst <duerst <at> it.aoyama.ac.jp> wrote:
with chair hat on:

At 07:06 07/06/30, Mark Davis wrote:
>Addison and I have been looking over some of the remaining issues, and have worked out some suggested language to resolve some open issues.
>
><http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-06.html> http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-06.html<http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-06.html>

You talk about issue*s* here, but below, I can proposed language for
only one issue. Are we supposed to look at the above link for the
other issues, or will language for other issues be proposed in
forthcomming email, or what?

There will be some more with separate email titles as we work through them.

Also, it would be good to give a more specific pointer, i.e.
http://www.inter-locale.com/ID/draft-ietf-ltru-4646bis-06.html#subtagreviewer

Good suggestion.

>The IESG will solicit nominees for the position (initially or upon a vacancy) and seek to ascertain the candidates' qualifications.
>
>=>
>
>The IESG will solicit nominees for the position (upon adoption of this document or upon a vacancy) and then solicit feedback on the nominees' qualifications.
>
>Qualified candidates should be familiar with BCP 47 and its requirements; be willing to fairly, responsively, and judiciously administer the registration process; and be suitably informed about the issues of language identification so that they can draw upon and assess the claim and contributions of language experts and subtag requesters.

as a technical (or, in this case, procedural) contributor:

I think one big issue that isn't dealt with here is that the appointment
is for an infinite term. I think it would be much better to have a
limited term (e.g. two years), with a possibility for renewal.
This gives a good chance for reviewing from both sides (both the
reviewer as well as the IESG).

That sounds reasonable to me. Let's see what others think

Regards,    Martin.





#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst <at> it.aoyama.ac.jp




--
Mark
Doug Ewell | 2 Jul 04:19

Re: Solving the UTF-8 problem

I noticed that the Web-based LTRU mail archives bungled my UTF-8 
mailing --which once again is an e-mail problem and not a UTF-8 problem per 
se -- so I'm re-posting a fragment, this time in ISO 8859-1.

> What are we going to do when the ISO 639-3 code list is finalized and we 
> have to deal with adding the following pairs of languages, whose names 
> differ only by diacritical marks?
>
> aru  Arua
> arx  Aruá
>
> bfa  Bari
> mot  Barí
>
> kgm  Karipúna
> kuq  Karipuná
>
> sbe  Saliba
> slc  Sáliba
>
> wbf  Wara
> tci  Wára

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages


Gmane