WGLC: remove new region subtags from 4646bis/4645bis (long)
Doug Ewell <dewell <at> roadrunner.com>
2008-02-10 06:39:08 GMT
This is a technical comment on both draft-4646bis and, by implication,
draft-4645bis.
Summary:
I propose that the new region subtags based on ISO 3166-1 "exceptionally
reserved" code elements, and the wording that authorizes them, be
removed from both drafts.
Discussion:
This topic was discussed at significant length during November 2007.
Most of what I am writing here is a summary of what has gone before,
repackaged for Last Call. An earlier argument, which itself contains
links to still earlier arguments, is at
http://www.ietf.org/mail-archive/web/ltru/current/msg08851.html .
Historically, RFC 4646 and its predecessors (including common practice
in the days before RFC 1766) limited two-letter region subtags to those
corresponding to code elements formally assigned in ISO 3166-1. The
primary reason for adding the exceptionally reserved ("ER") code
elements was apparently to allow 'EU' to be used with the meaning
"European Union," not for language tagging per se, but in CLDR and other
works that are not related to language tagging. The other ER code
elements, such as 'DG' for "Diego Garcia" and 'EA' for "Ceuta [and]
Melilla," were ancillary to this objective; they were added in
preference to cherry-picking 'EU' out of the list of ER codes (a
preference I certainly shared).
In justifying this change, Mark Davis wrote, "BCP 47 is not only the
source of language tags; it is *the* reference for stabilized,
maintainable codes for languages, scripts, and regions." While I
appreciate the desire to have a work such as CLDR derive its code list
from a stable reference, this does not require the BCP 47 code list to
include all entities that might be useful to a derivative work.
Instead, CLDR could be defined as a profile of BCP 47, and add code
elements beyond those provided in BCP 47 as necessary for its own needs.
There are other region subtags already in the Language Subtag Registry
that are known, or highly suspected, not to be useful for identification
of language variations. Some of the better-known examples are 'AQ' for
"Antarctica" and 'BV' for "Bouvet Island." However, those subtags were
included in the Registry in strict accordance with the philosophy of
incorporating all of the *formally assigned* ISO 3166-1 code elements,
without cherry-picking them for perceived relevance.
For RFC 4646, the LTRU WG added the UN M.49 numeric code elements
corresponding to supra-national regions, to provide additional region
subtags for demonstrated language tagging needs. The poster child for
this particular need was '419' for "Latin American and the Caribbean,"
useful for Spanish as used in certain Spanish-speaking areas of the New
World but not in Spain.
But the WG explicitly excluded UN codes categorized under "Selected
economic and other groupings," because those codes represent groups of
geographically and linguistically unrelated nations and have no
pertinence to language tagging. An example is '432' for "Landlocked
developing countries," a grouping that includes Bolivia, Chad, Laos, and
Moldova. The WG correctly decided to consider *relevance to language
tagging* in deciding which categories of UN code to include and which to
exclude.
Making an explicit change to RFC 4646bis to add the ISO 3166-1 ER code
elements presents the impression that they must fulfill a particular
language tagging need, just as the act of adding the UN codes fulfilled
such a need. However, no such need has been demonstrated *for language
tagging*, only for locale designation and ease of cross-checking between
standards. John Cowan mentioned that "English as spoken on Tristan da
Cunha" is a recognized variety of English, but there is no actual
evidence of a need to tag this variety.
Claims of a requirement to tag "European Union English" as distinct from
"English as spoken in {Northern, Southern, Western, Eastern} Europe"
(all of which could be implemented with numeric UN-based region subtags)
have been based on the notion of an EU "bureaucratese" which has not
been shown to exist, at least not in any way beyond ordinary
bureaucratese, and which in any case would not be truly region-based and
thus not appropriate for a region subtag. A variant subtag is the
normal mechanism for tagging specialized jargon.
In short, to make a change like this in BCP 47, there should be a good
reason why the change is necessary and the status quo is inadequate for
language tagging. No good reason has been demonstrated IMHO, although
several reasons ranging from marginal to bogus have appeared.
None of my objection to these subtags has anything to do with the
additional overhead of adding seven new region subtags to the RFC
4645bis Registry, which already contains 289 region subtags and 8,304
tags and subtags overall. Also, none of my objection constitutes a
judgment of the political or institutional legitimacy of the European
Union; this is not the purpose of the Registry or of the LTRU effort.
My specific proposal is that all wording in draft-4646bis and
draft-4645bis related to the addition of these region subtags be
removed, including the actual subtags in the 4645bis Registry. I can
itemize the specific passages to be changed if desired.
I do not intend to oppose the drafts or file an appeal if this comment
is rejected, but I do ask that the comment be duly considered by the WG
and ruled upon by the co-chairs.
--
Doug Ewell * Fullerton, California, USA * RFC 4645 * UTN #14
Editor, draft-ietf-ltru-4645bis
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
http://www.ietf.org/mailman/listinfo/ltru