Peter Constable | 1 Dec 05:17
Picon
Favicon

Re: Support of ISO 639 (was: Survey Tool pre-alpha)

From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf Of Doug Ewell

>>> The use of collection codes in language tags is dubious, like saying
>>> "it's language group so-and-so, but information about individual
>>> language is not available".
>>
>> I'm of the same general opinion.
>
> So am I, when we are talking about the "traditional" uses of language
> tags, viz. tagging Web pages and e-mails, or specifying desired matches
> in Web search engines.
>
> But there may be other applications of language tags...

Which is why I included the "general" qualifier in my comment. I didn't mean to re-open this issue: we
debated this a long time ago, and the consensus taken was to continue to include collective language
categories in the registry. There's no reason to re-visit that.

Peter
Peter Constable | 1 Dec 05:48
Picon
Favicon

Re: [CLDR] Re: Support of ISO 639 (was: Survey Tool pre-alpha)

> Anyway, the fact that families ARE encoded in addition with languages,
> and the fact that families are hierarchized as well, creates a hole
> that must be filled between families and languages...

If you mean that there is action needed on the part of the ISO 639 RAs or JAC, then that is out of scope for this WG
and should be taken up elsewhere (with those bodies).

If you mean that something needs to be changed in the language-subtag registry, or in 4646bis, then I don't
see any such need: ISO 639-3 provides very comprehensive coverage of languages, and there is not a lot of
likelihood that users would need to tag content for some language not covered by 639-3 but potentially
(perhaps not clearly) covered by one or more collective categories in 639-5. Also, the use case for
collective categories in language tagging is, I think, not that great.

> it is ESSENTIAL that the labels displayed when selecting any collective
> code from a list containing ISO 639 codes of various scopes MUST reflect
> the fact that this is effectively a collection of distinct languages

If you mean that it must be clear when a subtag in the Language Subtag Registry represents a collection, then
I completely agree.

> That's exactly the reverse decision that CLDR made...

That is an issue for the CLDR TC to consider and is out of scope for the LTRU WG.

Btw, can we please *not* cross-post items between LTRU and CLDR Users. (I've moved CLDR Users to bcc to that
end.) CLDR Users is an informal discussion list for users of CLDR, while LTRU is a technical working group:
IMO there cannot possibly be a discussion that would be appropriate for both lists.

> but ISO 639 is always wrong about these letters when it uses
> ASCII punctuation...
(Continue reading)

Doug Ewell | 1 Dec 06:08
Favicon

Re: Support of ISO 639 (was: Survey Tool pre-alpha)

Peter Constable <petercon at microsoft dot com> wrote:

>> it is ESSENTIAL that the labels displayed when selecting any 
>> collective code from a list containing ISO 639 codes of various 
>> scopes MUST reflect the fact that this is effectively a collection of 
>> distinct languages
>
> If you mean that it must be clear when a subtag in the Language Subtag 
> Registry represents a collection, then I completely agree.

Draft-4645bis does this, as described in sections 2.2 and 2.3, and as 
seen in the included Registry:

%%
Type: language
Subtag: bh
Description: Bihari
Added: 2005-10-16
Scope: collection
%%

This does not mean that tags containing 'bh' or any other collection 
subtag would themselves carry any indication that the language subtag 
represents a collection, nor that any UI used to create language tags 
would necessarily identify collection subtags.  Someone could write a UI 
to do this, and it might be a nice feature, but we do not and cannot 
require this.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
(Continue reading)

Mark Davis | 1 Dec 07:10

Re: [CLDR] Re: Support of ISO 639 (was: Survey Tool pre-alpha)

I agree with Peter, on all counts. 


In particular, it muddies the waters when you cross-post, because it becomes completely unclear what you are asking for, why, and from whom.

Mark


On Sun, Nov 30, 2008 at 20:48, Peter Constable <petercon <at> microsoft.com> wrote:
> Anyway, the fact that families ARE encoded in addition with languages,
> and the fact that families are hierarchized as well, creates a hole
> that must be filled between families and languages...

If you mean that there is action needed on the part of the ISO 639 RAs or JAC, then that is out of scope for this WG and should be taken up elsewhere (with those bodies).

If you mean that something needs to be changed in the language-subtag registry, or in 4646bis, then I don't see any such need: ISO 639-3 provides very comprehensive coverage of languages, and there is not a lot of likelihood that users would need to tag content for some language not covered by 639-3 but potentially (perhaps not clearly) covered by one or more collective categories in 639-5. Also, the use case for collective categories in language tagging is, I think, not that great.


> it is ESSENTIAL that the labels displayed when selecting any collective
> code from a list containing ISO 639 codes of various scopes MUST reflect
> the fact that this is effectively a collection of distinct languages

If you mean that it must be clear when a subtag in the Language Subtag Registry represents a collection, then I completely agree.


> That's exactly the reverse decision that CLDR made...

That is an issue for the CLDR TC to consider and is out of scope for the LTRU WG.

Btw, can we please *not* cross-post items between LTRU and CLDR Users. (I've moved CLDR Users to bcc to that end.) CLDR Users is an informal discussion list for users of CLDR, while LTRU is a technical working group: IMO there cannot possibly be a discussion that would be appropriate for both lists.


> but ISO 639 is always wrong about these letters when it uses
> ASCII punctuation...

Please address concerns regarding ISO 639 to the relevant RA.


Peter


-----Original Message-----
From: verdy_p [mailto:verdy_p <at> wanadoo.fr]
Sent: Sunday, November 30, 2008 3:22 AM
To: Peter Constable; LTRU list; CLDR Users
Subject: RE: [Ltru] [CLDR] Re: Support of ISO 639 (was: Survey Tool pre-alpha)

"Peter Constable" <petercon <at> microsoft.com>
> The need for a language hierarchy (by families) is to simplify the search...
>
> An informal suggestion: while Ethnologue is not formally part of ISO 639, it is maintained so as to stay
consistent with ISO 639, and ISO 639-3 makes use of Ethnologue as a source to clarify the denotation of its encoded
categories. Since the Ethnologue site provides a comprehensive language-family classification, one could search on
the Ethnologue site to find particular languages, and then follow the links provided to get to the corresponding
ISO 639-3 entry.

That's exactly the kind of reason why we need such classification ALSO in other languages than English. But without
a reliable codification of families, of their hierarchy (at least a minimal classification in the most important
groups, possibly excluding finely tuned intermediate subdivisions), and more importantly of the membership of
isolated languages and macrolanguages that are direct children of those families, building such hierarchy and
making it usable is illusory.

Anyway, the fact that families ARE encoded in addition with languages, and the fact that families are hierarchized
as well, creates a hole that must be filled between families and languages (this will close the mess that was
introduced in ISO 639-1/2 when exclusive (and unstable) family names were given (with various and non interoperable
results about which languages get included or not in a search of results by family names).

Believe it, searching for terms within a complete language family rather than precise language name or even just
macrolanguage, is not an unbelievable situation. Linguists are performing such things very often, notably when
looking for etymologia; translators are also looking for translated terms that were chosen in other related
languages; terminologists and advertizers or "brand builders" want to look for terms in families to check if a new
chosen term for a given language may be misinterpreted by less qualified translators or readers of another
language.

Yes it's true that encoded texts should never be tagged and indexed directly by a family language code. But family
codes are as essential as language codes for full-text searches.

In addition, it is ESSENTIAL that the labels displayed when selecting any collective code from a list containing
ISO 639 codes of various scopes MUST reflect the fact that this is effectively a collection of distinct languages
(so, no more label that just displays "Apache" or "Bihari").

That's exactly the reverse decision that CLDR made, and I do think that this is an error (on the opposite, I
support the decision of dropping the "(Other)" word). If a short name is needed (without any plural mark and
without the "languages" word that generally comes with the language adjective), it should be encoded as a separate
variant in CLDR: this short name should be used only when displaying filtered lists that contain only collections.

Note that isolated short language names are generally nouns, but if they are used as a complement to an expression
containing "language(s)", then they are adjectives and may be written differently (sometimes not even with the same
words despite that, in general, the adjectives are simple derivation still needing some changes for marking the
plural, feminine or genitive cases, depending on the language used to name the referenced language).

Note also that some English names/descriptions used by ISO 639 and in the RFC 4645bis draft or in the IANA database
for BCP 47 may contain some non capitalizable letters, but ISO 639 is always wrong about these letters when it uses
ASCII punctuation like "!" and math symbols like "/", "//" or "=/" or ASCII apostrophe instead of true Latin clicks
or dropping the apostrophe letters in a way that makes the language name ambiguous or unreadable; note also that
The Ethnologue lists, for some of them but not all of them, some synonymes using capitalizable letters only):

The ISO 639 documents say that they are themselves normally encoded with UTF-8 (possible using numeric character
entities for the plain-text version), meaning that these documents should support Unicode characters and should not
use any ASCII substitutes... This is also true for the HTML version displayed online on the ISO 639/RA sites
(including on SIL.org), and the language names that were finally used in the English locale of the CLDR!



_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
https://www.ietf.org/mailman/listinfo/ltru

CE Whitehead | 1 Dec 19:43
Picon

Re: 13th-hour WG Last Call comments‏

.hmmessage P {margin:0px;padding:0px;} body.hmmessage {font-size:10pt;font-family:Verdana;}
Hi!

Regarding the draft  http://tools.ietf.org/html/draft-ietf-ltru-4646bis-18 notes‏  Section 2 Comments (I sent this as a whole; it is mostly editorial; I can however break it up into two or maybe three emails, one containing strictly editorial comments/corrections; and one containing requests for claridications.

I've just put in all my comments for Section 2 below; I had none on Section 1; I will have a few on Section 4.6; not sure about comments on other sections as I've not had time to go over those carefully.

Sorry for repeat of comments on section 2; I forgot about the private-use subtags 'Qaaa' - 'Qabx' that are reserved for script codes--in my editing of Section 2.2; so I fixed that & decided it best to send the whole corrections as a unit?? a

Whether or not you take my edits, I do think you need to make one change:


2.2.6; Page 17; par 1; item 9; Example




{COMMENT: please correct the spelling of "extention.".}

> For example, if an extention were defined for the singleton 'r' and
it defined the subtags shown, then the following tag would be a valid
example: "en-Latn-GB-boont-r-extended-sequence-x-private"
Thanks.

C. E. Whitehead
cewcathar <at> hotmail.com


* * *
2.1; Page 4; par 2.
There are different types of subtag, each of which is distinguished
by length, position in the tag, and content: each subtag's type can
be recognized solely by these features. This makes it possible to
extract and assign some semantic information to the subtags, even if
the specific subtag values are not recog nized. Thus, a language tag
processor need not have a list of valid tags or subtags (that is, a
copy of some version of the IANA Language Subtag Registry) in order
to perform common searching and matching operations. The only
exceptions to this ability to infer meaning from subtag structure are
the grandfathered tags listed in the productions 'regular' and
'irregular' below. These tags were registered under [RFC3066] and
are a fixed list that can never change. {COMMENTS:
1. The above paragraph is a bit awkward, hard to read}
> There are several/different types of subtags. A subtag's length and position in the language tag, as well as its content, serves to distinguish its subtag type. Thus it is possible to extract f rom and assign to the subtag some semantic information, even when the subtag's specific value is not recognized. Because of this, a language tag processor need not have a list of all valid tags or subtags (that is, a copy of the IANA Language Subtag Registry), in order to perform common searching and matching operations on the tag. Exceptions are the 'regular' and 'irregular' grandfathered tags, listed below. Semantic information cannot be extracted from the individual subtags that compose the grandfathered language tags because these tags are only interpretable only as wholes. The order of and length of subtags that make up the 'irregular' grandfathered tags may in fact vary from the rules governing order and length of subtags specified in this document (see below). It is therefore possible, in an irregular grandfathered language tag, to have a variant subtag composed of fewer than four characters or a subtag denoting a region preceding one denoting a language, as in, f or example, "sgn-BE-fr" (which denotes Belgian French sign language, the French language as used in Belgium). The grandfathered tags were registered under [RFC3066] and are a fixed list that can never change.
  * * *
* * *2.1; page 4; par. 3
The syntax of the language tag in ABNF [RFC5234] is:
{COMMENTS:
1. It is confusing to have the whole list that follows with no real introduction; what about a short definition of the syntax first?} > The language tag (referred to below as langtag) in ABNF [FRC5234] is composed of one (1) language (or macrolanguage?) subtag, followed by--in order--zero or one extended language subtag, zero or one script subtag, zero or one region subtag, sero or more variant subtags, and zero or more singletons (single character prefixes) for extension language subtags, with each singleton followed by one or more extension language subtags.The language tag may be interrupted anywhere in the tag (including initial position) by private use subtags. These are introduced by the singleton x. The singleton x indicates that all content that follows is governed by private agreement only, and that its meaning and arrangement is not covered in this document (see section 2.2.7 and 4.6 below for more information on private use subtags).

Rules governing the length of the various types of subtags are specified below and in Section 2.2.
 
As noted above, grandfathered language tags are not governed by all of the above rules of syntax (for more on grandfathered tags, see 2.2.8, below).

* * ** * *
2.1, pages 4-6
{COMMENTS:

1. My main comment on the section below is that I find it very distracting and bizarre to use the asterisk [*] to indicate a variable number of letters and numbers, for example 5*8 indicates 5 to 8 letters in a variant subtag beginning with a letter;I would prefer some other symbol-- 5 - 8; or 5-8 or 5to8?? Thanks.} Language-Tag = langtag ; normal language tags
/ privateuse ; private use tag
/ grandfathered ; grandfathered tags

langtag = language
["-" script]
["-" region]
*("-" variant)
*("-" extension)
["-" privateuse]

language = 2*3ALPHA ; shortest ISO 639 code
["-" extlang] ; sometimes followed by
; extended language subtags
/ 4ALPHA ; or reserved for future use
/ 5*8ALPHA ; or registered language subtag




Phillips & Davis Expires May 4, 2009 [Page 5]

Internet-Draft language-tags October 2008


extlang = 3ALPHA ; selected ISO 639 codes
*2("-" 3ALPHA) ; permanently reserved

script = 4ALPHA ; ISO 15924 code

region = 2ALPHA ; ISO 3166-1 code
/ 3DIGIT ; UN M.49 code

variant = 5*8alphanum ; registered variants
/ (DIGIT 3alphanum)

extension = singleton 1*("-" (2*8alphanum))

; Single alphanumerics
; "x" reserved for private use
singleton = %x41-57 ; a - w
/ %x59-5A ; y - z
/ %x61-77 ; A - W
/ %x79-7A ; Y - Z
/ DIGIT ; 0 - 9


privateuse = "x" 1*("-" (1*8alphanum))

grandfathered = irregular ; non-redundant tags registered
/ regular ; during the RFC 3066 era


irregular = "en-GB-oed" ; irregular tags do not match
/ "i-ami" ; the 'langtag' production and
/ "i-bnn" ; would not otherwise be
/ "i-default" ; considered 'well-formed'
/ "i-enochian" ; These tags are all valid,
/ "i-hak" ; but most are deprecated
/ "i-klingon" ; in favor of more modern
/ "i-lux" ; subtags or subtag
/ "i-mingo" ; combination
/ "i-navajo"
/ "i-pwn"
/ "i-tao"
/ "i-tay"
/ "i-tsu"
/ "sgn-BE-FR"
/ "sgn-BE-NL"
/ "sgn-CH-DE"






Phillips & Davis Expires May 4, 2009 [Page 6]
ont>
Internet-Draft language-tags October 2008


regular = "art-lojban" ; these tags match the 'langtag'
/ "cel-gaulish" ; production, but their subtags
/ "no-bok" ; are not extended language
/ "no-nyn" ; or variant subtags: their meaning
/ "zh-guoyu" ; is defined by their registration
/ "zh-hakka" ; and all of these are deprecated
/ "zh-min" ; in favor of a more modern
/ "zh-min-nan" ; subtag or sequence of subtags
/ "zh-xiang"

alphanum = (ALPHA / DIGIT) ; letters and numbers

Figure 1: Language Tag ABNF
* * *
* * *
2.1, p. 6, par 5
The re is a subtlety in the ABNF
production 'variant':

{COMMENTS:
1. ??? Do you need this comment about 'subtlety'?; the information that follows this--see below--is sufficient!}> {NULL, nothing here needed.}
* * *
* * *
a variant starting with a digit has a minimum
length of four characters, while those starting with a letter have a
minimum length of five characters.

{COMMENT:
1. If you delete the sentence above, beginning, "There is a subtlety . . . ," then make the 'a' upper case.}

> A variant starting with a digit . . .


* * *

* * *



2.2. Language Subtag Sources and Interpretation


p. 8, par 3

Language tags are designed so that each subtag type has unique length
and content restrictions. These make identification of the subtag's
type possible, even if the content of the subtag itself is
unrecognized. This allows tags to be parsed and processed without
reference to the latest version of the underlying standards or the
IANA registry and makes the associated exception handling when
parsing tags simpler.

{COMMENTS:

1. last sentence is awkward and wordy; change.}

> This allows tags to be parsed and processed without reference to either the latest version of the underlying standards or the IANA registry, and simplifies exception handling associated with the parsing of language tags.

* * *


* * *
2.2 p. 8, par 5

{CLARIFICATION??
NEED CLARIFICATION of the first sentence in the paragraph below:

What about the reserved sequences for private language subtags ranging from ' qaa' to 'qtz,' and the reserved sequences for private region subtags including 'AA,' 'QM'-'QZ,' 'XA-XZ,' and 'ZZ?'
These do not follow the singleton x and so occur in their normal position in the language tag, right?
}

Sequences of private use and extension subtags MUST occur at the end
of the sequence of subtags and MUST NOT be interspersed with subtags
defined elsewhere in this document. These sequences are introduced
by single-character subtags, which are reserved as follows:

> Sequences of private use (excepting those sequences reserved for language subtags--ranging from 'qaa' to 'qtz', those reserved for script subtags--ranging from 'Qaaa' to 'Qabx', and those reserved for region subtags--including 'AA', 'QM'-'QZ', 'XA'-XZ', and 'ZZ') and extension subtags
MUST occur at the end of the sequence of subtags and MUST NOT be interspersed with subtags defined elsewhere in this document.

* * *



* * *


2.2.1. Primary Language Subtag, p. 10, par 1, item 5




5. Any language subtags of 5 to 8 characters in length in the IANA
registry were defined via the registration process in Section 3.5
and MAY be used to form the primary language subtag. An example
of what such a registration might include: one of the
grandfathered IANA registrations is "i-enochian". The subtag
'enochian' could be registered in the IANA registry as a primary
language subtag (assuming that ISO 639 does not register this
language first), making tags such as "enochian-AQ" and "enochian-
Latn" valid.

At the time this document was created, t here were no examples of
this kind of subtag and future registrations of this type are
discouraged: primary languages are strongly RECOMMENDED for
registration with ISO 639, and proposals rejected by ISO 639/
RA-JAC will be closely scrutinized before they are registered
with IANA.

{ COMMENT:
1. A transition--such as "However," "The above notwithstanding," etc, is needed to go the second paragraph in item 5 above.}

> However, at
he time this document was created, there were no examples of
this kind of subtag and future registrations of this type are
discouraged: primary languages are strongly RECOMMENDED for
registration with ISO 639, and proposals rejected by ISO 639/
RA-JAC will be closely scrutinized before they are registered
with IANA.

* * *


* * *
2.2.1, p. 11, par. 6



{COMMENT/CLARIFICATION?:

What is the reference for "these" in the first sentence of the paragraph below?
I do not think "these" is needed; that is I do not think you need to refer back to a prior mention of these problems.
}
To avoid these problems with versioning and subtag choice (as
experienced during the transition between RFC 1766 and RFC 3066), as
well as to ensure the canonical nature of subtags defined by this
document, the ISO 639 Registration Authority Joint Advisory Committee
(ISO 639/RA-JAC) has included the following statement in
[iso639.prin]:

> To avoid problems with versioning and subtag choice (
(as
experienced during the transition between RFC 1766 and RFC 3066), as
well as to ensure the canonical nature of subtags defined by this
document, the ISO 639 Registration Authority Joint Advisory Committee
(ISO 639/RA-JAC) has included the following statement in
[iso639.prin]:

* * *
"A language code already in ISO 639-2 at the point of freezing ISO
639-1 shall not later be added to ISO 639-1. This is to ensure
consistency in usage over time, since users are directed in
Internet applications to employ the alpha-3 code when an alpha-2
code for that language is not available." * * *

2.2.2. Extended Language Subtags; p. 11, par 1 {COMMENT: I think the fourth sentence of this paragraph should say "REQUIRED" not "RECOMMENDED;" this is not the sameas capitalization rules, which are simply recommendations; I assume that a language tag without a primary language subtag will not be parsed properly.}Extended language subtags are used to identify certain specially-
selected languages that, for various historical and compatibility
reasons, are closely identified with or tagged using an existing
primary language subtag. Extended language subtags are always used
with their enclosing primary language subtag (indicated with a
'Prefix' field in the registry) when used to form the language tag.
All languages that have an extended language subtag in the registry
also have an identical primary language subtag record in the
registry. This primary language subtag is RECOMMENDED for forming
the language tag. The following rules apply to the extended language
subtags:
> This primary language subtag is REQUIRED.  {END OF SENTENCE.} * * ** * *2.2.2. p. 11; par 1; item 4.{ CLARIFICATIO?? Item 4 is confusing to me' does this mean that while 3 extended language subtags are permitted in theory, only one is permitted in fact?Or does this mean that three extended language subtags are permitted in fact when they have the same language subtag for a prefix & that they must all share that one prefix.Does this ever happen? It would be nice to clarify this in the draft.}

Although the ABNF production 'extlang' permits up to three
extended language tags in the language tag, extended language
subtags MUST NOT include another extended language subtag in
their Prefix. That is, the second and third extended language
subtag positions in a language tag are permanently reserved and
tags that include subtags in that position are inv alid.* * *
* * *2.2.2; page 12; par 2.


For example, the macrolanguage Chinese ('zh') encompasses a number of
languages. For compatibility reasons, each of these languages has
both a primary and extended language subtag in the registry. A few
selected examples of these include Gan Chinese ('gan'), Cantonese
Chinese ('yue') and Mandarin Chinese ('cmn'). Each is encompassed by
the macrolanguage 'zh' (Chinese). Therefore, they each have the
prefix "zh" in their registry records. Thus Gan Chinese is
represented with tags beginning "zh-gan" or "gan"; Cantonese with
tags beginning either "yue" or "zh-yue"; and Mandarin Chinese with
"zh-cmn" or "cm n". The language subtag 'zh' can still be used
without an extended language subtag to label a resource as some
unspecified variety of Chinese, while the primary language subtag
('gan', 'yue', 'cmn') is preferred to using the extended language
form ("zh-gan", "zh-yue", "zh-cmn"). {CLARIFICATION??: are there ever instances where zh-cmn might be preferable? For example, with written Chinese, targeting primarily a Mandarin-speaking audience?}* * *

* * *
2.2.3. Script Subtag; page 12; par 1; item 4.


4. There MUST be at most one script subtag in a language tag, and
the script subtag SHOULD be omitted when it adds no
distinguishing value to the tag or when the primary or extended
language subtag's record in the subt ag registry includes a
'Suppress-Script' field listing the applicable script subtag.

For example: "sr-Latn" represents Serbian written using the Latin
script. { COMMENT: I want to add a second example here that would not be appropriate, so it will be clearer to readers--might even prefer to replace the first example. Or is the ex. getting too detailed?}INSERT > The tag, "uk-Latn" similarly represents Ukrainian written in the Latin script. The tag, "uk", on the other hand, indicates Ukrainian written in Cyrillic script. On the other hand, the tag, "uk-Cyrl", is not a well-formed tag as a suppress-script of Cyrillic is registered for Ukrainian, and so the script subtag should not be used; the tag "uk" is all that is needed in this case. * * * * * *



n>2.2.4. Region Subtag; page 12; par 1; item 2




2. Two-letter region subtags were defined according to the
assignments found in [ISO3166-1] ("Codes for the representation
of names of countries and their subdivisions -- Part 1: Country
codes") using the list of alpha-2 country codes, or using
assignments subsequently made by the ISO 3166-1 maintenance
agency or governing standardization bodies. In addition, the
codes that are "exceptionally reserved" (as opposed to
"assigned") in ISO 3166-1 were also defined in the registry, with
the exception of ' UK', which is an exact synonym for the assigned
code 'GB'.

{COMMENT: First, elsewhere you use the present tense with "defined;" these subtags are still so defined, right? If so I think the present tense is best, as you use it elsewhere.Second, I inserted "in the registry" after defined; it seems more explicit;Third, in the clause that begins "or using assignments," "using" appears to be in parallel to the word "using" in the phrase, "using the list of alpha-2 . . ."--because of the repetition of "using;" at least it sounds this way to my ear; but it's actually parallel grammatically and semantically to the clause that begins "according to"--so I changed "using" to "according to" to emphasize this parallelism}

> 2. Two-letter region subtags are defined in the registry according to assignments found in [ISO3166-1] ("Codes for the representation
of names of countries and their subdivisions -- Part 1: Country
codes") using the list of alpha-2 country codes, or according to
assignments subsequently made by the ISO 3166-1 maintenance
agency or governing standardization bodies.
* * *


* * *
2.2.4; page 13; par 1; item 4C C. When ISO 3166-1 reassigns a code formerly used for one
country or area to another country or area and that code
already is present in the registry, the UN numeric code for
that country or area MUST be registered in the registry as
described in Section 3.4 and MUST be used to form language
tags that represent the country or region for which it is
defined (rather than the recycled ISO 3166-1 code).{COMMENT: the wording is a bit confusing; insert "to which the code has been reassigned;" change the order a bit so that "rather than the recycled ISO 3166-1 code" is next to
the info. on the U.N. numeric code. } > When ISO 3166-1 reassigns a code formerly used for one
country or area to another country or area and that code
already is present in the registry, the UN numeric code for
the country or area to which the code has been reassigned--and not the recycled ISO 3166-1 code--MUST be registered in the registry as
described in Section 3.4 and MUST be used to form language
tags that represent the country or region for which it is
defined.* * *

* * *


2.2.5. Variant Subtags; p. 15; par 1,; item 1

Variant subtags are used to indicate additional, well-recognized
variations that define a language or its dialects that are not
covered by other available subtags. The following rules apply to the
variant subtags:

1. Variant subtags MUST follow any primary language, extended
language, script, or region subtags, and MUST precede any
extension or private use subtag sequences
{COMMENTS: again, I think you only mean those private-use subtags preceded by the singleton x; you might insert at end the following}
INSERT > (agai n, private use subtags does not include those reserved codes 'qaa'- . . .).

* * *

* * *
2.2.6; Page 17; par 1; item 9; Example





9. In the event that more than one extension appears in a single
tag, the tag SHOULD be canonicalized as described in Section 4.5.

For example, if an extention were defined for the singleton 'r' and
it defined the subtags shown, then the following tag would be a valid
example: "en-Latn-GB-boont-r-extended-sequence-x-private"

{COMMENT: please correct the spelling of "extention." This you have to do I think.}

>
For example, if an extention were defined for the singleton 'r' and
it defined the subtags shown, then the following tag would be a val id
example: "en-Latn-GB-boont-r-extended-sequence-x-private"

* * *





* * *
2.2.9; page 20; par 4
Phillips & Davis Expires May 4, 2009 [Page 20]

Internet-Draft language-tags October 2008


reference [RFC3066]. A wider array of tags was considered 'well-
formed' under that document. Any tags that were valid for use under
RFC 3066 are both 'well-formed' and 'valid' under this document's
syntax; only invalid or illegal tags were well-formed by the early
definition but no longer are. The language tag syntax under RFC 3066
was:

obs-language-tag = primary-subtag *( "-" subtag )
primary-subtag = 1*8ALPHA
subtag = 1*8(ALPHA / DIGIT)

Figure 2: RFC 3066 Language Tag Syntax


{CLARIFICATION?: I'm still not clear as to what tags were well-formed but no longer are; I personally do not see the need of this comment; I guess though that you mean that these
tags used subtags that were not in the registry but were otherwise well-formed??? If that's what you mean it's not clear.}
* * *


Best,

C. E. Whitehead
cewcathar <at> hotmail.com
Internet-Drafts | 1 Dec 22:30
Picon
Favicon

I-D ACTION:draft-ietf-ltru-4645bis-08.txt

A New Internet-Draft is available from the on-line Internet-Drafts 
directories.
This draft is a work item of the Language Tag Registry Update Working Group of the IETF.

	Title		: Update to the Language Subtag Registry
	Author(s)	: D. Ewell
	Filename	: draft-ietf-ltru-4645bis-08.txt
	Pages		: 948
	Date		: 2008-12-1
	
This memo defines the procedure used to update the IANA Language
   Subtag Registry in conjunction with the publication of RFC 4646bis
   [RFC EDITOR NOTE: replace with actual RFC number], for use in forming
   tags for identifying languages.  As an Internet-Draft, it also
   contained a complete replacement of the contents of the Registry to
   be used by IANA in updating it.  To prevent confusion, this material
   was removed before publication.

A URL for this Internet-Draft is:
http://www.ietf.org/internet-drafts/draft-ietf-ltru-4645bis-08.txt

Internet-Drafts are also available by anonymous FTP at:
ftp://ftp.ietf.org/internet-drafts/

Below is the data which will enable a MIME compliant mail reader
implementation to automatically retrieve the ASCII version of the
Internet-Draft.
Attachment (draft-ietf-ltru-4645bis-08.txt): message/external-body, 69 bytes
Phillips, Addison | 3 Dec 06:15
Picon
Favicon

Re: 13th-hour WG Last Call comments‏

(editor hat)

 

I find the comments I can read to be editorial in nature or wrong altogether. Alas, much of the document below was rendered to whitespace by the mailer, so I can’t examine it.  I have examined Page 18 (which I believe you mean below, as page numbers are at the BOTTOM of each page, not the top). Not being able to see your comment, I interpolate that you mean that the item probably could be more direct and I have made it clearer.

 

Addison

 

Addison Phillips

Globalization Architect -- Lab126

 

Internationalization is not a feature.

It is an architecture.

 

From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf Of CE Whitehead
Sent: Monday, December 01, 2008 10:44 AM
To: ltru <at> ietf.org
Subject: Re: [Ltru] 13th-hour WG Last Call comments

 


Hi!

Regarding the draft  http://tools.ietf.org/html/draft-ietf-ltru-4646bis-18 notes
  Section 2 Comments (I sent this as a whole; it is mostly editorial; I can however break it up into two or maybe three emails, one containing strictly editorial comments/corrections; and one containing requests for claridications.

I've just put in all my comments for Section 2 below; I had none on Section 1; I will have a few on Section 4.6; not sure about comments on other sections as I've not had time to go over those carefully.

Sorry for repeat of comments on section 2; I forgot about the private-use subtags 'Qaaa' - 'Qabx' that are reserved for script codes--in my editing of Section 2.2; so I fixed that & decided it best to send the whole corrections as a unit?? a

Whether or not you take my edits, I do think you need to make one change:

2.2.6; Page 17; par 1; item 9; Example
    


 
{COMMENT:  please correct the spelling of "extention.".}

> For example, if an extention were defined for the singleton 'r' and
   it defined the subtags shown, then the following tag would be a valid
   example: "en-Latn-GB-boont-r-extended-sequence-x-private"


Thanks.

C. E. Whitehead
cewcathar <at> hotmail.com


* * *
2.1; Page 4; par 2.
   There are different types of subtag, each of which is distinguished
   by length, position in the tag, and content: each subtag's type can
   be recognized solely by these features.  This makes it possible to
   extract and assign some semantic information to the subtags, even if
   the specific subtag values are not recog

nized.  Thus, a language tag
   processor need not have a list of valid tags or subtags (that is, a
   copy of some version of the IANA Language Subtag Registry) in order
   to perform common searching and matching operations.  The only
   exceptions to this ability to infer meaning from subtag structure are
   the grandfathered tags listed in the productions 'regular' and
   'irregular' below.  These tags were registered under [RFC3066] and
   are a fixed list that can never change.

 

{COMMENTS:
1.  The above paragraph is a bit awkward, hard to read}

>  There are several/different types of subtags.  A subtag's length and position in the language tag, as well as its content, serves to distinguish its subtag type.  Thus it is possible to extract f

rom and assign to the subtag some semantic information, even when the subtag's specific value is not recognized. Because of this, a language tag processor need not have a list of all valid tags or subtags (that is, a copy of the IANA Language Subtag Registry), in order to perform common searching and matching operations on the tag.  Exceptions are the 'regular' and 'irregular' grandfathered tags, listed below.  Semantic information cannot be extracted from the individual subtags that compose the grandfathered language tags because these tags are only interpretable only as wholes.  The order of and length of subtags that make up the 'irregular' grandfathered tags may in fact vary from the rules governing order and length of subtags specified in this document (see below).  It is therefore possible, in an irregular grandfathered language tag, to have a variant subtag composed of fewer than four characters or a subtag denoting a region preceding one denoting a language, as in, f

or example, "sgn-BE-fr" (which denotes Belgian French sign language, the French language as used in Belgium).  The grandfathered tags were registered under [RFC3066] and are a fixed list that can never change.

 

    

* * *

* * *

2.1; page 4; par. 3
The syntax of the language tag in ABNF [RFC5234] is:
{COMMENTS:
1.  It is confusing to have the whole list that follows with no real introduction; what about a short definition of the syntax first?}

 

>  The language tag (referred to below as langtag) in ABNF [FRC5234] is composed of one (1) language (or macrolanguage?) subtag, followed by--in order--zero or one extended language subtag,

  zero or one script subtag, zero or one region subtag, sero or more variant subtags, and zero or more singletons (single character prefixes) for extension language subtags, with each singleton followed by one or more extension language subtags.

The language tag may be interrupted anywhere in the tag (including initial position) by private use subtags.  These are introduced by the singleton x.  The singleton x indicates that all content that follows is governed by private agreement only, and that its meaning and arrangement is not covered in this document (see section 2.2.7 and 4.6 below for more information on private use subtags).

Rules governing the length of the various types of subtags are specified below and in Section 2.2.
 
As noted above, grandfathered language tags are not governed by all of the above rules of syntax (for more on grandfathered tags, see 2.2.8, below).

* * *

* * *
2.1, pages 4-6

{COMMENTS:

1.  My main comment on the section below is that I find it very distracting and bizarre to use the asterisk [*] to indicate a variable number of letters and numbers, for example 5*8 indicates 5 to 8 letters in a variant subtag beginning with a letter;

I would prefer some other symbol-- 5 - 8; or 5-8 or 5to8??  Thanks.}

Language-Tag  = langtag             ; normal language tags
               / privateuse          ; private use tag
               / grandfathered       ; grandfathered tags

 langtag       = language
                 ["-" script]
                 ["-" region]
                 *("-" variant)
                 *("-" extension)
                 ["-" privateuse]

 language      = 2*3ALPHA            ; shortest ISO 639 code
                 ["-" extlang]       ; sometimes followed

  by
                                     ;   extended language subtags
               / 4ALPHA              ; or reserved for future use
               / 5*8ALPHA            ; or registered language subtag




Phillips & Davis           Expires May 4, 2009                  [Page 5]


Internet-Draft                language-tags                 October 2008


 extlang       = 3ALPHA              ; selected ISO 639 codes
                 *2("-" 3ALPHA)      ; permanently reserved

 script        = 4ALPHA              ; ISO 15924 code

 region        = 2ALPHA              ; ISO 3166-1 code
               / 3DIGIT           

   ; UN M.49 code

 variant       = 5*8alphanum         ; registered variants
               / (DIGIT 3alphanum)

 extension     = singleton 1*("-" (2*8alphanum))

                                     ; Single alphanumerics
                                     ; "x" reserved for private use
 singleton     = %x41-57             ; a - w
               / %x59-5A             ; y - z
               / %x61-77             ; A - W
               / %x79-7A             ; Y - Z
               / DIGIT               ; 0 - 9


 privateuse    = "x" 1*("-" (1*8alphanum))

 grandfathered = irregular           ; non-redundant tags registered
               / regular             ;   during the RFC 3066 era


 irregular     = "en-GB-oed"         ; irregular tags do not match
               / "i-ami"             ; the 'langtag' production and
               / "i-bnn"          

   ; would not otherwise be
               / "i-default"         ; considered 'well-formed'
               / "i-enochian"        ; These tags are all valid,
               / "i-hak"             ; but most are deprecated
               / "i-klingon"         ; in favor of more modern
               / "i-lux"             ; subtags or subtag
               / "i-mingo"           ; combination
               / "i-navajo"
               / "i-pwn"
               / "i-tao"
               / "i-tay"
               / "i-tsu"
               / "sgn-BE-FR"
               / "sgn-BE-NL"
               / "sgn-CH-DE"






Phillips & Davis           Expires May 4, 2009                  [Page 6]


Internet-Draft                language-tags                 October 2008


 regular       = "art-lojban"        ; these tags match the 'langtag'
               / "cel-gaulish"       ; production, but their subtags
               / "no-bok"            ; are not extended language
               / "no-nyn"            ; or variant subtags: their meaning
               / "zh-guoyu"          ; is defined by their registration
               / "zh-hakka"          ; and all of these are deprecated
               / "zh-min"            ; in favor of a more modern
               / "zh-min-nan"        ; subtag or sequence of subtags
               / "zh-xiang"

 alphanum      = (ALPHA / DIGIT)     ; letters and numbers

                        Figure 1: Language Tag ABNF
* * *

* * *
2.1, p. 6, par 5

The

re is a subtlety in the ABNF
   production 'variant':

{COMMENTS:
1.  ???  Do you need this comment about 'subtlety'?; the information that follows this--see below--is sufficient!}

> {NULL, nothing here needed.}
* * *
* * *

a variant starting with a digit has a minimum
   length of four characters, while those starting with a letter have a
   minimum length of five characters.

{COMMENT:
1.  If you delete the sentence above, beginning, "There is a subtlety . . . ," then make the 'a' upper case.}

> A variant starting with a digit . . .


* * *

* * *



2.2.  Language Subtag Sources and Interpretation

 
p. 8, par 3


    Language tags are designed so that each subtag type has unique length
   and content restrictions.  These make identification of the subtag's
   type possible, even if the content of the subtag itself is
   unrecognized.  This allows tags to be parsed and processed without
   reference to the latest version of the underlying standards or the
   IANA registry and makes the associated exception handling when
   parsing tags simpler.

{COMMENTS:

1.  last sentence is awkward and wordy; change.}

> This allows tags to be parsed and processed without reference to either the latest version of the underlying standards or the IANA registry, and simplifies exception handling associated with the parsing of language tags.

* * *
 

* * *
2.2 p. 8, par 5

{CLARIFICATION??
NEED CLARIFICATION of the first sentence in the paragraph below:

What about the reserved sequences for private language subtags ranging from '

qaa' to 'qtz,' and the reserved sequences for private region subtags including 'AA,' 'QM'-'QZ,' 'XA-XZ,' and 'ZZ?'
These do not follow the singleton x and so occur in their normal position in the language tag, right?
}
 
  Sequences of private use and extension subtags MUST occur at the end
   of the sequence of subtags and MUST NOT be interspersed with subtags
   defined elsewhere in this document.  These sequences are introduced
   by single-character subtags, which are reserved as follows:

> Sequences of private use (excepting those sequences reserved for language subtags--ranging from 'qaa' to 'qtz', those reserved for script subtags--ranging from 'Qaaa' to 'Qabx', and those reserved for region subtags--including 'AA', 'QM'-'QZ', 'XA'-XZ', and 'ZZ') and extension subtags
MUST occur at the end of the sequence of subtags and MUST NOT be interspersed with subtags defined elsewhere in this document.

* * *



* * *


 
2.2.1.  Primary Language Subtag, p. 10, par 1, item 5



   5.  Any language subtags of 5 to 8 characters in length in the IANA
       registry were defined via the registration process in Section 3.5
       and MAY be used to form the primary language subtag.  An example
       of what such a registration might include: one of the
       grandfathered IANA registrations is "i-enochian".  The subtag
       'enochian' could be registered in the IANA registry as a primary
       language subtag (assuming that ISO 639 does not register this
       language first), making tags such as "enochian-AQ" and "enochian-
       Latn" valid.

       At the time this document was created, t

here were no examples of
       this kind of subtag and future registrations of this type are
       discouraged: primary languages are strongly RECOMMENDED for
       registration with ISO 639, and proposals rejected by ISO 639/
       RA-JAC will be closely scrutinized before they are registered
       with IANA.

{ COMMENT:
1.  A transition--such as "However," "The above notwithstanding," etc, is needed to go the second paragraph in item 5 above.}

> However, at
he time this document was created, there were no examples of
       this kind of subtag and future registrations of this type are
       discouraged: primary languages are strongly RECOMMENDED for
       registration with ISO 639, and proposals rejected by ISO 639/
       RA-JAC will be closely scrutinized before they are registered
       with IANA.

* * *
 

* * *
2.2.1, p. 11, par. 6



{COMMENT/CLARIFICATION?:

What is the

 reference for "these" in the first sentence of the paragraph below?
I do not think "these" is needed; that is I do not think you need to refer back to a prior mention of these problems.
}
   To avoid these problems with versioning and subtag choice (as
   experienced during the transition between RFC 1766 and RFC 3066), as
   well as to ensure the canonical nature of subtags defined by this
   document, the ISO 639 Registration Authority Joint Advisory Committee
   (ISO 639/RA-JAC) has included the following statement in
   [iso639.prin]:

> To avoid problems with versioning and subtag choice (
(as
   experienced during the transition between RFC 1766 and RFC 3066), as
   well as to ensure the canonical nature of subtags defined by this
   document, the ISO 639 Registration Authority Joint Advisory Committee
   (ISO 639/RA-JAC) has included the following statement in
   [iso639.prin]:

* * *
      "A language code already in ISO 639-2 at the point of freezing ISO
      639-1 shall not later be added to ISO 639-1.  This is to ensure
      consistency in usage over time, since users are directed in
      Internet applications to employ the alpha-3 code when an alpha-2
      code for that language is not available."

 

* * *

2.2.2.  Extended Language Subtags; p. 11, par 1

 

{COMMENT:  I think the fourth sentence of this paragraph should say "REQUIRED" not "RECOMMENDED;" this is not the same

as capitalization rules, which are simply recommendations; I assume that a language tag without a primary language subtag will not be parsed properly.}

Extended language subtags are used to identify certain specially-
   selected languages that, for various historical and compatibility
   reasons, are closely identified with or tagged using an existing
   primary language subtag.  Extended language subtags are always used
   with their enclosing primary language subtag (indicated with a
 

  'Prefix' field in the registry) when used to form the language tag.
   All languages that have an extended language subtag in the registry
   also have an identical primary language subtag record in the
   registry.  This primary language subtag is RECOMMENDED for forming
   the language tag.  The following rules apply to the extended language
   subtags:

> This primary language subtag is REQUIRED.

 

{END OF SENTENCE.}

 

* * *

* * *

2.2.2. p. 11; par 1; item 4.

{ CLARIFICATIO??  Item 4 is confusing to me' does this mean that while 3 extended language subtags are permitted in theory, only one is permitted in fact?

Or does this mean that three extended language subtags are permitted in fact when they have the same language subtag for a prefix & that they must all share that one prefix.

Does this ever happen?   It would be nice to clarify this in the draft.}



 Although the ABNF production 'extlang' permits up to three
       extended language tags in the language tag, extended language
       subtags MUST NOT include another extended language subtag in
       their Prefix.  That is, the second and third extended language
       subtag positions in a language tag are permanently reserved and
       tags that include subtags in that position are inv

alid.

* * *

* * *

2.2.2; page 12; par 2.




   For example, the macrolanguage Chinese ('zh') encompasses a number of
   languages.  For compatibility reasons, each of these languages has
   both a primary and extended language subtag in the registry.  A few
   selected examples of these include Gan Chinese ('gan'), Cantonese
   Chinese ('yue') and Mandarin Chinese ('cmn').  Each is encompassed by
   the macrolanguage 'zh' (Chinese).  Therefore, they each have the
   prefix "zh" in their registry records.  Thus Gan Chinese is
   represented with tags beginning "zh-gan" or "gan"; Cantonese with
   tags beginning either "yue" or "zh-yue"; and Mandarin Chinese with
   "zh-cmn" or "cm

n".  The language subtag 'zh' can still be used
   without an extended language subtag to label a resource as some
   unspecified variety of Chinese, while the primary language subtag
   ('gan', 'yue', 'cmn') is preferred to using the extended language
   form ("zh-gan", "zh-yue", "zh-cmn").

 

{CLARIFICATION??:  are there ever instances where zh-cmn might be preferable?  For example, with written Chinese, targeting primarily a Mandarin-speaking audience?}

* * *

* * *
2.2.3.  Script Subtag; page 12; par 1; item 4.


   4.  There MUST be at most one script subtag in a language tag, and
       the script subtag SHOULD be omitted when it adds no
       distinguishing value to the tag or when the primary or extended
       language subtag's record in the subt

ag registry includes a
       'Suppress-Script' field listing the applicable script subtag.

   For example: "sr-Latn" represents Serbian written using the Latin
   script. 

{ COMMENT:  I want to add a second example here that would not be appropriate, so it will be clearer to readers--might even prefer to replace the first example.  Or is the ex. getting too detailed?}

INSERT > The tag, "uk-Latn" similarly represents Ukrainian written in the Latin script.  The tag, "uk", on the other hand, indicates Ukrainian written in Cyrillic script.  On the other hand, the tag, "uk-Cyrl", is not a well-formed tag as a suppress-script of Cyrillic is registered for Ukrainian, and so the script subtag should not be used;

the tag "uk" is all that is needed in this case.

* * *

* * *


2.2.4. Region Subtag; page 12; par 1; item 2


  


   2.  Two-letter region subtags were defined according to the
       assignments found in [ISO3166-1] ("Codes for the representation
       of names of countries and their subdivisions -- Part 1: Country
       codes") using the list of alpha-2 country codes, or using
       assignments subsequently made by the ISO 3166-1 maintenance
       agency or governing standardization bodies.  In addition, the
       codes that are "exceptionally reserved" (as opposed to
       "assigned") in ISO 3166-1 were also defined in the registry, with
       the exception of '

UK', which is an exact synonym for the assigned
       code 'GB'.

{COMMENT:  

First,  elsewhere you use the present tense with "defined;" these subtags are still so defined, right?  If so I think the present tense is best, as you use it elsewhere.

Second,  I inserted "in the registry" after defined; it seems more explicit;

Third,  in the clause that begins "or using assignments," "using" appears to be in parallel to the word "using" in the phrase, "using the list of alpha-2 . . ."--because of the repetition of "using;" at least it sounds this way to my ear; but it's actually parallel grammatically and semantically to the clause that begins "according to"--so I changed "using" to "according to" to emphasize this parallelism

}

> 2.  Two-letter region subtags are defined in the registry according to assignments found in [ISO3166-1] ("Codes for the representation
       of names of countries and their subdivisions -- Part 1: Country
       codes") using the list of alpha-2 country codes, or according to
       assignments subsequently made by the ISO 3166-1 maintenance
       agency or governing standardization bodies.
* * *
 

 

* * *
 

2.2.4; page 13; par 1; item 4C

     C.  When ISO 3166-1 reassigns a code formerly used for one
           country or area to another country or area and that code
           already is present in the registry, the UN numeric code for
           that country or area MUST be registered in the registry as
           described in Section 3.4 and MUST be used to form language
           tags that represent the country or region for which it is
           defined (rather than the recycled ISO 3166-1 code).

{COMMENT:  the wording is a bit confusing; insert "to which the code has been reassigned;" change the order a bit so that "rather than the recycled ISO 3166-1 code" is next to
the info. on the U.N. numeric code. }

 

> When ISO 3166-1 reassigns a code formerly used for one
           country or area to another country or area and that code
           already is present in the registry, the UN numeric code for
           the country or area to which the code has been reassigned--and not the recycled ISO 3166-1 code--MUST be registered in the registry as
           described in Section 3.4 and MUST be used to form language
           tags that represent the country or region for which it is
           defined.

* * *

* * *



2.2.5.  Variant Subtags; p. 15; par 1,; item 1

   Variant subtags are used to indicate additional, well-recognized
   variations that define a language or its dialects that are not
   covered by other available subtags.  The following rules apply to the
   variant subtags:

   1.  Variant subtags MUST follow any primary language, extended
       language, script, or region subtags, and MUST precede any
       extension or private use subtag sequences
{COMMENTS: again, I think you only mean those private-use subtags preceded by the singleton x; you might insert at end the following}
INSERT > (agai

n, private use subtags does not include those reserved codes 'qaa'- . . .).

* * *
  
* * *

2.2.6; Page 17; par 1; item 9; Example
    



   9.  In the event that more than one extension appears in a single
       tag, the tag SHOULD be canonicalized as described in Section 4.5.

   For example, if an extention were defined for the singleton 'r' and
   it defined the subtags shown, then the following tag would be a valid
   example: "en-Latn-GB-boont-r-extended-sequence-x-private"

{COMMENT:  please correct the spelling of "extention."  This you have to do I think.}

>
   For example, if an extention were defined for the singleton 'r' and
   it defined the subtags shown, then the following tag would be a val

id
   example: "en-Latn-GB-boont-r-extended-sequence-x-private"

* * *




* * *
2.2.9; page 20; par 4
Phillips & Davis           Expires May 4, 2009                 [Page 20]


Internet-Draft                language-tags                 October 2008


   reference [RFC3066].  A wider array of tags was considered 'well-
   formed' under that document.  Any tags that were valid for use under
   RFC 3066 are both 'well-formed' and 'valid' under this document's
   syntax; only invalid or illegal tags were well-formed by the early
   definition but no longer are.  The language tag syntax under RFC 3066
   was:

       obs-language-tag = primary-subtag *( "-" subtag )
       primary-subtag   = 1*8ALPHA
       subtag           = 1*8(ALPHA / DIGIT)

                  Figure 2: RFC 3066 Language Tag Syntax


{CLARIFICATION?:  I'm still not clear as to what tags were well-formed but no longer are; I personally do not see the need of this comment; I guess though that you mean that these
tags used subtags that were not in the registry but were otherwise well-formed???  If that's what you mean it's not clear.}
* * *


Best,

C. E. Whitehead
cewcathar <at> hotmail.com

 

Phillips, Addison | 3 Dec 06:17
Picon
Favicon

Re: rfc4646bis-18 section 4.1 capitalization

(editor hat) Done.

 

Addison Phillips

Globalization Architect -- Lab126

 

Internationalization is not a feature.

It is an architecture.

 

From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf Of Mark Davis
Sent: Tuesday, November 04, 2008 1:54 PM
To: Randy Presuhn
Cc: LTRU Working Group
Subject: Re: [Ltru] rfc4646bis-18 section 4.1 capitalization

 

+1


Mark

On Sun, Nov 2, 2008 at 1:56 AM, Randy Presuhn <randy_presuhn <at> mindspring.com> wrote:

Hi -

As a technical contributor...

In section 4.1 are a couple of cases where key word capitalization
should be checked.

4.1 4 1 says:
      1.  Collections are interpreted inclusively, so the subtag 'gem'
          (Germanic langauges) could, but should not, be used with
          content that would be better tagged with "en" (English), "de"
          (German), or "gsw" (Swiss German, Alemannic).  While 'gem'
          collects all of these (and other) languages, most
          implementations will not match 'gem' to the individual
          languages; thus using the subtag will not produce the desired
          result.

"should not" -> "SHOULD NOT"

4.1 5 says
      *  The 'und' (Undetermined) primary language subtag identifies
         linguistic content whose language is not determined.  This
         subtag SHOULD NOT be used unless a language tag is required
         and language information is not available or cannot be
         determined.  Omitting the language tag (where permitted) is
         preferred.  The 'und' subtag MAY be useful for protocols that
         require a language tag to be provided or where a primary
         language subtag is required (such as in "und-Latn").  The
         'und' subtag MAY also be useful when matching language tags in
         certain situations.

"MAY" -> "might"

Minor, but I'm certain other reviewers will bumpt into this as well.

Randy

_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
https://www.ietf.org/mailman/listinfo/ltru

 

Phillips, Addison | 3 Dec 06:20
Picon
Favicon

Re: rfc4646bis-18 propageted subtag churn

(editor hat)

I fixed this (I'm not ignoring the lengthy other comment Randy submitted. I'm just not responding to it in
this message)

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

> (BTW, there is an editorial "the a" to be fixed in 3.4.14)
> 
> Randy
> 
> _______________________________________________
> Ltru mailing list
> Ltru <at> ietf.org
> https://www.ietf.org/mailman/listinfo/ltru
Phillips, Addison | 3 Dec 06:20
Picon
Favicon

Re: rfc4646bis-18 section 3.1.7

(editor hat)

This was not incorporated per discussion elsewhere.

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

> -----Original Message-----
> From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On
> Behalf Of Randy Presuhn
> Sent: Sunday, November 02, 2008 2:15 AM
> To: LTRU Working Group
> Subject: [Ltru] rfc4646bis-18 section 3.1.7
> 
> Hi -
> 
> As a technical contributor...
> 
> In section 3.1.7
>    Occasionally the deprecated code is preferred in certain
> contexts.
>    For example, both "iw" and "he" can be used in the Java
> programming
>    language, but "he" is converted on input to "iw", which is thus
> the
>    canonical form in Java.
> 
> This example of a programming environment idosyncracy isn't helpful.
> Suggest deleting it, as what is "preferred" isn't an application of
> the BCP.
> 
> Very minor.
> 
> Randy
> 
> 
> _______________________________________________
> Ltru mailing list
> Ltru <at> ietf.org
> https://www.ietf.org/mailman/listinfo/ltru

Gmane