.hmmessage P
{margin:0px;padding:0px;}
body.hmmessage
{font-size:10pt;font-family:Verdana;}
Hi!
Regarding the draft http://tools.ietf.org/html/draft-ietf-ltru-4646bis-18 notes
Section 2 Comments (I sent this as a whole; it is mostly editorial; I can however break it up into two or maybe three emails, one
containing strictly editorial comments/corrections; and one containing requests for claridications.
I've
just put in all my comments for Section 2 below; I had none on Section 1; I
will have a few on Section 4.6; not sure about comments on other
sections as I've not had time to go over those carefully.
Sorry for repeat of comments on section 2; I forgot
about the private-use subtags 'Qaaa' - 'Qabx' that are reserved for
script codes--in my editing of Section 2.2; so I fixed that &
decided it best to send the whole corrections as a unit?? a
Whether or not you take my edits, I do think you need to make one change:
2.2.6; Page 17; par 1; item 9; Example
{COMMENT: please correct the spelling of "extention.".}
> For example, if an extention were defined for the singleton 'r' and
it defined the subtags shown, then the following tag would be a valid
example: "en-Latn-GB-boont-r-extended-sequence-x-private"
Thanks.
C. E. Whitehead
cewcathar <at> hotmail.com
* * *
2.1; Page 4; par 2.
There are different types of subtag, each of which is distinguished
by length, position in the tag, and content: each subtag's type can
be recognized solely by these features. This makes it possible to
extract and assign some semantic information to the subtags, even if
the specific subtag values are not recog
nized. Thus, a language tag
processor need not have a list of valid tags or subtags (that is, a
copy of some version of the IANA Language Subtag Registry) in order
to perform common searching and matching operations. The only
exceptions to this ability to infer meaning from subtag structure are
the grandfathered tags listed in the productions 'regular' and
'irregular' below. These tags were registered under [
RFC3066] and
are a fixed list that can never change. {COMMENTS:
1. The above paragraph is a bit awkward, hard to read}
> There are several/different types of subtags. A subtag's length and position in the language tag, as well as its content, serves to distinguish its subtag type. Thus it is possible to extract f
rom and assign to the subtag some semantic information, even when the subtag's specific value is not recognized. Because of this, a language tag processor need not have a list of all valid tags or subtags (that is, a copy of the IANA Language Subtag Registry), in order to perform common searching and matching operations on the tag. Exceptions are the 'regular' and 'irregular' grandfathered tags, listed below. Semantic information cannot be extracted from the individual subtags that compose the grandfathered language tags because these tags are only interpretable only as wholes. The order of and length of subtags that make up the 'irregular' grandfathered tags may in fact vary from the rules governing order and length of subtags specified in this document (see below). It is therefore possible, in an irregular grandfathered language tag, to have a variant subtag composed of fewer than four characters or a subtag denoting a region preceding one denoting a language, as in, f
or example, "sgn-BE-fr" (which denotes Belgian French sign language, the French language as used in Belgium). The grandfathered tags were registered under [RFC3066] and are a fixed list that can never change.
* * *
* * *2.1; page 4; par. 3
The syntax of the language tag in ABNF [
RFC5234] is:
{COMMENTS:
1. It is confusing to have the whole list that follows with no real introduction; what about a short definition of the syntax first?} > The language tag (referred to below as langtag) in ABNF [FRC5234] is composed of one (1) language (or macrolanguage?) subtag, followed by--in order--zero or one extended language subtag,
zero or one script subtag, zero or one region subtag, sero or more variant subtags, and zero or more singletons (single character prefixes) for extension language subtags, with each singleton followed by one or more extension language subtags.The language tag may be interrupted anywhere in the tag (including initial position) by private use subtags. These are introduced by the singleton x. The singleton x indicates that all content that follows is governed by private agreement only, and that its meaning and arrangement is not covered in this document (see section 2.2.7 and 4.6 below for more information on private use subtags).
Rules governing the length of the various types of subtags are specified below and in Section 2.2.
As noted above, grandfathered language tags are not governed by all of the above rules of syntax (for more on grandfathered tags, see 2.2.8, below).
* * ** * *
2.1, pages 4-6
{COMMENTS:
1. My main comment on the section below is that I find it very distracting and bizarre to use the asterisk [*] to indicate a variable number of letters and numbers, for example 5*8 indicates 5 to 8 letters in a variant subtag beginning with a letter;I would prefer some other symbol-- 5 - 8; or 5-8 or 5to8?? Thanks.} Language-Tag = langtag ; normal language tags
/ privateuse ; private use tag
/ grandfathered ; grandfathered tags
langtag = language
["-" script]
["-" region]
*("-" variant)
*("-" extension)
["-" privateuse]
language = 2*3ALPHA ; shortest ISO 639 code
["-" extlang] ; sometimes followed
by
; extended language subtags
/ 4ALPHA ; or reserved for future use
/ 5*8ALPHA ; or registered language subtag
Phillips & Davis Expires May 4, 2009 [Page 5] Internet-Draft language-tags October 2008 extlang = 3ALPHA ; selected ISO 639 codes
*2("-" 3ALPHA) ; permanently reserved
script = 4ALPHA ; ISO 15924 code
region = 2ALPHA ; ISO 3166-1 code
/ 3DIGIT
; UN M.49 code
variant = 5*8alphanum ; registered variants
/ (DIGIT 3alphanum)
extension = singleton 1*("-" (2*8alphanum))
; Single alphanumerics
; "x" reserved for private use
singleton = %x41-57 ; a - w
/ %x59-5A ; y - z
/ %x61-77 ; A - W
/ %x79-7A ; Y - Z
/ DIGIT ; 0 - 9
privateuse = "x" 1*("-" (1*8alphanum))
grandfathered = irregular ; non-redundant tags registered
/ regular ; during the
RFC 3066 era
irregular = "en-GB-oed" ; irregular tags do not match
/ "i-ami" ; the 'langtag' production and
/ "i-bnn"
; would not otherwise be
/ "i-default" ; considered 'well-formed'
/ "i-enochian" ; These tags are all valid,
/ "i-hak" ; but most are deprecated
/ "i-klingon" ; in favor of more modern
/ "i-lux" ; subtags or subtag
/ "i-mingo" ; combination
/ "i-navajo"
/ "i-pwn"
/ "i-tao"
/ "i-tay"
/ "i-tsu"
/ "sgn-BE-FR"
/ "sgn-BE-NL"
/ "sgn-CH-DE"
Phillips & Davis Expires May 4, 2009 [Page 6] ont>Internet-Draft language-tags October 2008 regular = "art-lojban" ; these tags match the 'langtag'
/ "cel-gaulish" ; production, but their subtags
/ "no-bok" ; are not extended language
/ "no-nyn" ; or variant subtags: their meaning
/ "zh-guoyu" ; is defined by their registration
/ "zh-hakka" ; and all of these are deprecated
/ "zh-min" ; in favor of a more modern
/ "zh-min-nan" ; subtag or sequence of subtags
/ "zh-xiang"
alphanum = (ALPHA / DIGIT) ; letters and numbers
Figure 1: Language Tag ABNF
* * *
* * *
2.1, p. 6, par 5
The
re is a subtlety in the ABNF
production 'variant':
{COMMENTS:
1. ??? Do you need this comment about 'subtlety'?; the information that follows this--see below--is sufficient!}> {NULL, nothing here needed.}
* * *
* * *
a variant starting with a digit has a minimum
length of four characters, while those starting with a letter have a
minimum length of five characters.
{COMMENT:
1. If you delete the sentence above, beginning, "There is a subtlety . . . ," then make the 'a' upper case.}
> A variant starting with a digit . . .
* * *
* * *
2.2. Language Subtag Sources and Interpretationp. 8, par 3
Language tags are designed so that each subtag type has unique length
and content restrictions. These make identification of the subtag's
type possible, even if the content of the subtag itself is
unrecognized. This allows tags to be parsed and processed without
reference to the latest version of the underlying standards or the
IANA registry and makes the associated exception handling when
parsing tags simpler.
{COMMENTS:
1. last sentence is awkward and wordy; change.}
> This allows tags to be parsed and processed without reference to either the latest version of the underlying standards or the IANA registry, and simplifies exception handling associated with the parsing of language tags.
* * *
* * *
2.2 p. 8, par 5
{CLARIFICATION??
NEED CLARIFICATION of the first sentence in the paragraph below:
What about the reserved sequences for private language subtags ranging from '
qaa' to 'qtz,' and the reserved sequences for private region subtags including 'AA,' 'QM'-'QZ,' 'XA-XZ,' and 'ZZ?'
These do not follow the singleton x and so occur in their normal position in the language tag, right?
}
Sequences of private use and extension subtags MUST occur at the end
of the sequence of subtags and MUST NOT be interspersed with subtags
defined elsewhere in this document. These sequences are introduced
by single-character subtags, which are reserved as follows:
> Sequences of private use (excepting those sequences reserved for language subtags--ranging from 'qaa' to 'qtz', those reserved for script subtags--ranging from 'Qaaa' to 'Qabx', and those reserved for region subtags--including 'AA', 'QM'-'QZ', 'XA'-XZ', and 'ZZ') and extension subtags
MUST occur at the end of the sequence of subtags and MUST NOT be interspersed with subtags defined elsewhere in this document.
* * *
* * *
2.2.1. Primary Language Subtag, p. 10, par 1, item 5
5. Any language subtags of 5 to 8 characters in length in the IANA
registry were defined via the registration process in
Section 3.5 and MAY be used to form the primary language subtag. An example
of what such a registration might include: one of the
grandfathered IANA registrations is "i-enochian". The subtag
'enochian' could be registered in the IANA registry as a primary
language subtag (assuming that ISO 639 does not register this
language first), making tags such as "enochian-AQ" and "enochian-
Latn" valid.
At the time this document was created, t
here were no examples of
this kind of subtag and future registrations of this type are
discouraged: primary languages are strongly RECOMMENDED for
registration with ISO 639, and proposals rejected by ISO 639/
RA-JAC will be closely scrutinized before they are registered
with IANA.
{ COMMENT:
1. A transition--such as "However," "The above notwithstanding," etc, is needed to go the second paragraph in item 5 above.}
> However, at
he time this document was created, there were no examples of
this kind of subtag and future registrations of this type are
discouraged: primary languages are strongly RECOMMENDED for
registration with ISO 639, and proposals rejected by ISO 639/
RA-JAC will be closely scrutinized before they are registered
with IANA.
* * *
* * *
2.2.1, p. 11, par. 6
{COMMENT/CLARIFICATION?:
What is the
reference for "these" in the first sentence of the paragraph below?
I do not think "these" is needed; that is I do not think you need to refer back to a prior mention of these problems.
}
To avoid these problems with versioning and subtag choice (as
experienced during the transition between
RFC 1766 and
RFC 3066), as
well as to ensure the canonical nature of subtags defined by this
document, the ISO 639 Registration Authority Joint Advisory Committee
(ISO 639/RA-JAC) has included the following statement in
[
iso639.prin]:
> To avoid problems with versioning and subtag choice (
(as
experienced during the transition between
RFC 1766 and
RFC 3066), as
well as to ensure the canonical nature of subtags defined by this
document, the ISO 639 Registration Authority Joint Advisory Committee
(ISO 639/RA-JAC) has included the following statement in
[
iso639.prin]:
* * *
"A language code already in ISO 639-2 at the point of freezing ISO
639-1 shall not later be added to ISO 639-1. This is to ensure
consistency in usage over time, since users are directed in
Internet applications to employ the alpha-3 code when an alpha-2
code for that language is not available." * * *
2.2.2. Extended Language Subtags; p. 11, par 1 {COMMENT: I think the fourth sentence of this paragraph should say "REQUIRED" not "RECOMMENDED;" this is not the sameas capitalization rules, which are simply recommendations; I assume that a language tag without a primary language subtag will not be parsed properly.}Extended language subtags are used to identify certain specially-
selected languages that, for various historical and compatibility
reasons, are closely identified with or tagged using an existing
primary language subtag. Extended language subtags are always used
with their enclosing primary language subtag (indicated with a
'Prefix' field in the registry) when used to form the language tag.
All languages that have an extended language subtag in the registry
also have an identical primary language subtag record in the
registry. This primary language subtag is RECOMMENDED for forming
the language tag. The following rules apply to the extended language
subtags:
> This primary language subtag is REQUIRED. {END OF SENTENCE.} * * ** * *2.2.2. p. 11; par 1; item 4.{ CLARIFICATIO?? Item 4 is confusing to me' does this mean that while 3 extended language subtags are permitted in theory, only one is permitted in fact?Or does this mean that three extended language subtags are permitted in fact when they have the same language subtag for a prefix & that they must all share that one prefix.Does this ever happen? It would be nice to clarify this in the draft.}
Although the ABNF production 'extlang' permits up to three
extended language tags in the language tag, extended language
subtags MUST NOT include another extended language subtag in
their Prefix. That is, the second and third extended language
subtag positions in a language tag are permanently reserved and
tags that include subtags in that position are inv
alid.* * *
* * *2.2.2; page 12; par 2. For example, the macrolanguage Chinese ('zh') encompasses a number of
languages. For compatibility reasons, each of these languages has
both a primary and extended language subtag in the registry. A few
selected examples of these include Gan Chinese ('gan'), Cantonese
Chinese ('yue') and Mandarin Chinese ('cmn'). Each is encompassed by
the macrolanguage 'zh' (Chinese). Therefore, they each have the
prefix "zh" in their registry records. Thus Gan Chinese is
represented with tags beginning "zh-gan" or "gan"; Cantonese with
tags beginning either "yue" or "zh-yue"; and Mandarin Chinese with
"zh-cmn" or "cm
n". The language subtag 'zh' can still be used
without an extended language subtag to label a resource as some
unspecified variety of Chinese, while the primary language subtag
('gan', 'yue', 'cmn') is preferred to using the extended language
form ("zh-gan", "zh-yue", "zh-cmn"). {CLARIFICATION??: are there ever instances where zh-cmn might be preferable? For example, with written Chinese, targeting primarily a Mandarin-speaking audience?}* * *
* * *
2.2.3. Script Subtag; page 12; par 1; item 4.
4. There MUST be at most one script subtag in a language tag, and
the script subtag SHOULD be omitted when it adds no
distinguishing value to the tag or when the primary or extended
language subtag's record in the subt
ag registry includes a
'Suppress-Script' field listing the applicable script subtag.
For example: "sr-Latn" represents Serbian written using the Latin
script. { COMMENT: I want to add a second example here that would not be appropriate, so it will be clearer to readers--might even prefer to replace the first example. Or is the ex. getting too detailed?}INSERT > The tag, "uk-Latn" similarly represents Ukrainian written in the Latin script. The tag, "uk", on the other hand, indicates Ukrainian written in Cyrillic script. On the other hand, the tag, "uk-Cyrl", is not a well-formed tag as a suppress-script of Cyrillic is registered for Ukrainian, and so the script subtag should not be used; the tag "uk" is all that is needed in this case. * * * * * *
n>2.2.4. Region Subtag; page 12; par 1; item 2
2. Two-letter region subtags were defined according to the
assignments found in [ISO3166-1] ("Codes for the representation
of names of countries and their subdivisions -- Part 1: Country
codes") using the list of alpha-2 country codes, or using
assignments subsequently made by the ISO 3166-1 maintenance
agency or governing standardization bodies. In addition, the
codes that are "exceptionally reserved" (as opposed to
"assigned") in ISO 3166-1 were also defined in the registry, with
the exception of '
UK', which is an exact synonym for the assigned
code 'GB'.
{COMMENT: First, elsewhere you use the present tense with "defined;" these subtags are still so defined, right? If so I think the present tense is best, as you use it elsewhere.Second, I inserted "in the registry" after defined; it seems more explicit;Third, in the clause that begins "or using assignments," "using" appears to be in parallel to the word "using" in the phrase, "using the list of alpha-2 . . ."--because of the repetition of "using;" at least it sounds this way to my ear; but it's actually parallel grammatically and semantically to the clause that begins "according to"--so I changed "using" to "according to" to emphasize this parallelism}
> 2. Two-letter region subtags are defined in the registry according to assignments found in [ISO3166-1] ("Codes for the representation
of names of countries and their subdivisions -- Part 1: Country
codes") using the list of alpha-2 country codes, or according to
assignments subsequently made by the ISO 3166-1 maintenance
agency or governing standardization bodies.
* * *
* * *
2.2.4; page 13; par 1; item 4C C. When ISO 3166-1 reassigns a code formerly used for one
country or area to another country or area and that code
already is present in the registry, the UN numeric code for
that country or area MUST be registered in the registry as
described in Section 3.4 and MUST be used to form language
tags that represent the country or region for which it is
defined (rather than the recycled ISO 3166-1 code).{COMMENT: the wording is a bit confusing; insert "to which the code has been reassigned;" change the order a bit so that "rather than the recycled ISO 3166-1 code" is next to
the info. on the U.N. numeric code. } > When ISO 3166-1 reassigns a code formerly used for one
country or area to another country or area and that code
already is present in the registry, the UN numeric code for
the country or area to which the code has been reassigned--and not the recycled ISO 3166-1 code--MUST be registered in the registry as
described in Section 3.4 and MUST be used to form language
tags that represent the country or region for which it is
defined.* * *
* * *
2.2.5. Variant Subtags; p. 15; par 1,; item 1
Variant subtags are used to indicate additional, well-recognized
variations that define a language or its dialects that are not
covered by other available subtags. The following rules apply to the
variant subtags:
1. Variant subtags MUST follow any primary language, extended
language, script, or region subtags, and MUST precede any
extension or private use subtag sequences
{COMMENTS: again, I think you only mean those private-use subtags preceded by the singleton x; you might insert at end the following}
INSERT > (agai
n, private use subtags does not include those reserved codes 'qaa'- . . .).
* * *
* * *
2.2.6; Page 17; par 1; item 9; Example
9. In the event that more than one extension appears in a single
tag, the tag SHOULD be canonicalized as described in Section 4.5.
For example, if an extention were defined for the singleton 'r' and
it defined the subtags shown, then the following tag would be a valid
example: "en-Latn-GB-boont-r-extended-sequence-x-private"
{COMMENT: please correct the spelling of "extention." This you have to do I think.}
>
For example, if an extention were defined for the singleton 'r' and
it defined the subtags shown, then the following tag would be a val
id
example: "en-Latn-GB-boont-r-extended-sequence-x-private"
* * *
* * *
2.2.9; page 20; par 4
Phillips & Davis Expires May 4, 2009 [Page 20]
Internet-Draft language-tags October 2008
reference [RFC3066]. A wider array of tags was considered 'well-
formed' under that document. Any tags that were valid for use under
RFC 3066 are both 'well-formed' and 'valid' under this document's
syntax; only invalid or illegal tags were well-formed by the early
definition but no longer are. The language tag syntax under RFC 3066
was:
obs-language-tag = primary-subtag *( "-" subtag )
primary-subtag = 1*8ALPHA
subtag = 1*8(ALPHA / DIGIT)
Figure 2: RFC 3066 Language Tag Syntax
{CLARIFICATION?: I'm still not clear as to what tags were well-formed but no longer are; I personally do not see the need of this comment; I guess though that you mean that these
tags used subtags that were not in the registry but were otherwise well-formed??? If that's what you mean it's not clear.}
* * *
Best,
C. E. Whitehead
cewcathar <at> hotmail.com