Hi!
I.
I still think Mark’s text reads more clearly than other possible introductory text and that it is the text to start with in the introduction to canonicalization . . . I can't do anything with:
"A language tag is in canonical form, either default canonical form or
extlang canonical form, when the tag is well-formed according to the
rules in <xref target="syntax"/> and <xref target="sources"/> and
canonicalising it does not change it"
Mark's text clearly introduces the two types of canonicalization, which you all seem to think you need to do.
)
Mark’s suggested introductory text:
"4.5. Canonicalization of Language Tags
Since a particular language tag is sometimes used by many processes,
language tags SHOULD always be created or generated in a canonical
form.
There are two canonical forms for language tags. The 'default'
canonical form maps each 'extlang' subtag to its Preferred-Value.
The 'extended' canonical form includes the macrolanguage primary
language subtag before eligible (extended) language subtags."
However, Kent points out that,
"Mapping just the subtag will not do anything here (if we just consider the
strings, ignoring the primary language vs. extlang classification). One
needs to map the language tag prefix, up to and including the extlang subtag
in question, to the preferred value."
So then, do we replace:
"each 'extlang' subtag"
with
"each subtag in the primary language-extension language combination" ???
Also, I want to clarify the implication of Kent’s suggested change to the steps in canonicalization going to affect the use of the term "macrolanguage" in the above?(Kent's worried about sign language subtags that are extension language subtags but don't have a macrolanguage):
"5. For the extlang canonical form (but not for the default canonical form), a primary language subtag that is also registered as an 'extlang' subtag is replaced by the corresponding language-extlang combination, where the
primary language subtag here is the Prefix registered for the extlang."
Does the above mean that we need to say here:
"The ‘extlang’ canonical form includes the registered prefix for the extension language before eligible (extlang) language subtags"
???
(I noted that Kent did not like "extended canonical form" and acted accordingly.)
These changes would result in:
"Since a particular language tag is sometimes used by many processes,
language tags SHOULD always be created or generated in a canonical
form.
There are two canonical forms for language tags. The 'default'
canonical form maps each each subtag in the primary language-extension language combination to its Preferred-Value.
The 'extended' canonical form includes the registered prefix for the extension language before eligible (extlang) language subtags."
II.
I also find the following text useful:
<t>Normally, the 'default' canonicalization is preferred. However, the 'extlang' canonical form is useful
in environments where the presence of the macrolanguage is beneficial in matching or selection (see <xref target="choiceUsingExtlang"></xref>).</t>
III.
My goof on Addison’s text (below)—as Kent pointed out, Addison means that: “mapping a subtag to its preferred value should occur before any additional steps in canonicalization”
>"These mappings MUST be done before additional processing, since there can be additional changes to subtag values."
>form(s)."
But then would not Kent's suggested step 2 (below) become step 1? but otherwise everything follows mapping:
>Canonicalisation of a well-formed [or well-defined, see comment above]
>language tag is defined by doing the following steps, in order, using data
>from the current IANA language subtag registry (<xref
>target="ianaformat"/>).
>1. Extension sequences in the tag are ordered into case-insensitive ASCII
> order by the singleton subtags. (At the time of publication of this
> document, there were no extension subtags registered.)
>2. A redundant or grandfathered tag that has a Preferred-Value field in
> the IANA registry is replaced with its preferred value.
> 3. A non-extlang subtag that has a Preferred-Value field in the IANA
> registry is replaced with its preferred value.
> 4. A tag prefix of the form language-extlang is replaced by the
> preferred value registered for the extlang.
> 5. For the extlang canonical form (but not for the default canonical form),
> a primary language subtag that is also registered as an 'extlang' subtag
> is replaced by the corresponding language-extlang combination, where the
> primary language subtag here is the Prefix registered for the extlang.
(NOTE: Hope my suggestions make some sense. Sorry if I'm not keeping up with the discussion on this (the postings online aren't quite up-to-date???)
--C. E. Whitehead
cewcathar <at> hotmail.com