Hi. This is a response (mostly proofreading nits) to the working draft you've got at:
http://unicode.org/repos/cldr/trunk/docs/rfc/draft-davis-t-langtag-ext.txt
{GENERAL: Thanks very much for this link!
To me the intro has sufficient examples.
Also it's nice to get the list of various conventions/standards for transliterations at the end of the intro.
I do think the paragraphs in it could flow a tiny bit better.
So, for the Intro,
text I think might be removed is enclosed by inverted brackets > <
text I think might be inserted is marked by {}
All my own comments are indicated as { COMMENT: . . . }
}
1. Introduction par 2 ff
"Language tags, as defined by [BCP47], are useful for identifying the
language of content. There are mechanisms for specifying variant
subtags for special purposes. However, these variants are
insufficient for specifying content that has undergone
transformations, including content that has been transliterated,
transcribed, or translated. The correct interpretation of the
content may depend upon knowledge of {how the source script or language has affected the transformation and even upon knowledge of} the conventions used for the transformation.
{ COMMENT: I don't quite see how the following is an example of needing to specify conventions used for transformation -- what you've been talking about above. }
"> For example, < suppose that Italian or Russian cities on a map are
transcribed for Japanese users. Each name needs to be transliterated
into katakana using rules appropriate for the specific source and
target language. When tagging such data, it is important to be able
to indicate not only the resulting content language ("ja" in this
case), but also the source language.
* * *
{ COMMENT: in the text below I do not think "not only . . . but also" is quite right;
we've already been told that the language is important; this is not new info to introduce with a "but also" clause;
you can stress that language is important here, but do you need the "not only . . . but also"? }
"Transforms such as transliterations may vary depending >not only< on
the basis of the source and target script, >but< {and} also on the source and
target language. Thus the Russian <U+041F U+0443 U+0442 U+0438
U+043D> (which corresponds to the Cyrillic <PE, U, TE, I, EN>)
transliterates into "Putin" in English but "Poutine" in French. The
identifier could be used to indicate a desired mechanical
transformation in an API, or could be used to tag data that has been
converted (mechanically or by hand) according to a transliteration
method.
"{In addition, }Many different conventions have arisen for how to transform text,
even between the same languages and scripts. For example, "Gaddafi"
is commonly transliterated from Arabic to English as any of (G/Q/K/
Kh)a(d/dh/dd/dhdh/th/zz)af(i/y). Some examples of standardized
conventions used for transcribing or transliterating text include:
" . . . "
{ COMMENT: I do like having the info. at the end of this section . . . }
* * *
* * *
2.1 par 4
"The t extension is not intended for use in structured data that
already provides for source and target language identifiers. For
example, this is the case in localization interchange formats such as
XLIFF. In such cases, it would be inappropriate to use "ja-t-it" for
the target language tag because the source language tag "it" would
already be present in the data. Instead one would use the language
tag "ja"."
{ COMMENT: The phrase "already present in the data" is confusing; if I have text in Italian or French transliterated from French script to Arabic script I can of course use the it or fr subtag twice, but this text seems to say if the language is part of the original subtag then you should not mention it again after -t ??? To me it does. But otherwise this section is fine}
* * *
2.1 par 5
"It is sometimes necessary to indicate additional information about
the transformation. This additional information is optionally
supplied after the source in a series of one or more fields, where
each field consists of a field separator subtag followed by one or
more non-separator subtags. Each field separator subtag consists of
a single letter followed by a single digit.
{ COMMENT: I personally would insert, "As noted" or "As noted earlier" or something similar at the beginning of this paragraph; I also did not see why you say "the" transformation" here instead of just "a transformation" in general }
=>
"As pointed out in section I, it is sometimes necessary to indicate additional information about a transformation. This additional information is optionally
supplied after the source in a series of one or more fields, where
each field consists of a field separator subtag followed by one or
more non-separator subtags. Each field separator subtag consists of
a single letter followed by a single digit."
* * *
2.1 Editorial Note
"The data and specification will be available by the time this internet draft has
been approved."
{ COMMENT: O.k. for now; I am assuming here you will put in more details, for example a date, by the time you send this draft for approval.}
* * *
From: "Martin J. DÃrst" <duerst at it.aoyama.ac.jp>
Date: Thu, 21 Jul 2011 19:14:26 +0900
> On 2011/07/08 6:00, Doug Ewell wrote:
>> Pete Resnick<presnick at qualcomm dot com> wrote:
> . . .
>> I can't find any indication of where within CLDR the list of allowable
>> values will be located. Saying they're in core.zip is almost useless.
>> Saying they're in common/bcp47 is better, but I'd still like to know
>> what file name, what XML element, etc. An example would help.
> Agreed here again.
I tend to agree too.
Best,
--C. E. Whitehead
cewcathar <at> hotmail.com
> Regards, Martin.