Doug Ewell | 7 Dec 19:14
Favicon

Availability of 't' extension document and data

According to the IETF Datatracker, draft-davis-t-langtag-ext-07 ("BCP 47
Extension T - Transformed Content") has been approved by IESG and
forwarded to the RFC Editor queue.

The time a document normally spends in the RFC Editor queue varies
dramatically, and can be unexpectedly long (as BCP 47 veterans know),
but the RFC Editor FAQ notes that "Typical time to publish is 1-2
months."

Section 2.9 of draft-davis-t-langtag-ext-07 says, "The data and
specification will be available by the time this internet draft has been
approved.  The description field is in the process of being added to
CLDR."  The first sentence is repeated in Section 2.1.  This was an
ongoing concern of mine during the draft process, which was partially
addressed by including sample data in Section 2.9.

According to the CLDR "Releases/Downloads" page, Version 2.1 of CLDR is
scheduled to be released on February 1, 2012.  This is eight weeks from
now.

What is the likelihood that the data for the 't' extension actually will
be made available in time for RFC publication?

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­

_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
(Continue reading)

yoshito_umaoka | 31 Aug 05:45
Picon
Favicon

Fw: draft-davis-t-langtag-ext

Sorry,

>If my interpretation is correct, Section 2.2 d should be changed to "The order of the fields in a t extension is significant" (a field consists from <sep> and subtags represented by 3*8alphanum).

My suggestion above was wrong - I meant

1) The order of the fields in a t extension is not significant.
2) The order of subtags within a field is significant.


Yoshito Umaoka


----- Forwarded by Yoshito Umaoka/Westford/IBM on 08/30/2011 11:41 PM -----

From:        Yoshito Umaoka/Westford/IBM
To:        Mark Davis ☕ <mark <at> macchiato.com>
Cc:        Doug Ewell <doug <at> ewellic.org>, "Gordon P. Hemsley" <gphemsley <at> gmail.com>, ltru <at> ietf.org
Date:        08/30/2011 11:35 PM
Subject:        Re: [Ltru] draft-davis-t-langtag-ext



Thanks for the feedback!

Updated working drafts: Mark
— Il meglio è l’inimico del bene —


One thing which we may need clarification...

>Section 2.2 Structure
>
>d. The order of the subtags in a t extension is significant (see Section 2.3 (Canonicalization) Canonicalization).

I think this line does not match 2.3 Canonicalization - because the referenced section says - "with the fields ordered by the separators, alphabetically. "
I think we don't want to make the order of <field> significant - for example, assume there is a new <sep> "x0" is introduced -

und-Cyrl-t-und-latn-m0-ungegn-2007-x0-foo

and

und-Cyrl-t-und-latn-x0-foo-m0-ungegn-2007

would be equivalent (but, the canonical representation is - "und-Cyrl-t-und-latn-m0-ungegn-2007-x0-foo" - because field "m0-*" and "field "x0-*" are sorted alphabetical order).

If my interpretation is correct, Section 2.2 d should be changed to "The order of the fields in a t extension is significant" (a field consists from <sep> and subtags represented by 3*8alphanum).



Otherwise, the latest edition looks clean and ready to go.

Yoshito Umaoka
Phillips, Addison | 31 Aug 04:34
Favicon
Gravatar

draft-davis-t-langtag-ext...

I suppose it should go without saying, since I'm one of the authors, but I think that the current draft of the
-t- extension looks like it covers the important cases and provides the necessary future extensibility,
so I think it's ready for advancement before the IESG. I also intend to use it internally at Lab in some of our
future products, since it addresses needs in language identification.

Regards,

Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

Randy Presuhn | 3 Aug 21:10
Picon

Fw: New Non-WG Mailing List: happiana -- IETF/W3C/IANA Registry Happiness

Hi -

I don't know anything more about this than what the announcement
says, but it sounds like something that some of the people on this
list probably might care about.

Randy

----- Original Message ----- 
> From: "IETF Secretariat" <ietf-secretariat <at> ietf.org>
> To: "IETF Announcement list" <ietf-announce <at> ietf.org>
> Cc: <presnick <at> qualcomm.com>; <happiana <at> ietf.org>
> Sent: Wednesday, August 03, 2011 11:54 AM
> Subject: New Non-WG Mailing List: happiana -- IETF/W3C/IANA Registry Happiness 
>
> 
> 
> A new IETF non-working group email list has been created.
> 
> List address: happiana <at> ietf.org
> Archive: http://www.ietf.org/mail-archive/web/happiana/
> To subscribe: https://www.ietf.org/mailman/listinfo/happiana
> 
> Purpose: This list is for discussion of IANA Registry issues to
> result in Happy IETF, Happy W3C, and Happy IANA.
> 
> For additional information, please contact the list administrators.
> _______________________________________________
> IETF-Announce mailing list
> IETF-Announce <at> ietf.org
> https://www.ietf.org/mailman/listinfo/ietf-announce

CE Whitehead | 22 Jul 15:51
Picon

Mostly Proofreading Nits (Was: Re: draft-davis-t-langtag-ext)




Hi.  This is a response (mostly proofreading nits) to the working draft you've got at:
http://unicode.org/repos/cldr/trunk/docs/rfc/draft-davis-t-langtag-ext.txt

{GENERAL:  Thanks very much for this link!
To me the intro has sufficient examples.
Also it's nice to get the list of various conventions/standards for transliterations at the end of the intro.
I do think the paragraphs in it could flow a tiny bit better. 

So, for the Intro, 
text I think might be removed is enclosed by inverted brackets  > <  
text I think might be inserted is marked by {} 
All my own comments are indicated as { COMMENT: . . .  } 
}

1.  Introduction par 2 ff
   "Language tags, as defined by [BCP47], are useful for identifying the
   language of content.  There are mechanisms for specifying variant
   subtags for special purposes.  However, these variants are
   insufficient for specifying content that has undergone
   transformations, including content that has been transliterated,
   transcribed, or translated.  The correct interpretation of the
   content may depend upon knowledge of {how the source script or language has affected  the transformation and even upon knowledge of}  the conventions used for the transformation.

{ COMMENT: I  don't quite see how the following is an example of needing to specify conventions used for transformation -- what you've been talking about above. }
   "> For example, < suppose that Italian or Russian cities on a map are
   transcribed for Japanese users.  Each name needs to be transliterated
   into katakana using rules appropriate for the specific source and
   target language.  When tagging such data, it is important to be able
   to indicate not only the resulting content language ("ja" in this
   case), but also the source language.
* * *

{ COMMENT:  in the text below I do not think "not only . . . but also" is quite right; 
we've already been told that the language is important; this is not new info to introduce with a "but also" clause;
you can stress that language is important here, but do you need the "not only . . . but also"? }

"Transforms such as transliterations may vary depending >not only< on
   the basis of the source and target script, >but<  {and} also on the source and
   target language.  Thus the Russian <U+041F U+0443 U+0442 U+0438
   U+043D> (which corresponds to the Cyrillic <PE, U, TE, I, EN>)
   transliterates into "Putin" in English but "Poutine" in French.  The
   identifier could be used to indicate a desired mechanical
   transformation in an API, or could be used to tag data that has been
   converted (mechanically or by hand) according to a transliteration
   method.

"{In addition, }Many different conventions have arisen for how to transform text,
   even between the same languages and scripts.  For example, "Gaddafi"
   is commonly transliterated from Arabic to English as any of (G/Q/K/
   Kh)a(d/dh/dd/dhdh/th/zz)af(i/y).  Some examples of standardized
   conventions used for transcribing or transliterating text include:

" . . . "

{ COMMENT: I do like having the info. at the end of this section . . . }

* * *
* * *
2.1 par 4
   "The t extension is not intended for use in structured data that
   already provides for source and target language identifiers.  For
   example, this is the case in localization interchange formats such as
   XLIFF.  In such cases, it would be inappropriate to use "ja-t-it" for
   the target language tag because the source language tag "it" would
   already be present in the data.  Instead one would use the language
   tag "ja"."

{ COMMENT:  The phrase "already present in the data" is confusing; if I have text in Italian or French transliterated from French script to Arabic script I can of course use the it or fr subtag twice, but this text seems to say if the language is part of the original subtag then you should not mention it again after -t  ??? To me it does. But otherwise this section is fine}

* * *

2.1 par 5
   "It is sometimes necessary to indicate additional information about
   the transformation.  This additional information is optionally
   supplied after the source in a series of one or more fields, where
   each field consists of a field separator subtag followed by one or
   more non-separator subtags.  Each field separator subtag consists of
   a single letter followed by a single digit.

{ COMMENT: I personally would insert, "As noted" or "As noted earlier" or something similar at the beginning of this paragraph; I also did not see why you say "the" transformation"  here instead of just "a transformation" in general }

=>
"As pointed out in section I, it is sometimes necessary to indicate additional information about a transformation.  This additional information is optionally
   supplied after the source in a series of one or more fields, where
   each field consists of a field separator subtag followed by one or
   more non-separator subtags.  Each field separator subtag consists of
   a single letter followed by a single digit."

* * *
2.1 Editorial Note
"The data and specification will be available by the time this internet draft has
   been approved."  
{ COMMENT:  O.k. for now; I am assuming here you will put in more details, for example a date, by the time you send this draft for approval.}

* * *

From: "Martin J. DÃrst" <duerst at it.aoyama.ac.jp>
Date: Thu, 21 Jul 2011 19:14:26 +0900

> On 2011/07/08 6:00, Doug Ewell wrote:
>> Pete Resnick<presnick at qualcomm dot com>  wrote:
> . . .

>> I can't find any indication of where within CLDR the list of allowable
>> values will be located.  Saying they're in core.zip is almost useless.
>> Saying they're in common/bcp47 is better, but I'd still like to know
>> what file name, what XML element, etc.  An example would help.

> Agreed here again.
I tend to agree too.

Best,

--C. E. Whitehead
cewcathar <at> hotmail.com
> Regards,   Martin.

CE Whitehead | 20 Jul 15:37
Picon

Re: Proposed -t0- subtag

Forgot to cc the list.

Best,

--C. E. Whitehead
cewcathar <at> hotmail.com 

From: cewcathar <at> hotmail.com
To: addison <at> lab126.com
Subject: RE: [Ltru] Proposed -t0- subtag
Date: Wed, 20 Jul 2011 09:36:33 -0400

.ExternalClass .ecxhmmessage P {padding:0px;} .ExternalClass body.ecxhmmessage {font-size:10pt;font-family:Tahoma;}

Thanks for the info.
Best,

--C. E. Whitehead
cewcathar <at> hotmail.com 
From: addison <at> lab126.com
To: cewcathar <at> hotmail.com; ltru <at> ietf.org
Date: Mon, 18 Jul 2011 07:36:49 -0700
Subject: RE: [Ltru] Proposed -t0- subtag

.ExternalClass p.ecxMsoNormal, .ExternalClass li.ecxMsoNormal, .ExternalClass div.ecxMsoNormal {margin-bottom:.0001pt;font-size:12.0pt;font-family:'Times New Roman','serif';} .ExternalClass h1 {margin-right:0in;margin-left:0in;font-size:24.0pt;font-family:'Times New Roman','serif';font-weight:bold;} .ExternalClass a:link, .ExternalClass span.ecxMsoHyperlink {color:blue;text-decoration:underline;} .ExternalClass a:visited, .ExternalClass span.ecxMsoHyperlinkFollowed {color:purple;text-decoration:underline;} .ExternalClass p {margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:'Times New Roman','serif';} .ExternalClass p.ecxMsoAcetate, .ExternalClass li.ecxMsoAcetate, .ExternalClass div.ecxMsoAcetate {margin-bottom:.0001pt;font-size:8.0pt;font-family:'Tahoma','sans-serif';} .ExternalClass span.ecxHeading1Char {font-family:'Cambria','serif';color:#365F91;font-weight:bold;} .ExternalClass span.ecxapple-style-span {;} .ExternalClass span.ecxBalloonTextChar {font-family:'Tahoma','sans-serif';} .ExternalClass span.ecxEmailStyle22 {font-family:'Calibri','sans-serif';color:#1F497D;} .ExternalClass .ecxMsoChpDefault {font-size:10.0pt;} <at> page WordSection1 {size:8.5in 11.0in;} .ExternalClass div.ecxWordSection1 {page:WordSection1;}

No. Mark is saying that we should publish the draft without a ‘t0’ “mechanism” subtag and then consider adding one later if there is need.

 

Addison

 

From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf Of CE Whitehead
Sent: Monday, July 18, 2011 6:41 AM
To: ltru <at> ietf.org
Subject: Re: [Ltru] Proposed -t0- subtag

 

Hi.

 

From: Mark Davis â <mark at macchiato.com>
Date: Fri, 15 Jul 2011 10:56:38 -0700
> I agree; so I think we ought to see how everything works first, before extending.

 

 

I assume this means the proposal is on hold.  Is there any point now in forwarding it on to another list (such as lingualist; I can do so as I have joined that) to get feedback; or do you just waiting to get feedback on the transliteration scheme from places like ala-lc (and then perhaps on how the scheme might work for transcription & speech recognition  from the ISPs/developers/researchers? on this list).

 

Thanks if you can clarify this.

 

Best,

 

--C. E. Whitehead

> Mark

> â Il meglio à lâinimico del bene â

 

 

John Cowan | 14 Jul 18:23

Proposed -t0- subtag

In case you missed it (it was embedded in another posting), I proposed
the -t0- subtag to indicate a transformation path: thus en-t-fr-t0-sq
would indicate text translated from Albanian to French and then to
English (as is often done with Albanian literature because of the lack
of clear copyright law in Albania, so that no one knows who has rights
to what).

Formally, this subtag is needed because stacked -t- extensions are
forbidden by RFC 5646.

--

-- 
He played King Lear as though           John Cowan <cowan <at> ccil.org>
someone had played the ace.             http://www.ccil.org/~cowan
        --Eugene Field
CE Whitehead | 14 Jul 00:59
Picon

Minor proofreading nits again (was: Re: draft-davis-t-langtag-ext-03)

Hi.

I looked over http://tools.ietf.org/html/draft-davis-t-langtag-ext-03 quickly; just a couple of minor things.

1.  Intro

"Transforms such as transliteration may vary depending not only on the
 basis of the source and target script, but also on language.  Thus
 the Russian <U+041F U+0443 U+0442 U+0438 U+043D> (which corresponds
 to the Cyrillic <PE, U, TE, I, EN>) transliterates into "Putin" in
 English but "Poutine" in French.  The identifier could be used to
 indicate a desired mechanical transformation in an API, or could be
 used to tag data that has been converted (mechanically or by hand)
 according to a transliteration method."

{ COMMENT:  Try "Transforms such as transliterations?"  (that is, make "transliterations" plural I think).  Also I would change "could" to "can" because you are using the present tense elsewhere in this paragraph}
=>

"Transforms such as transliterations may vary depending not only on the
 basis of the source and target script, but also on language.  Thus
 the Russian <U+041F U+0443 U+0442 U+0438 U+043D> (which corresponds
 to the Cyrillic <PE, U, TE, I, EN>) transliterates into "Putin" in
 English but "Poutine" in French.  The identifier can be used to
 indicate a desired mechanical transformation in an API, or can be
 used to tag data that has been converted (mechanically or by hand)
 according to a transliteration method."

* * *

2.5
{ QUESTION:  do you want to mention that BP 47 language subtags are updated from time to time and this does not mean that the -t extension RFC will be updated at the same time (or does it?). }

* * *
2.6 last par

"Accepted tickets result an a new entry in the machine-readable CLDR
   BCP47 data, or in the case of a clarified description, modifications
   to the description attribute value for an existing entry."

{ ***IMPORTANT COMMENT:  typo it seems:  "Accepted tickets result in a new entry . . . " should replace "Accepted tickets result an a new entry . . . " }

=>

"Accepted tickets result in a new entry in the machine-readable CLDR
   BCP47 data, or in the case of a clarified description, modifications
   to the description attribute value for an existing entry." 


That's all I found but I read this pretty quickly this time.  

Best,

--C. E. Whitehead
cewcathar <at> hotmail.com  
Doug Ewell | 13 Jul 16:41
Favicon

Re: Title of draft-davis-t-langtag-ext

Mark Davis 🍮 <mark at macchiato dot com> wrote:

> How about:* BCP 47 Extension T - Content Transforms*

+1

--
Doug Ewell | Thornton, Colorado, USA | RFC 5645, 4645, UTN #14
www.ewellic.org | www.facebook.com/doug.ewell | @DougEwell ­

_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
https://www.ietf.org/mailman/listinfo/ltru
CE Whitehead | 12 Jul 21:18
Picon

Re: Language tags and (localization) processes (Re: draft-davis-t-langtag-ext)

Hi again.
From: cewcathar <at> hotmail.com
To: ltru <at> ietf.org; ietf-languages <at> iana.org
Subject: Language tags and (localization) processes (Re: [Ltru] draft-davis-t-langtag-ext)
Date: Tue, 12 Jul 2011 14:35:13 -0400


> Felix Sasaki felix.sasaki at fh-potsdam.de 
> Tue Jul 12 09:23:36 CEST 2011

>> Language tags so far have described *states*: an object is in a language, a
>> script etc. The proposed extension extends languages to described *states*: an object is in a language, a script etc.  
>> The proposed extension extends languages to describe the outcome
>> of a *process*: objects have been transformed, with a source object as the
>> basis for this process. According to the paragraph above, this
>> transformation includes also translation.

> I do personally agree that it's good to discuss and then document in the draft some of the concerns you have described.  
> And yes, translation/transliteration is a process.

> . . .

> I do think this is briefly mentioned (intro, last paragraph):

> "The usage of this extension is not limited to formal transformations, > and may include other instances where the content is in some other > way influenced by the source. For example, this extension could be > used to designate a request for a speech recognizer that is tailored > specifically for 2nd-language speakers who are 1st-language speakers > of a particular language (e.g. a recognizer for "English spoken with > a Chinese accent")."> Maybe there could be very brief info (in the intro or where the M0 part of the extension is discussed) on the methods/mechanism used in transcription, why they are relevant to indicate, a sentence or something?Actually you have brought this up sufficiently for me in section 2.5:
"A language tag with the t extension MAY be used to request a specific transform of content. In such a case, the recipient SHOULD return content that corresponds as closely as feasible to the requested transform, including the specification of the mechanism. For example, if the request is ja-t-it-m0-xxx-v21a-2007, and the recipient has content corresponding to both ja-t-it-m0-xxx-v21a-2007 and ja-t-it-m0-xxx-v21a-2009, then the 2007 version would be preferred. As is the case for language matching as discussed in [BCP47], different implementations MAY have different measures of "closeness"."

Best,

--C. E. Whitehead
cewcathar <at> hotmail.com 
 
Felix Sasaki | 12 Jul 09:23
Picon
Favicon

Language tags and (localization) processes (Re: draft-davis-t-langtag-ext)

The current draft states


"Language tags, as defined by [BCP47], are useful for identifying the
language of content. There are mechanisms for specifying variant subtags for special purposes. However, these variants are insufficient for specifying text transformations, including content 
that has been transliterated, transcribed, or translated."

I am requesting a clarification from the editors, that includes a liaison with the Unicode ULI TC http://uli.unicode.org/ , and a clarification in the draft.

Language tags so far have described *states*: an object is in a language, a script etc. The proposed extension extends languages to describe the outcome of a *process*: objects have been transformed, with a source object as the basis for this process. According to the paragraph above, this transformation includes also translation.

So far formats like TBX, XLIFF or others have been used for aligning source and target contents. These formats also use language tags, via xml:lang. However, the transformation, i.e. the process information, is not expressed via the language tag, but via XML structures (pairs of source and target elements). The language tags are purely for identifying the state of an object.

To avoid confusion for users of the above and other, process related formats about where to put language identification information and where to put process related information, I am asking you to
1) Liaise with the ULI TC about the issue described above and see what issues they see here
2) Document the outcome of this liaison on this list and in the draft  
There is no need to have long explanations in the draft, but guidance about the topic will be very helpful to avoid confusion.

As a side note, formats like TBX, XLIFF and others reduce the usage of a language tag for good reasons: information related to processes like translation can be very complex, e.g. expressing translation state, cycle, quality. So I have the general concern that language tags might be overloaded with key value pairs in areas that would require more complex information and that potentially overlap with formats that provide that information. Nevertheless I won't object against moving this extension forward, if the concerns are explained properly in the draft.

Felix

2011/7/12 Mark Davis ☕ <mark <at> macchiato.com>
We've posted a new version of http://tools.ietf.org/html/draft-davis-t-langtag-ext

Diffs are here: http://tools.ietf.org/rfcdiff?url2=draft-davis-t-langtag-ext-02.txt

The changes are:

* Made it clear that application to the case of speech was included, added Peter C's example.
* Fixed references, adding authors, removing unneeded reference.
* Changed ABNF. Mostly just the table form, but also defined alphanum.
* Made it clear that the CLDR committee must post proposals publicly.
* Added more information on the XML structure, including the description attribute. (Note that the CLDR committee had decided to add the description attribute before this process began.)
* Added fixes for typos noted by CEW.

Please let us know of further feedback.

Note to Doug: The CLDR committee had agreed to move the descriptions into the bcp47 files, such as http://unicode.org/repos/cldr/trunk/common/bcp47/calendar.xml. Yoshito has the action to do that, and was able to accelerate it. So please take a look if you have the time.

Mark



_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
https://www.ietf.org/mailman/listinfo/ltru



Gmane