Randy Presuhn | 1 Oct 05:10
Picon

Re: WG meeting at Vancouver IETF?

Hi -

As co-chair -

I believe we have a rough consensus to *not* request an ltru WG
session at the Vancouver IETF meeting.  So, everybody, PLEASE
recouble your efforts to wrap up the remaining issues.

Randy

Addison Phillips | 2 Oct 06:43
Picon
Favicon

teleconference information...

Hi All,

We had a teleconference last week as a kick-off. We'll be holding a 
follow-on call this week, on Wednesday 3 October 16:00 UTC.

Time: 9:00-10:00 Pacific Daylight Time
USA Dial-In #:+1.888.371.8922
International Dial-In #:+1.617.224.4792
Participant Passcode: 58371972

Addison

--

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.
Chair -- W3C Internationalization Core WG

Internationalization is an architecture.
It is not a feature.

Randy Presuhn | 3 Oct 20:09
Picon

Summary of 2007-10-03 teleconference

Hi -

There was a teleconference to discuss remaining open issues in
the ltru drafts this morning.  I volunteered to serve as scribe.
I hope I got it right.  If I didn't, I'm sure someone will
correct me.  In any case, let's see if we can finish this discussion
by working through the tradeoffs and agreeing on something, since
it's pretty clear at this point that neither approach is perfect.

The primary topic of discussion was the question of what to do
about macrolanguages and extlangs.  The two major alternatives
both had implications for matching.

There was general agreement that remove-from-right matching is not
a panacea.  There was less agreement about whether a macrolanguage-
extlang construct is ever helpful, or actually counterproductive,
in its interactions with matching.

One consequence of the "flat" approach (not including encapsulating
subtags in a language tag encoding) is that more intelligent matching
would require additional information from the registry, which would
increase the risk that someone might build applications that would
read the registry, presenting a possible DoS with respect to IANA.

There was some disagreement about what is meant by a macrolanguage
subtag by itself.  Two examples: does zh mean the same thing as zh-cmn,
and does ar mean the same thing as ar-arb?  A *rough* consensus
of those participating in the teleconference (this must not be
understood as a determination of WG consensus) was that they mean
different things.  In this view, "zh" means "undifferentiated Chinese"
(Continue reading)

Randy Presuhn | 4 Oct 00:08
Picon

Clarifying Suppress-Script

Hi -

Based on recent discussion on the ietf-languages <at> iana.org list,
as a technical contributor I think we should clarify section 3.1.8
in the current draft by adding the following paragraph to that section:

  Suppress-Script is a compatibility measure, and as such is
  only appropriate for languages whose subtags were added to
  the registry before RFC 4646 was approved.  Even for such
  languages, Supress-Script SHOULD NOT be used unless there
  exists a significant body (as determined by the Language
  Subtag Reviewer) of material tagged before the approval of
  RFC 4646 without a script subtag.

Thoughts?

Randy

John Cowan | 4 Oct 02:34

Re: Clarifying Suppress-Script

Randy Presuhn scripsit:

>   Suppress-Script is a compatibility measure, and as such is
>   only appropriate for languages whose subtags were added to
>    the registry before RFC 4646 was approved.

+1

>   Even for such
>   languages, Supress-Script SHOULD NOT be used unless there
>   exists a significant body (as determined by the Language
>   Subtag Reviewer) of material tagged before the approval of
>   RFC 4646 without a script subtag.

Neither the LSR nor anyone else (not even people who work for
search-engine companies) is in a position to make any such determination.
Not all tagged material is open to public scrutiny.  I am therefore
opposed to adding this sentence.

--

-- 
With techies, I've generally found              John Cowan
If your arguments lose the first round          http://www.ccil.org/~cowan
    Make it rhyme, make it scan                 cowan <at> ccil.org
    Then you generally can
Make the same stupid point seem profound!           --Jonathan Robie

McDonald, Ira | 4 Oct 17:59
Favicon

RE: Clarifying Suppress-Script

Hi,

I object to this whole paragraph.  Because RFC 4646
*inserted* scripts between the two previously main
subtags (language and region), Suppress-Script is
necessary for MANY languages tagged in the future.
It has nothing to do with an existing body of tagged
material.

Cheers,
- Ira

Ira McDonald (Musician / Software Architect)
Chair - Linux Foundation Open Printing WG
Blue Roof Music / High North Inc
PO Box 221  Grand Marais, MI  49839
phone: +1-906-494-2434
email: imcdonald <at> sharplabs.com

-----Original Message-----
From: John Cowan [mailto:cowan <at> ccil.org]
Sent: Wednesday, October 03, 2007 7:35 PM
To: Randy Presuhn
Cc: LTRU Working Group
Subject: Re: [Ltru] Clarifying Suppress-Script

Randy Presuhn scripsit:

>   Suppress-Script is a compatibility measure, and as such is
>   only appropriate for languages whose subtags were added to
(Continue reading)

Mark Davis | 4 Oct 19:06
Favicon

Re: Clarifying Suppress-Script

I am pulled in different directions on this. What people should do when tagging is inexorably linked with what implementers should do when interpreting tags. If there is a mismatch in expectations, then we get problems.

What I as an implementer want to know is: when I see xxx-YY and xxx-ZZZZ-YY, what am I to do?

  1. Do I treat these as essentially synonyms for filtering and lookup? (eg en-US and en-Latn-US)
  2. Do I treat them as completely different for filtering and lookup (eg ru-RU and ru-Latn-RU, or zh-CN and zh-Latn-CN)
  3. Do I treat them as different for filtering, but as synonyms for lookup (eg zh-CN and zh-Hans-CN, or zh-TW and zh-Hant-TW)
On the one hand, Suppress Script, as discussed before, is a very imperfect tool: it doesn't tell me about case 3, and it is ambiguous -- being missing could mean that there are multiple common scripts for a language, or could mean that there is no information available.

On the other hand, whether or not the language code was pre or post 4646 doesn't seem to make much difference as far as the question of what I should do when interpreting these codes, so it seems from that that we should not make Suppress-Script depend on pre or post 4646.

Mark

On 10/4/07, McDonald, Ira <imcdonald <at> sharplabs.com> wrote:
Hi,

I object to this whole paragraph.  Because RFC 4646
*inserted* scripts between the two previously main
subtags (language and region), Suppress-Script is
necessary for MANY languages tagged in the future.
It has nothing to do with an existing body of tagged
material.

Cheers,
- Ira


Ira McDonald (Musician / Software Architect)
Chair - Linux Foundation Open Printing WG
Blue Roof Music / High North Inc
PO Box 221  Grand Marais, MI  49839
phone: +1-906-494-2434
email: imcdonald <at> sharplabs.com

-----Original Message-----
From: John Cowan [mailto: cowan <at> ccil.org]
Sent: Wednesday, October 03, 2007 7:35 PM
To: Randy Presuhn
Cc: LTRU Working Group
Subject: Re: [Ltru] Clarifying Suppress-Script


Randy Presuhn scripsit:

>   Suppress-Script is a compatibility measure, and as such is
>   only appropriate for languages whose subtags were added to
>    the registry before RFC 4646 was approved.

+1

>   Even for such
>   languages, Supress-Script SHOULD NOT be used unless there
>   exists a significant body (as determined by the Language
>   Subtag Reviewer) of material tagged before the approval of
>   RFC 4646 without a script subtag.

Neither the LSR nor anyone else (not even people who work for
search-engine companies) is in a position to make any such determination.
Not all tagged material is open to public scrutiny.  I am therefore
opposed to adding this sentence.

--
With techies, I've generally found              John Cowan
If your arguments lose the first round          http://www.ccil.org/~cowan
    Make it rhyme, make it scan                 cowan <at> ccil.org
    Then you generally can
Make the same stupid point seem profound!           --Jonathan Robie


_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ltru

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.488 / Virus Database: 269.14.0/1049 - Release Date: 10/4/2007 8:59 AM



_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ltru



--
Mark
Mark Davis | 4 Oct 19:20
Favicon

Re: Summary of 2007-10-03 teleconference



On 10/3/07, Randy Presuhn <randy_presuhn <at> mindspring.com> wrote:
Hi -

There was a teleconference to discuss remaining open issues in
the ltru drafts this morning.  I volunteered to serve as scribe.
I hope I got it right.  If I didn't, I'm sure someone will
correct me.  In any case, let's see if we can finish this discussion
by working through the tradeoffs and agreeing on something, since
it's pretty clear at this point that neither approach is perfect.

The primary topic of discussion was the question of what to do
about macrolanguages and extlangs.  The two major alternatives
both had implications for matching.

There was general agreement that remove-from-right matching is not
a panacea.  There was less agreement about whether a macrolanguage-
extlang construct is ever helpful, or actually counterproductive,
in its interactions with matching.

One consequence of the "flat" approach (not including encapsulating
subtags in a language tag encoding) is that more intelligent matching

is that some people felt
[this was not universal]

would require additional information from the registry, which would
increase the risk that someone might build applications that would
read the registry, presenting a possible DoS with respect to IANA.

(Suppress Script already requires access to the registry. One could theoretically argue that this is only required on the tagging side, not the matching side, but in practice any reasonable application is going to probably need to have a snapshot of information from the registry.)

There was some disagreement about what is meant by a macrolanguage
subtag by itself.  Two examples: does zh mean the same thing as zh-cmn,
and does ar mean the same thing as ar-arb?  A *rough* consensus
of those participating in the teleconference (this must not be
understood as a determination of WG consensus) was that they mean
different things.  In this view, "zh" means "undifferentiated Chinese"
(much of which happens to be "zh-cmn"), and "ar" means "undifferentiated
Arabic".  A minority perspective was that because so much content
tagged with "zh" is "zh-cmn", they should be understood as effectively
synonymous.

One of the key issues we need to face are the problems introduced by either of the models:

no-NO
nn-NO (or no-nn-NO*)
nb-NO (or no-nb-NO*)

vs.

zh-Hant-HK
zh-yue-Hant-HK or yue-Hant-HK
zh-cmn-Hant-HK or cmn-Hant-HK

While theoretically no-NO and nb-NO are different, in practice every implementation I know of ends up having to treat them as synonyms -- that wouldn't have changed if they had been encoded as no-nb-NO and no-NO. The same is true of zh-Hant-HK and zh-cmn-Hant-HK, or ar-SA and ar-arb-SA.


In fleshing out the details of the alternative approaches, we need to
consider the potential for synonyms, and what guidance we provide to
taggers in the cases where synonyms could arise, as well as the extent
that we might want to distinguish between "true" and "effective" ones.

There was a side discussion about the possibility of additional
macrolanguages being registered, and about the possibility of the
registration of additional exncapsulated languages.  None of the
likely scenarios appeared to create any serious problems for either
proposal.

Yet another consideration discussed is the extent to which either approach
would require the retagging of existing data.

A final consideration will be how to explain it all to the BCP's users.

Randy



_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ltru



--
Mark
Randy Presuhn | 4 Oct 19:42
Picon

Re: Clarifying Suppress-Script

Hi -

As a technical contributor...

> From: "Mark Davis" <mark.davis <at> icu-project.org>
> To: "McDonald, Ira" <imcdonald <at> sharplabs.com>
> Cc: "John Cowan" <cowan <at> ccil.org>; "Randy Presuhn" <randy_presuhn <at> mindspring.com>; "LTRU Working
Group" <ltru <at> ietf.org>
> Sent: Thursday, October 04, 2007 10:06 AM
> Subject: Re: [Ltru] Clarifying Suppress-Script
...
> On the other hand, whether or not the language code was pre or post 4646
> doesn't seem to make much difference as far as the question of what I should
> do when interpreting these codes, so it seems from that that we should not
> make Suppress-Script depend on pre or post 4646.
...

As I recall, we added Suppress-Script because there was a large body
of already-tagged material which lacked script subtags, and we wanted
to be able to process that material as though it had been given
script subtags.  In this context, suppress-script is only meaningful
for coping with such material.

Somewhere along the way, the language in what became RFC 4646 got a bit
stronger, eventually ending up as "5.  There MUST be at most one script
subtag in a language tag, and the script subtag SHOULD be omitted when
it adds no distinguishing value to the tag or when the primary language
subtag's record includes a Suppress-Script field listing the applicable
script subtag."  This is still OK with me, but...

There seems to be a bit of semantic drift in the understanding of "Suppress-
Script", from its original meaning as a legacy data compatibility kluge
to a way of indicating for a particular language that a script subtag
would probably not add "distinguishing value".  It's clear from this
discussion that not all of us (myself included) have made such a shift.

If we as a WG want to adopt such a semantic shift, we should do so consciously
and explicitly, because it has a dramatic impact on the number of languages
for which the ietf-languages <at> iana.org list will have to debate the merits of
adding this information to the registry.

Randy

Mark Davis | 4 Oct 19:44
Favicon

Re: Clarifying Suppress-Script

I agree that we need to have clearer language, whichever way we go. As I said, I'm not sure of the right answer; I could see going either direction.

Mark

On 10/4/07, Randy Presuhn <randy_presuhn <at> mindspring.com> wrote:
Hi -

As a technical contributor...

> From: "Mark Davis" <mark.davis <at> icu-project.org>
> To: "McDonald, Ira" < imcdonald <at> sharplabs.com>
> Cc: "John Cowan" <cowan <at> ccil.org>; "Randy Presuhn" <randy_presuhn <at> mindspring.com >; "LTRU Working Group" <ltru <at> ietf.org>
> Sent: Thursday, October 04, 2007 10:06 AM
> Subject: Re: [Ltru] Clarifying Suppress-Script
...
> On the other hand, whether or not the language code was pre or post 4646
> doesn't seem to make much difference as far as the question of what I should
> do when interpreting these codes, so it seems from that that we should not
> make Suppress-Script depend on pre or post 4646.
...

As I recall, we added Suppress-Script because there was a large body
of already-tagged material which lacked script subtags, and we wanted
to be able to process that material as though it had been given
script subtags.  In this context, suppress-script is only meaningful
for coping with such material.

Somewhere along the way, the language in what became RFC 4646 got a bit
stronger, eventually ending up as "5.  There MUST be at most one script
subtag in a language tag, and the script subtag SHOULD be omitted when
it adds no distinguishing value to the tag or when the primary language
subtag's record includes a Suppress-Script field listing the applicable
script subtag."  This is still OK with me, but...

There seems to be a bit of semantic drift in the understanding of "Suppress-
Script", from its original meaning as a legacy data compatibility kluge
to a way of indicating for a particular language that a script subtag
would probably not add "distinguishing value".  It's clear from this
discussion that not all of us (myself included) have made such a shift.

If we as a WG want to adopt such a semantic shift, we should do so consciously
and explicitly, because it has a dramatic impact on the number of languages
for which the ietf-languages <at> iana.org list will have to debate the merits of
adding this information to the registry.

Randy



_______________________________________________
Ltru mailing list
Ltru <at> ietf.org
https://www1.ietf.org/mailman/listinfo/ltru



--
Mark

Gmane