Doug Ewell | 28 Aug 03:09
Picon

Re: STD (was: Last Call: 'Tags for Identifying Languages' to BCP)

Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote:

> JFTR, nothing new, I still prefer standards track:
>
> 3066bis   could obsolete 3066 (BCP 47) also as PS.
> 3066ter   oould integrate ISO 639-3 as DS (minor update)
> 3066tetra oould integrate ISO 639-6 as STD

I still have significant concerns about the assumption that ISO 639-6
will be, or should be, automatically integrated into a language tagging
scheme.  The Linguasphere database upon which 639-6 is based is still
not freely available to the public for evaluation and examination, and
seems unlikely ever to be, unlike the Ethnologue database upon which ISO
639-3 is based.

Moreover, with ISO 639-3 already claiming to provide coverage for "all
known human languages," numbering a bit over 7,600, and with
Linguasphere listing "over 20,000 languages and constituent dialects,"
one is left to wonder just what the 13,400 new identifiers will
contribute from the standpoint of identifying and requesting linguistic
content.  Will "en-US", "en-GB", "en-AU", and "en-IN" be given their own
four-letter codes in ISO 639-6, and what will be the rules for
correlating and equating these with the identifiers already in place?

Meanwhile, the claim that there are "over 20,000 languages" to be tagged
is being used as an argument against the current RFC 3066bis effort and
the plan to support 7,600 languages in RFC 3066ter.

I'm not saying anything against the Linguasphere effort per se, but with
the limited knowledge available to me, I don't think its eventual role
(Continue reading)

Doug Ewell | 28 Aug 07:15
Picon

Re: Minor regret

Addison Phillips <addison dot phillips at quest dot com> wrote:

> 1. As an implementer I don't like the idea having to analyze the
> content of each character in a subtag in order to recognize the subtag
> type. All subtags distinguished by content in the current grammar are
> distinguished on the first character. There are two such cases now:
>
> - 4 characters: Variant (initial digit) vs. script (initial alpha)
> - 3 characters: extlang (initial alpha) vs. UN M.49 code (initial
>   digit)

You have to look at every character anyway if you are going to check for
well-formedness.

--
Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Frank Ellermann | 28 Aug 07:14
Picon
Picon

Re: STD

Doug Ewell wrote:

> I still have significant concerns about the assumption that
> ISO 639-6 will be, or should be, automatically integrated
> into a language tagging scheme.

Yes, it wasn't my intention to say something else, I used it
only for my standards track example.  More about "BCP" on the
general list, a better solution could be a separate series
for "non-technical" (admin / meta) documents.

> the claim that there are "over 20,000 languages" to be tagged
> is being used as an argument against the current RFC 3066bis
> effort and the plan to support 7,600 languages in RFC 3066ter

Jefsey mumbled something in this direction, but IMHO it was not
convincing - we discussed potential size issues here.  His real
concern is "who controls IANA", that's interesting, but I'm not
worried, and it's no reason to block 3066bis.

BTW, I like number [7] in the rather long list. <g>  Bye, Frank

Doug Ewell | 28 Aug 07:36
Picon

Reloading the registry (was: STD)

John.Cowan <jcowan at reutershealth dot com> wrote:

> Indeed.  We'll probably want to issue another I-D to reload the
> registry with all the thousands of new values.  Another reason for
> going dormant rather than shutting up shop when RFC 3066bis and
> friends are finished.

I've already built a hypothetical RFC 3066ter registry.  The changes
alone add up to 35,700 lines, or more than 740 pages in RFC format.  It
might reopen the question of whether an I-D is the best vehicle for
delivering this amount of information to IANA.

--
Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Doug Ewell | 28 Aug 07:44
Picon

Re: STD (was: Last Call: 'Tags for Identifying Languages' to BCP)

I wrote:

> Moreover, with ISO 639-3 already claiming to provide coverage for "all
> known human languages," numbering a bit over 7,600, and with
> Linguasphere listing "over 20,000 languages and constituent dialects,"
> one is left to wonder just what the 13,400 new identifiers will
> contribute from the standpoint of identifying and requesting
> linguistic content.

Recte: 12,400

--
Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Debbie Garside | 28 Aug 13:22
Picon

RE: Re: STD (was: Last Call: 'Tags for Identifying Languages' toBCP)


Hi

See in line comments

> -----Original Message-----
> From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf Of
> Doug Ewell
> Sent: 28 August 2005 02:10
> To: LTRU Working Group
> Subject: [Ltru] Re: STD (was: Last Call: 'Tags for Identifying Languages'
> toBCP)
> 
> Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote:
> 
> > JFTR, nothing new, I still prefer standards track:
> >
> > 3066bis   could obsolete 3066 (BCP 47) also as PS.
> > 3066ter   oould integrate ISO 639-3 as DS (minor update)
> > 3066tetra oould integrate ISO 639-6 as STD
> 
> I still have significant concerns about the assumption that ISO 639-6
> will be, or should be, automatically integrated into a language tagging
> scheme.  The Linguasphere database upon which 639-6 is based is still
> not freely available to the public for evaluation and examination, and
> seems unlikely ever to be, unlike the Ethnologue database upon which ISO
> 639-3 is based.

I don't think anyone should make assumptions where technology is concerned!
What I can say is that the database for the 639-6 standard will be made
(Continue reading)

JFC (Jefsey) Morfin | 28 Aug 15:55

Re: Last Call: language root file size

Last Call: http://www.ietf.org/internet-drafts/draft-ietf-ltru-initial-04.txt

I started documenting some of the problems resulting from the 
expected size of the language tag registry and the capacity of the 
langtag solution to fulfill the WG-ltru Charter. Here are two inputs 
from the author of the Draft above on the WG-ltru list, Doug Ewell:

- "I've already built a hypothetical RFC 3066ter registry.  The 
changes alone add up to 35,700 lines, or more than 740 pages in RFC format.  It
might reopen the question of whether an I-D is the best vehicle for 
delivering this amount of information to IANA."

- "I still have significant concerns about the assumption that ISO 
639-6 will be, or should be, automatically integrated into a language tagging
scheme. [snip] Meanwhile, the claim that there are "over 20,000 
languages" to be tagged is being used as an argument against the 
current RFC 3066bis effort and the plan to support 7,600 languages in 
RFC 3066ter."

I fully share the concerns of Doug Ewell. There is only a difference 
of timing. I rose questions they took as oppositions and he now 
discovers as problems. Would the WG-ltru have first analysed its 
charter we would have identified them a long ago. I list some in annex.

The Charter says: "The RFC 3066 standard for language tags has been 
widely adopted in various protocols and text formats, including HTML, 
XML, and CLDR, [... the first document] is also expected to provide 
mechanisms to support the evolution of the underlying ISO standards, 
in particular ISO 639-3, mechanisms to support variant registration 
and formal extensions, as well as allowing generative private use 
(Continue reading)

r&d afrac | 28 Aug 15:54

RE: Re: STD (was: Last Call: 'Tags for Identifying Languages' toBCP)

Debbie,
you will see my comments in the mail of comments/support to Doug 
Ewell. All this is good and clear. Except your timing. When do you 
think we can have a test sample, to be able to develop? I suppose 
IANA would be interested too. Ditto, Doug: could you release your 
complete test ISO-639-3 included base?
jfc

At 13:22 28/08/2005, Debbie Garside wrote:
>See in line comments
> > -----Original Message-----
> > From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf Of
> > Doug Ewell
> > Sent: 28 August 2005 02:10
> > To: LTRU Working Group
> > Subject: [Ltru] Re: STD (was: Last Call: 'Tags for Identifying Languages'
> > toBCP)
> >
> > Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote:
> >
> > > JFTR, nothing new, I still prefer standards track:
> > >
> > > 3066bis   could obsolete 3066 (BCP 47) also as PS.
> > > 3066ter   oould integrate ISO 639-3 as DS (minor update)
> > > 3066tetra oould integrate ISO 639-6 as STD
> >
> > I still have significant concerns about the assumption that ISO 639-6
> > will be, or should be, automatically integrated into a language tagging
> > scheme.  The Linguasphere database upon which 639-6 is based is still
> > not freely available to the public for evaluation and examination, and
(Continue reading)

Debbie Garside | 28 Aug 16:54
Picon

RE: Re: STD (was: Last Call: 'Tags for Identifying Languages' toBCP)

Hi JFC

I would not release any data until the November draft.

Cheers

Debbie

> -----Original Message-----
> From: r&d afrac [mailto:rd <at> afrac.org]
> Sent: 28 August 2005 14:54
> To: Debbie Garside; 'Doug Ewell'; 'LTRU Working Group'
> Subject: RE: [Ltru] Re: STD (was: Last Call: 'Tags for Identifying
> Languages' toBCP)
> 
> Debbie,
> you will see my comments in the mail of comments/support to Doug
> Ewell. All this is good and clear. Except your timing. When do you
> think we can have a test sample, to be able to develop? I suppose
> IANA would be interested too. Ditto, Doug: could you release your
> complete test ISO-639-3 included base?
> jfc
> 
> 
> At 13:22 28/08/2005, Debbie Garside wrote:
> >See in line comments
> > > -----Original Message-----
> > > From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf
> Of
> > > Doug Ewell
(Continue reading)

Peter Constable | 28 Aug 18:10
Picon
Favicon

RE: STD (was: Last Call: 'Tags for Identifying Languages' toBCP)

> From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf
Of
> Addison Phillips

> ISO 639-6 codes are easily dealt with: there is a slot reserved for
them
> in the ABNF that implementers ignore at their peril and the only
change is
> the management of registrations.
> 
> ISO 639-3 is slightly more complicated.

Actually, I think it could well be the other way around. 

Yes, from a purely syntactic perspective 639-6 would be straightforward.
But the same would be true for 639-3: any alpha-4 from 639-6 can go into
the initial subtag position, and any alpha-3 from 639-3 can go into
initial or extlang subtags. But 639-3 won't be quite that simple because
we want to avoid certain things:

- e.g. don't use 639-3 alpha-3 in initial subtag when there's an
equivalent alpha-2

- there must be some constraint on the relationship between lang and
extlang subtags to avoid uninterpretable anomalies like en-fre, or
arb-ar (ar = generic Arabic, arb = specifically Standard Arabic) --
assuming the second subtag qualifies the first subtag

- we want to avoid the wrong matching behaviour that would arise from
tags like arb-ar
(Continue reading)


Gmane