Re: [semi-OT] ISO 639-3, record-jars, and Boontling


At 07:41 21/03/2005, Randy Presuhn wrote: >Hi - >(co-chair hat off) > > From: "John Cowan" <jcowan <at> reutershealth.com> > > To: "Randy Presuhn" <randy_presuhn <at> mindspring.com> > > Cc: <ltru <at> ietf.org> > > Sent: Sunday, March 20, 2005 8:41 PM > > Subject: Re: [Ltru] [semi-OT] ISO 639-3, record-jars, and Boontling >... > > From what I understand of the IETF process, this WG could go dormant > > until ISO 639-3 was released, and then wake up for a relatively brief > > period to incorporate the necessary few paragraphs in RFC 3066bis to > > create RFC 3066ter. >... > >I think it would be really nice if we could put in text now that would not >need revision later.
I fully agree with that. In particular ISO 639-4, which should permits to make the BCP 047 consistent with ISO 639-3, ISO 639-6 and the IANA entries - and obviously ISO 3166-2. jfc
L.Gillam | 6 Apr 02:16
Picon
Favicon

RE: [semi-OT] ISO 639-3, record-jars, and Boontling

 
639-4 is going to be "a while" in process. N.B. 5 month DIS ballot for 639-3, possible FDIS stage then publication (final editorial phase with ISO central secretariat). Unless activity on -4 suddenly happens at the Berlin metadata event next week, which it might or it might not, I'm not aware of a current timescale for this document beyond ISO time limits (the clock is certainly ticking).
 
Perhaps Peter Constable can comment on whether existing comments received at CD ballot have resulted in any alteration of the essence of -3, comments on the codes themselves (not part of the published standard) aside.

From: ltru-bounces <at> lists.ietf.org on behalf of JFC (Jefsey) Morfin
Sent: Wed 06/04/2005 00:48
To: Randy Presuhn; ltru
Subject: Re: [Ltru] [semi-OT] ISO 639-3, record-jars, and Boontling

At 07:41 21/03/2005, Randy Presuhn wrote:
>Hi -
>(co-chair hat off)
> > From: "John Cowan" <jcowan <at> reutershealth.com>
> > To: "Randy Presuhn" <randy_presuhn <at> mindspring.com>
> > Cc: <ltru <at> ietf.org>
> > Sent: Sunday, March 20, 2005 8:41 PM
>

> Subject: Re: [Ltru] [semi-OT] ISO 639-3, record-jars, and
Boontling
>...
> > From what I understand of the IETF process, this WG could go dormant
> > until ISO 639-3 was released, and then wake up for a relatively brief
> > period to incorporate the necessary few paragraphs in RFC 3066bis to
> > create RFC 3066ter.
>...
>
>I think it would be really nice if we could put in text now that would not
>need revision later.

I fully agree with that. In particular ISO 639-4, which should permits to
make the BCP 047 consistent with ISO 639-3, ISO 639-6 and the IANA entries
- and obviously ISO 3166-2.
jfc


_______________________________________________
Ltru mailing list
Ltru <at> lists.ietf.org
https://www1.ietf.org/mailman/listinfo/ltru

John Cowan | 6 Apr 04:04

Re: Matching metrics (was: Registry in record-jar format)

Frank Ellermann scripsit:

> I've added 0 for '*' = '*' and no match.  Otherwise 8/4/2/1,
> is this what you wanted ?  Your metrics has apparently a
> problem with en-Latn-US-scouse:

I'm not clear on whether * = * should count as a match or a no-match.
Originally I thought it should count as a match, but perhaps not.

> If one side wants en-GB-scouse, and the other side offers
> en-Latn-US-scouse (9) or en-Latn-GB (10), and it also has
> en-Brai-GB-scouse (11), then en-Brai-GB-scouse "wins".  All
> in the 2nd column for en-GB-scouse.

Fortunately en-US-scouse doesn't exist.

> Not okay, but not completely unintentional, for some languages
> I can guess what the text is about, as long as it's Latn:  The
> combined power of forgotten school Latin plus miserable French
> sometimes helps with es or pt.  But with ru I'd be lost - with
> luck I can decode some Cyrl.  For fy my chances are lousy, for
> dk or nl it's better than zero.

Fair enough.

> One effect you see with both metrics;  If one side wants
> en-scouse, and the other side has only en-Latn-US-scouse and
> en-Brai-GB-scouse, you get a draw.  Apparently your algorithm
> cannot completely replace the "default script" approach.

Well, that problem applies at all levels: if you ask for en-AU,
then no algorithm can choose between the offered en-GB and en-US
(except RFC 2616, which will simply fail).

For that matter, if you ask for de and nn and nb are all that's
available, the matching algorithm won't help then either.

--

-- 
Is a chair finely made tragic or comic? Is the          John Cowan
portrait of Mona Lisa good if I desire to see           jcowan <at> reutershealth.com
it? Is the bust of Sir Philip Crampton lyrical,         www.ccil.org/~cowan
epical or dramatic?  If a man hacking in fury           www.reutershealth.com
at a block of wood make there an image of a cow,
is that image a work of art? If not, why not?               --Stephen Dedalus

Doug Ewell | 6 Apr 04:56
Picon

Re: Date of deprecation

Addison Phillips <addison dot phillips at quest dot com> wrote:

>>> One of our chairs proposed to prepare the LTRU registry as far
>>> as possible for this case.  IMHO a good idea, of course not for
>>> what you plan to publish tomorrow.
>>
>> I missed something there.  Did Randy or Martin say something about
>> working on the registry itself?
>
> To prepare the 3066bis regime to handle ISO 639-3 as much as possible
> is what I believe Frank means.

and Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote:

> What I meant was Randy's (hat off) question...
>
> | If we already have a good idea how the ISO specifications
> | will look, wouldn't it make sense to ensure that 3066bis
> | works correctly for those cases, so we don't have to bother
> | doing a 3066ter?
>
> | I think it would be really nice if we could put in text now
> | that would not need revision later.  It sounds like all that
> | is needed is text that spells out precisely when using the
> | macro-language code is required.

Ah, you are right.  Technically that's preparing the draft, not the
registry per se, which is why I missed something.

But I completely agree, and think we should include as much information
as we can -- depending on how much confidence we have in the stability
of the 639-3 draft between now and the time it's published -- about how
extlangs will work, and which types of 639-3 codes will be primary
language subtags and which will be extlangs.

-Doug Ewell
 Fullerton, California
 http://users.adelphia.net/~dewell/

Randy Presuhn | 6 Apr 05:00
Picon

Re: Re: Proposed changes to region subtags

Hi -

> From: "Frank Ellermann" <nobody <at> xyzzy.claranet.de>
> To: <ltru <at> ietf.org>
> Sent: Saturday, April 02, 2005 1:43 PM
> Subject: [Ltru] Re: Proposed changes to region subtags
...
> And one reason pro XML (clear support for Unicode character
> references in an otherwise US-ASCII text).  JFTR, I prefer
> record-jar in the body of the I-D registry.
...

Pardon the naive question:
Couldn't we simply use Unicode character references (as needed)
in a record-jar format?

Randy

Randy Presuhn | 6 Apr 05:32
Picon

Re: Re: Registry in RDF/XML format

Hi -

> From: "Doug Ewell" <dewell <at> adelphia.net>
> To: "LTRU Working Group" <ltru <at> ietf.org>
> Sent: Tuesday, April 05, 2005 7:40 AM
> Subject: [Ltru] Re: Registry in RDF/XML format
...
> Can we specify *how* they build the alternative-format registry files?
> If so, great.
...

Yes, I we could do that, but that would be a normative part of the
document via the IANA considerations section, even if what is
thereby generated is considered "informative".

I suspect that the IESG would want to see a pretty good rationale
for having IANA generate technically redundant representations of
the registry content.

Randy (as process-aware contributor)

Randy Presuhn | 6 Apr 05:51
Picon

Re: Re: Language transformation

Hi -

(<co-chair hat ON>)

> From: "JFC (Jefsey) Morfin" <jefsey <at> jefsey.com>
> To: "Mark Davis" <mark.davis <at> jtcsv.com>
> Cc: <ltru <at> ietf.org>
> Sent: Thursday, March 31, 2005 10:29 AM
> Subject: Re: [Ltru] Re: Language transformation
>

> On 19:09 31/03/2005, Mark Davis said:
> >The plan is for transliterations to be added to CLDR at some point, and
> >that would provide stable, predictable IDs that could be used for
> >variants. I suggest that we not try to incorporate transliteration
> >variants prematurely. For some more formal, structured mechanism of
> >describing the variants, it would a perfect case for the extension
> >mechanism proposed in the draft, where some other RFC could have
> >something like ru-Latn-t-ungegn-t-en.
>
> This kind of remark obviously make me react. Whose plan: IETF, IESG, IAB?

Obviously not, since CLDR is a Unicode project.
http://www.unicode.org/cldr/

> 1. this CLDR plan should be documented in the Draft as per Charter. At
> least to explain why it is not supported now.
...

There is no such requirement in the charter.  The charter mentions CLDR as
one of the users of RFC 3066.

Randy

Re: Re: Language transformation


At 05:51 06/04/2005, Randy Presuhn wrote: >This kind of remark obviously make me react. Whose plan: IETF, IESG, IAB? > >Obviously not, since CLDR is a Unicode project. >http://www.unicode.org/cldr/ > > > 1. this CLDR plan should be documented in the Draft as per Charter. At > > least to explain why it is not supported now. >... > >There is no such requirement in the charter. The charter mentions CLDR as >one of the users of RFC 3066.
I dont think this is correct. The CLDR is not documented by any RFC. This is a copyrighted private project. There are other projects with comparable aims and different needs. The needs of the CLDR are parts of the needs calling a the BPC 047 review. In not quoting CLDR the current draft implies that the CLDR needs are satisfied what puts it ad advantage over other projects or projects studies the needs of which are not satisfied. I fully understand that CLDR IPRs should be respected, the same as other projects. But the Internet standard process calls for protected solutions to be considered after open ones. jfc
Doug Ewell | 6 Apr 06:11
Picon

Re: Matching metrics (was: Registry in record-jar format)


Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote: > I've added 0 for '*' = '*' and no match. Otherwise 8/4/2/1, > is this what you wanted ?
No, that's not right. '*' = '*' is a match, just the same as 'de' = 'de', and should be assigned the full complement of points, depending on which subtag it is. 'de' = 'en', by contrast, would be a zero. Suppose I request "sr-CS". The script is not specified, so in our matching syntax, this is equivalent to "sr-*-CS". Accordingly, this request should retrieve content labeled "sr-CS", "sr-Latn-CS", "sr-Cyrl-CS", or "sr-Yiii-CS" with equal success. '*' matches everything, and earns full points. Try your metrics again. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
Doug Ewell | 6 Apr 06:17
Picon

Re: Matching metrics (was: Registry in record-jar format)


John Cowan <jcowan at reutershealth dot com> wrote: > I'm not clear on whether * = * should count as a match or a no-match. > Originally I thought it should count as a match, but perhaps not.
It has to be a match. Otherwise, additional elements don't serve the purpose of narrowing down the scope (for either requested or tagged content). They just get in the way. -Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/

Gmane