Tracey, Niall | 1 Sep 11:39
Picon
Favicon

RE: acade - LANGUAGE SUBTAG REGISTRATION FORM

Yury Tarasievich said:

> Why want the subtags' names to be hierarchical (to contain genealogy)
at all?

From a systems point of view it's very helpful, in terms of search and
backwards compatibility, where there is a high level of similarity
between forms -- even more so when we are talking about current
standards.

If I wrote a search today that identifies all Academy-standard
Belarusian text in a database or library, I'd want it to work tomorrow
and next year, and the year after that. However, a search on be-1959acad
would cease to bring up new text as soon as be-2008acad became the
official norm. These dated tags are transient, and there is no permanent
umbrella tag that a librarian, systems developer or member of the public
can use to tie the variants together.

A library may set up separate sections for classical and academy texts,
but we would expect 2008 standard academy texts to be shelved alongside
books written in the current standard.  As such, we want an identifiable
common element, so the librarian can say to a trainee "all be-academy go
here, all be-tarask go here".  This is vastly preferable to the
alternative of "all be go here, unless they are be-tarask" -- what if
someone was to introduce be-arabic?  We wouldn't want to shelve that
alongside the cyrillic texts, but our official procedures would say to
do so.

In a normal library human common sense may prevail in this case, but in
automated systems that option isn't available.
(Continue reading)

Yury Tarasievich | 1 Sep 12:45
Picon

Re: acade - LANGUAGE SUBTAG REGISTRATION FORM

Tracey, Niall wrote:
> Yury Tarasievich said:
> 
>> Why want the subtags' names to be hierarchical (to contain genealogy)
> at all?
...
> If I wrote a search today that identifies all Academy-standard
...
<search example skipped>

As I see it, you are just presupposing a certain architecture of the 
search system(s) which *would* dictate the need of a certain metadata 
structure.
And search systems common today are already more sophisticated than that.

-Yury
CE Whitehead | 1 Sep 23:32
Picon

RE: acade - LANGUAGE SUBTAG REGISTRATION FORM


Hi

I think there are precendents for adding academy or akademy or akadem now and then adding in 1950 and 2008 or
2010 variants later (the Resian orthography subtags), but there are also precedents for doing things
Yuri's way (adding two varieties, be-1959acad and be-2010acad (the 16th-17th century French subtags,
1694acad and 1606nict--the latter being troubling because it is so cryptic when in school I always
learned 16eme siecle 17eme siecle, but . . . )

I want to allow Yuri and others who use this language to decide, and I'm happy to abstain from voting for one or
the other version.

(P.S.  I would not ask for 1959acad-rev1959; maybe 1950acad-1959 or 1959acad-unrev  but these are all
cryptic now anyway when I look at them, and since this suggestion was not well-received, forget it).

--C. E. Whitehead

From: "Tracey, Niall" niall.tracey <at> logica.com

> Yury Tarasievich said:

>> Why want the subtags' names to be hierarchical (to contain genealogy)
at all?

>>From a systems point of view it's very helpful, in terms of search and
backwards compatibility, where there is a high level of similarity
between forms -- even more so when we are talking about current
standards.

> If I wrote a search today that identifies all Academy-standard
(Continue reading)

ISO639-3 | 2 Sep 18:44
Favicon

RE: [Ltru] Results of Duplicate Busters Survey #1


The disambiguation of Dimli and Kirmanjki are complete.

These changes are now posted publicly both online and in download tables.

Joan Spanne
ISO 639-3/RA
SIL International
7500 W Camp Wisdom Rd
Dallas, TX 75236
ISO639-3 <at> sil.org


Doug Ewell wrote:
>
> Type: language
> Subtag: diq
> Description: Dimli
> --> REPLACE WITH: Description: Dimli (individual language)
> Added: 2029-09-09
> Macrolanguage: zza
>
> Type: language
> Subtag: kiu
> Description: Kirmanjki
> --> REPLACE WITH: Description: Kirmanjki (individual language)
> Added: 2029-09-09
> Macrolanguage: zza
>
> Type: language
> Subtag: zza
> Description: Zaza
> Description: Dimili
> Description: Dimli
> --> REPLACE WITH: Description: Dimli (macrolanguage)
> Description: Kirdki
> Description: Kirmanjki
> --> REPLACE WITH: Description: Kirmanjki (macrolanguage)
> Description: Zazaki
> Added: 2006-08-24
> Scope: macrolanguage
>
> Discussion:
> These three cases are handled together due to their commonality.  Both
> Dimli and Kirmanjki are individual languages encompassed within Zaza,
> which may also be called Dimli or Kirmanjki, neatly exemplifying the
> concept of a macrolanguage.  The strings "individual language" and
> "macrolanguage" are used extensively in 639-3 for this
> purpose; see, for
> example, Dogri.

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
Tracey, Niall | 2 Sep 19:27
Picon
Favicon

RE: acade - LANGUAGE SUBTAG REGISTRATION FORM


CE Whitehead
> I think there are precendents for adding academy or akademy or
> akadem now and then adding in 1950 and 2008 or 2010 variants
> later (the Resian orthography subtags), but there are also
> precedents for doing things Yuri's way (adding two varieties,
> be-1959acad and be-2010acad (the 16th-17th century French subtags,
> 1694acad and 1606nict--the latter being troubling because it is so
> cryptic when in school I always learned 16eme siecle 17eme siecle,
> but . . . )

I don't see those as valid precedents, as those tags describe historical
varieties.
As they were never current during the lifespan of the standard, we
didn't encounter the problem of the nominal norm changing overnight.

Yury said:
> As I see it, you are just presupposing a certain architecture
> of the search system(s) which *would* dictate the need of a
> certain metadata structure.

I am not suggesting that this is the *only* architecture that will
exist, nor am I suggesting that it is a *good* architecture. I am simply
saying that we can confidently predict that such systems *will* exist,
and an explicit hierarchy can pre-emptively address the problems this
would cause.

In fact, I'd say that defining a metadata structure with minimal
redundancy presupposes a certain (efficient) architecture, when defining
one with substantial redundancy presupposes nothing.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain
proprietary material, confidential information and/or be subject to legal privilege. It should not be
copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then
please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Yury Tarasievich | 2 Sep 20:13
Picon

Re: acade - LANGUAGE SUBTAG REGISTRATION FORM

Tracey, Niall wrote:
...

> ...I am simply
> saying that we can confidently predict that such systems *will* exist,
> and an explicit hierarchy can pre-emptively address the problems this
> would cause.
>
> In fact, I'd say that defining a metadata structure with minimal
> redundancy presupposes a certain (efficient) architecture, when defining
> one with substantial redundancy presupposes nothing.
>
>   
I strongly doubt both 1) and 2). And there already exist major tagging 
systems dealing with the linguistical minutiae.

-Yury
Frank Ellermann | 3 Sep 08:47
Picon
Picon

Re: [Ltru] Results of Duplicate Busters Survey #1

Joan Spanne wrote:

> The disambiguation of Dimli and Kirmanjki are complete.

Thanks.  Seeing that the LTRU review process sometimes
helps to find and fix minor oddities such as Dimli or
the "ork" proposal, and that the subtag review process
also sometimes helps to find things like "eur" or the
"frs" vs. "stq" case, I wonder if this could be a less
ad hoc institution.

If ISO 639-3 languages are only added on request to the
IANA registry instead of a bulk update this list would
have a chance to find a Suppress-Script issues and note
potential issues.

Otherwise what Doug found in his surveys will be more
or less all you ever get from here as feedback.

 Frank
Peter Constable | 3 Sep 16:00
Picon
Favicon

RE: LANGUAGE SUBTAG REGISTRATION FORM (R3): pinyin

From: ietf-languages-bounces <at> alvestrand.no [mailto:ietf-languages-bounces <at> alvestrand.no] On
Behalf Of Doug Ewell

> Here are the proposed new records and registration forms, for a two-week
> review period.  (Sorry, guys: RFC 4646, Section 3.7.)  Eligible to be
> added Wednesday, September 9 at 3:00 UTC, unless someone objects or
> finds a problem.
>
> ===
>
> LANGUAGE SUBTAG MODIFICATION
...

> Prefix: zh

IMO this should be "zh-Latn".

More generally, it has always been my opinion that variant subtags denoting a particular written form
should always be prefixed by a script subtag except when Suppress-Script applies.

Peter
Peter Constable | 3 Sep 16:24
Picon
Favicon

RE: wadegile and pinyin LANGUAGE SUBTAG REGISTRATION FORMs

From: ietf-languages-bounces <at> alvestrand.no [mailto:ietf-languages-bounces <at> alvestrand.no] On
Behalf Of Michael Everson
Sent: Tuesday, August 26, 2008 6:31 AM

>> Is that to say you approve them with a Prefix value of
>> "zh-Latn", as shown on Mark's "R2" registration forms?
>
> Erm, no. Both Wade Giles and Hanyu Pinyin imply Latin
> inherently, in my opinion.

I think we all agree that Latin is implied. Chinese is also implied. By this rationale, a complete tag of
"wadegile" would work just as well as "zh-wadegile" (BCP47 syntax requirements aside). In terms of
semantic representation, that is true: "wadegile" contains just as much information as does "zh-wadegile".

But in processing operations, they are not equal: having a separate subtag denoting the 'Chinese'
semantic (or, in a 4646bis era, the 'Mandarin' semantic) makes it easy for processes to recognize that
without needing to have tables recording the relationship between "wadegiles" and "zh". In just the same
way, including "Latn" frees processes from needing to have tables recording the relationship between
"wadegiles" and "Latn".

We need to consider how tags will get used -- in matching -- together with the matching algorithms described
in BCP47 (RFC 4647). Realistic scenarios include

- matching a request for "zh-Latn" content with content tagged to indicate Wade Giles or Hanyu Pinyin Romanizations

- matching a request for Wade Giles or Pinyin content with the best-available match, which may be content
tagged "zh-Latn"

Those are made more complicated if "Latn" is not part of the prefix for "wadegile" and "pinyin".

Peter
John Cowan | 3 Sep 16:52

Re: wadegile and pinyin LANGUAGE SUBTAG REGISTRATION FORMs

Peter Constable scripsit:

> I think we all agree that Latin is implied. Chinese is also implied. By
> this rationale, a complete tag of "wadegile" would work just as
> well as "zh-wadegile" (BCP47 syntax requirements aside). In terms of
> semantic representation, that is true: "wadegile" contains just as
> much information as does "zh-wadegile".

However, all tags MUST have a language subtag, so this analogy is not on all fours.

> Those are made more complicated if "Latn" is not part of the prefix for
> "wadegile" and "pinyin".

You still need to know that "wadegile" and "pinyin" imply Latin, because
the Prefix for a subtag is only a SHOULD, so people are still free to
send you "zh-wadegile" whether the Prefix says "zh" or "zh-Latn".

--

-- 
My confusion is rapidly waxing          John Cowan
For XML Schema's too taxing:            cowan <at> ccil.org
    I'd use DTDs                        http://www.ccil.org/~cowan
    If they had local trees --
I think I best switch to RELAX NG.

Gmane