Mark Davis | 1 Aug 01:09
Favicon

Re: LANGUAGE SUBTAG REGISTRATION FORM: pinyin

Ok, good. I'll continue with that then.

Mark


On Thu, Jul 31, 2008 at 12:51 PM, John Cowan <cowan <at> ccil.org> wrote:
Phillips, Addison scripsit:

> Note: we can have both Prefixes. Variants are allowed to have multiples.
>
> I think requiring the use of the 'Latn' subtag goes against the spirit
> of one of the language tagging maxims, which is: only use a subtag if
> it adds information. In fact, I think that the 'Latn' subtag adds no
> information to 'pinyin' or, for that matter, 'fonipa'. We shouldn't
> try to use the registry to enforce the exact tag for every situation.

Fortunately, prefixes for variant tags are a SHOULD, not a MUST, so
there is no requirement/enforcement here.  On that understanding, and
for the sake of useful documentation, I'm currently okay with zh-Latn.

--
John Cowan   cowan <at> ccil.org    http://ccil.org/~cowan
I come from under the hill, and under the hills and over the hills my paths
led. And through the air. I am he that walks unseen.  I am the clue-finder,
the web-cutter, the stinging fly. I was chosen for the lucky number.  --Bilbo
_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
Doug Ewell | 1 Aug 03:54
Favicon

Re: Duplicate Busters: Survey #1

Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote:

> If a 4646bis would be published it apparently offers a new
> field "Scope" to disambiguate "macrolanguage" from the same
> description used for an "individual language".
>
> If that is so the context explaining why there is a dupe is
> in the registry, a similar situation as for the "deprecated"
> dupes.  Therefore the source description should stay as is.

I guess this comment applies to Dimli/Kirmanjki/Zaza.

> For three of the four remaining cases (Aruá, Awa, Murik) it
> would be nice to get the disambiguation in the *source*, or
> as a comment.  Don't touch their Descriptions, it is the job
> of the source to get it right, and it is the job of Comments
> to offer critical missing info.

I agree with Frank (and Debbie) that it would be good to have ISO 639-3 
disambiguate these.  At first I was concerned about their turnaround 
time, but then I thought of LTRU's milestones and laughed at myself.

The changes I am proposing to the source names consist only of adding 
parenthetical annotations to the existing names, in the same (and only) 
way that ISO 639 already uses parentheses in language names:

Interlingua (International Auxiliary Language Association)
Malay (macrolanguage)
Occitan (post 1500)
Slave (Athapascan)
Tonga (Tonga Islands)

> The last case (Borna) is arguably no real dupe, bwo is also
> known as Boro, bbx is only known as Borna.

ISO 639-3 makes a distinction between the reference and non-reference 
names.  RFC 4646bis does not, although it will non-normatively place the 
ISO 639-3 reference name (if any) first within the record.  So this is 
not a problem for 639-3, but it is for us.

> Survey #1, are there coming more ?

There will be a #2.  Stay tuned.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
Doug Ewell | 1 Aug 04:22
Favicon

Re: LANGUAGE SUBTAG REGISTRATION FORM: pinyin

CE Whitehead <cewcathar at hotmail dot com> wrote:

> The prefix is zh-Latn.
>
> The solution would be to use suppress-script except that that is used 
> only for language subtags, not for variant subtags.
>
> A suppress-script for the language part of the tag, zh , that would 
> work here--because Latin is clearly not the default script for zh.

This is not an accurate depiction of the purpose or use of 
Suppress-Script.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
Doug Ewell | 1 Aug 04:35
Favicon

Re: Duplicate Busters: Survey #1 [bwo] and [bxx]

Joan Spanne <ISO639 dash 3 at sil dot org> wrote:

> I will take action to resolve the matter for each of the four (Aruá; 
> Awa; Borna; Murik) by early next week.

I'm delighted to see that the RA is willing to resolve these conflicts 
within ISO 639-3, so we don't have to deviate from them.  I'll hold off 
on any action here until the new 639-3 files come out.

> I am inclined to accept Doug's recommendations, with the exception of 
> [Aruá]. Precedent within the standard uses a state or province level 
> geographic qualifier, so those would be [arx] "Aruá (Rodonia State)" 
> and [aru] "Aruá (Amazonas State)". If they were geographically 
> proximal to the district level, the next choice of qualifier would be 
> classification based (the highest level where they are distinct).

That is perfectly fine.  I wanted to stay consistent with the precedent, 
but couldn't find state-level identifications in the Ethnologue pages 
(though I see it now for 'arx').

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
Doug Ewell | 1 Aug 07:02
Favicon

Duplicate Busters: Survey #2

This is the second of two surveys being sent to both LTRU and 
ietf-languages on the subject of removing certain duplicate Description 
fields in the Language Subtag Registry.  Some of these issues affect the 
current Registry, while others affect only the proposed RFC 4646bis 
Registry being considered by LTRU.

Whereas the first survey dealt with eliminating duplicates across 
records by adding differentiating text, this survey deals with removing 
essentially duplicate Description fields within a record.  "Essentially 
duplicate" in this sense means either of two things:

1.  Two Description fields are identical, except for different 
punctuation marks (hyphens or apostrophes), or one contains letters with 
diacritical marks while the other is a pure-ASCII equivalent (i.e. all 
diacritical marks stripped).  No other types of spelling differences are 
considered (such as Kirghiz vs. Kyrgyz, or Dhivehi vs. Divehi).  The 
premise is that both Description fields convey the exact same content, 
but using slightly different typography.  The goal is to pick one and 
discard the other.

2.  Two Description fields are identical, except that one includes a 
parenthetical comment signifying a region or individual/macrolanguage 
status, and the other does not.  In each case, the description with 
comment is the ISO 639-3 name, while the description without comment is 
the ISO 639-1 and/or -2 name.  The premise is that the commented names 
convey the same content, but are less likely to be confused with other 
similarly named languages.  The goal (I hope) is to pick the commented 
(639-3) name and discard the uncommented (639-2) name; a reasonable 
alternative would be to continue to list both names.

Records are presented here in the order they will appear in the 
Registry, and are not segregated into categories 1 and 2.  Records are 
shown in hex-NCR format to allow the content to be "visible" on 
UTF-8-deprived or font-deprived systems, and in the Digest.

For each of the subtags listed below, please examine the two Description 
fields and indicate whether you think the revised Registry should keep 
the first, the second, or both.  Only the Description fields that 
conflict are shown -- for example, both "Ge'ez" and "Ge‘ez" are shown, 
but not "Ethiopic", which is also listed for the same subtag.  When the 
rate of responses slows to a trickle, I will ask the Language Subtag 
Reviewer (not myself) to make the final determination, taking list 
feedback into account as appropriate.

===

Type: language
Subtag: ms
Description: Malay (macrolanguage)
Description: Malay

---

Type: language
Subtag: sw
Description: Swahili (macrolanguage)
Description: Swahili

---

Type: language
Subtag: ain
Description: Ainu (Japan)
Description: Ainu

---

Type: language
Subtag: bas
Description: Basa (Cameroon)
Description: Basa

---

Type: language
Subtag: bem
Description: Bemba (Zambia)
Description: Bemba

---

Type: language
Subtag: chm
Description: Mari (Russia)
Description: Mari

---

Type: language
Subtag: doi
Description: Dogri (macrolanguage)
Description: Dogri

---

Type: language
Subtag: fan
Description: Fang (Equatorial Guinea)
Description: Fang

---

Type: language
Subtag: gba
Description: Gbaya (Central African Republic)
Description: Gbaya

---

Type: language
Subtag: kam
Description: Kamba (Kenya)
Description: Kamba

---

Type: language
Subtag: kok
Description: Konkani (macrolanguage)
Description: Konkani

---

Type: language
Subtag: men
Description: Mende (Sierra Leone)
Description: Mende

---

Type: language
Subtag: rup
Description: Macedo Romanian
Description: Macedo-Romanian

---

Type: language
Subtag: war
Description: Waray (Philippines)
Description: Waray

---

Type: script
Subtag: Ethi
Description: Ge&#x2BB;ez
Description: Ge'ez

---

Type: script
Subtag: Hang
Description: Hangul
Description: Hang&#x16D;l
Description: Hangeul

(Technically I should not be including Hangeul, which is a different 
transcription of the same Korean word, not a genuinely different name. 
Make your own judgment.)

---

Type: script
Subtag: Hano
Description: Hanunoo
Description: Hanun&#xF3;o

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
John Cowan | 1 Aug 07:54

Re: Duplicate Busters: Survey #2

Doug Ewell scripsit:

[snip]

In each case keep the first exccept as noted.

> ---
> 
> Type: script
> Subtag: Ethi
> Description: Ge&#x2BB;ez
> Description: Ge'ez

Keep the second.

> ---
> 
> Type: script
> Subtag: Hang
> Description: Hangul
> Description: Hang&#x16D;l
> Description: Hangeul

Keep the first and third.

--

-- 
John Cowan  cowan <at> ccil.org  http://ccil.org/~cowan
And now here I was, in a country where a right to say how the country should
be governed was restricted to six persons in each thousand of its population.
For the nine hundred and ninety-four to express dissatisfaction with the
regnant system and propose to change it, would have made the whole six
shudder as one man, it would have been so disloyal, so dishonorable, such
putrid black treason.  --Mark Twain's Connecticut Yankee
Kent Karlsson | 1 Aug 10:50
Picon

RE: Duplicate Busters: Survey #2

Doug Ewell wrote:
> 1.  Two Description fields are identical, [...]
> or one contains letters with 
> diacritical marks while the other is a pure-ASCII
> equivalent (i.e. all 
> diacritical marks stripped).  [...]  The 
> premise is that both Description fields convey
> the exact same content, 
> but using slightly different typography. ...

I do **NOT** agree with the position that removing diacritial
marks would be "slightly different typography". It is a difference
in spelling, much the same as differences in spelling that you
excluded from your list ["(such as Kirghiz vs. Kyrgyz, or Dhivehi
vs. Divehi)"] and thus want to keep as multiple names.

As for the other items in your "#2" list, keep just the ISO 639-3
names. (Don't generalise my statement here. As you know, I think
some of the items not on this "#2" list need spell correction.)

	/kent k
Kent Karlsson | 1 Aug 10:50
Picon

RE: Duplicate Busters: Survey #1

Doug Ewell wrote:
> This is the first of two surveys that are being distributed 

I agree with Doug's suggested changes in the "#1" list, in particular
if 639-3 RA misses out on some name changes (promised for next week).

	/kent k
Doug Ewell | 1 Aug 15:35
Favicon

Re: Duplicate Busters: Survey #2

Kent Karlsson <kent dot karlsson14 at comhem dot se> wrote:

> I do **NOT** agree with the position that removing diacritial marks 
> would be "slightly different typography". It is a difference in 
> spelling, much the same as differences in spelling that you excluded 
> from your list ["(such as Kirghiz vs. Kyrgyz, or Dhivehi vs. Divehi)"] 
> and thus want to keep as multiple names.

Kent is right here, and I phrased that poorly.  Of course, the presence 
or absence of diacritical marks may (depending on language and writing 
system) represent a change in spelling, or even meaning (Spanish 'ano' 
vs. 'año').

What I meant to point out was that diacritical marks are sometimes 
removed, not as an intentional change in spelling, but rather  as a 
typographical convenience, or out of concern that the correct character 
won't be available or rendered correctly.  This may or may not be 
justified given the circumstances.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
Frank Ellermann | 1 Aug 16:48
Picon
Picon

Re: Duplicate Busters: Survey #1

Doug Ewell wrote:

> At first I was concerned about their turnaround time

Yes, but you could use a Comment to "announce" expected
changes; and later remove the Comment after it happened.

I'm more concerned about having multiple sources with
slight variations.  In Wikipedia they now try to mirror
the complete 639-3 list.  Divided in 26 sublists (A-Z).

Checking my favourites I could add fy to fry, and maybe
fix the Eastern Frisian word for Frisian to "Seeltersk".

In the edit history I found that somebody had the quite
plausible but wrong idea to rename Silesian to "Polish
Silesian", and Lower Silesian to "German Silesian".  It
was immediately corrected by somebody knowing what it is
about.  But it is a good example how "just add some nice
qualifier" can miss the point and hit a rat-hole.

BTW, one advantage of the Wikipedia list is that it has
the native names (as far as they are known, and editors
agree on what it is, szl is simple, stq is harder), see
<http://en.wikipedia.org/wiki/ISO_639:s>

> ISO 639-3 makes a distinction between the reference and
> non-reference names.  RFC 4646bis does not, although it
> will non-normatively place the ISO 639-3 reference name
> (if any) first within the record.

The 4646bis proponents could decree that it is normative.

Or why not invent a convention to flag "secondary" dupes,
e.g., add (*) to "secondary" dupes.  Including the few
"dupe by macrolanguage" and "dupe by deprecation" cases.

 Frank

Gmane