verdy_p | 27 Nov 19:54
Picon

Re: [OT] Re: Support of ISO 639 (was: Survey Tool pre-alpha)

"Doug Ewell" wrote:
> Warning: this is completely OT for the Unicode list.  Future discussion 
> should be on the LTRU list (ltru <at> ietf.org) or CLDR list 
> (cldr-users <at> unicode.org) as appropriate.

You have just replied to the Unicode list yourself (despite I was replaying to you using a CC to the CLDR list...)

> "verdy underscore p" <verdy underscore p at wanadoo dot fr> wrote:
> 
> > If only we could have some access to ISO 639-5 data (for managing the 
> > language families instead of using the historic and bdly designed 
> > language collections of ISO 639-1 (code [bi] only) and ISO 639-2...
> 
> I wish the ISO 639-5 Registration Authority, which is the same as that 
> for ISO 639-2 (Library of Congress), would set up an official 639-5 Web 
> site.  It has been a long time coming.

Well, still waiting (sorry, my interest for the subject is mostly personal, although I could have use of it 
professionnally, but I can't pay myself for getting a copy of the published paper; it's too expensive for me).

> I don't agree with characterizing 639-1 and 639-2 as "badly designed." 
> They were designed for different purposes.

Apparently not. Your description just indicates that 639-5 is effectively continuing the 639-2 (and
639-1 for 
bihari) model, and does not create what was expected (a comprehensive hierarchy similar to the
Ethnologue); in 
addition, the 639-5 is now incompatible with 639-2 and 639-1, making it mostly unusable within the RFC
4645/4646 
bis framework). For me, this means that 639-5 is already a dead standard before its publication, unless the
(Continue reading)

Kent Karlsson | 28 Nov 18:37
Picon

Re: [CLDR] Re: Support of ISO 639 (was: Survey Tool pre-alpha)


Den 2008-11-28 17.29, skrev "verdy_p" <verdy_p <at> wanadoo.fr>:

> there ARE combinations of a collection code and a
> language code, they are listed ONLY for the collection code [sgn] (Sign
> languages), within the proposed registry as

Yes, sgn was included in that LTRU compromise. Still not sure why it was.

> full "Tag:" elements, rather than just "Subtag-Type:" elements.

No, it goes via the "Type: extlang" and "Prefix: sgn" mechanisms.

> And I was NOT speaking about the case of macrolanguages (that are already
> correctly handled in ISO 639-3, and well
> integrated in RFC 4645bis, except a few diferences that should be corrected to
> match what ISO 639-3 indicates, but

No, the compromise does not handle all macrolanguages the same. Only some
were selected as extlang prefixes.

> anyway, the RFC 4647 alerady contains the statements needed to avoid or
> correct these small discrepencies).
[...]

> I remain convinced that the unexpected change of scope for most collections of
> ISO 639-1/2 (where their exclusive
> scope was also very fuzzy, undetermined across versions of ISO 639-1/2 that
> constantly reduced their scope) when
> converted to ISO 639-5 is a defect. And that for the future, a comprehensive
(Continue reading)

verdy_p | 28 Nov 17:29
Picon

Re: [CLDR] Re: Support of ISO 639 (was: Survey Tool pre-alpha)

"Kent Karlsson" <kent.karlsson14 <at> comhem.se> wrote:
> > It just says that they are just added as possible subtags, usable as prefixes,
> > but immediately, the included list
> > of tags make these combinations of a collection subtag plus a language subtag
> 
> This is for some of the so-called macrolanguage codes. While macrolanguage
> codes are (informally) like collection codes, they are "special" collections
> (in a particular way), and they are not formally collection codes.

I must have read the current RFC4645bis draft better than you: there ARE combinations of a collection code
and a 
language code, they are listed ONLY for the collection code [sgn] (Sign languages), within the proposed
registry as 
full "Tag:" elements, rather than just "Subtag-Type:" elements.

And I was NOT speaking about the case of macrolanguages (that are already correctly handled in ISO 639-3,
and well 
integrated in RFC 4645bis, except a few diferences that should be corrected to match what ISO 639-3
indicates, but 
anyway, the RFC 4647 alerady contains the statements needed to avoid or correct these small discrepencies).

Note: my message was not supposed to be out-of-topic: I posted it to the LTRU list under indication given by
Doug 
Ewell, the signing author of the drafts for RFC 4645bis and 4646bis (initially it was a post on the Unicode
list 
(than Doug Ewell suggested me to forward it to the Unicode CLDR list as well), and this explains the "[OT]"
label 
that remained, and the title of this thread (related to the "CDLR Survey Tool" and its list speaking about
its 
current use of RFC 4645-4647 series).
(Continue reading)

verdy_p | 28 Nov 19:53
Picon

Re: [CLDR] Re: Support of ISO 639 (was: Survey Tool pre-alpha)

"Kent Karlsson" <kent.karlsson14 <at> comhem.se>
> > NB : For your information I have just built yesterday (temporarily: if it's
> > not acceptable there, I can remove it)
> > a easily navigatable and sortable version of the proposed registry that is
> > part of RFC 4645bis Draft 07 on
> > <URL:http://fr.wiktionary.org/wiki/Wiktionnaire:RFC_4645>, on a site that
> 
> (Well, note that RFC 4645 had no macrolanguage concept, nor covered the new
> codes in ISO 639-3. RFC 4645bis is what you should refer to.)

That's what I'm refering to throughout these pages (however I did not give it the article title name "RFC
4645bis", 
thinking that this is not the definitive name for this draft RFC revision) that indicate "bis" explicitly
(well I 
should correct the link that MediaWiki autogenerates in the middle of the name "RFC 4645bis", leaving
"bis" 
separated in the rendered page, by disabling this automatically generated link that still points to the
wrong 
version.)

> > currently needs a comprehensive list of
> > language families to allow searches of words across "similar" languages, and
> > an easy way to search for language
> > synonyms (or dialect names) localized in other languages than just English.
> > I've correctly stated that this version
> > is a draft with a publication date and the validity date, and a direct
> > reference to the draft text currently
> > published by the IETF.

My purpose is just to make the registry more easily searchable and readable (I've not attepted to translate
(Continue reading)

Doug Ewell | 28 Nov 20:10
Favicon

Re: [CLDR] Re: Support of ISO 639 (was: Survey Tool pre-alpha)

Just a quick comment on one point; I'll have to spend some time reading 
through all the others.

"verdy underscore p" <verdy underscore p at wanadoo dot fr> wrote:

> For now the "bis" version is not fully released, so I don't think the 
> page name should adopt the "bis" name, unless it has been said that 
> this will be the final name. If the name changes, I don't want to 
> redirect it again.
>
> Will the "bis" be kept after release, or won't that be the same RFC 
> number (pointing directly to the revized text), or another RFC number 
> ?

RFCs are never "updated" with the same number; they are superseded or 
replaced by a new RFC with a different number.  The is unlike ISO and 
other standards, which normally keep the same number through revisions.

"RFC 4645bis" means roughly "the Internet-Draft that is intended to 
supersede or replace RFC 4645bis."  If and when it is approved as an 
RFC, it will be assigned a new number, known only to the RFC Editor 
until the moment of publication.

So any page that describes RFC 4645bis, or reformats its content in a 
different way, should not be labeled "RFC 4645."  As you will see by 
reading RFC 4645, that is a completely different document.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
(Continue reading)

verdy_p | 28 Nov 22:31
Picon

Re: [CLDR] Re: Support of ISO 639 (was: Survey Tool pre-alpha)

> De : "Kent Karlsson" <kent.karlsson14 <at> comhem.se>
> A : "verdy_p" <verdy_p <at> wanadoo.fr>, "Doug Ewell" <doug <at> ewellic.org>
> Copie à : "LTRU list" <ltru <at> ietf.org>, "CLDR Users" <cldr-users <at> unicode.org>
> Objet : Re: [Ltru] [CLDR] Re: Support of ISO 639 (was: Survey Tool pre-alpha)
> 
> 
> 
> Den 2008-11-28 17.29, skrev "verdy_p" <verdy_p <at> wanadoo.fr>:
> 
> > there ARE combinations of a collection code and a
> > language code, they are listed ONLY for the collection code [sgn] (Sign
> > languages), within the proposed registry as
> 
> Yes, sgn was included in that LTRU compromise. Still not sure why it was.
> 
> > full "Tag:" elements, rather than just "Subtag-Type:" elements.
> 
> No, it goes via the "Type: extlang" and "Prefix: sgn" mechanisms.

That's not what RFC 4645bis-version 07 says. It clearly states that it is a collection, not a
macrolanguage, but 
included there to be treated like macrolanguages whose subtags are used in FULL "Tag:" elements for
entries related 
to "Type: Redundant"... This is the only exception made that allows a collection to be used in locale tags.

And I've NEVER said that language collections codes should be part of locale tags. My need for a
comprehensive 
hierarchy is for something else: organizing long lists of languages (and their many synonyms, including
those in 
other languages than just English used in BCP 47).
(Continue reading)

Peter Constable | 29 Nov 23:44
Picon
Favicon

Re: [CLDR] Re: Support of ISO 639 (was: Survey Tool pre-alpha)

As I've explained in the past on this list and various other loci, the exclusive nature of some collections
in 639-2 has all along been a problem because of the dynamic nature of the standard: not only are the
denotations fuzzy, they are unstable. Broadening the scope of those existing collections does not
introduce any compatibility issues with existing applications, given that conforming applications
can continue to treat certain collections as exclusive in that given application context if desired, and
it eliminates the general problem of instability.

CLDR and other applications of ISO 639 may experience a one-time change in the name data published with ISO
639, but it is known that ISO 639 can and not infrequently does make name changes, and that applications
must allow for that. So, this is not an exceptional problem.

Also, adding new categories to replace the existing ones is incredibly destabilizing, with
compatibility impact on all applications.

Thus, I strongly disagree with Philippe.

Peter

From: cldr-users-bounce <at> unicode.org [mailto:cldr-users-bounce <at> unicode.org] On Behalf Of verdy_p

[snip]

I remain convinced that the unexpected change of scope for most collections of ISO 639-1/2 (where their exclusive
scope was also very fuzzy, undetermined across versions of ISO 639-1/2 that constantly reduced their
scope) when
converted to ISO 639-5 is a defect. And that for the future, a comprehensive list containing only inclusive
families coded distinctly from the old exclusive codes would have been better:

...
Under this scheme, there would have been less problems in the CLDR collections (that were updated inconsistently
(Continue reading)

Peter Constable | 30 Nov 00:17
Picon
Favicon

Re: [CLDR] Re: Support of ISO 639 (was: Survey Tool pre-alpha)

From: cldr-users-bounce <at> unicode.org [mailto:cldr-users-bounce <at> unicode.org] On Behalf Of verdy_p
Sent: Friday, November 28, 2008 1:31 PM

The need for a language hierarchy (by families) is to simplify the search...

An informal suggestion: while Ethnologue is not formally part of ISO 639, it is maintained so as to stay
consistent with ISO 639, and ISO 639-3 makes use of Ethnologue as a source to clarify the denotation of its
encoded categories. Since the Ethnologue site provides a comprehensive language-family
classification, one could search on the Ethnologue site to find particular languages, and then follow
the links provided to get to the corresponding ISO 639-3 entry.

For example, starting at the Ethnologue language-family index
(http://www.ethnologue.com/family_index.asp), you can follow the link for the Iroquoian family to
get the complete hierarchy of Iroquoian languages, then select the link for (say) Mohawk to get the entry
for that language, and then from there follow the link to get to the entry on the ISO 639-3 site for "moh".

Caveat: as with any hierarchical language classification, the classification used by Ethnologue is one
of several possible analyses (Ethnologue primarily follows the _International Encyclopaedia of
Linguistics_), and not all experts would necessarily posit the same hierarchy, though most linguists
would likely be somewhat familiar with that analysis and find it reasonably workable for searching by
language-family hierarchy.

Peter
verdy_p | 30 Nov 12:22
Picon

Re: [CLDR] Re: Support of ISO 639 (was: Survey Tool pre-alpha)

"Peter Constable" <petercon <at> microsoft.com>
> The need for a language hierarchy (by families) is to simplify the search...
> 
> An informal suggestion: while Ethnologue is not formally part of ISO 639, it is maintained so as to stay 
consistent with ISO 639, and ISO 639-3 makes use of Ethnologue as a source to clarify the denotation of its
encoded 
categories. Since the Ethnologue site provides a comprehensive language-family classification, one
could search on 
the Ethnologue site to find particular languages, and then follow the links provided to get to the
corresponding 
ISO 639-3 entry.

That's exactly the kind of reason why we need such classification ALSO in other languages than English. But
without 
a reliable codification of families, of their hierarchy (at least a minimal classification in the most
important 
groups, possibly excluding finely tuned intermediate subdivisions), and more importantly of the
membership of 
isolated languages and macrolanguages that are direct children of those families, building such
hierarchy and 
making it usable is illusory.

Anyway, the fact that families ARE encoded in addition with languages, and the fact that families are
hierarchized 
as well, creates a hole that must be filled between families and languages (this will close the mess that was 
introduced in ISO 639-1/2 when exclusive (and unstable) family names were given (with various and non
interoperable 
results about which languages get included or not in a search of results by family names).

Believe it, searching for terms within a complete language family rather than precise language name or
(Continue reading)


Gmane