verdy_p | 27 Nov 19:54
Picon
Gravatar

Re: [OT] Re: Support of ISO 639 (was: Survey Tool pre-alpha)

"Doug Ewell" wrote:
> Warning: this is completely OT for the Unicode list.  Future discussion 
> should be on the LTRU list (ltru <at> ietf.org) or CLDR list 
> (cldr-users <at> unicode.org) as appropriate.

You have just replied to the Unicode list yourself (despite I was replaying to you using a CC to the CLDR list...)

> "verdy underscore p" <verdy underscore p at wanadoo dot fr> wrote:
> 
> > If only we could have some access to ISO 639-5 data (for managing the 
> > language families instead of using the historic and bdly designed 
> > language collections of ISO 639-1 (code [bi] only) and ISO 639-2...
> 
> I wish the ISO 639-5 Registration Authority, which is the same as that 
> for ISO 639-2 (Library of Congress), would set up an official 639-5 Web 
> site.  It has been a long time coming.

Well, still waiting (sorry, my interest for the subject is mostly personal, although I could have use of it 
professionnally, but I can't pay myself for getting a copy of the published paper; it's too expensive for me).

> I don't agree with characterizing 639-1 and 639-2 as "badly designed." 
> They were designed for different purposes.

Apparently not. Your description just indicates that 639-5 is effectively continuing the 639-2 (and
639-1 for 
bihari) model, and does not create what was expected (a comprehensive hierarchy similar to the
Ethnologue); in 
addition, the 639-5 is now incompatible with 639-2 and 639-1, making it mostly unusable within the RFC
4645/4646 
bis framework). For me, this means that 639-5 is already a dead standard before its publication, unless the
(Continue reading)

Naz Gassiep | 23 Jan 18:00

Translation of numbers revisited

After progressing a little further into this area, it seems that 
including numerals in the CLDR would be not only appropriate, but would 
significantly increase the usefulness of the repository in localization. 
What is being proposed is a simple mapping between values and their 
localized counterparts. This would only appear in locales that had local 
numeral sets that were in widespread use. An indication of the use of 
the local script numerals could even be included in the supplemental 
data allowing apps to decide whether or not to translate numbers using 
these glyphs.

Using Arabic as an example, the following XML will serve as an illustration:

Contained in ar.xml
<ldml>
<numbers>
<numerals>
<numeral type="0">٠</numeral>
<numeral type="1">١</numeral>
<numeral type="2">٢</numeral>
<numeral type="3">٣</numeral>
<numeral type="4">٤</numeral>
<numeral type="5">٥</numeral>
<numeral type="6">٦</numeral>
<numeral type="7">٧</numeral>
<numeral type="8">٨</numeral>
<numeral type="9">٩</numeral>
</numerals>
</numbers>
</ldml>

(Continue reading)

Gravatar

Voting is weird?

What I do not understand (as I am trying to trace a bug inside CLDR) is how
voting works.

I got one part of a locale here that has a score of 5 versus a score of 4 and
the score of 4 won. Now, perhaps I am really misunderstanding something, but
last time I checked 5 still beat 4. Is it due to being included in 1.4 that it
gets preference over the weightier vote? (Which is a pity since it pulled a
mistake over to 1.5.x.)

--

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/
To conquer fear is the beginning of wisdom...

Gravatar

Why tabs?

Is there a reason the CLDR XML files use tabs instead of the more standard
2-space indents for XML?

--

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/
As one lamp serves to dispel a thousand years of darkness, so one flash of
wisdom destroys ten thousand years of ignorance...

Rick McGowan | 19 Jan 17:57
Picon

Unicode Transliteration Guidelines released

The Unicode CLDR committee has released
"Unicode Transliteration Guidelines":
	http://www.unicode.org/cldr/transliteration_guidelines.html

Regards,
	Rick McGowan
	Unicode, Inc.

Patrick Andries | 18 Jan 19:28

Timezone in CLDR and corresponding ICU4J 3.6.1 & 3.8.1


I'm not too sure where this would be best mentioned : one of the ICU lists or the CLDR. Since, the data ultimately comes from CLDR, I decided after some hesitation to send it here.  Please let me know if the queries I am posting here are inappropriate.

Inquisitive, I tested a little bit the DateFormat found in ICU4J 3.6.1 built on CLDR 1.4.0 and 3.8.1. built on CLDR 1.5.1, if I'm right.


First, a word of appreciation. I noticed some improvement from 3.6.1/CLDR 1.4.0 where the FULL time was printed for instance

   23 h 42 min 42 s HNP (ÉUA)

using a  fr_CA Locale... The ÉUA (USA) was really not necessary for Canadians

This is now in 3.8.1. :

   23 h 45 min 28 s HP

Where the absence of "(ÉUA)" is appreciated.



Second, I have a small question though on the FULL vs LONG format (I suppose this is rather a ICU question, but I'm not too sure)

See the way the same time is printed using FULL and LONG formats:

LONG :     17:07:12 HNP
FULL :      17 h 07 min 12 s HP

What is the logic behind the fact that the timezone name of the full time format is smaller than its corresponding long format? I would have intuitively thought the opposite (see in Java http://java.sun.com/javase/6/docs/api/java/text/DateFormat.html, where LONG < FULL)




Third, when the time is given as GMT ± an offset, I was a bit surprised to read "HMG" in French.

At least in Canada, this is not at all common, TUC (or UTC, supposedly an "international" form) is used...

http://inms-ienm.nrc-cnrc.gc.ca/time_services/leap_second_f.html  (government link).


«Si la vitesse et le temps sont coordonnés à l'aide de comparaisons internationales organisées sous l'égide la Convention du mètre, on obtient le **TUC** ou Temps universel coordonné qui est l'application moderne du GMT et constitue la base de temps officielle dans le monde.»

http://www.nrc-cnrc.gc.ca/aboutUs/nrc90/achievements/atomicclock_f.html

«Le TUC a remplacé le temps universel en 1972 pour devenir le fondement du temps officiel dans chaque pays. Les fuseaux horaires qui divisent la planète sont désormais exprimés en écart positif ou négatif par rapport au TUC. Ainsi, l'Heure normale de l'Est correspond au TUC moins cinq heures. On l'écrit donc **TUC-5**.»

Maybe someone has more information on the use of HMG is French versus TUC or UCT in French...

Patrick l

Mark Davis | 18 Jan 18:41
Favicon

Re: Translation of numbers

Right, thanks Dave.

On Jan 18, 2008 9:08 AM, Dave Opstad <dave.opstad <at> monotypeimaging.com> wrote:
> Mark Davis wrote:
>
> > George Iftah has a great book on numbering systems. Some libraries,
> > like ICU, also offer some degree of non-decimal number support.
>
> A minor correction: it's Georges Ifrah, not George Iftah.
>
> Dave Opstad
>
>

--

-- 
Mark

Naz Gassiep | 18 Jan 17:32

Translation of numbers

Is there a CLDR/Unicode specified method for the translation of numbers? 
For example, to translate a number from the Latin numerals to Arabic or 
Thai would be trivial, as both numbering systems are syntactically 
identical to the Latin numeral system, just with different glyphs 0-9.

However translating a Hebrew number would be more of a challenge, as the 
system, while decimal, uses a different numeral set, with 22 (or 27 if 
you consider the extended glyphs) numerals rather than 10.

If there is no Unicode method for this I will have to write my own, but 
I thought I'd check if such a standard existed first. Does the CLDR 
specify at least when it is OK to so a simple glyph substitution? Any 
direction in this area would be greatly appreciated.

Best regards,
- Naz.

Naz Gassiep | 18 Jan 09:20

Checking

Hello,
    Please let me know if the queries I am posting here are 
inappropriate. I am unable to find another list to post these questions 
to, but if they are not welcome here I shall refrain from sending them here.

I currently have an algorithm for determining valid locales and their 
search paths that works like this:

   1. Get the list of files, except for root.xml
   2. Discard files where //ldml/alias returns a value (alias files are
      unnecessary)
   3. Break down the filename, and return all filenames that are shorter.
   4. Add root.xml to the search path

This results, for example, with paths like this:

Serbian (Latin script, in Serbia):
    sr_Latn_RS.xml
    sr_Latn.xml
    sr.xml
    root.xml

Chinese (Simplified script, Hong Kong):
    zh_Hant_HK.xml
    zh_Hant.xml
    zh.xml
    root.xml

Is this procedure sufficient for handling search paths? I have not yet 
dealt with multiple inheritance of resources, but I think that will best 
be handled after the file search path has been built. Is this correct or 
am I doing this wrong?
Regards,
- Naz.

Rick McGowan | 18 Jan 02:40
Picon

Unicode 5.1.0 beta period ends soon!

The Unicode Consortium would like to remind everyone of the deadline for  
review of the content and data for the pending release of Unicode 5.1.

The Unicode Technical Committee meeting on February 4-8, 2007 will be  
making the final decisions on the content of the release, based on public  
review feedback and member submissions. Over the past months, there have  
been a number of changes to the text of 5.1.0, the text of the Unicode  
5.1.0 Standard Annexes, and the UCD data files to reflect decisions of the  
previous UTC meetings. These cover such areas as Bidi, Line breaking,  
Normalization, Segmentation, Identifiers, and others.

For a description of what to review and how to provide feedback, see:
http://www.unicode.org/versions/beta.html

There have been a number of changes on that page, in the section "Notable  
Issues for Beta Testers", that should help focus your review on important  
changes.

Regards,
	Rick McGowan
	Unicode, Inc.

Naz Gassiep | 15 Jan 12:14

Data set relevance

I am a relatively new user to the CLDR datasets.

I was wondering if someone can explain to me how to determine which 
datafiles are relevant to a given locale selection. For example, if I 
would like to select Chinese, in Hong Kong for the Traditional Han 
script, which files would I use? Do I only use zh_Hant_HK.xml or do I 
have to merge in zh.xml with that file?

Also, if a language is written in multiple scripts, then surely each 
script/language combination constitutes a whole locale, distinct from 
other scripts. I will use Serbian as an example here:

If a user wants to get the locale relating to Serbian in Montenegro 
using the Latin script, how do I know if the file sr_ME has any 
relevance? How do I know if the data in it pertains to the Latin or 
Cyrillic scripts? Can I assume that data in the sr_ME is relevant to 
*both* script variants of Serbian?

Some pointers on this would be great.
Thanks in advance,
- Naz.


Gmane