John Cowan | 1 Jan 2005 10:07
Favicon

Re: Language attributes- what are they?

Tex Texin scripsit:

> I can take your mail two ways. 
> a) sorted lists should be dynamically changed to the user's preference.

Yes, when practicable.

> b) Documents written in a language are presumably targeted at readers of that
> language, and therefore should be sorted for the intended audience (as opposed
> to the actual preference of that individual). And this is true even if the list
> contents are in another language.

The rest of your email reads to me as if you thought I was arguing *for* using
language-tagging for sorting, but I was actually arguing against that position.
Sort order should be inferred from the locale specifier (which is per-user)
rather than the language tag (which is per-document, or per-section of document).

--

-- 
John Cowan      jcowan <at> reutershealth.com        http://www.reutershealth.com
        "Not to know The Smiths is not to know K.X.U."  --K.X.U.
John Cowan | 1 Jan 2005 10:14
Favicon

Re: Language attributes- what are they?

Peter Constable scripsit:

> Where the boundary on "writing system" is uncertain for me would be
> with things like conventions for hyphenation or use of quotation
> marks. But something like date formats are IMO well outside the
> boundary. I sometimes write "12/31" and sometimes "31/12", but I'd
> say I don't ever change my writing system. I think a "writing system"
> is something that is generally very stable for a given individual,
> and even speaker communities (except when undergoing a transition).

But do you, qua Canadian in the U.S., consistently use either Canadian or
U.S. spelling, or do you occasionally use one and occasionally the other?
It seems to me that writing 12/31 and using Canadian spellings, or 31/12
and using U.S. spellings, is simply inconsistent writing, and that the
date format is as much part of the writing system as whether you write
"tire center", "tire centre", or "tyre centre".

> And are you really going to run a parser on the stuff I enter into
> a document manually?

It happens all the time.  After all, all data is entered manually at some stage.

> I doubt that's a common scenario. On the other
> hand, a very common scenario would be that you're requesting data from
> my server that you intend to parse, and you either need date strings
> to be in a particular format or you want to be told what the format
> is. That's an API: your process interacting with my process.

"want to be told what the format is":  that's a case for per-document tagging,
aka language-tagging.  If I request a document in either Welsh or English,
(Continue reading)

John Cowan | 1 Jan 2005 10:18
Favicon

Re: Language attributes- what are they?

Peter Constable scripsit:

> That certainly indicates that a sort ID, in principle, need not have any
> intrinsic relationship to the language or spelling conventions of the
> content, although the writing systems must have some similarity in their
> character inventories.

Not absolutely necessary.  There might, for example, exist a conventional
English-language rule for sorting Cyrillic text (say, sort it according
to Russian alphabetical order with various extensions to handle non-Russian
letters).  But I agree that similar writing systems are a clearer and better
case.

> But the other part of "you need to sort them according to the reader's
> expectation" is that it involves control over a display process and thus
> implies an API rather than simply declaring attributes of static
> content, which in my mind is a significant distinction in this matter.

Yes: in short, we agree: sort order is not suitable for language-tagging
except in the exceptional case where you want to mark that a static document is
*already* sorted according to some order.

--

-- 
Business before pleasure, if not too bloomering long before.
        --Nicholas van Rijn
                John Cowan <jcowan <at> reutershealth.com>
                        http://www.ccil.org/~cowan  http://www.reutershealth.com
Tex Texin | 1 Jan 2005 04:33
Favicon

Re: Language attributes- what are they?

Hi, 

I agree that a user's locale identifier specifies sort order for how
information is presented to a user.

However, I also believe that a language tag implies the sort order used within
the content it represents.
The argument for this is essentially the same as you presented for language tag
specifying ordering of date elements.

A language tag labeling a document or more generally text content should imply
the language and all language attributes that the AUTHOR uses to create the
content.

A locale identifier (as we are using it at the moment) should represent the
international preferences of the USER (ie recepient) and determines how the
application's interface presents information to the user.

There is a gray area in between when the application assembles or constructs
content to present to the user.

So, a dynamically generated list of results might be ordered by the locale. A
list generated personally by someone for publication would follow that author's
language. Analogously for date element ordering.

I would like to avoid if possible discussion of locales, and focus on what the
language identifier entails.
I think that implies not discussing user interface, and requires some
presumption of pre-built content and the choice of language tag to be used to
label that content.
(Continue reading)

Tex Texin | 1 Jan 2005 04:37
Favicon

Re: Language attributes- what are they?


John Cowan wrote:
> Yes: in short, we agree: sort order is not suitable for language-tagging
> except in the exceptional case where you want to mark that a static document is
> *already* sorted according to some order.

Yes, I agree with this, although I don't think it is that exceptional and
rather common for there to be sorted lists in content. (References and such.)

I was trying to bear down on the point that the language tag does specify the
sort order used within the document.

tex
John Cowan | 1 Jan 2005 11:19
Favicon

Re: Language attributes- what are they?

Tex Texin scripsit:

> Wouldn't it be surprising for a non-swedish sort order to be used with
> content that was labeled as Swedish?  (Regardless of who the content
> is given to...)

I agree with all of your points except this one.  The difference
between (normalized) Old Norse text and Modern Icelandic text is
primarily one of spelling: there's a nice example at the bottom of
http://en.wikipedia.org/wiki/Old_Norse_language, the center and right
columns.  Nonetheless, I'd expect the sorted index of a modern printing of
an Old Norse text to use Icelandic order if it's intended for Icelanders
(thorn after z), and English order if it's intended for anglophones
(thorn after t).

--

-- 
The experiences of the past show                John Cowan
that there has always been a discrepancy        jcowan <at> reutershealth.com
between plans and performance.                  http://www.reutershealth.com
        --Emperor Hirohito, August 1945         http://www.ccil.org/~cowan
Tex Texin | 1 Jan 2005 08:23
Favicon

Re: Language attributes- what are they?

oy vey.
It's a good point.
I agree with you about the sorting, but I guess in that case we disagree on the
language tag.
It seems to me the document is an Icelandic or English document which contains
some Old Norse text. Alternatively, we can tag the Norse text as Old Norse
separately from the sorted index tagged as Icelandic.

If an author writes for an audience, the content is presumably in the language
of the audience, even if there are elements which are in another language.

If we were talking html, I might tag each list element to reflect the language
of each item (Old Norse), but the overall list would be tagged to provide the
right ordering, numbering, etc. (Icelandic).

<ol lang=[language of the author or audience]>
<li lang=[language of the list element]>...

Tex

John Cowan wrote:
> 
> Tex Texin scripsit:
> 
> > Wouldn't it be surprising for a non-swedish sort order to be used with
> > content that was labeled as Swedish?  (Regardless of who the content
> > is given to...)
> 
> I agree with all of your points except this one.  The difference
> between (normalized) Old Norse text and Modern Icelandic text is
(Continue reading)

John Cowan | 1 Jan 2005 18:02
Favicon

Re: Language attributes- what are they?

Tex Texin scripsit:

> It seems to me the document is an Icelandic or English document which contains
> some Old Norse text. Alternatively, we can tag the Norse text as Old Norse
> separately from the sorted index tagged as Icelandic.

How could the anglophone-directed version be an English document?
It doesn't contain a word of English!  Just the ON text and its sorted
index (or maybe concordance is a better term).

> If an author writes for an audience, the content is presumably in the language
> of the audience, even if there are elements which are in another language.

Not if *all* of it is in another language.  If I prepare an edition of Plato
in Greek, then it's in Greek, even if I intend it for my anglophone students.
(Hypothetical example.)

--

-- 
John Cowan <jcowan <at> reutershealth.com>     http://www.reutershealth.com
I amar prestar aen, han mathon ne nen,    http://www.ccil.org/~cowan
han mathon ne chae, a han noston ne 'wilith.  --Galadriel, LOTR:FOTR
Tex Texin | 1 Jan 2005 11:43
Favicon

Re: Language attributes- what are they?

John,

Well I admit that the solution of calling the English-sorted Old Norse text an
English document is bizarre and unappealing, but let's also admit the example
is a bit odd. What does it mean to sort Old Norse by English rules where  some
of the characters are not used in English? It's even a stretch for Swedish...

But if you don't like the bizarre, the other alternative I proposed is more
reasonable- move to tagging with greater granularity.

Just to be clear, I am not dismissing the example. I can imagine other cases
that might be more common.
I have in mind listing Japanese ideographs for a Chinese audience, requiring
Japanese language tags to insure the right fonts are used as the text is
rendered, and sorting by Chinese pronounciations of the characters (or some
such).

But regardless of the details, the issue is (it seems to me) if a document is
tagged as some language, and sorting within the content is performed in a way
that does not correspond to the language of the document, then would that
surprise readers of that language (as opposed to other languages)? I think the
answer is yes and that sorting is an attribute of language.
(I am finding it hard even to write about sorting text without using
language-based names for the collation or referencing language in some way.)

And if the audience needs a different collation from (one of the ones
associated with) the document language, then it is because their primary
language is different from that of the document, and doesn't undermine the
argument that sorting is an aspect of language.

(Continue reading)

Bruce Lilly | 1 Jan 2005 16:57
Picon

Re: draft-phillips-langtags-08, process, specifications, "stability",   and extensions

>  Date: 2004-12-30 16:02
>  From: Tex Texin <tex <at> xencraft.com>
>  To: 
>  CC: ietf-languages <at> alvestrand.no, ietf <at> ietf.org
>  
> As the number of question marks, exclamation marks, asterisks and other forms
> of expressing digital shock and awe increase with each mail, I would like to
> suggest a temporary cease and desist policy with respect to responding to JFC
> until the chair chimes in, presumably after the holidays.

What "chair"?; there is no WG associated with the ietf-languages
list, nor with the individual-effort draft under discussion.
There is of course an IETF/IESG chair, and he has in the past
(most recently about six months ago) participated in the ietf-
languages discussion of a predecessor of the draft under
discussion, but I see no reason to suggest an end to discussion
of issues arising out of a NEW Last Call for comments on that
draft.  I would agree that some of the rhetoric could be toned
down...

Gmane