Stephane Bortzmeyer | 3 Jan 2008 13:50
Picon

Happy International Year of Languages

Happy New Year to everyone!

http://portal.unesco.org/culture/en/ev.php-URL_ID=35344&URL_DO=DO_TOPIC&URL_SECTION=201.html

 2008, International Year of Languages
 Languages matter !

 On 16 May 2007, the United Nations General Assembly proclaimed 2008 to
 be the International Year of Languages. As language issues are central
 to UNESCO's mandate in education, science, social and human
 sciences, culture, and communication and information, the Organization
 has been named the lead agency for this event.

I owe to Don Osborn a reference to that excellent paper on languages
and what can we do for the IYL:

http://www.crystalreference.com/DC_articles/Langdeath20.pdf

What can we do ourselves for the IYL? Finishing 4646bis? 

Karen_Broome | 3 Jan 2008 20:33
Picon

Re: Addition request: alsatian


I have no objection to this tag. However, it does seem to create the possibility (mentioned before) that the same dialect could be identified as gsw-FR (assuming there are no other gsw dialects in France).  

Is it simply up to the user to decide whether to use regional or variant tagging? Or should some guidelines be written to indicate a preference for variant tagging over regional tagging if both exist?

Regards,

Karen Broome




Stephane Bortzmeyer <bortzmeyer <at> nic.fr>
Sent by: ietf-languages-bounces <at> alvestrand.no

01/03/2008 02:24 AM

To
ietf-languages <at> iana.org
cc
Subject
Re: Addition request: alsatian





Request in the body, for the MIME-impaired.

LANGUAGE SUBTAG REGISTRATION FORM

1. Name of requester: St&#xE9;phane Bortzmeyer

2. E-mail address of requester: bortzmeyer+langtag <at> nic.fr

3. Record Requested:

  Type: variant
  Subtag: alsatian
  Description: Alsatian variant of Alemannic
  Description: Els&#xE4;ssisch
  Prefix: gsw
  Comments:

4. Intended meaning of the subtag: There is a distinct variety of
Alemannic spoken in Alsace. It is distinct from the language spoken in
Germany and Switzerland partly for political reasons, because Alsace
has been a french province for a long time.

5. Reference to published description
of the language (book or article):

   * (fr) "L'alsacien, deuxième langue régionale de France" Insee,
   Chiffres pour l'Alsace no. 12, December 2002
   http://www.insee.fr/fr/insee_regions/alsace/rfc/docs/cpar12_1.pdf

   * (fr) Brunner, Jean-Jacques. L'alsacien sans peine. ASSiMiL,
     2001. ISBN 2-7005-0222-1

   * (fr) Laugel-Erny, Elsa. Cours d'alsacien. Les Editions du Quai,
     1999. ISBN 978-2903548018

   * (fr) Matzen, Raymond, and Léon Daul. Wie Geht's ? Le dialecte à
     la portée de tous La Nuée Bleue, 1999. ISBN 2-7165-0464-4

   * (fr) Matzen, Raymond, and Léon Daul. Wie Steht's ? Lexiques
     alsacien et français, Variantes dialectales, Grammaire La Nuée
     Bleue, 2000. ISBN 2-7165-0525-X

   * (de) Frédéric Hartweg: Die Sprachen im Elsass: Kalter Krieg oder
     versöhntes Miteinander?. In: Ingo Kolboom und Bernd Rill
     (Hrsg.): Frankophonie -- nationale und internationale
     Dimensionen. Argumente und Materialien zum Zeitgeschehen 35,
     München: Hanns Seidel Stiftung, ISBN
     3-88795-249-9. http://www.hss.de/downloads/argumente_materialien_35.pdf

   * (de) Hubert Klausmann, Konrad Kunze und Renate Schrambke (1994):
     Kleiner Dialektatlas - Alemannisch und Schwäbisch in
     Baden-Württemberg. Veröff. Alem. Inst. Frbg. Themen der
     Landeskunde 6, Bühl (Baden): Konkordia, 1994.

   * (de) Friedrich Maurer: Neue Forschungen zur südwestdeutschen
     Sprachgeschichte. In: Sprachgeographie Beih. Wirkendes Wort. 21,
     S. 119-163, Düsseldorf: Schwann, 1972.

6. Any other relevant information: Do note there exists several
"sub-dialects" (specifically between North and South of Alsace) but I
do not know if there is a comprehensive list of them yet. Do note also
that some alsatian local dialects are *not* variant of Alemannic at
all but Franconian or even Roman languages. Alsatian is still in
common use in Alsace, spoken and written. There is a localization of
Microsoft Word
(http://www.faz.net/s/Rub4C34FD0B1A7E46B88B0653D6358499FF/Doc~E7E48128AB8C348E1BCEB1EAF2D4105EA~ATpl~Ecommon~Scontent.html)
but I do not know if they use proper language tags.
_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages


<div>
<br>I have no objection to this tag. However,
it does seem to create the possibility (mentioned before) that the same
dialect could be identified as gsw-FR (assuming there are no other gsw
dialects in France). &nbsp;
<br><br>Is it simply up to the user to decide
whether to use regional or variant tagging? Or should some guidelines be
written to indicate a preference for variant tagging over regional tagging
if both exist?
<br><br>Regards,
<br><br>Karen Broome
<br><br><br><br><br><table width="100%"><tr valign="top">
<td width="40%">Stephane Bortzmeyer &lt;bortzmeyer <at> nic.fr&gt;

<br>Sent by: ietf-languages-bounces <at> alvestrand.no
<p>01/03/2008 02:24 AM
</p>
</td>
<td width="59%">
<table width="100%">
<tr valign="top">
<td>
<div align="right">To</div>
</td>
<td>ietf-languages <at> iana.org
</td>
</tr>
<tr valign="top">
<td>
<div align="right">cc</div>
</td>
<td>
</td>
</tr>
<tr valign="top">
<td>
<div align="right">Subject</div>
</td>
<td>Re: Addition request: alsatian</td>
</tr>
</table>
<br><table><tr valign="top">
<td>
</td>
<td></td>
</tr></table>
<br>
</td>
</tr></table>
<br><br><br>Request in the body, for the MIME-impaired.<br><br>
LANGUAGE SUBTAG REGISTRATION FORM<br><br>
1. Name of requester: St&amp;#xE9;phane Bortzmeyer<br><br>
2. E-mail address of requester: bortzmeyer+langtag <at> nic.fr<br><br>
3. Record Requested: <br><br>
 &nbsp; Type: variant<br>
 &nbsp; Subtag: alsatian<br>
 &nbsp; Description: Alsatian variant of Alemannic<br>
 &nbsp; Description: Els&amp;#xE4;ssisch<br>
 &nbsp; Prefix: gsw<br>
 &nbsp; Comments: <br><br>
4. Intended meaning of the subtag: There is a distinct variety of<br>
Alemannic spoken in Alsace. It is distinct from the language spoken in<br>
Germany and Switzerland partly for political reasons, because Alsace<br>
has been a french province for a long time.<br><br>
5. Reference to published description<br>
of the language (book or article):<br><br>
 &nbsp; &nbsp;* (fr) "L'alsacien, deuxi&egrave;me langue r&eacute;gionale de France"
Insee,<br>
 &nbsp; &nbsp;Chiffres pour l'Alsace no. 12, December 2002<br>
 &nbsp; &nbsp;http://www.insee.fr/fr/insee_regions/alsace/rfc/docs/cpar12_1.pdf<br><br>
 &nbsp; &nbsp;* (fr) Brunner, Jean-Jacques. L'alsacien sans peine. ASSiMiL,<br>
 &nbsp; &nbsp; &nbsp;2001. ISBN 2-7005-0222-1<br><br>
 &nbsp; &nbsp;* (fr) Laugel-Erny, Elsa. Cours d'alsacien. Les Editions
du Quai,<br>
 &nbsp; &nbsp; &nbsp;1999. ISBN 978-2903548018<br><br>
 &nbsp; &nbsp;* (fr) Matzen, Raymond, and L&eacute;on Daul. Wie Geht's ? Le dialecte
&agrave;<br>
 &nbsp; &nbsp; &nbsp;la port&eacute;e de tous La Nu&eacute;e Bleue, 1999. ISBN 2-7165-0464-4<br><br>
 &nbsp; &nbsp;* (fr) Matzen, Raymond, and L&eacute;on Daul. Wie Steht's ? Lexiques<br>
 &nbsp; &nbsp; &nbsp;alsacien et fran&ccedil;ais, Variantes dialectales, Grammaire
La Nu&eacute;e<br>
 &nbsp; &nbsp; &nbsp;Bleue, 2000. ISBN 2-7165-0525-X<br><br>
 &nbsp; &nbsp;* (de) Fr&eacute;d&eacute;ric Hartweg: Die Sprachen im Elsass: Kalter Krieg
oder<br>
 &nbsp; &nbsp; &nbsp;vers&ouml;hntes Miteinander?. In: Ingo Kolboom und Bernd
Rill<br>
 &nbsp; &nbsp; &nbsp;(Hrsg.): Frankophonie -- nationale und internationale<br>
 &nbsp; &nbsp; &nbsp;Dimensionen. Argumente und Materialien zum Zeitgeschehen
35,<br>
 &nbsp; &nbsp; &nbsp;M&uuml;nchen: Hanns Seidel Stiftung, ISBN<br>
 &nbsp; &nbsp; &nbsp;3-88795-249-9. http://www.hss.de/downloads/argumente_materialien_35.pdf<br><br>
 &nbsp; &nbsp;* (de) Hubert Klausmann, Konrad Kunze und Renate Schrambke
(1994):<br>
 &nbsp; &nbsp; &nbsp;Kleiner Dialektatlas - Alemannisch und Schw&auml;bisch
in<br>
 &nbsp; &nbsp; &nbsp;Baden-W&uuml;rttemberg. Ver&ouml;ff. Alem. Inst. Frbg. Themen
der<br>
 &nbsp; &nbsp; &nbsp;Landeskunde 6, B&uuml;hl (Baden): Konkordia, 1994.<br><br>
 &nbsp; &nbsp;* (de) Friedrich Maurer: Neue Forschungen zur s&uuml;dwestdeutschen<br>
 &nbsp; &nbsp; &nbsp;Sprachgeschichte. In: Sprachgeographie Beih. Wirkendes
Wort. 21,<br>
 &nbsp; &nbsp; &nbsp;S. 119-163, D&uuml;sseldorf: Schwann, 1972.<br><br>
6. Any other relevant information: Do note there exists several<br>
"sub-dialects" (specifically between North and South of Alsace)
but I<br>
do not know if there is a comprehensive list of them yet. Do note also<br>
that some alsatian local dialects are *not* variant of Alemannic at<br>
all but Franconian or even Roman languages. Alsatian is still in<br>
common use in Alsace, spoken and written. There is a localization of<br>
Microsoft Word<br>
(http://www.faz.net/s/Rub4C34FD0B1A7E46B88B0653D6358499FF/Doc~E7E48128AB8C348E1BCEB1EAF2D4105EA~ATpl~Ecommon~Scontent.html)<br>
but I do not know if they use proper language tags.<br>
_______________________________________________<br>
Ietf-languages mailing list<br>
Ietf-languages <at> alvestrand.no<br>
http://www.alvestrand.no/mailman/listinfo/ietf-languages<br><br>
<br>
</div>
Stephane Bortzmeyer | 4 Jan 2008 11:29
Picon

Description on tags, not just subtags?

A registration request of a variant recently triggered a discussion on
wether we should be able to add Description fields to tags, not just
subtags (ignoring the specific issue of grandfathered tags).

For instance, if we use "alsatian" to refer to the variant of alemannic
spoken in Alsace (France), we cannot put a:

  Description: Alsatian 
  Description: Els&#xE4;ssisch

to the tag gsw-FR.

Is it a real problem? Should we address it? How?

Markus Scherer <markus.icu <at> gmail.com> wrote:

The Unicode Consortium encountered the same sort of problem for
"encode this 'character' as that sequence of Unicode code points". A
partial solution was to register names for some of those recommended
sequences. Adding registration for select language tags, not just
subtags, might help here as well.

Doug Ewell | 4 Jan 2008 17:19

Re: Description on tags, not just subtags?

Stephane Bortzmeyer <bortzmeyer at nic dot fr> wrote:

> A registration request of a variant recently triggered a discussion on 
> wether we should be able to add Description fields to tags, not just 
> subtags (ignoring the specific issue of grandfathered tags).
>
> For instance, if we use "alsatian" to refer to the variant of 
> alemannic spoken in Alsace (France), we cannot put a:
>
>   Description: Alsatian
>   Description: Els&#xE4;ssisch
>
> to the tag gsw-FR.
>
> Is it a real problem? Should we address it? How?

It might be a real problem, if the dialect or other language variant is 
commonly known by a distinct name (not just, say, "Australian English") 
and if tag users are unlikely to associate the dialect with the name of 
the base language.

In 
http://www.alvestrand.no/pipermail/ietf-languages/2008-January/007279.html I 
proposed using the Comments field for this purpose, to avoid introducing 
a new mechanism that would probably be used by only a handful of 
(cherry-picked) variations:

Type: language
Subtag: gsw
Description: Swiss German
Description: Alemannic
Added: 2006-03-08
Suppress-Script: Latn
Comment: gsw-FR represents the Alsatian dialect

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ 

Addison Phillips | 4 Jan 2008 17:51
Picon
Favicon

Re: Re: Description on tags, not just subtags?

> In 
> http://www.alvestrand.no/pipermail/ietf-languages/2008-January/007279.html 
> I proposed using the Comments field for this purpose, to avoid 
> introducing a new mechanism that would probably be used by only a 
> handful of (cherry-picked) variations:
> 
> Type: language
> Subtag: gsw
> Description: Swiss German
> Description: Alemannic
> Added: 2006-03-08
> Suppress-Script: Latn
> Comment: gsw-FR represents the Alsatian dialect
> 

I dislike this idea. We already have experience with it and its 
"unintended consequences". See: sign language tags.

For that matter, we have ample experience with region subtags being 
imbued with artificial meaning. For a long time, "zh-TW" 'meant' 
Traditional Chinese, for example. Part of the reason behind this whole 
effort is to reduce the need to attach artificial meaning to subtags.

This isn't to say that the tag "gsw-FR" doesn't encompass or "mean" the 
Alsatian dialect of Alemannic. It certainly encompasses Alsatian. But to 
start packing the registry with these sorts of comments (and the 
registration process with requests for same) strikes me as 
counter-productive to the point that I'd almost favor adding a MUST NOT 
to draft-4646bis.

Guidance on tag formation should, incidentally and in my opinion, be in 
the form of external documentation (such as on Langtag.net, in CLDR, or 
via the W3C articles on same, just to name a few) or in an additional 
informational documents (possibly an Informational RFC). Burying tag 
choice information like this in the registry seems complicated and 
haphazard.

Addison

--

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.
Chair -- W3C Internationalization Core WG

Internationalization is an architecture.
It is not a feature.

Randy Presuhn | 5 Jan 2008 01:29
Picon

Re: Re: Description on tags, not just subtags?

Hi -

As a technical contributor...

> From: "Addison Phillips" <addison <at> yahoo-inc.com>
> To: "Doug Ewell" <dewell <at> roadrunner.com>
> Cc: "LTRU Working Group" <ltru <at> ietf.org>
> Sent: Friday, January 04, 2008 8:51 AM
> Subject: Re: [Ltru] Re: Description on tags, not just subtags?
...
> > Comment: gsw-FR represents the Alsatian dialect
> > 
> 
> I dislike this idea. We already have experience with it and its 
> "unintended consequences". See: sign language tags.
> 
> For that matter, we have ample experience with region subtags being 
> imbued with artificial meaning. For a long time, "zh-TW" 'meant' 
> Traditional Chinese, for example. Part of the reason behind this whole 
> effort is to reduce the need to attach artificial meaning to subtags.
> 
> This isn't to say that the tag "gsw-FR" doesn't encompass or "mean" the 
> Alsatian dialect of Alemannic. It certainly encompasses Alsatian. But to 
> start packing the registry with these sorts of comments (and the 
> registration process with requests for same) strikes me as 
> counter-productive to the point that I'd almost favor adding a MUST NOT 
> to draft-4646bis.
> 
> Guidance on tag formation should, incidentally and in my opinion, be in 
> the form of external documentation (such as on Langtag.net, in CLDR, or 
> via the W3C articles on same, just to name a few) or in an additional 
> informational documents (possibly an Informational RFC). Burying tag 
> choice information like this in the registry seems complicated and 
> haphazard.
...

There are multiple questions here.

    1) Do we need to be explicit about the conditions under which a
       registration request would be considered to refer to something
       which can be clearly expressed with existing subtags? 
       (That is, should a request for "alsation" be denied because it
       would mean precisely the same thing as "gsw-FR"? (I leave it to
       experts in this language to determine whether that is indeed
       the case.))

    2) There may be cases where the correct tag for a particular
       language may not be immediately obvious from the data in
       the registry.  I can see several possible responses:
           a) attempt to add the information to the registry
           b) (as a WG) take on the effort of defining a repository
              for this information
           c) leave the documentation of such cases to others

    3) Do we need to provide additional clarification of precisely
       what is meant by the use of a region subtag?  The concept
       seems to be relatively fuzzy, encompassing both dialect and
       orthography.

My own answers:
    (1) I think some additional clarification might be helpful, as
        the Alsation request shows.
    (2) I'm inclined to go along with (a), but only so far as to
        recognize that this would not be systematic or comprehensive
        by any stretch of the imagination.  As soon as we had more
        than a handful of "Alsation" cases, my preference would shift
        to (c).
    (3) I think the fuzziness is a feature (meaning a bug a we can't
        fix).  Getting rid of the fuzziness would require a re-architecting
        of language tagging in general, but this shouldn't come as
        a surprise to anyone who has waded through the discussions
        on scripts and modalities and orthographies and historical
        forms over the years.  Consequently, I do not think we need
        additional clarification on this point.

Randy

Nicholas Shanks | 7 Jan 2008 15:21

Re: Progressing beyond borders—making subtags inclusive

Note: I haven't yet read most of the other replies to my email yet,  
sorry if I overlap with what others have said. Also moved email to new  
list per request from Mister Presuhn.

On 4 Jan 2008, at 19:59, John Cowan wrote:

> Nicholas Shanks scripsit:
>
>> the guy that submitted the scouse request has a vested
>> interest in his own dialect,
>
> Actually not:  I speak pure old classical East Coast American.

Oh. I was pretty sure in my mind that the person who registered it  
(who's name I could not recall) had referenced a dictionary that he  
had authored himself, partly it had stuck in my mind, and this book  
was published in the city of Liverpool, so I assumed he was from that  
area too. I visited a website referred to by the registration and it  
was all about why his accent was better than everyone else's and  
seemed generally derogatory to non-scousers. It offended me. Maybe my  
memory is playing tricks on me, in which case I apologise unreservedly.

>> en-US represents a cluster of dialects and accents, with a unified
>> orthography, and en-GB represents a cluster of accents and dialects
>> (some overlapping with en-US), but a different orthography. Thus
>> en-GB/US is pretty useless to people who are tagging audio data,
>> but quite useful to those tagging written data.
>
> Although of course there are many individual isoglosses that cross the
> Atlantic, I think you greatly overstate the case.

My point was not that there are accents similar on both sides, but  
more that there is no unified "en-US" accent, but there is a single en- 
US orthography. Likewise for -GB. thus the country codes aren't a  
precise way of identifying an accent, but are a precise way of  
disambiguating dates and the meaning of words, especially potentially  
upsetting ones like 'fanny'.

> I don't think there are any U.K. varieties that even a half-trained  
> person could mistake for
> American ones, and certainly not vice versa.

I agree

>> I believe that having a subtag registered is at present too difficult
>> (requirement for dictionaries!? what if it's mostly just an accent
>> with only phonemic changes relative to surrounding accents).
>
> Dictionaries are not required; they are just an example of the kind
> of documentation that's acceptable.  We need to be sure, when we are
> registering a tag, that it is not substantially identical with some
> existing tag, that's all, so there must exist documentation of the
> variety being tagged.

This sounds like an "en-scots" vs. "sco" kind of debate.

>> As an example, things like the supposedly "British English" speech
>> synthesizer voices on my computer (which the OS processes using the
>> tag "en_GB" from the voice's property list) sound nothing like most  
>> of
>> the accents of the United Kingdom,
>
> I doubt the en-US accent is much like the vast majority of U.S.  
> speakers
> nowadays either.

precisely :)

>> they would be better marked as "en-received" or similar.
>
> Register it, then.

Okay. All in favo(u)r say "Aye". ;-)

> If the document was tagged "en-US-dixie", then a synth with an "en-US"
> voice (probably so-called General American) would at least get it
> approximately right (assuming, that is, that it doesn't speak  
> Hawkingese,
> which sounds American to the English and pseudo-Swedish to the  
> Americans).

"Dixie" — I forgot what it was called!
Didn't quite get whether you were agreeing or disagreeing with me here.
What I was thinking was to allow content written with one orthography  
be read by a voice designed for a different one. Some conversion could  
happen e.g. if an american voice was reading aluminium it could drop  
the penultimate syllable and move the stress, instead of trying to  
pronounce it with a Gen. Am. accent.

>> I'm sure we can all agree on commonly recognised dialects for  
>> English,
>
> I wish I shared your certitude.

Which ones would you suggest as being problematic? I suggested en- 
scots above, though I can;t think of any more.
I would have thought looking through the lexicon of a language and  
seeing which accents/dialects/patois already have their own names  
would be a good start. Am I wrong?

- Nicholas.
Attachment (smime.p7s): application/pkcs7-signature, 2427 bytes
Note: I haven't yet read most of the other replies to my email yet,  
sorry if I overlap with what others have said. Also moved email to new  
list per request from Mister Presuhn.

On 4 Jan 2008, at 19:59, John Cowan wrote:

> Nicholas Shanks scripsit:
>
>> the guy that submitted the scouse request has a vested
>> interest in his own dialect,
>
> Actually not:  I speak pure old classical East Coast American.

Oh. I was pretty sure in my mind that the person who registered it  
(who's name I could not recall) had referenced a dictionary that he  
had authored himself, partly it had stuck in my mind, and this book  
was published in the city of Liverpool, so I assumed he was from that  
area too. I visited a website referred to by the registration and it  
was all about why his accent was better than everyone else's and  
seemed generally derogatory to non-scousers. It offended me. Maybe my  
memory is playing tricks on me, in which case I apologise unreservedly.

>> en-US represents a cluster of dialects and accents, with a unified
>> orthography, and en-GB represents a cluster of accents and dialects
>> (some overlapping with en-US), but a different orthography. Thus
>> en-GB/US is pretty useless to people who are tagging audio data,
>> but quite useful to those tagging written data.
>
> Although of course there are many individual isoglosses that cross the
> Atlantic, I think you greatly overstate the case.

My point was not that there are accents similar on both sides, but  
more that there is no unified "en-US" accent, but there is a single en- 
US orthography. Likewise for -GB. thus the country codes aren't a  
precise way of identifying an accent, but are a precise way of  
disambiguating dates and the meaning of words, especially potentially  
upsetting ones like 'fanny'.

> I don't think there are any U.K. varieties that even a half-trained  
> person could mistake for
> American ones, and certainly not vice versa.

I agree

>> I believe that having a subtag registered is at present too difficult
>> (requirement for dictionaries!? what if it's mostly just an accent
>> with only phonemic changes relative to surrounding accents).
>
> Dictionaries are not required; they are just an example of the kind
> of documentation that's acceptable.  We need to be sure, when we are
> registering a tag, that it is not substantially identical with some
> existing tag, that's all, so there must exist documentation of the
> variety being tagged.

This sounds like an "en-scots" vs. "sco" kind of debate.

>> As an example, things like the supposedly "British English" speech
>> synthesizer voices on my computer (which the OS processes using the
>> tag "en_GB" from the voice's property list) sound nothing like most  
>> of
>> the accents of the United Kingdom,
>
> I doubt the en-US accent is much like the vast majority of U.S.  
> speakers
> nowadays either.

precisely :)

>> they would be better marked as "en-received" or similar.
>
> Register it, then.

Okay. All in favo(u)r say "Aye". ;-)

> If the document was tagged "en-US-dixie", then a synth with an "en-US"
> voice (probably so-called General American) would at least get it
> approximately right (assuming, that is, that it doesn't speak  
> Hawkingese,
> which sounds American to the English and pseudo-Swedish to the  
> Americans).

"Dixie" — I forgot what it was called!
Didn't quite get whether you were agreeing or disagreeing with me here.
What I was thinking was to allow content written with one orthography  
be read by a voice designed for a different one. Some conversion could  
happen e.g. if an american voice was reading aluminium it could drop  
the penultimate syllable and move the stress, instead of trying to  
pronounce it with a Gen. Am. accent.

>> I'm sure we can all agree on commonly recognised dialects for  
>> English,
>
> I wish I shared your certitude.

Which ones would you suggest as being problematic? I suggested en- 
scots above, though I can;t think of any more.
I would have thought looking through the lexicon of a language and  
seeing which accents/dialects/patois already have their own names  
would be a good start. Am I wrong?

- Nicholas.
Nicholas Shanks | 7 Jan 2008 16:02

Re: Progressing beyond borders-making subtags inclusive

On 4 Jan 2008, at 20:28, Randy Presuhn wrote:

>> I say this because en-US represents a cluster of
>> dialects and accents, with a unified orthography, and en-GB  
>> represents
>> a cluster of accents and dialects (some overlapping with en-US),
>
> Could you give an example of such an overlap?  The divergence in
> pronunciation was already marked in the 1700s.

I have personally noticed a lot of convergence is occurring here in  
the UK, with the large quantity of US sitcoms and music consumed. US  
speech patterns and pronunciations, especially when used in specific  
phrases, but also in the general patterns of speech. I don't believe  
it causes people to gradually start speaking with a different accent,  
but certainly can lead to blended speech in the young which differs  
from that of their parents. I have not looked for academic evidence  
for this though.

>> I believe that having a subtag registered is at present too difficult
>> (requirement for dictionaries!? what if it's mostly just an accent
>> with only phonemic changes relative to surrounding accents). A
>> relaxation of the barriers would lead to more de facto recognised
>> dialects being available to choose from.
>
> I'm not able to figure out what you're trying to say here.

I want to say "Here's some subtags for English that I think should be  
registered. Anyone disagree?" and just list them by name, using  
commonly understood terms that lay people wanting to use the tags will  
be able to identify with ease.
By lay people I mean tens of millions of folks like my girlfriend, who  
wants to make a website about dolphins, and uses one of ov dem HMTL  
programs to do so.

>> As an example, things like the supposedly "British English" speech
>> synthesizer voices on my computer (which the OS processes using the
>> tag "en_GB" from the voice's property list) sound nothing like most  
>> of
>> the accents of the United Kingdom, they would be better marked as  
>> "en-
>> received" or similar.
>
> This is not a tagging problem.  It's a complaint about a speech
> synthesizer

I disagree here. It's not the synthesizer and/or website's fault that  
the palette of choices is so restricted. If those creating the voice  
were to have a wider choice of subtags, they could more accurately  
mark it up. The synth was just one example of electronic consumption  
of content. It may be a search engine or something as yet undreamt of.  
Age and the rural/urban split also have significant influences on  
language, though those are at present not dealt with.

> and could be made for any language not tagged right
> down to the level of some person's idiolect.

Agreed. The line has to be drawn somewhere. I just think it's too  
course at present. We have the facility for creating 'approved'  
subtags without anything breaking. We might as well use it to the  
maximum :-)

>> I'm sure we can all agree on commonly recognised dialects for  
>> English,
>
> I'd be surprised.  The "cowboy" dialects spoken by my relatives in
> South Dakota differ from what the ones in Wyoming speak, and
> neither sounds much like Bush-speak.  With variation seemingly on
> the rise in US English, compiling an agreed list might be harder
> than you think.

Should have used 'dixie' as was pointed out, but as I am not familiar  
with US dialects in general I wouldn't be suggesting any :)

>> as it is a first langauge for many people on this list, and familiar
>> for many others. For other languages compiling a list might involve
>> asking a scholar for suggestions.
>
> That's not how ietf-languages <at> iana.org is supposed to work.

Okay. You referred to the mailing list. I was more referring to how  
the standard should be created.
I presume there are people out there (and on these lists) who get paid  
to create these things. These people would conduct or locate relevant  
research and create a map, rather like a barometric or elevation map,  
with contours encircling different dialects.

> Rather, someone (anyone) who has a need of a subtag for a
> particular dialect submits a registration request, the request is
> discussed, and the Language Subtag Reviewer decides whether
> to accept the registration.

The thing is most people using them in my field (web authors) have no  
idea this list exists, nor that they even have the need for a new  
subtag when they are creating new content. Quite often the tags are  
added without the author even knowing.
The tags have to be created beforehand and given as options on a  
platter to these people by the software.
If you expect users to register their own codes, would you like to see  
dialog boxes like this when people press 'save as HTML' ?

Please choose the most appropriate language tag for this page:
[ ]  en    English (language WizzoWebWhacker is running in)
[ ]  en-IE Hibernian English (taken from your time zone)
[x]  en-enteryourdialecthere (automatically sends a registration email  
to IETF)

[Cancel] [Okay]

(I write that only half in jest)

>> It occurred to me while writing this that perhaps a good solution
>> would be to use country codes for written content that uses the
>> national orthography, and dialect tags when transcribing spoken
>> content or for audio data. You would only combine the two if you were
>> transcribing the speech of someone with that dialect into the
>> orthography of a country (maybe not the country of the speaker).
>
> Interesting idea.  Discussion of such a proposal belongs on ltru <at> ietf.org 
> ,
> not here.

Moved. No comments on this? Obviously changes like this would have to  
be best practice suggestions, and not rules, for compatibility.

- Nicholas.
Attachment (smime.p7s): application/pkcs7-signature, 2427 bytes
On 4 Jan 2008, at 20:28, Randy Presuhn wrote:

>> I say this because en-US represents a cluster of
>> dialects and accents, with a unified orthography, and en-GB  
>> represents
>> a cluster of accents and dialects (some overlapping with en-US),
>
> Could you give an example of such an overlap?  The divergence in
> pronunciation was already marked in the 1700s.

I have personally noticed a lot of convergence is occurring here in  
the UK, with the large quantity of US sitcoms and music consumed. US  
speech patterns and pronunciations, especially when used in specific  
phrases, but also in the general patterns of speech. I don't believe  
it causes people to gradually start speaking with a different accent,  
but certainly can lead to blended speech in the young which differs  
from that of their parents. I have not looked for academic evidence  
for this though.

>> I believe that having a subtag registered is at present too difficult
>> (requirement for dictionaries!? what if it's mostly just an accent
>> with only phonemic changes relative to surrounding accents). A
>> relaxation of the barriers would lead to more de facto recognised
>> dialects being available to choose from.
>
> I'm not able to figure out what you're trying to say here.

I want to say "Here's some subtags for English that I think should be  
registered. Anyone disagree?" and just list them by name, using  
commonly understood terms that lay people wanting to use the tags will  
be able to identify with ease.
By lay people I mean tens of millions of folks like my girlfriend, who  
wants to make a website about dolphins, and uses one of ov dem HMTL  
programs to do so.

>> As an example, things like the supposedly "British English" speech
>> synthesizer voices on my computer (which the OS processes using the
>> tag "en_GB" from the voice's property list) sound nothing like most  
>> of
>> the accents of the United Kingdom, they would be better marked as  
>> "en-
>> received" or similar.
>
> This is not a tagging problem.  It's a complaint about a speech
> synthesizer

I disagree here. It's not the synthesizer and/or website's fault that  
the palette of choices is so restricted. If those creating the voice  
were to have a wider choice of subtags, they could more accurately  
mark it up. The synth was just one example of electronic consumption  
of content. It may be a search engine or something as yet undreamt of.  
Age and the rural/urban split also have significant influences on  
language, though those are at present not dealt with.

> and could be made for any language not tagged right
> down to the level of some person's idiolect.

Agreed. The line has to be drawn somewhere. I just think it's too  
course at present. We have the facility for creating 'approved'  
subtags without anything breaking. We might as well use it to the  
maximum :-)

>> I'm sure we can all agree on commonly recognised dialects for  
>> English,
>
> I'd be surprised.  The "cowboy" dialects spoken by my relatives in
> South Dakota differ from what the ones in Wyoming speak, and
> neither sounds much like Bush-speak.  With variation seemingly on
> the rise in US English, compiling an agreed list might be harder
> than you think.

Should have used 'dixie' as was pointed out, but as I am not familiar  
with US dialects in general I wouldn't be suggesting any :)

>> as it is a first langauge for many people on this list, and familiar
>> for many others. For other languages compiling a list might involve
>> asking a scholar for suggestions.
>
> That's not how ietf-languages <at> iana.org is supposed to work.

Okay. You referred to the mailing list. I was more referring to how  
the standard should be created.
I presume there are people out there (and on these lists) who get paid  
to create these things. These people would conduct or locate relevant  
research and create a map, rather like a barometric or elevation map,  
with contours encircling different dialects.

> Rather, someone (anyone) who has a need of a subtag for a
> particular dialect submits a registration request, the request is
> discussed, and the Language Subtag Reviewer decides whether
> to accept the registration.

The thing is most people using them in my field (web authors) have no  
idea this list exists, nor that they even have the need for a new  
subtag when they are creating new content. Quite often the tags are  
added without the author even knowing.
The tags have to be created beforehand and given as options on a  
platter to these people by the software.
If you expect users to register their own codes, would you like to see  
dialog boxes like this when people press 'save as HTML' ?

Please choose the most appropriate language tag for this page:
[ ]  en    English (language WizzoWebWhacker is running in)
[ ]  en-IE Hibernian English (taken from your time zone)
[x]  en-enteryourdialecthere (automatically sends a registration email  
to IETF)

[Cancel] [Okay]

(I write that only half in jest)

>> It occurred to me while writing this that perhaps a good solution
>> would be to use country codes for written content that uses the
>> national orthography, and dialect tags when transcribing spoken
>> content or for audio data. You would only combine the two if you were
>> transcribing the speech of someone with that dialect into the
>> orthography of a country (maybe not the country of the speaker).
>
> Interesting idea.  Discussion of such a proposal belongs on ltru <at> ietf.org 
> ,
> not here.

Moved. No comments on this? Obviously changes like this would have to  
be best practice suggestions, and not rules, for compatibility.

- Nicholas.
Nicholas Shanks | 7 Jan 2008 16:22

Re: Progressing beyond borders?making subtags inclusive

On 7 Jan 2008, at 07:40, Stephane Bortzmeyer wrote:

> This is a bad start for the discussion. If you call each proposal
> "partizan", we will soon go to flame wars...

Well that's why I want someone who can be seen to be impartial to go  
through and do them all!

>> If I requested the registration of "alsatian" and not a general
> registration of all alemannic dialects, it's because I know about this
> dialect (or knows where to find information).

Well you probably know enough about the others to decide how many  
there are and give them names. It will then be up to users to pick the  
closest/most appropriate for their use case.

>> nobody has bothered to register the others.
>
> The IETF is volunteer work.

Well the standards produced are used by corporations like Sony.  
Couldn't there be a clause that public companies contribute some  
nominal fee to get things looked after professionally?
Do these standards not get adopted by the ISO, IEEE and W3C, all of  
whom have money? They could also pay someone to do a thorough job.

>> I believe that having a subtag registered is at present too
>> difficult
>
> I would partly agree (I wrote some of the texts on
> http://www.langtag.net/ because I wanted to help registration
> beginners) but only partly because registration of a subtag is a
> serious matter (once a subtag, always a subtag, there is no way back)
> and should not be done lightly.

Thanks for the URL! I will go through the registration page and see if  
I can get many UK dialects of English officially sanctioned by the end  
of this. It looks very useful for pointing beginners and not-beginners- 
but-not-experts-either at. Thanks for creating this resource :-)

>> I'm sure we can all agree on commonly recognised dialects for
>> English, as it is a first langauge for many people on this list, and
>> familiar for many others.
>
> I am not sure you properly assert the amount of work it
> means.

Perhaps not for some dialects, but surely every language has low- 
hanging fruit that all can agree on? Getting all of these would at  
least be a start.

>> For other languages compiling a list might involve asking a scholar
>> for suggestions.
>
> This would be a huge change from "Anyone can request a registration,
> if he backs his request with serious facts" from "A committee playing
> ISO-639, but without the resources of SIL". I'm note sure I would
> approve it. But, anyway, this discussion belongs to the LTRU Working
> Group.

The SIL could help. Good idea :)

- Nicholas.
Attachment (smime.p7s): application/pkcs7-signature, 2427 bytes
On 7 Jan 2008, at 07:40, Stephane Bortzmeyer wrote:

> This is a bad start for the discussion. If you call each proposal
> "partizan", we will soon go to flame wars...

Well that's why I want someone who can be seen to be impartial to go  
through and do them all!

>> If I requested the registration of "alsatian" and not a general
> registration of all alemannic dialects, it's because I know about this
> dialect (or knows where to find information).

Well you probably know enough about the others to decide how many  
there are and give them names. It will then be up to users to pick the  
closest/most appropriate for their use case.

>> nobody has bothered to register the others.
>
> The IETF is volunteer work.

Well the standards produced are used by corporations like Sony.  
Couldn't there be a clause that public companies contribute some  
nominal fee to get things looked after professionally?
Do these standards not get adopted by the ISO, IEEE and W3C, all of  
whom have money? They could also pay someone to do a thorough job.

>> I believe that having a subtag registered is at present too
>> difficult
>
> I would partly agree (I wrote some of the texts on
> http://www.langtag.net/ because I wanted to help registration
> beginners) but only partly because registration of a subtag is a
> serious matter (once a subtag, always a subtag, there is no way back)
> and should not be done lightly.

Thanks for the URL! I will go through the registration page and see if  
I can get many UK dialects of English officially sanctioned by the end  
of this. It looks very useful for pointing beginners and not-beginners- 
but-not-experts-either at. Thanks for creating this resource :-)

>> I'm sure we can all agree on commonly recognised dialects for
>> English, as it is a first langauge for many people on this list, and
>> familiar for many others.
>
> I am not sure you properly assert the amount of work it
> means.

Perhaps not for some dialects, but surely every language has low- 
hanging fruit that all can agree on? Getting all of these would at  
least be a start.

>> For other languages compiling a list might involve asking a scholar
>> for suggestions.
>
> This would be a huge change from "Anyone can request a registration,
> if he backs his request with serious facts" from "A committee playing
> ISO-639, but without the resources of SIL". I'm note sure I would
> approve it. But, anyway, this discussion belongs to the LTRU Working
> Group.

The SIL could help. Good idea :)

- Nicholas.
John Cowan | 7 Jan 2008 16:36

Re: Progressing beyond borders�??making subtags inclusive

Nicholas Shanks scripsit:

> Oh. I was pretty sure in my mind that the person who registered it  
> (who's name I could not recall) had referenced a dictionary that he  
> had authored himself, partly it had stuck in my mind, and this book  
> was published in the city of Liverpool, so I assumed he was from that  
> area too. I visited a website referred to by the registration and it  
> was all about why his accent was better than everyone else's and  
> seemed generally derogatory to non-scousers. It offended me. Maybe my  
> memory is playing tricks on me, in which case I apologise unreservedly.

No problem.  Both Stan Kelly-Bootle and I are Olde Hacqueres these
days (born in 1929 and 1958 respectively), but we are quite distinct.
For example, while we both have written folk songs, his have actually
been performed by singers of note, whereas mine have been sung only
by myself.  His book on Scouse is in its sixteenth edition, whereas
my book on Lojban is still mired between its first and second edition,
without much prospect of progress.  He was awarded the first Diploma in
Computer Science from Cambridge University, whereas I have no degrees
at all.  There are many other such points.

As for Kelly-Bootle's alleged belief in the superiority of his own accent:
given his known and evil sense of humor, I diffidently suggest that he
was taking the Minnie out of you, as I believe y'all say in Rightpondia.
("The English-speaking nations are Rightpondia and Begrudgeria in Europe,
Leftpondia and Northicia (pronounced with four syllables) in America,
Bharattia in Asia, Sarfeffrica and Cecilia (now called Zimbabwe) in
Africa, and Downundria and Aotearoa in Oceania. Rightpondia may further
be divided into the four sub-nations of Londonia, Eboracia, Bagpipia,
and Quaint." -- me)

> This sounds like an "en-scots" vs. "sco" kind of debate.

A fine example of why trying to make firm decisions about dialects
is a tricky business.

> >>they would be better marked as "en-received" or similar.
> >
> >Register it, then.
> 
> Okay. All in favo(u)r say "Aye". ;-)

Not so fast.  You have to follow the RFC 4646 procedures, and do it back
on ietf-languages.

> Didn't quite get whether you were agreeing or disagreeing with me here.
> What I was thinking was to allow content written with one orthography  
> be read by a voice designed for a different one. Some conversion could  
> happen e.g. if an american voice was reading aluminium it could drop  
> the penultimate syllable and move the stress, instead of trying to  
> pronounce it with a Gen. Am. accent.

Quite.  I was addressing your point that "en-dixie" is superior to
"en-US-dixie" by pointing out that on the contrary the latter has
better default fallback properties.  A synthesizer that can
recognize the "en-US" part can produce at least some American voice,
whereas with "en-dixie" it will probably produce its default "en",
which may or may not be American.

> Which ones would you suggest as being problematic? I suggested en- 
> scots above, though I can;t think of any more.

(I believe we settled on "en-scotland".)

The difficulty is to know where to stop.  For example, the American
South is a fairly well-defined dialect area, but how many dialects does
it contain?  There is the centuries-old distinction between the northern
and the southern portions, and a more recent split (of the last fifty
years or so) between the inland and coastal parts of the latter.  But much
finer subdivision can be and has been done when necessary and appropriate.
ietf-languages has therefore always taken an agnostic approach, waiting
for people to declare their actual needs rather than trying to impose a
particular systematism on the data ("en-scouse" and "en-boont" were in
the way of test cases rather than ones required by my personal needs).

> I would have thought looking through the lexicon of a language and  
> seeing which accents/dialects/patois already have their own names  
> would be a good start. Am I wrong?

It isn't very reliable, as dialects often don't become popularly visible
for a long time after they are tenaciously established.  For example,
the "Northern Cities" accent of en-us is a matter of the last fifty years
or so, like the Inland South I mentioned above, yet very few Americans,
*including the speakers*, are even aware of its existence.  Yet it is
radically different phonologically, having massive differences in the
short vowels as great as (but not the same as) traditional Cockney,
Australian, or New Zealander.  Only when specific instances arise, as
when a boy moved "Ian" newly named to the region is teased because his
peers hear his name as "Ann", does any awareness of a dialect spoken by
millions of people in the U.S. come to the surface.

(The U.S. isn't the only place with new dialects; Scouse only arose in
19th century, when the Liverpool's original dialect became heavily mixed
with Hiberno-English.)

[from another email]

> I have personally noticed a lot of convergence is occurring here in
> the UK, with the large quantity of US sitcoms and music consumed. US
> speech patterns and pronunciations, especially when used in specific
> phrases, but also in the general patterns of speech. I don't believe
> it causes people to gradually start speaking with a different accent,
> but certainly can lead to blended speech in the young which differs
> from that of their parents. I have not looked for academic evidence
> for this though.

Quite right: convergence is going on.  But divergence is too.

> I want to say "Here's some subtags for English that I think should be
> registered. Anyone disagree?" and just list them by name, using
> commonly understood terms that lay people wanting to use the tags will
> be able to identify with ease.

Unfortunately, counterexamples like Northern Cities make that impossible --
it unquestionably exists, but most people haven't heard of it.

> Agreed. The line has to be drawn somewhere. I just think it's too
> course at present. We have the facility for creating 'approved'
> subtags without anything breaking. We might as well use it to the
> maximum :-)

To, but not beyond.  Waiting (for the most part) for actual need
rather than pre-registering en masse is an attempt to avoid such
false positives, particularly since "once registered, always registered".

> I presume there are people out there (and on these lists) who get paid
> to create these things. These people would conduct or locate relevant
> research and create a map, rather like a barometric or elevation map,
> with contours encircling different dialects.

Things are not so cut and dried, not even at the level of languages.
We have (for better and worse) adopted ISO 639-3, which is a splitter
among language taxonomies -- that is, it is much more likely to call a
difference between varieties a language rather than a dialect difference
than other taxonomies are.  What's left we have to work our way through
carefully.

> The thing is most people using them in my field (web authors) have no
> idea this list exists, nor that they even have the need for a new
> subtag when they are creating new content. Quite often the tags are
> added without the author even knowing.

Indeed, search engines are known to entirely discard all "en" tags,
as they are applied by broken software to texts in entirely different
languages so often as to be worthless.  Fortunately, authorial tagging
of random Web documents is not the only BCP 47 application.

> The tags have to be created beforehand and given as options on a
> platter to these people by the software.

True, which means it waits until some volunteer armed with sufficient
knowledge, documentation, and dedication steps forward (whether the
volunteer is paid by someone else to do it is not a consideration in
the IETF).  Fortunately, even a little accurate tagging goes a long way.

> Moved. No comments on this? Obviously changes like this would have to
> be best practice suggestions, and not rules, for compatibility.

We do have similar remarks in the current RFC 4646bis draft.

--

-- 
John Cowan  cowan <at> ccil.org  http://ccil.org/~cowan
Linguistics is arguably the most hotly contested property in the academic
realm. It is soaked with the blood of poets, theologians, philosophers,
philologists, psychologists, biologists and neurologists, along with
whatever blood can be got out of grammarians. - Russ Rymer


Gmane