Don Osborn | 9 Apr 2009 01:40

How to handle macrolanguage when no code?

In looking at the BBC website's offerings in African languages, one notes that they have grouped Kinyarwanda and Kirundi together under http://www.bbc.co.uk/greatlakes/  . This makes sense from a linguistic point of view since as I understand it, the two languages are almost the same. When looking at the view (page) source, one notes that they use lang="rw" (for Kinyarwanda). It may be that the pages I checked are properly Kinyarwanda and an expert would know that they are not Kirundi (rn), but it is in any event true that there is no code element to cover both languages.

 

I'm curious if there is any other recommended way to handle such a situation where web content may be deliberately and easily designed to cover more than one language as defined by ISO 639 when there is not currently any macrolanguage code for them. Could one for example define a whole page as having two languages? E.g., something like lang="rw, rn"?

 

Thanks in advance for any feedback.

 

Don

 

 

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
John Cowan | 9 Apr 2009 02:39

Re: How to handle macrolanguage when no code?

Don Osborn scripsit:

> I'm curious if there is any other recommended way to handle such a situation
> where web content may be deliberately and easily designed to cover more than
> one language as defined by ISO 639 when there is not currently any
> macrolanguage code for them. Could one for example define a whole page as
> having two languages? E.g., something like lang="rw, rn"?

Petition 639-3/RA for a new macrolanguage, I guess.  For sure this list
can't help you.

--

-- 
John Cowan  cowan <at> ccil.org   http://www.ccil.org/~cowan
Dievas dave dantis; Dievas duos duonos          --Lithuanian proverb
Deus dedit dentes; deus dabit panem             --Latin version thereof
Deity donated dentition;
  deity'll donate doughnuts                     --English version by Muke Tever
God gave gums; God'll give granary              --Version by Mat McVeagh
Phillips, Addison | 9 Apr 2009 02:53
Picon
Favicon

RE: [Ltru] How to handle macrolanguage when no code?

HTML certainly allows you to declare that some content is applicable to more than one language audience. See:

 

   http://www.w3.org/TR/i18n-html-tech-lang/#ri20040728.121358444

 

Otherwise, John Cowan’s advice seems appropriate… ISO 639-3 or ISO 639-5 would be your next stop. Note that macrolanguages are sometimes problematical, so you might also consider a collection code instead.

 

Addison Phillips

Globalization Architect -- Lab126

 

Internationalization is not a feature.

It is an architecture.

 

From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf Of Don Osborn
Sent: Wednesday, April 08, 2009 4:40 PM
To: 'LTRU Working Group'; 'IETF Languages Discussion'
Subject: [Ltru] How to handle macrolanguage when no code?

 

In looking at the BBC website's offerings in African languages, one notes that they have grouped Kinyarwanda and Kirundi together under http://www.bbc.co.uk/greatlakes/  . This makes sense from a linguistic point of view since as I understand it, the two languages are almost the same. When looking at the view (page) source, one notes that they use lang="rw" (for Kinyarwanda). It may be that the pages I checked are properly Kinyarwanda and an expert would know that they are not Kirundi (rn), but it is in any event true that there is no code element to cover both languages.

 

I'm curious if there is any other recommended way to handle such a situation where web content may be deliberately and easily designed to cover more than one language as defined by ISO 639 when there is not currently any macrolanguage code for them. Could one for example define a whole page as having two languages? E.g., something like lang="rw, rn"?

 

Thanks in advance for any feedback.

 

Don

 

 

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
Peter Constable | 9 Apr 2009 05:11
Picon
Favicon

RE: [Ltru] How to handle macrolanguage when no code?

If it is content in one linguistic variety and crafted to serve two audiences deemed in 639-3 to be distinct languages, then that strikes me as a potential macrolanguage scenario.

 

One key question is how narrow a scope of content is needed and how much deliberate effort is needed to craft something like that. For instance, a document consisting of “Papa!” can serve many different audiences, but that is solely because the scope of content is so constrained, and for that reason the bar is not met for a macrolanguage. But if it’s easy for a content provider to come up with content that serves both, then that’s interesting.

 

Another key question is why that content is functional for both audiences. Is it because it is expressed in a variety that can really be considered common, or is it because it’s actually in language A and 90% of speakers in language B are functionally bilingual in A? Does the common-identify label reflect actual linguistic commonality, or is it a logistic tool used in the repository to reflect merely a dual tasking?

 

 

Some thoughts. Discuss it with the 639-3 RA.

 

 

Peter

 

From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf Of Phillips, Addison
Sent: Wednesday, April 08, 2009 5:53 PM
To: Don Osborn; 'LTRU Working Group'; 'IETF Languages Discussion'
Subject: Re: [Ltru] How to handle macrolanguage when no code?

 

HTML certainly allows you to declare that some content is applicable to more than one language audience. See:

 

   http://www.w3.org/TR/i18n-html-tech-lang/#ri20040728.121358444

 

Otherwise, John Cowan’s advice seems appropriate… ISO 639-3 or ISO 639-5 would be your next stop. Note that macrolanguages are sometimes problematical, so you might also consider a collection code instead.

 

Addison Phillips

Globalization Architect -- Lab126

 

Internationalization is not a feature.

It is an architecture.

 

From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf Of Don Osborn
Sent: Wednesday, April 08, 2009 4:40 PM
To: 'LTRU Working Group'; 'IETF Languages Discussion'
Subject: [Ltru] How to handle macrolanguage when no code?

 

In looking at the BBC website's offerings in African languages, one notes that they have grouped Kinyarwanda and Kirundi together under http://www.bbc.co.uk/greatlakes/  . This makes sense from a linguistic point of view since as I understand it, the two languages are almost the same. When looking at the view (page) source, one notes that they use lang="rw" (for Kinyarwanda). It may be that the pages I checked are properly Kinyarwanda and an expert would know that they are not Kirundi (rn), but it is in any event true that there is no code element to cover both languages.

 

I'm curious if there is any other recommended way to handle such a situation where web content may be deliberately and easily designed to cover more than one language as defined by ISO 639 when there is not currently any macrolanguage code for them. Could one for example define a whole page as having two languages? E.g., something like lang="rw, rn"?

 

Thanks in advance for any feedback.

 

Don

 

 

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
Doug Ewell | 9 Apr 2009 06:09
Favicon

Re: How to handle macrolanguage when no code?

Don Osborn <dzo at bisharat dot net> wrote:

> In looking at the BBC website's offerings in African languages, one 
> notes that they have grouped Kinyarwanda and Kirundi together under 
> http://www.bbc.co.uk/greatlakes/  . This makes sense from a linguistic 
> point of view since as I understand it, the two languages are almost 
> the same. When looking at the view (page) source, one notes that they 
> use lang="rw" (for Kinyarwanda). It may be that the pages I checked 
> are properly Kinyarwanda and an expert would know that they are not 
> Kirundi (rn), but it is in any event true that there is no code 
> element to cover both languages.

Ethnologue says the two are mutually intelligible, which isn't quite the 
same as saying they are the same language.  This is one of those many 
gray areas in language identification.

The fact is that we rely on the distinctions that ISO 639 makes, and if 
they decide that Kinyarwanda and Kirundi (Rundi) are different languages 
then that's what we have to go with.  We can narrow language usage down 
to dialects or other variations, but we have no mechanism to create 
broader categories in such a way that a more specific tag would still 
match.

> I'm curious if there is any other recommended way to handle such a 
> situation where web content may be deliberately and easily designed to 
> cover more than one language as defined by ISO 639 when there is not 
> currently any macrolanguage code for them. Could one for example 
> define a whole page as having two languages? E.g., something like 
> lang="rw, rn"?

If you can, in any markup language or other protocol, it would be a 
feature of that markup language or protocol, and not of language tags or 
subtags per se.  This is similar to protocols that allow something like 
<lang="">.  It doesn't mean that the empty string is a valid language 
tag; it means the "lang" syntax exceptionally allows the empty string as 
a value.

I don't think we want to go down the path of offering aliases.  If the 
content truly is in "rw", it should be tagged as "rw" even if all 
speakers of "rn" can understand it perfectly.

We have 61 "collection code" subtags available in the Registry, with 
another 55 on the way when 4646bis is approved; one of those might do if 
you really need a different solution.  Asking ISO 639-3/RA to create a 
new macrolanguage to encompass these two languages might (or might not) 
create more confusion than it resolves.  Remember that a new 
macrolanguage would not result in new extlangs for RFC 4646bis.

--
Doug Ewell  *  Thornton, Colorado, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
Phillips, Addison | 9 Apr 2009 06:16
Picon
Favicon

RE: [Ltru] How to handle macrolanguage when no code?

> 
> On 2009/04/09 9:53, Phillips, Addison wrote:
> > HTML certainly allows you to declare that some content is
> applicable to more than one language audience. See:
> 
> Read "HTTP" for "HTML" here. HTML allows you to declare that
> different
> parts of a Web page are in different languages, but not that one
> and the
> same (part of a) Web page are in more than one language.
> 

That's correct. Although this also applies to HTML via the <meta> tag.

The 'lang' attribute in HTML (as with the xml:lang attribute in XML) allow only a single language tag to be
applied to a specific scope *within* a document. I should point out that the best practice link I pointed to
also makes this distinction: you can declare the "target audience" in one way and the "document
processing language" separately in another.

Addison
Martin J. Dürst | 9 Apr 2009 06:11
Picon
Gravatar

Re: [Ltru] How to handle macrolanguage when no code?

On 2009/04/09 9:53, Phillips, Addison wrote:
> HTML certainly allows you to declare that some content is applicable to more than one language audience. See:

Read "HTTP" for "HTML" here. HTML allows you to declare that different 
parts of a Web page are in different languages, but not that one and the 
same (part of a) Web page are in more than one language.

Regards,    Martin.

>
>     http://www.w3.org/TR/i18n-html-tech-lang/#ri20040728.121358444
>
> Otherwise, John Cowan’s advice seems appropriate… ISO 639-3 or ISO 639-5 would be your next stop.
Note that macrolanguages are sometimes problematical, so you might also consider a collection code instead.
>
> Addison Phillips
> Globalization Architect -- Lab126
>
> Internationalization is not a feature.
> It is an architecture.
>
> From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf Of Don Osborn
> Sent: Wednesday, April 08, 2009 4:40 PM
> To: 'LTRU Working Group'; 'IETF Languages Discussion'
> Subject: [Ltru] How to handle macrolanguage when no code?
>
> In looking at the BBC website's offerings in African languages, one notes that they have grouped
Kinyarwanda and Kirundi together under http://www.bbc.co.uk/greatlakes/  . This makes sense from a
linguistic point of view since as I understand it, the two languages are almost the same. When looking at
the view (page) source, one notes that they use lang="rw" (for Kinyarwanda). It may be that the pages I
checked are properly Kinyarwanda and an expert would know that they are not Kirundi (rn), but it is in any
event true that there is no code element to cover both languages.
>
> I'm curious if there is any other recommended way to handle such a situation where web content may be
deliberately and easily designed to cover more than one language as defined by ISO 639 when there is not
currently any macrolanguage code for them. Could one for example define a whole page as having two
languages? E.g., something like lang="rw, rn"?
>
> Thanks in advance for any feedback.
>
> Don
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ltru mailing list
> Ltru <at> ietf.org
> https://www.ietf.org/mailman/listinfo/ltru

--

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst <at> it.aoyama.ac.jp
_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
CE Whitehead | 9 Apr 2009 19:00
Picon
Favicon

How to handle macrolanguage when no code?

Hi, Don, for your reference:
 
http://www.w3.org/International/questions/qa-http-and-lang
"It is . . . worth noting that the meta element and the HTTP header both support a list of values. "
<meta http-equiv="Content-Language" content="rw, rn"/>
 
I don't know offhand of any collection code that would work here . . .
 
--C. E. Whitehead
cewcathar <at> hotmail.com

 

Don Osborn dzo at bisharat.net
Thu Apr 9 01:40:05 CEST 2009

 

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
Don Osborn | 12 Apr 2009 18:01

RE: [Ltru] How to handle macrolanguage when no code?

Thanks to all who replied on this question with suggestions, additional questions, and pointers.

 

I will try (to find the time) to get an answer from BBC on their approach and intentions, and also to get some feedback from someone familiar with Kinyarwanda and Kirundi.  This sort of situation is one that I think is potential with a number of languages (per some past threads), and that in such cases, the idea of a clear-cut single language definition and/or audience for page content may not hold. More information on such situations is will certainly become available as more web content in diverse languages is created.

 

As for requesting macrolanguage codes, that is another level, but obviously one to keep in mind. I think it is viable in many circumstances, but in others it may be difficult to make the case. The ad hoc way that ISO 639 evolved, however, means that there are similar cases of related tongues that are sometimes given a common code (interpreted after the fact as macrolanguage) and sometimes not.  I think that developments such as more web content in diverse languages and efforts such as the locales sub-project of ANLoc (African Network for Localisation) have the potential to highlight such issues.

 

Thanks again and all the best.

 

Don

 

 

From: Peter Constable [mailto:petercon <at> microsoft.com]
Sent: Wednesday, April 08, 2009 11:12 PM
To: Phillips, Addison; Don Osborn; 'LTRU Working Group'; 'IETF Languages Discussion'
Subject: RE: [Ltru] How to handle macrolanguage when no code?

 

If it is content in one linguistic variety and crafted to serve two audiences deemed in 639-3 to be distinct languages, then that strikes me as a potential macrolanguage scenario.

 

One key question is how narrow a scope of content is needed and how much deliberate effort is needed to craft something like that. For instance, a document consisting of “Papa!” can serve many different audiences, but that is solely because the scope of content is so constrained, and for that reason the bar is not met for a macrolanguage. But if it’s easy for a content provider to come up with content that serves both, then that’s interesting.

 

Another key question is why that content is functional for both audiences. Is it because it is expressed in a variety that can really be considered common, or is it because it’s actually in language A and 90% of speakers in language B are functionally bilingual in A? Does the common-identify label reflect actual linguistic commonality, or is it a logistic tool used in the repository to reflect merely a dual tasking?

 

 

Some thoughts. Discuss it with the 639-3 RA.

 

 

Peter

 

From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf Of Phillips, Addison
Sent: Wednesday, April 08, 2009 5:53 PM
To: Don Osborn; 'LTRU Working Group'; 'IETF Languages Discussion'
Subject: Re: [Ltru] How to handle macrolanguage when no code?

 

HTML certainly allows you to declare that some content is applicable to more than one language audience. See:

 

   http://www.w3.org/TR/i18n-html-tech-lang/#ri20040728.121358444

 

Otherwise, John Cowan’s advice seems appropriate… ISO 639-3 or ISO 639-5 would be your next stop. Note that macrolanguages are sometimes problematical, so you might also consider a collection code instead.

 

Addison Phillips

Globalization Architect -- Lab126

 

Internationalization is not a feature.

It is an architecture.

 

From: ltru-bounces <at> ietf.org [mailto:ltru-bounces <at> ietf.org] On Behalf Of Don Osborn
Sent: Wednesday, April 08, 2009 4:40 PM
To: 'LTRU Working Group'; 'IETF Languages Discussion'
Subject: [Ltru] How to handle macrolanguage when no code?

 

In looking at the BBC website's offerings in African languages, one notes that they have grouped Kinyarwanda and Kirundi together under http://www.bbc.co.uk/greatlakes/  . This makes sense from a linguistic point of view since as I understand it, the two languages are almost the same. When looking at the view (page) source, one notes that they use lang="rw" (for Kinyarwanda). It may be that the pages I checked are properly Kinyarwanda and an expert would know that they are not Kirundi (rn), but it is in any event true that there is no code element to cover both languages.

 

I'm curious if there is any other recommended way to handle such a situation where web content may be deliberately and easily designed to cover more than one language as defined by ISO 639 when there is not currently any macrolanguage code for them. Could one for example define a whole page as having two languages? E.g., something like lang="rw, rn"?

 

Thanks in advance for any feedback.

 

Don

 

 

_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages
CE Whitehead | 15 Apr 2009 18:23
Picon
Favicon

[Ltru] How to handle macrolanguage when no code?

 Don, just one more note:
 
the http header and
the meta tag
<meta http-equiv="Content-Language" content="rw, rn"/>

are for indicating the target audience language (& are fine if both languages are mutually comprehensible to the point that a speaker of one would be happy to read text in the other)
but not for indicating the text processing language!
The text processing language is indicated in the element tags itself and has to be one specific language, the actual language used (or I guess a macrolanguage that encompasses the actual language, but my understanding is the more specific the better)
 
Hope the info you got helped some.
best wishes,
 
C. E. Whitehead
cewcathar <at> hotmail.com
Don Osborn dzo at bisharat.net
Sun Apr 12 18:01:14 CEST 2009

> Thanks to all who replied on this question with suggestions, additional questions, and pointers. > I will try (to find the time) to get an answer from BBC on their approach and intentions, and also to get some feedback from someone familiar with Kinyarwanda and Kirundi. This sort of situation is one that I think is potential with a number of languages (per some past threads), and that in such cases, the idea of a clear-cut single language definition and/or audience for page content may not hold. More information on such situations is will certainly become available as more web content in diverse languages is created. > As for requesting macrolanguage codes, that is another level, but obviously one to keep in mind. I think it is viable in many circumstances, but in others it may be difficult to make the case. The ad hoc way that ISO 639 evolved, however, means that there are similar cases of related tongues that are sometimes given a common code (interpreted after the fact as macrolanguage) and sometimes not. I think that developments such as more web content in diverse languages and efforts such as the locales sub-project of ANLoc (African Network for Localisation) have the potential to highlight such issues. > Thanks again and all the best. > Don
_______________________________________________
Ietf-languages mailing list
Ietf-languages <at> alvestrand.no
http://www.alvestrand.no/mailman/listinfo/ietf-languages

Gmane