McAulay, Elizabeth | 3 Jan 22:31 2011

language <at> ident for unknown language

Happy New Year TEI folks!

 

I am reviewing a TEI Header for a colleague and I’m looking for the best way to note in a TEI Header that an “unknown African language” is used in part in the document. I wanted to declare that information in

<langUsage>

   <language ident=””>unknown African language</language>

</language>

 

I’ve reviewed the P5 guidelines, but the emphasis seems to be on declaring known languages.

 

Thanks,

Lisa

 

--------------------------------------------

Elizabeth "Lisa" McAulay

Librarian for Digital Collection Development

UCLA Digital Library Program

http://digital.library.ucla.edu/

email: emcaulay <at> library.ucla.edu

 

Sebastian Rahtz | 4 Jan 00:15 2011
Picon
Picon

Re: language <at> ident for unknown language

ISO 639 had/has "und" for "undetermined" language, which I believe is still the right answer 
for this situation.
--
Sebastian Rahtz      
Information and Support Group Manager, Oxford University Computing Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431

Sólo le pido a Dios
que el futuro no me sea indiferente

Paul F. Schaffner | 4 Jan 00:18 2011
Picon

Re: language <at> ident for unknown language

Lisa,

I think we'd probably just use the three-letter ISO-639-2 code "und" 
('undetermined'), but that of course fails to capture the "African"
part of the description. The ISO-3166 region designations can
be used to further specify, but I do not think that they include
continents (or anything bigger than countries), so you might
have to make up a private-use qualification to "und-". I've not
done that, so I  am speaking in ignorance, as always.

pfs

On Mon, 3 Jan 2011, McAulay, Elizabeth wrote:

> Happy New Year TEI folks!
>
> I am reviewing a TEI Header for a colleague and I'm looking for the best way to note in a TEI Header that an
"unknown African language" is used in part in the document. I wanted to declare that information in
> <langUsage>
>   <language ident="">unknown African language</language>
> </language>
>
> I've reviewed the P5 guidelines, but the emphasis seems to be on declaring known languages.
>
> Thanks,
> Lisa
>
> --------------------------------------------
> Elizabeth "Lisa" McAulay
> Librarian for Digital Collection Development
> UCLA Digital Library Program
> http://digital.library.ucla.edu/
> email: emcaulay <at> library.ucla.edu
>
>

--------------------------------------------------------------------
Paul Schaffner | PFSchaffner <at> umich.edu | http://www.umich.edu/~pfs/
316-C Hatcher Library N, Univ. of Michigan, Ann Arbor MI 48109-1190
--------------------------------------------------------------------

McAulay, Elizabeth | 4 Jan 00:38 2011

Re: language <at> ident for unknown language

Excellent -- thanks! Foggy headed ... I was searching and searching for "unknown" ... 

-----Original Message-----
From: Sebastian Rahtz [mailto:sebastian.rahtz <at> oucs.ox.ac.uk] 
Sent: Monday, January 03, 2011 3:15 PM
To: McAulay, Elizabeth
Cc: TEI-L <at> listserv.brown.edu
Subject: Re: language <at> ident for unknown language

ISO 639 had/has "und" for "undetermined" language, which I believe is still the right answer 
for this situation.
--
Sebastian Rahtz      
Information and Support Group Manager, Oxford University Computing Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431

Sólo le pido a Dios
que el futuro no me sea indiferente

McAulay, Elizabeth | 4 Jan 00:40 2011

Re: language <at> ident for unknown language

I think I will just stick with "und". And I will let the free text declare the regional assertion.

-----Original Message-----
From: Paul F. Schaffner [mailto:pfs-listmail <at> umich.edu] 
Sent: Monday, January 03, 2011 3:19 PM
To: McAulay, Elizabeth
Cc: TEI-L <at> listserv.brown.edu
Subject: Re: language <at> ident for unknown language

Lisa,

I think we'd probably just use the three-letter ISO-639-2 code "und" 
('undetermined'), but that of course fails to capture the "African"
part of the description. The ISO-3166 region designations can
be used to further specify, but I do not think that they include
continents (or anything bigger than countries), so you might
have to make up a private-use qualification to "und-". I've not
done that, so I  am speaking in ignorance, as always.

pfs

On Mon, 3 Jan 2011, McAulay, Elizabeth wrote:

> Happy New Year TEI folks!
>
> I am reviewing a TEI Header for a colleague and I'm looking for the best way to note in a TEI Header that an
"unknown African language" is used in part in the document. I wanted to declare that information in
> <langUsage>
>   <language ident="">unknown African language</language>
> </language>
>
> I've reviewed the P5 guidelines, but the emphasis seems to be on declaring known languages.
>
> Thanks,
> Lisa
>
> --------------------------------------------
> Elizabeth "Lisa" McAulay
> Librarian for Digital Collection Development
> UCLA Digital Library Program
> http://digital.library.ucla.edu/
> email: emcaulay <at> library.ucla.edu
>
>

--------------------------------------------------------------------
Paul Schaffner | PFSchaffner <at> umich.edu | http://www.umich.edu/~pfs/
316-C Hatcher Library N, Univ. of Michigan, Ann Arbor MI 48109-1190
--------------------------------------------------------------------

Felix Sasaki | 4 Jan 00:45 2011
Picon

Re: language <at> ident for unknown language

Hello all,

if an BCP 47 language tags can be used here (like e.g. in xml:lang), you might want to choose the sequence of
"und" (undetermined, as Paul and Sebastian stated)
"002" , a UN Standard Area Code
See section 2.2.4 of http://www.rfc-editor.org/bcp/bcp47.txt for details about these codes. The BCP 47 langauge tag then would be und-002, see an analysis at
http://fabday.fh-potsdam.de/~sasaki/lta/language-tags/q?input=und-002

Felix

2011/1/4 Paul F. Schaffner <pfs-listmail <at> umich.edu>
Lisa,

I think we'd probably just use the three-letter ISO-639-2 code "und" ('undetermined'), but that of course fails to capture the "African"
part of the description. The ISO-3166 region designations can
be used to further specify, but I do not think that they include
continents (or anything bigger than countries), so you might
have to make up a private-use qualification to "und-". I've not
done that, so I  am speaking in ignorance, as always.

pfs



On Mon, 3 Jan 2011, McAulay, Elizabeth wrote:

Happy New Year TEI folks!

I am reviewing a TEI Header for a colleague and I'm looking for the best way to note in a TEI Header that an "unknown African language" is used in part in the document. I wanted to declare that information in
<langUsage>
 <language ident="">unknown African language</language>
</language>

I've reviewed the P5 guidelines, but the emphasis seems to be on declaring known languages.

Thanks,
Lisa

--------------------------------------------
Elizabeth "Lisa" McAulay
Librarian for Digital Collection Development
UCLA Digital Library Program
http://digital.library.ucla.edu/
email: emcaulay <at> library.ucla.edu



--------------------------------------------------------------------
Paul Schaffner | PFSchaffner <at> umich.edu | http://www.umich.edu/~pfs/
316-C Hatcher Library N, Univ. of Michigan, Ann Arbor MI 48109-1190
--------------------------------------------------------------------

Sebastian Rahtz | 4 Jan 23:34 2011
Picon
Picon

TEI Guidelines as ePub

http://tei.oucs.ox.ac.uk/Guidelines.epub is a revised ePub edition of the TEI P5 Guidelines. I have
finally removed all the errors
that caused it to fail epubcheck, and I'd be glad from anyone whether it works or not.  my iPad is not working at
the moment,
so I have not been able to test it under iBooks. Seems ok with Firefox epub reader.

This is now part of the P5 production process.
--
Sebastian Rahtz      
Information and Support Group Manager, Oxford University Computing Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431

Sólo le pido a Dios
que el futuro no me sea indiferente

Lou Burnard | 5 Jan 00:02 2011
Picon
Picon

Re: TEI Guidelines as ePub

looking good to me, except for some formatting problems in examples. see p. 199 for a few random cases

from the pad 

On 4 Jan 2011, at 22:37, "Sebastian Rahtz" <sebastian.rahtz <at> OUCS.OX.AC.UK> wrote:

> http://tei.oucs.ox.ac.uk/Guidelines.epub is a revised ePub edition of the TEI P5 Guidelines. I have
finally removed all the errors
> that caused it to fail epubcheck, and I'd be glad from anyone whether it works or not.  my iPad is not working
at the moment,
> so I have not been able to test it under iBooks. Seems ok with Firefox epub reader.
> 
> This is now part of the P5 production process.
> --
> Sebastian Rahtz      
> Information and Support Group Manager, Oxford University Computing Services
> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
> 
> Sólo le pido a Dios
> que el futuro no me sea indiferente

Paul F. Schaffner | 6 Jan 00:52 2011
Picon

number-separators

IN the interest (and at the risk) of 'exposing' our practice
with regard to trivial yet vexing issues of text-capture,
I reproduce the following internal message from this morning,
for the benefit of those who like to chew on such things early
in the new year.

Has anyone dealt with similar issues differently?

pfs

-----

I've been thinking a little about number-separators, and how most
practically and consistently to capture them with minimal offense
to Unicode, existing practice, and utility.

The immediate inspiration here was noticing a few books
that use the high comma (aka the closing curvy quote mark or apostrophe)
as a thousands separator, as they do nowadays in Switzerland and
perhaps elsewhere, e.g. 1'234,56, and one book that used the same
apostrophe as thousands separator, miles/furlongs separator, and
decimal separator indifferently. In the texts as they've been keyed
this has been captured sometimes as the straight apostrophe/single quote
('), and sometimes as the minutes/prime sign. And probably as other
things as well that I have failed to notice.

Unicode itself is relatively silent on the subject, though what it
does say is as usual slightly inconsistent.

About *decimal* separators, it says (I think) that when the full stop
is used as a decimal separator, it should be captured as full stop
(period), and that when a comma serves as decimal separator it should
be captured as a comma, but that when the mid-dot serves as a decimal
separator, it should be regarded as a glyph variant of full stop.
I.e., there is no character 'decimal separator': this is simply a
use of the ordinary punctuation marks (except for middot, which in
this case, but not others, is regarded as a glyph variant of full stop).

It says nothing about what to do when other characters are used
in this role, but I infer that they too should be regarded as special
uses of existing characters--assuming that the characters exist
in their own right.

About *thousands* separators, it says nothing.

About other *non-decimal* separators e.g. shillings/pence or
miles'furlongs or volume:page it says (I believe) nothing, except
that the ordinary virgule (SOLIDUS or slash) should be used
to capture the ordinary virgule-like shillings/pence separator ("3/6").

Since in our case there is no likelihood of these separators being
processed in any mathematical way (i.e., the strings are unlikely
ever to be parsed as number values), but there is some likelihood
of someone's wanting to search for odd uses of punctuation, I think
we should in general:

(1) reserve the prime sign for its intended purpose: minutes (of arc or
     of the hour) and feet (as a unit of measure), i.e. used it only when
     it marks one of those specific units of measure, not when it acts
     as a generic separator. And likewise for any other
     semantically-weighted characters that might serve in such a role

(2) adopt the Unicode recommendations with respect to decimal periods
     and commas--i.e. capture them as ordinary periods and commas

(3) adopt the Unicode recommendation with respect to the shilling/pence
     virgule--i.e. capture it as an ordinary virgule (/).

(4) ignore the Unicode recommendation with respect to the decimal
     middot--i.e. capture this as a middot (U+00B7), not as a
     period/full-stop.

(5) follow the spirit of Unicode usage with respect to other lightly
     loaded characters used as separators--i.e. capture tham as ordinary
     punctuation characters, preferring a more formally precise
     option if there are several, e.g. if the right-single-quote
     or left-single-quote is used as a decimal-, or furlongs-, or
     thousands-separator, capture it as &rsquo; (U+2018) etc., regardless
     of which function it is serving.

(6) if a novel glyph is used or or a heavily-weighted character is misused
     in the separator role, invent a new PUA character. We have already
     done this in the case of the  American (?) elongated s-shaped
     shilling separator (in order to avoid confusing it with
     ordinary tall-s, which we are apt to convert to round-s); of
     the "L-"shaped decimal separator; and (?perhaps simply a glyph variant
     of the "L-") of the decimal separator that resembles a mirror-image
     comma or  small subscripted letter "c".  Only in this instance (6)
     will we indicate the semantics of the character, i.e. as separator,
     and maybe not always then.

Some examples at http://www.lib.umich.edu/tcp/docs/dox/separators.html

--------------------------------------------------------------------
Paul Schaffner | PFSchaffner <at> umich.edu | http://www.umich.edu/~pfs/
316-C Hatcher Library N, Univ. of Michigan, Ann Arbor MI 48109-1190
--------------------------------------------------------------------

Piotr Bański | 6 Jan 02:32 2011
Picon

tei-c.org.uk

Hi All,

This domain was once registered for the Consortium, I'm sure. I still
have it bookmarked as the very old, initial, address for TEI Wiki:

http://www.tei-c.org.uk/wiki/index.php/Main_Page

Perhaps it's worth mentioning, then, that it's no longer TEI-related,
just so we're careful when introducing the TEI to new audiences, some of
whom may during our presentation stumble upon this other TEI-C.

Best,

  Piotr


Gmane