M. J. Driscoll | 1 Dec 2005 16:52
Picon
Favicon

Papyri

Lund University Library in Sweden has a small collection of Greek papyri which 
they would like to catalogue using XML (as they have done with their other 
manuscripts, see http://laurentius.lub.lu.se/). In particular they would be 
interested in finding out if anyone has developed an XML application based on 
or consistent with the metadata standards of the APIS project 
(http://www.columbia.edu/cu/libraries/inside/projects/apis/), who are 
themselves still database-based. They have approached me to ask if I know of 
anyone using XML for this purpose; I don't, but perhaps someone out there does.

Matthew

M. J. Driscoll
Arnamagnæan Institute
University of Copenhagen

Alan Morrison | 1 Dec 2005 17:35
Picon
Picon

Re: Papyri

M. J. Driscoll wrote:

>In particular they would be 
>interested in finding out if anyone has developed an XML application based on 
>or consistent with the metadata standards of the APIS project
>
There are a couple of projects based in Oxford which might be worth 
contacting
for more information in this area:

- The Oxyrhynchus Papyri. They are certainly using standards based
on or consistent with the APIS project for their online database, although
I'm not sure of their use of XML.

The main web site can be found at:
http://www.papyrology.ox.ac.uk/index.html

and information on imaging and markup can be found here:
http://www.papyrology.ox.ac.uk/imaging/imaging.html

- Vindolanda Tablets Online. Which provides an online XML transcription of
Roman tablets, and also incorporates metadata compatible with
other APIS databases.

http://vindolanda.csad.ox.ac.uk/index.shtml

Hope this helps,

Alan Morrison
Collections Manager
(Continue reading)

Julia Flanders | 1 Dec 2005 18:37
Favicon

brevigraphs: abbreviation or typography?

The WWP has always treated brevigraphs as a form of abbreviation and 
encoded them with <abbr>, with an expanded reading encoded on expan=. 
However, it has just occurred to me that there might be a rationale 
for treating them instead as a form of old-style typography, and 
using <orig> with reg= instead. I am curious whether any projects 
have taken this approach--or does it seem self-evident that they 
should be treated as a form of abbreviation?

I'm also prepared to discover that the general category of 
"brevigraphs" (or worse yet, "things the WWP thinks of as 
brevigraphs") actually contains things which should be treated 
differently.

I'm not sure whether there are any great consequences to this 
choice--just trying to sort it out in my head. Any thoughts welcome.

Best wishes and thanks, Julia

Julia Flanders
Women Writers Project
Brown University

Sylvain Loiseau | 2 Dec 2005 01:15
Picon
Favicon

Parsing the customization file

Hello,

I write an application using TEI vocabularies. I would like to be able to handle
any document written in TEI, and to allow the user, for instance, to tell the
program that in his document "TEI" is renammed as "foo", as the customization
mechanisms allow him, or to tell that the members of class x is extended whith
elements y and z.

A nice way to do it is perphaps to expect a TEI configuration file to be
provided by the user. Whith this file, if I understand correctly, it is able to
retreive the name used for the "TEI" element using this configuration file, the
final code looking for instance (in java) :

    String localName = customizationFile.getCustomizedNameOfElement("TEI");

with the customizatonFile being an instance of a class able to scan the
configuration file and retreive the definition.

Does something like this already exists ? Or do you thing it may be done ?

Best regards,
Sylvain

--

-- 
Sylvain Loiseau
sloiseau <at> u-paris10.fr
http://panini.u-paris10.fr/~sloiseau

« Le chef de la police du dictateur local Duvalier s'appelait Desyr. » G.
Deleuze, l'Anti-OEdipe
(Continue reading)

Rare Book School | 2 Dec 2005 01:11
Favicon

Rare Book School 2006

[Cross-posted. Please excuse any duplication.]

RARE BOOK SCHOOL (RBS) is pleased to announce its 2006 Session, a collection
of five-day, non-credit courses on topics concerning rare books,
manuscripts, the history of books and printing, and special collections to
be held at the University of Virginia from 6-10 March, 5-9 June, 12-16 June,
and 17-28 July 2006.

Among the courses offered in 2006 and 2007 will be Introduction to Special
Collections Librarianship, Rare Book Cataloging, Introduction to the History
of the Book in the West, Developing Collections of African-American
Literature, Electronic Texts and Images, Introduction to the History of
Bookbinding, and Implementing Encoded Archival Description (EAD).

Subscribers to this list may find the following Rare Book School courses to
be of particular interest:

(L-80) IMPLEMENTING ENCODED ARCHIVAL DESCRIPTION (MONDAY-FRIDAY, March
6-10). Encoded Archival Description (EAD) provides standardized
machine-readable access to primary resource materials. This course is aimed
at archivists, librarians, and museum personnel who would like an
introduction to EAD that includes an extensive supervised hands-on
component. Students will learn SGML encoding techniques in part using
examples selected from among their own institution(s finding aids. Topics:
the context out of which EAD emerged; introduction to the use of SGML
authoring tools and browsers; the conversion of existing finding aids to
EAD. Instructor: Daniel Pitti.

DANIEL PITTI became Project Director at the University of Virginia(s
Institute for Advanced Technology in 1997, before which he was Librarian for
(Continue reading)

Lou Burnard | 2 Dec 2005 01:50
Picon
Picon

Re: Parsing the customization file

This is what the ODD system is for.

Your system should process the ODD declarations which define the TEI 
customization in question. This will contain a great deal of information 
about the classes concerned, the canonical names, etc. for all the 
elements in question. Not to mention internationalization!

ODD is an XML format (indeed, a TEI one: see the section of the 
Guidelines confusingly called "tag documentation"), so you can write 
your java classes to do this in the same way as they currently handle 
TEI documents.

  Sylvain Loiseau wrote:

> Hello,
> 
> I write an application using TEI vocabularies. I would like to be able to handle
> any document written in TEI, and to allow the user, for instance, to tell the
> program that in his document "TEI" is renammed as "foo", as the customization
> mechanisms allow him, or to tell that the members of class x is extended whith
> elements y and z.
> 
> A nice way to do it is perphaps to expect a TEI configuration file to be
> provided by the user. Whith this file, if I understand correctly, it is able to
> retreive the name used for the "TEI" element using this configuration file, the
> final code looking for instance (in java) :
> 
>     String localName = customizationFile.getCustomizedNameOfElement("TEI");
> 
> with the customizatonFile being an instance of a class able to scan the
(Continue reading)

Sebastian Rahtz | 2 Dec 2005 10:05
Picon
Picon
Favicon

Re: Parsing the customization file

Lou Burnard wrote:

> This is what the ODD system is for.
>
> Your system should process the ODD declarations which define the TEI 
> customization in question. This will contain a great deal of 
> information about the classes concerned, the canonical names, etc. for 
> all the elements in question. Not to mention internationalization!
>
> ODD is an XML format (indeed, a TEI one: see the section of the 
> Guidelines confusingly called "tag documentation"), so you can write 
> your java classes to do this in the same way as they currently handle 
> TEI documents.
>
You might also want to consider the way Roma works on the web, which is 
to grab element or class information from
an eXist database people with the source of the TEI, using XQuery.   
This is even more fun than parsing the whole
source file yourself.

--

-- 
Sebastian Rahtz      
Information Manager, Oxford University Computing Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431

OSS Watch: JISC Open Source Advisory Service
http://www.oss-watch.ac.uk

Young, John T | 2 Dec 2005 12:37
Picon

Re: brevigraphs: abbreviation or typography?

My feeling has always strongly been that brevigraphs are distinct things from abbreviations, or at least a very distinct and specific form of abbreviation, which merits being treated differently from others.  In an abbreviation like 'Dr', the letters 'octo' are simply omitted, whereas in a brevigraph such as crossed p for 'per' there is something on the page to indicate the 'er', even though it isn't the letters 'er'.  The Newton Project's current policy is to tag them as (for instance) <expan abbr="crossed p">per</expan>, and our current stylesheet displays the expanded version.  But we're coming round to the idea of using entities instead and doing something like <abbr expan="per">[entity for crossed p]</abbr> so that in our 'normalised' view people would see the expanded version and in the 'diplomatic' they would actually see an image of the brevigraph in question.  For one thing, this seems more honest; for another the use of brevigraphs is of considerable interest to historians of handwriting and typography.
 
John
John Young
The Newton Project
Imperial College London

From: TEI (Text Encoding Initiative) public discussion list on behalf of Julia Flanders
Sent: Thu 01/12/2005 17:37
To: TEI-L <at> listserv.brown.edu
Subject: brevigraphs: abbreviation or typography?

The WWP has always treated brevigraphs as a form of abbreviation and
encoded them with <abbr>, with an expanded reading encoded on expan=.
However, it has just occurred to me that there might be a rationale
for treating them instead as a form of old-style typography, and
using <orig> with reg= instead. I am curious whether any projects
have taken this approach--or does it seem self-evident that they
should be treated as a form of abbreviation?

I'm also prepared to discover that the general category of
"brevigraphs" (or worse yet, "things the WWP thinks of as
brevigraphs") actually contains things which should be treated
differently.

I'm not sure whether there are any great consequences to this
choice--just trying to sort it out in my head. Any thoughts welcome.

Best wishes and thanks, Julia

Julia Flanders
Women Writers Project
Brown University

M. J. Driscoll | 2 Dec 2005 14:52
Picon
Favicon

Re: brevigraphs: abbreviation or typography?

Dear all,

For years I've been collecting materials (and boring people to death) on the 
subject of abbreviations and their expansions, and in particular how best to 
mark them up. So anyone with any sense should stop reading now.

The short answer to the question is yes, brevigraphs are abbreviations, one of 
four basic types, seen from the point of view of the means through which the 
abbreviation is achieved, the other three being: suspension, where only the 
first letter or letters of a word are written, generally followed (and 
frequently also preceded) by a point or with a superscript stroke; contraction, 
where the first and last letters are written, normally also with a superscript 
stroke, or, less commonly, a point or points; and superscript letters, where a 
superscript vowel will normally represent that vowel preceded by r or v, a 
superscript consonant that consonant preceded by a.

The most common sign of abbreviation is the superscript stroke or bar, which 
can indicate the suppression of one or more nasal consonants (m or n), and is 
also used as a more general mark of abbreviation in suspensions and 
contractions. Although in appearance there is no discernible difference between 
the two signs, in terms of their use they are quite distinct. From the point of 
view of their function, therefore, abbreviation signs may be said to fall into 
two categories, those which indicate that something has been omitted, without 
suggesting what that something may be, and those which always refer to a 
particular combination of graphemes, regardless of the lexical item in which 
they occur. There is obviously a correlation between the two systems: 
suspensions and contractions by necessity make use of a general mark of 
abbreviation, while superscript letters and tittles (some of which actually 
derive from letters) have a specific graphemic reference. The brevigraphs are 
of both types, since some, such as the inverted c representing con, have a 
specific graphemic reference, while others, for example the nomina sacra, have 
more in common with suspensions and contractions.

It is customary in scholarly editions to expand abbreviations, that is to 
supply the letters which have been omitted; the letters so supplied are often 
typographical distinct from the others in order to indicate exactly what is in 
the original and what has been supplied.  The TEI Guidelines provide a means of 
encoding abbreviations and their expansions through the use of the <abbr> and 
<expan> tags. It is up to the encoder to decide whether the abbreviation or its 
expansion is to be the base form; in P4 the other form could be given as an 
attribute value, while in P5 both can be provided, wrapped within <choice> 
tags.

The problem is that the two elements, <abbr> and <expan>, do not really mirror 
each other, at least not completely (there are historical reasons for this 
which I won't go into here). Underlying this problem is the fundamental 
question of what, exactly, "the abbreviation" is. Is it the mark, sign or 
letter that indicates that something has been suppressed, or is it the entire 
word? And similarly, is "the expansion" only the letters which have been 
suppressed and have therefore been supplied by the transcriber, or is it, 
again, the whole word? Given the distinction between abbreviations with a 
lexical reference (suspensions, contractions and some of the brevigraphs) and 
those with a graphemic reference (superscript letters and signs and the 
remainder of the brevigraphs), the most reasonable answer would appear to be 
that it is both. It strikes one as counter-intuitive to treat abbreviations 
with a lexical reference on anything other than the whole-word level. In 
English and other languages "p." is a common abbreviation for the word "page" 
(or the equivalent), but the dot can in no way be said to "stand for" the 
letters "age": rather it is the first letter, followed by a point, which 
represents the whole word. Moreover, this initial letter can be doubled to 
indicate the plural, e.g. "pp.", for "pages". It seems nothing short of 
perverse to maintain that in such a case, the first p somehow "really is 
there", whereas the second and attendant dot are not, but are rather an 
abbreviation representing the suppressed "ages". At the same time, however, the 
superscript 9-shaped sign does only stand for "us", regardless of the lexical 
item in which it occurs, and it would be somewhat forced to treat it any other 
way, especially in cases where there is more than one abbreviation within a 
single word; treating such abbreviations on a whole-word basis would blur the 
connection between the abbreviation sign and its expansion. One obvious 
solution would be to distinguish between the two types of abbreviations (and 
expansions), using a type attribute on <abbr> (and <expan>), with the values, 
say, "lex" or "graph". Another would be to tag abbreviations and expansions 
solely on a whole-word basis and use some other means, the <supplied> tag for 
example, to indicate which letters have been supplied. A third possibility 
would be to rethink the entire system, which is what I have been trying to do 
for the last several years, thus far without any great success.

Should anyone still be reading, and be interested in reading more, I've written 
an article on abbreviation practices in some modern European languages, 
available at: http://www.staff.hum.ku.dk/mjd/thoughts.html

all the best,
Matthew

M. J. Driscoll
Arnamagnaean Institute
Copenhagen University

David Sewell | 2 Dec 2005 15:50
Favicon

Re: brevigraphs: abbreviation or typography?

On Fri, 2 Dec 2005, Young, John T wrote:

> My feeling has always strongly been that brevigraphs are distinct
> things from abbreviations, or at least a very distinct and specific
> form of abbreviation, which merits being treated differently from
> others.  In an abbreviation like 'Dr', the letters 'octo' are simply
> omitted, whereas in a brevigraph such as crossed p for 'per' there is
> something on the page to indicate the 'er', even though it isn't the
> letters 'er'.  The Newton Project's current policy is to tag them as
> (for instance) <expan abbr="crossed p">per</expan>, and our current
> stylesheet displays the expanded version.  But we're coming round to
> the idea of using entities instead and doing something like <abbr
> expan="per">[entity for crossed p]</abbr> (...)

For what it's worth, earlier this year the Unicode Consortium officially
added an entry for the "per sign", U+214C; see

  http://www.fileformat.info/info/unicode/char/214c/index.htm

For some reason they don't replicate the glyph, but a full discussion of
the character, with illustration, is in this document:

  http://ra.dkuug.dk/JTC1/SC2/WG2/docs/n2590.pdf

Of course the symbol isn't in the current Mac or Windows standard
font sets, but you can render it with some alternative either with TEI
choice-like tagging or run-time substitution.

(The per sign is used widely in the print edition of the Papers of
George Washington, so its adoption as a Unicode symbol was timely since
we've just started converting the volumes to TEI-XML.)

DS

--

-- 
David Sewell, Editorial and Technical Manager
Electronic Imprint, The University of Virginia Press
PO Box 400318, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: dsewell <at> virginia.edu   Tel: +1 434 924 9973
Web: http://www.ei.virginia.edu/