Stefan Dumont | 1 Sep 18:45 2014
Picon

Web service correspSearch (beta version) now online

Dear list-members,

With the new web service “correspSearch” (http://correspsearch.bbaw.de) 
you may search within the metadata of various scholarly letter editions 
with regard to senders, addressees, as well as places and time of 
origin. For this purpose, a website and an application programming 
interface (API) are provided.

The web service assembles and analyzes data which is based on the 
projected TEI module "correspDesc" (proposed by the TEI SIG 
Correspondence). With correspDesc it is possible to record metadata of 
scholarly letter editions in a homogeneous way. It involves the usage of 
standardized ids (GND, VIAF etc.) for unambiguous referencing of persons 
and locations. This approach facilitates the interchange of data among 
letter editions. The proposal of the module "correspDesc" is currently 
being examined and evaluated by the TEI Council.

The data basis of this web service are digital indexes of letters 
provided by the hosts of printed or digital scholarly letter editions. 
Any printed or digital scholarly edition may register their indexes of 
letters in correspDesc format at the correspSearch web service. (see 
http://correspsearch.bbaw.de/index.xql?id=participate)

This web service is developed and provided by the TELOTA initiative at 
the Berlin-Brandenburg Academy of Sciences and Humanities in cooperation 
with the TEI SIG Correspondence and other scholars.

Please note that this web service is as “beta version” currently still 
under development.

(Continue reading)

Martin Mueller | 1 Sep 15:05 2014

Early Registration discount for TEI 2014 conference extended to September 19

We extended the deadline for early registration to September 19 so that it will coincide with the end of the hotel discount at the Hilton Orrington.  Do take advantage of it and remember that the hotel's deadline is a very immoveable feast. 

Register for the conference by going to https://www.conftool.net/tei2014.

Hotel information is found at http://tei.northwestern.edu/local-info/, but  for convenience sake I copy the main points below:

The TEI and DHCS conferences will be held at theHilton  Orrington  Hotel. A preferential room rate of $179.00 will be available if you reserve a room by Sunday, September 19, 2014. If you make your reservation by telephone (1-800-HILTONS, 1-800-445-8667), identify yourself as an attendee of the TEI/DHCS Conference.

You can also reserve a room on the hotel’s website at www.orringtonevanston.hilton.com. If you do so, enter the cod TEIDHC in the box markedd “Group/Convention Code.

The TEI program will start on Wednesday morning, October 22, and will end with the Members Meeting on Friday morning, October 24. Workshops will be held on Saturday October 25.  The detailed schedule will be up by early next week. 

A list of the Chicago DHCS Colloquium papers is available at http://dhcs.northwestern.edu/papers/


Martin Mueller
Professor emeritus of English and Classics
Northwestern University
Charles Muller | 1 Sep 11:22 2014
Picon

Workshop on Buddhist Studies and Digital Humanities at Oxford, Sept 4-5

Subject: Workshop on Buddhist Studies and Digital Humanities at 
Oxford, Sept 4-5
Date: Mon, 01 Sep 2014 18:20:41 +0900
From: Jan Westerhoff jan.westerhoff <at> lmh.ox.ac.uk

Dear Colleagues,

We will be holding a two-day workshop on
Buddhist Studies and Digital Humanities at Lady Margaret Hall,
University of Oxford, on 4th and 5th September 2014. I include the
programme below. There is no charge for attendance, but please email
Jan Westerhoff at jan.westerhoff <at> lmh.ox.ac.uk if you are planning to
come.
We look forward to seeing many of you there.

Very best wishes

Jan Westerhoff

Workshop on Buddhist Studies and Digital Humanities

Lady Margaret Hall
University of Oxford
4-5th September 2014

Programme

Thursday, 4 Sept

11.30-12.30 David Gold (Bridgeton Research): Śastravid: A new
research tool for the study of Indian philosophical texts

14.00-15.00 Birgit Kellner (Heidelberg University): The SARIT
Project: Enriching Digital Text Collections in Indology

15.00-16.00 Andrew Ollett (Columbia University): Sarit-prasāraṇam:
Developing SARIT beyond “Search and Retrieval”.

16.30-17.30 Nathan Hill (SOAS): Using an annotated corpus to
facilitate the philological study of Tibetan texts

19.00 Dinner for speakers

Friday, 5 Sept

10.00-11.00 Jack Petranker/Ligeia Lugli (Mangalam Research Center
for Buddhist Languages): Thinking like a translator: the Buddhist
Translators Workbench

11.00-11.30 Tea

11.30-12.30 Charles Muller (Tokyo University): Strategies for
Project Development, Management, and Sustainability: The Example of
the DDB and CJKV-E Dictionaries.

13.00-14.00 Lunch for speakers

14.00-15.00 Paul Hackett (Columbia University/American Institute of
Buddhist Studies): Extending Buddhist Canonical Research: New Data
and New Approaches

15.00-16.00 Yigal Bronner (Hebrew University): A Prosopographical
Database for Sanskrit Works in the Early Modern Era (and Beyond):
The Appayya Dīkṣita Project, Phase 3

16.00-16.30 Tea

16.30-17.30 Kiyonori Nagasaki (International Institute for Digital
Humanities, Tokyo): Technical Possibilities of Digital Research
Environments for Buddhist Studies
19.00 Dinner for speakers

**************************
JC Westerhoff
Lady Margaret Hall
University of Oxford
Norham Gardens
Oxford OX2 6QA
United Kingdom

jan.westerhoff <at> lmh.ox.ac.uk <mailto:jan.westerhoff <at> lmh.ox.ac.uk>
www.janwesterhoff.net <http://www.janwesterhoff.net>

________________________________________
H-Buddhism-Mail mailing list
H-Buddhism-Mail <at> mailmanlist.net
http://mailmanlist.net/cgi-bin/mailman/listinfo/h-buddhism-mail
H-Buddhism Web Site: https://networks.h-net.org/h-buddhism
Twitter:  <at> H_Buddhism

Kathryn_Tomasek | 30 Aug 15:07 2014

Re: TEI-L Digest - 28 Aug 2014 to 29 Aug 2014 (#2014-188)

For Ben Brumfield:

The folks in Biblical Studies do a lot of work on comparing translations.  I don't know whether they include
both the original language and the translation in the same file, though.

Check in with  <at> tan_randall at biblicalhumanities.org ; I had an interesting conversation with him in June
re: how to detect translation in 19th c texts.  Sadly, sounds like a project for another lifetime.  :D

Kathryn

Sent from my iPad

> On Aug 30, 2014, at 12:00 AM, TEI-L automatic digest system <LISTSERV <at> LISTSERV.BROWN.EDU> wrote:
> 
> There are 6 messages totaling 724 lines in this issue.
> 
> Topics of the day:
> 
>  1. Solutions for dealing with a series of large 18th and 19th century book
>     catalogues (2)
>  2. Examples of edition + parallel translation? (4)
> 
> ----------------------------------------------------------------------
> 
> Date:    Fri, 29 Aug 2014 06:07:32 +0200
> From:    "Janusz S. Bien" <jsbien <at> MIMUW.EDU.PL>
> Subject: Re: Solutions for dealing with a series of large 18th and 19th century book catalogues
> 
> Quote/Cytat - Kevin Hawkins <kevin.s.hawkins <at> ULTRASLAVONIC.INFO> (Fri  
> 29 Aug 2014 03:00:58 AM CEST):
> 
>> Hello Siobhán,
>> 
>> If your OCR output for these documents isn't very accurate and you  
>> are already using good OCR software, I don't think there's much a  
>> grant can do to improve the accuracy.  OCR is a longstanding  
>> research area, and I don't think that your particular documents will  
>> lead to any particular advances in that area, or that a grant to a  
>> team that isn't already engaged in this research will lead to  
>> improvements in your project.
> 
> A really good sOCR oftware can be trained for recognizing specific documents.
> Cf e.g.
> 
> http://dl.psnc.pl/2012/07/20/raport-dotyczacy-porownania-silnikow-ocr-finereader-i-tesseract/lang-pref/en/
> http://heml.mta.ca/lace/
> 
> Best regards
> 
> Janusz
> 
> -- 
> Prof. dr hab. Janusz S. Bień -  Uniwersytet Warszawski (Katedra  
> Lingwistyki Formalnej)
> Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
> jsbien <at> uw.edu.pl, jsbien <at> mimuw.edu.pl, http://fleksem.klf.uw.edu.pl/~jsbien/
> 
> ------------------------------
> 
> Date:    Fri, 29 Aug 2014 07:29:11 -0400
> From:    Ben Brumfield <benwbrum <at> GMAIL.COM>
> Subject: Examples of edition + parallel translation?
> 
> I'm looking for a TEI example of a parallel text in which the original text was used to produce a TEI edition
as well as a modern translation of the text, with both the edition and the translation contained within the
same TEI file.  I'd like both texts to be linked, at least minimally, so that a page of translation can point
to the page of the original-language edition from which it was derived.
> 
> Background: Last October I drafted a TEI export feature for FromThePage, the wiki-like digital edition
tool I maintain.  In the last few days I've encountered a project which wants to use the tool to produce a
digital edition, but also needs a way to add an English translation as part of the edition.  That will
involve enhancing the software to add a parallel text feature, which I'm pretty excited about.  Of course
I'd like to capture that parallel translation in the programmatically-generated TEI-XML file
FromThePage exports, even if just as an archival format option.
> 
> Due to the nature of the tool and the limited scope of the enhancements we're considering, I won't be able to
support interlinear translation, but will be able to link translation and original at a page-by-page and
possibly paragraph-by-paragraph level.
> 
> Any examples of this sort of thing would be much appreciated.
> 
> Ben Brumfield
> http://manuscripttranscription.blogspot.com/
> 
> ------------------------------
> 
> Date:    Fri, 29 Aug 2014 14:33:18 +0200
> From:    Frederik Elwert <frederik.elwert <at> RUB.DE>
> Subject: Re: Examples of edition + parallel translation?
> 
> Dear Ben,
> 
> in our work-in-progress TEI representation of an edition of Ancient Egyptian texts we store the
translation (on sentence and word level) in separate divs and link using  <at> corresp:
> 
> <div type="text">
>  <ab>
>    <s xml:id="s2">
>      <lb n="1"/>
>      <w xml:id="w4" lemmaRef="tla:38530"><g ref="#ajin">ꜥ</g>n<g ref="#hArc">ḫ</g></w>
>    </s>
>  </ab>
> </div>
> <div type="translation" subtype="sentences" xml:lang="de">
>  <ab>
>    <s corresp="#s1">&lt;vor dem Gott&gt; ... König ewiglich und ewiglich;</s>
>  </ab>
> </div>
> <div type="translation" subtype="words" xml:lang="de">
>  <ab>
>    <w corresp="#w1">König</w>
>    <w corresp="#w2">ewig, ewiglich</w>
>  </ab>
> </div>
> 
> Suggestions for improvement are welcome.
> 
> Best,
> Frederik
> 
> 
> 
>> Am 29.08.2014 um 13:29 schrieb Ben Brumfield:
>> I'm looking for a TEI example of a parallel text in which the original text was used to produce a TEI edition
as well as a modern translation of the text, with both the edition and the translation contained within the
same TEI file.  I'd like both texts to be linked, at least minimally, so that a page of translation can point
to the page of the original-language edition from which it was derived.
>> 
>> Background: Last October I drafted a TEI export feature for FromThePage, the wiki-like digital edition
tool I maintain.  In the last few days I've encountered a project which wants to use the tool to produce a
digital edition, but also needs a way to add an English translation as part of the edition.  That will
involve enhancing the software to add a parallel text feature, which I'm pretty excited about.  Of course
I'd like to capture that parallel translation in the programmatically-generated TEI-XML file
FromThePage exports, even if just as an archival format option.
>> 
>> Due to the nature of the tool and the limited scope of the enhancements we're considering, I won't be able
to support interlinear translation, but will be able to link translation and original at a page-by-page
and possibly paragraph-by-paragraph level.
>> 
>> Any examples of this sort of thing would be much appreciated.
>> 
>> Ben Brumfield
>> http://manuscripttranscription.blogspot.com/
> 
> -- 
> Frederik Elwert M.A.
> 
> Wissenschaftlicher Mitarbeiter
> Projektkoordinator SeNeReKo
> Centrum für Religionswissenschaftliche Studien
> Ruhr-Universität Bochum
> 
> Universitätsstr. 150
> D-44780 Bochum
> 
> Raum FNO 01/180
> Tel. +49-(0)234 - 32 24794
> 
> ------------------------------
> 
> Date:    Fri, 29 Aug 2014 13:47:09 +0000
> From:    Laura Mandell <mandell <at> TAMU.EDU>
> Subject: Re: Solutions for dealing with a series of large 18th and 19th century book catalogues
> 
> Dear Kevin and List:
> 
> I just wanted to mention the Mellon funded “Early Modern OCR” project, or eMOP:
http://emop.tamu.edu  We have been working on OCR’ing EEBO documents and have developed blackletter
training sets for Tesseract. We have worked with some computer scientists and with SEASR to automate
correction of the OCR, and to find problems with page images. We now have a computer scientist who will be
helping us improve the inner workings of Tesseract so that we don’t have to depend upon the text all being
on a uniform line. We are working very hard to improve the OCR output of early modern texts and will have
progress to report sometime in the Fall.  Stay tuned at http://emop.tamu.edu  
> 
> Thanks!
> Best, Laura Mandell
> 
>> On Aug 28, 2014, at 8:00 PM, Kevin Hawkins <kevin.s.hawkins <at> ULTRASLAVONIC.INFO> wrote:
>> 
>> Hello Siobhán,
>> 
>> If your OCR output for these documents isn't very accurate and you are already using good OCR software, I
don't think there's much a grant can do to improve the accuracy.  OCR is a longstanding research area, and I
don't think that your particular documents will lead to any particular advances in that area, or that a
grant to a team that isn't already engaged in this research will lead to improvements in your project.
>> 
>> Instead, I would focus on setting up an efficient workflow for either correcting the OCR or, more likely,
working with a vendor that transcribes the information you need from the original documents.
>> 
>> It sounds like you are interested in getting certain data out of the documents in a structured way
without, say, reference to which page of which catalog a certain piece of data came from.  (The
alternative, which most of this list generally assumes as a mode of operation, is to create a faithful
digital representation of the catalogs as primary sources themselves, even the inconsistent layout
used in different catalogs.)  So you'll just need help structuring the database to capture whatever
information you'd like and then selecting a vendor that can fill it out according to your specifications,
which should include a required accuracy rate and (you'll need to do some spot-checking of the work) and
instructions on what to do to indicate if something is unclear.
>> 
>> People here on this list have experience writing specs for vendors that are creating faithful digital
representations (see, for example, the documents in the appendix of TEI Tite:
http://www.tei-c.org/release/doc/tei-p5-exemplars/html/tei_tite.doc.html#acknowledgments
), but you might be better off getting advice from someone who has had a vendor transcribe data into a
database.  The project that comes to mind is digitization of Irish censuses for the National Archives of
Ireland.  The vendor, in this case, was Library and Archives Canada.  See more at
http://www.census.nationalarchives.ie/about/index.html .
>> 
>> Hope that helps.
>> 
>> Kevin
>> 
>>> On 8/27/14 2:37 PM, Siobhan McElduff wrote:
>>> I was told that this list might have some suggestion for dealing with a
>>> digitization/data scraping issue I have. I am working with a large (some
>>> 1000 pages) set of book sales catalogues from an 18th and 19th century
>>> bookshop, the Temple of the Muses. (More information on the shop can be
>>> found here: http://www.templeofthemuses.org/. A copy of one of the
>>> catalogues can be found there; or if anyone is interested I can send you
>>> a portion. The layout is not entirely consistent across the catalogues,
>>> but it generally is in each particular catalogue and the information
>>> supplied is very simple in many ways: entry number, title, author, date
>>> of publication and price.
>>> 
>>> I would like to get the most amount of information possible off these
>>> catalogues, then clean the data up and insert it into a database (a
>>> sample database has been built and will be on line soon with limited
>>> functionality and a very limited hand-keyed dataset). The most important
>>> information to get off are the prices. Of course, with OCR I can get
>>> some data off the catalogues, but not enough and I am wondering what the
>>> solution is: improve the % of OCR and concentrate on that or find some
>>> other post-OCR solution or a combination of the two. I'm currently
>>> writing a grant for the project and this is where I am especially
>>> flailing. All suggestions would be welcome.
>>> 
>>> Siobhán
> 
> ------------------------------
> 
> Date:    Fri, 29 Aug 2014 14:04:02 +0000
> From:    "Kalvesmaki, Joel" <KalvesmakiJ <at> DOAKS.ORG>
> Subject: Re: Examples of edition + parallel translation?
> 
> Hi Ben,
> 
> I maintain a diagnostic list of alignment formats in a publicly available
> Google spreadsheet:
> https://docs.google.com/spreadsheets/d/14Ztd2wOdg7PZKAxeFRhSdi9MV0yFkFCHOcP
> eyhzA_Gc/edit?usp=sharing
> 
> The list includes both TEI and non-TEI approaches to the task of
> alignment, both segment-to-segment and token-to-token. Comments on the
> spreadsheet are welcome.
> 
> Best wishes,
> 
> jk
> ‹
> Joel Kalvesmaki
> Editor in Byzantine Studies
> Dumbarton Oaks
> 202 339 6435
> 
> 
>> On 8/29/14, 7:29 AM, "Ben Brumfield" <benwbrum <at> GMAIL.COM> wrote:
>> 
>> I'm looking for a TEI example of a parallel text in which the original
>> text was used to produce a TEI edition as well as a modern translation of
>> the text, with both the edition and the translation contained within the
>> same TEI file.  I'd like both texts to be linked, at least minimally, so
>> that a page of translation can point to the page of the original-language
>> edition from which it was derived.
>> 
>> Background: Last October I drafted a TEI export feature for FromThePage,
>> the wiki-like digital edition tool I maintain.  In the last few days I've
>> encountered a project which wants to use the tool to produce a digital
>> edition, but also needs a way to add an English translation as part of
>> the edition.  That will involve enhancing the software to add a parallel
>> text feature, which I'm pretty excited about.  Of course I'd like to
>> capture that parallel translation in the programmatically-generated
>> TEI-XML file FromThePage exports, even if just as an archival format
>> option.
>> 
>> Due to the nature of the tool and the limited scope of the enhancements
>> we're considering, I won't be able to support interlinear translation,
>> but will be able to link translation and original at a page-by-page and
>> possibly paragraph-by-paragraph level.
>> 
>> Any examples of this sort of thing would be much appreciated.
>> 
>> Ben Brumfield
>> http://manuscripttranscription.blogspot.com/
> 
> ------------------------------
> 
> Date:    Sat, 30 Aug 2014 06:16:37 +1000
> From:    Nick Thieberger <thien <at> UNIMELB.EDU.AU>
> Subject: Re: Examples of edition + parallel translation?
> 
> On a slightly different approach, I've developed a system for presentation
> of interlinear glossed text which is a kind of parallel text, here:
> http://www.eopas.org/transcripts/55. It is also linked to time-aligned
> media. It takes the output of field linguistic analysis and presents it
> online, aiming to allow cross-corpus comparison of primary recordings.
> 
> The underlying format can be XML as follows.
> 
> All the best,
> 
> Nick Thieberger
> 
> 
> <?xml version='1.0' encoding='utf-8' ?>
> <eopas xmlns:dc='http://purl.org/dc/elements/1.1/'>
> <header>
> <meta name='dc:type' value='text/xml' />
> <meta name='dc:source' value='toukelauMov' />
> <meta name='dc:creator' value='Nick Thieberger' />
> <meta name='dc:language' value='erk' />
> <meta name='dc:date' value='2000-03-31 00:00:00 UTC' />
> </header>
> <interlinear>
> <phrase endTime='5.485' id='o_102-001' startTime='2.795'>
> <transcription>Malen amurin na katur rowat, </transcription>
> <wordlist>
> <word>
> <text>Malen</text>
> <morphemelist>
> <morpheme>
> <text kind='morpheme'>malnen</text>
> <text kind='gloss'>as</text>
> </morpheme>
> </morphemelist>
> </word>
> <word>
> <text>amurin</text>
> <morphemelist>
> <morpheme>
> <text kind='morpheme'>a=</text>
> <text kind='gloss'>1S.RS=</text>
> </morpheme>
> <morpheme>
> <text kind='morpheme'>mur</text>
> <text kind='gloss'>want</text>
> </morpheme>
> <morpheme>
> <text kind='morpheme'>-i</text>
> <text kind='gloss'>-TS</text>
> </morpheme>
> <morpheme>
> <text kind='morpheme'>-n</text>
> <text kind='gloss'>-3S.O</text>
> </morpheme>
> </morphemelist>
> </word>
> <word>
> <text>na</text>
> <morphemelist>
> <morpheme>
> <text kind='morpheme'>na</text>
> <text kind='gloss'>COMP</text>
> </morpheme>
> </morphemelist>
> </word>
> <word>
> <text>katur</text>
> <morphemelist>
> <morpheme>
> <text kind='morpheme'>ka=</text>
> <text kind='gloss'>1S.IRS=</text>
> </morpheme>
> <morpheme>
> <text kind='morpheme'>tur</text>
> <text kind='gloss'>sew</text>
> </morpheme>
> </morphemelist>
> </word>
> <word>
> <text>rowat,</text>
> <morphemelist>
> <morpheme>
> <text kind='morpheme'>rowat</text>
> <text kind='gloss'>sago_palm</text>
> </morpheme>
> </morphemelist>
> </word>
> </wordlist>
> <translation>When I want to sew thatch,</translation>
> </phrase>
> <phrase endTime='12.347' id='o_102-002' startTime='5.485'>
> <transcription>go apo pan slat rowat, kafan slat rowat. </transcription>
> <wordlist>
> [....]
> 
> *************************
> Nick Thieberger MA, PhD
> ARC QEII Fellow
> School of Languages and Linguistics
> The University of Melbourne
> Parkville, VIC 3010, Australia
> *+61 3 8344 8952* <%2B61%203%208344%208952>
> *http://languages-linguistics.unimelb.edu.au/thieberger/*
> <http://languages-linguistics.unimelb.edu.au/thieberger/>
> *http://languages-linguistics.unimelb.edu.au/current-projects/great-things*
> <http://languages-linguistics.unimelb.edu.au/current-projects/great-things>
> 
> The Oxford Handbook of Linguistic Fieldwork
> Edited by Nicholas Thieberger
> *http://ukcatalogue.oup.com/product/9780199571888.do*
> <http://ukcatalogue.oup.com/product/9780199571888.do>
> 
> Director, Pacific and Regional Archive for Digital Sources in Endangered
> Cultures (PARADISEC) *http://paradisec.org.au* <http://paradisec.org.au/>
> Co-Director, Resource Network for Linguistic Diversity *http://www.rnld.org*
> <http://www.rnld.org/>
> Editor, Language Documentation & Conservation Journal
> *http://www.nflrc.hawaii.edu/ldc/* <http://www.nflrc.hawaii.edu/ldc/>
> Secretary, Australian Linguistic Society
> Secretary, Australasian Association for Digital Humanities
> CI in the  Centre of Excellence in the Dynamics of Language
> *http://orcid.org/0000-0001-8797-1018
> <http://orcid.org/0000-0001-8797-1018>*
> 
> 
> 
>> On 30 August 2014 00:04, Kalvesmaki, Joel <KalvesmakiJ <at> doaks.org> wrote:
>> 
>> Hi Ben,
>> 
>> I maintain a diagnostic list of alignment formats in a publicly available
>> Google spreadsheet:
>> https://docs.google.com/spreadsheets/d/14Ztd2wOdg7PZKAxeFRhSdi9MV0yFkFCHOcP
>> eyhzA_Gc/edit?usp=sharing
>> 
>> The list includes both TEI and non-TEI approaches to the task of
>> alignment, both segment-to-segment and token-to-token. Comments on the
>> spreadsheet are welcome.
>> 
>> Best wishes,
>> 
>> jk
>> ‹
>> Joel Kalvesmaki
>> Editor in Byzantine Studies
>> Dumbarton Oaks
>> 202 339 6435
>> 
>> 
>>>> On 8/29/14, 7:29 AM, "Ben Brumfield" <benwbrum <at> GMAIL.COM> wrote:
>>> 
>>> I'm looking for a TEI example of a parallel text in which the original
>>> text was used to produce a TEI edition as well as a modern translation of
>>> the text, with both the edition and the translation contained within the
>>> same TEI file.  I'd like both texts to be linked, at least minimally, so
>>> that a page of translation can point to the page of the original-language
>>> edition from which it was derived.
>>> 
>>> Background: Last October I drafted a TEI export feature for FromThePage,
>>> the wiki-like digital edition tool I maintain.  In the last few days I've
>>> encountered a project which wants to use the tool to produce a digital
>>> edition, but also needs a way to add an English translation as part of
>>> the edition.  That will involve enhancing the software to add a parallel
>>> text feature, which I'm pretty excited about.  Of course I'd like to
>>> capture that parallel translation in the programmatically-generated
>>> TEI-XML file FromThePage exports, even if just as an archival format
>>> option.
>>> 
>>> Due to the nature of the tool and the limited scope of the enhancements
>>> we're considering, I won't be able to support interlinear translation,
>>> but will be able to link translation and original at a page-by-page and
>>> possibly paragraph-by-paragraph level.
>>> 
>>> Any examples of this sort of thing would be much appreciated.
>>> 
>>> Ben Brumfield
>>> http://manuscripttranscription.blogspot.com/
> 
> ------------------------------
> 
> End of TEI-L Digest - 28 Aug 2014 to 29 Aug 2014 (#2014-188)
> ************************************************************

Ben Brumfield | 29 Aug 13:29 2014
Picon

Examples of edition + parallel translation?

I'm looking for a TEI example of a parallel text in which the original text was used to produce a TEI edition as
well as a modern translation of the text, with both the edition and the translation contained within the
same TEI file.  I'd like both texts to be linked, at least minimally, so that a page of translation can point
to the page of the original-language edition from which it was derived.

Background: Last October I drafted a TEI export feature for FromThePage, the wiki-like digital edition
tool I maintain.  In the last few days I've encountered a project which wants to use the tool to produce a
digital edition, but also needs a way to add an English translation as part of the edition.  That will
involve enhancing the software to add a parallel text feature, which I'm pretty excited about.  Of course
I'd like to capture that parallel translation in the programmatically-generated TEI-XML file
FromThePage exports, even if just as an archival format option.

Due to the nature of the tool and the limited scope of the enhancements we're considering, I won't be able to
support interlinear translation, but will be able to link translation and original at a page-by-page and
possibly paragraph-by-paragraph level.

Any examples of this sort of thing would be much appreciated.

Ben Brumfield
http://manuscripttranscription.blogspot.com/

Siobhan McElduff | 27 Aug 21:37 2014
Picon
Picon

Solutions for dealing with a series of large 18th and 19th century book catalogues

I was told that this list might have some suggestion for dealing with a digitization/data scraping issue I have. I am working with a large (some 1000 pages) set of book sales catalogues from an 18th and 19th century bookshop, the Temple of the Muses. (More information on the shop can be found here: http://www.templeofthemuses.org/. A copy of one of the catalogues can be found there; or if anyone is interested I can send you a portion. The layout is not entirely consistent across the catalogues, but it generally is in each particular catalogue and the information supplied is very simple in many ways: entry number, title, author, date of publication and price.

I would like to get the most amount of information possible off these catalogues, then clean the data up and insert it into a database (a sample database has been built and will be on line soon with limited functionality and a very limited hand-keyed dataset). The most important information to get off are the prices. Of course, with OCR I can get some data off the catalogues, but not enough and I am wondering what the solution is: improve the % of OCR and concentrate on that or find some other post-OCR solution or a combination of the two. I'm currently writing a grant for the project and this is where I am especially flailing. All suggestions would be welcome.

Siobhán

Martin Mueller | 27 Aug 18:30 2014

Is there an XML based collaborative curation environment?

I have sent this post simultaneously to the eXist, baseX, and TEI lists in
the hope that I will get some useful advice  on how to cobble together an
XML-based collaborative curation platform. If there is something "out
there" that meets some of my requirements I would love to hear about it.

Phil Burns, Craig Berry, and I  have been engaged in an informal project
that I call Shakespeare His Contemporaries (SHC). The source texts for SHC
are the TCP transcriptions, transformed into TEI P5 with Abbot and
linguistically with MorphAdorner. To goal is to produce "good enough"
texts (in original and standardized spellings) through collaborative
curation, enlisting the help of educated amateurs as well as professional
scholars. A group of Northwestern undergraduate on summer internships
fixed about 70% of some 45,000 known errors in almost 500 Early Modern
non-Shakespearean plays and demonstrated that you don't need to have a
Ph.D. to make valuable contributions to scholarship.

My curation framework has been a mixture of XML and SQL. MorphAdorner can
spit out its data as a TEI file or in a tabular
format.(http://morphadorner.northwestern.edu). The latter is easily turned
into a MySQL database.  AnnoLex, our curation tool
(http://annolex.at.northwestern.edu), was built by Craig Berry. It is a
Django web site that talks to an underlying MySQL database and lets
registered users make emendations that are kept in a separate table for
subsequent review and integration into the XML files. MorphAdorner
includes scripts for updating XML files in the light of approved
emendations. It also keeps track of changes made.

This system works well in environments where curators operate in a
controlled and work-like environment, cycling through textual phenomena
selected by some criteria likely to return incomplete or incorrect text.
It works well  because most of the errors in the TCP transcriptions are of
a very simple kind that lends itself to an 'atomic' word by word
treatment.The combination of Django and a relational database makes it
very easy to keep track of who did what where and when, and who approved
or rejected an emendation. These are non-trivial virtues, and there are
many TCP errors that can be fixed by this method before you run up against
its limits.

On the other hand, this method does not support what I call "curation en
passant," readers reading "with a pencil" or its digital equivalent,
stopping here or there to question a word or passages and offering a
diagnosis or emendation as a marginal gloss. I would like to have an
curation environment that looks like a page that supports reading of the
common garden variety but through a click on a word  shows a pop-up window
that lists the properties of a token and offers a template for structured
and free-form annotation.

The current model also fails when it comes to making changes in the XML
structure. In terms of the immediate needs of a project that focus on
EEBO-TCP drama texts, the most common errors involve just the correction
of  a tag name: turning <stage> into <l> or the other way round. There are
ways in which these errors could be corrected by extending AnnoLex, but it
would be clumsy, and it would not support more substantial changes in the
XML structure. So you want a curation environment that lets you manipulate
the XML structure as well as the words inside elements.  For instance,
half the plays in the EEBO-TCP collections are not divided into acts and
scenes because such a division is lacking in the printed source.  Adding
scene divisions (with an appropriate notation that they are supplied by an
editor) is an important step towards making the texts more "machine
actionable," and a proper digital corpus should be both human readable and
machine actionable.

The best curation environment would be XML based, but I wonder about
scale. As I understand it, 'scale' in an XML application is a function of
the number of documents, their size, and the complexity of encoding. The
SHC corpus has about 500 documents and could grow to 1,000 if one
interpreted "contemporaries" more generously or decided to include
Shakespeare's Restoration heirs. Most plays fall in a range between 15,000
and 25,000 words. In a linguistically annotated corpus the typical leaf
node is a <w> or <pc> element with a set of attributes, such as

<w xml:id="A04632_0500750" lemma ="Cleopatra", ana="#npg1">Cleopatraes</w>

The modal XPath of such a leaf node goes like

TEI/text/body/div <at> act/div <at> scene/sp/l/w

There are about 12 million Xpaths in the SHC corpus, and it is important
for readers to be able to move quickly from one to the other within and
across plays so as to support what I call "cascading curation," where the
diagnosis of an error in one play leads to the question whether there are
similar errors elsewhere.

So much for scale, and I don't know whether it is a big problem, a small
problem, or no longer a problem at all. If you want to extend a curation
model from 500 plays to the eventually 70,000 texts in the EEBO TCP corpus
there might be a scale problem. On the other hand, if you think of
collaborative curation in a humanities environment and you think of
scholarly data communities that organize themselves around some body of
texts, it may be that something like 1,000 texts or 50 millon words words
is a good enough upper limit: breaking down the big city of a large corpus
into "corpus villages" (perhaps with some overlap) may be technically and
socially preferable.

I don't have a very clear idea how a permissions regime would work in an
XML environment. It is clear to me that a change of any kind should take
the form of an annotation and that the integration of such an annotation
into the source text would be the work of an editor with special
privileges. There needs to be a clear trail of the 'who, what, when, and
where' of any textual change, and as much a possible of this detail should
be logged automatically.   The scholarly communities I'm familiar with are
unlikely to accept texts as curated unless there are familiar  seals of
good housekeeping that they recognize as the equivalent of the assurances
that come with a reputable printed text.

I could go on in considerable detail about the needs of our project (and
similar ones), but I hope I have outlined enough of the requirements in
sufficient detail.

Thanks in advance for any advice

Martin Mueller
Professor emeritus of English and Classics
Northwestern University

DHQ Call for Reviewers

With apologies for cross-posting:


Digital Humanities Quarterly (http://www.digitalhumanities.org/dhq/), an open-access, peer-reviewed, digital journal covering all aspects of digital media in the humanities is currently inviting scholars from all disciplines to join our team of regular peer reviewers. Our peer reviewers perform two important tasks: first, to ensure that the materials accepted for publication are of the highest quality, and second, to provide feedback that will guide the authors as they revise their articles. Peer reviewers thus act as the first audience for a submitted article and help the author to gauge whether the argument is clear, interesting, and well-crafted. 


Our goal is to cultivate a thriving community and improve the overall quality of writing, not to punish people or drive them away. If you are interested in reviewing for DHQ, please register through Open Journal Systems, and include your review interests when you register. Interested researchers and practitioners from all areas of DH are encouraged!


INSTRUCTIONS FOR REGISTERING AS A REVIEWER:


(1) Go to: http://openjournals.neu.edu/ojs/dhq/user/register


(2) Register your account by completing the required fields.


(3) Please be sure when you register that you compete check the "Reviewer" box at the end of the form and include your areas of interest for future reviews.


(4) When you click "Register" at the end of the form, you will be taken back to the home page.



For questions, please contact dhq <at> neu.edu


Elizabeth Hopwood

Managing Editor, DHQ

Northeastern University

el.hopwood <at> neu.edu

Kun JIN | 27 Aug 15:40 2014
Picon

Question about transcription of speech with using <shift/>

Hello Everyone,

i don't know why use two self-closed <shift/> as below in the example from 8.3.1 Utterances:

<u who="#a">Listen to this <shift new="reading"/>The government is
confident, he said, that the current economic problems will be
completely overcome by June<shift/> what nonsense</u>

instead of use an open and close <shift> ... </shift>

<u who="#a">Listen to this <shift new="reading">The government is
confident, he said, that the current economic problems will be
completely overcome by June</shift> what nonsense</u>

Maybe this is a question stupide, but i really want to know, could someone explain?

thanks in advance.

Best Regards

Kun
Laurent Romary | 27 Aug 12:12 2014
Picon
Picon

TEI used for linguistic annotation

Dear all,
I am currently writing a little note on the subject of the TEI and linguistic annotation and I cannot see a
good resource listing projects that actually use the TEI mechanisms (<w>, <span>, <fs>, etc.) for
linguistic purposes. Could some of you point to your best examples?
Thanks in advance,
Laurent

Magdalena Turska | 26 Aug 12:01 2014
Picon

Digital Scholarly Editions Infrastructure Survey

Technology, Standards, Software 
Digital Scholarly Editions Survey 

Dear all,

The requirements study for a publication architecture targeting multiple media is one of my research priorities for the DiXiT Network.

A survey that hopefully will help to assess the software and technologies used for creation and publishing of digital scholarly editions. The survey is available at https://www.surveymonkey.com/s/publishing_digital_editions.

We are inviting scholars, young researchers, teachers and students involved in any part of the process of creation and publishing of editions to participate in this study. If you have doubts whether your project is an edition or archive project, please complete the survey anyway. If you feel you lack the technical expertise to answer all the questions, do it anyway as best you can and consider asking someone else to complete the survey as well.

We are primarily interested with the tools and workflows associated with processes of creation and publishing of digital scholarly resources. Therefore we are especially interested in descriptions of the bespoke tools and pipelines employed in your project so we’d appreciate answering the open questions as fully as possible. The results of the study will be used to help create requirements and develop tools for digital scholarly editions in particular and digital humanities in general.

The content written by you in the survey is strictly anonymous and your participation is entirely voluntary.
It should take approximately 20 minutes to complete.

If you have any questions or require more information about this study, please contact me using the following details:

Magdalena Turska – University of Oxford
Researcher for Digital Scholarly Editions
magdalena.turska <at> it.ox.ac.uk

This study is funded by the European Commission through DiXiT Marie Curie Actions research programme.

Kind regards,

Magdalena Turska  
Researcher for Digital Scholarly Editions
University of Oxford




Gmane