Christian Schwaderer | 22 Jul 09:02 2014
Picon

Which TEI element for text paraphrasis/re-use?

Hi all,

I'm planning to edit a text whose author re-wrote older texts in his  
own words (we know that process as 'réécriture'). I want to tag this  
paraphrasis/re-use in the XML document and I'm wondering whether I  
should use my own tags (whithin an own namespace) or a TEI element for  
this purpose.

For each source usage whithin each paragraph I need three bits of  
information to be encoded:
- which source text was used? (Being an ID)
- how much of the source text was used? (Being an integer between 0 and 100)
- how much of the source text was omitted? (Being an integer between 0  
and 100)

So, the first solution (my own tags) would look like this:

<tei:div>

    <tei:p>(A singe text paragraph of about two sentences)</tei:p>

  <own_namespace_prefix:dependenceGrp>
	<own_namespace_prefix:dependence source="text1" range="70" omitted="30"/>
    	<own_namespace_prefix:dependence source="text2" range="30"/>
  </own_namespace_prefix:dependenceGrp>

  </tei:div>

But if there is a TEI element capapble of these needs I would prefer that.
However, I don't know which TEI element to use. Can you recomment any?
(Continue reading)

Andreas Trianta | 19 Jul 18:08 2014
Picon

Guidelines in TEI format -- is there a single file I could download?

i'm sure it is right in front of my eyes, but they fail me

thanks

Susan Schreibman | 18 Jul 11:05 2014
Picon

Announcing the publication of 'Digital Critical Editions

The TEI community will no doubt be interested in this newly published volume in which the TEI figures substantially.

Topics in the Digital Humanities, University of Illinois Press, is delighted to announce the publication of 'Digital Critical Editions'.

This volume, edited by Daniel Apollon, Claire Bélisle, and Philippe Régnier, consists of nine chapters which explores the interweaving of traditional and digital textual scholarship.

Digital Critical Editions examines how transitioning from print to a digital milieu deeply affects how scholars deal with the work of editing critical texts. It explores how changing technology and evolving reader expectations leads to the development of specific editorial products, while threatening traditional forms of knowledge and methods of textual scholarship.

Digital Critical Editions provides digital editors, researchers, readers, and technological actors with insights for addressing disruptions that arise from the clash of traditional and digital cultures, while also offering a practical roadmap for processing traditional texts and collections with today's state-of-the-art editing and research techniques thus addressing readers' new emerging reading habits.

For information on how to order the book, see http://www.press.uillinois.edu/books/catalog/92mby4hz9780252038402.html

-- -- Susan Schreibman Professor of Digital Humanities Director of An Foras Feasa Iontas Building National University of Ireland Maynooth Maynooth, Co. Kildare email: susan.schreibman <at> nuim.ie phone: +353 1 708 3451 fax: +353 1 708 4797
Charles Muller | 18 Jul 04:55 2014
Picon

Fwd: [tlug] High-quality pan-CJK typeface from Google+Adobe

The following may be of interest to some TEI-ers. -- Chuck

-------- Original Message --------
Subject: [tlug] High-quality pan-CJK typeface from Google+Adobe
Date: Wed, 16 Jul 2014 08:50:13 +0900
From: Travis Cardwell <travis.cardwell <at> extellisys.com>
Reply-To: Tokyo Linux Users Group <tlug <at> tlug.jp>
To: Tokyo Linux Users Group <tlug <at> tlug.jp>

I am quite excited about a new font!  It is released under the Apache 2.0
license, has unified coverage of Japanese, Korean, Simplified, and
Traditional characters, and it has seven weights!

You can read more about it here:
*
http://googledevelopers.blogspot.jp/2014/07/noto-cjk-font-that-is-complete.html
* http://blog.typekit.com/2014/07/15/introducing-source-han-sans/

You can download the full font here:
* http://www.google.com/get/noto/

Cheers,

Travis

--

-- 
To unsubscribe from this mailing list,
please see the instructions at http://lists.tlug.jp/list.html

The TLUG mailing list is hosted by ASAHI Net, provider of mobile and
fixed broadband Internet services to individuals and corporations.
Visit ASAHI Net's English-language Web page: http://asahi-net.jp/en/

Piotr Bański | 16 Jul 01:20 2014
Picon

Re: Using <fsdLink> to use external document(s) with <fsdDecl>

Dear Jack,

On 15/07/14 23:50, Jack Bowers wrote:
> Piotr,
> 
> Thanks for your help!
> 
> As far as the FSD/ISO vs FSR/ISO I do prefer FSD as that structure seems
> to make a bit more sense to me.

FSD is the schema language for FSR (feature structures), and as such it
implies the use of FSR. You can use the latter without the former, but
not the other way round, I would say. You might want to include some of
your TEI here, to illustrate more precisely what you are after.

> I would be interested in seeing an example of how others have done this
> before, I haven't found anything in my previous attempts though.

The example I can provide has been deemed non-kosher, but you can at
least see what can be done with FSRs when you compare the following:

1.
https://sourceforge.net/p/freedict/code/HEAD/tree/trunk/eng-pol/eng-pol.tei

(the trunk of an English-Polish dictionary; it xIncludes the particular
letter documents)

2.
https://sourceforge.net/p/freedict/code/HEAD/tree/trunk/shared/FreeDict_ontology.xml

the interface from the grammatical descriptions of Freedict dictionaries
to various existing ontologies (since then, GOLD has been anchored in
ISOcat, so the GOLD references are spurious)

3.
https://sourceforge.net/p/freedict/code/HEAD/tree/trunk/shared/freedict-P5.xml

Freedict ODD (potentially outdated) with some Schematron that links bits
of dictionary XML to the ontology interface.

I'm sorry about there being no external documentation for that.
Sourceforge has wiped the old Freedict wiki with years of edits inside
it (I think it got backed up and archived, but still: it's not out there).

> I checked out the FSD validator on GitHub and even though it says it was
> last updated fairly recently (Oct, 2013), it is in the TEI P3 standards
> and thus I assume it would be necessary to make a bunch of changes
> /(that I'm currently not capable of doing)./

Last year, it was made available by its author (Gary Simons) under an
open-source license, at my request -- I have uploaded it to github and
announced its availability during the last TEI Meeting. I had plans for
more but our intern decided to thwart them :-)

> This all makes me wonder why it is that TEI includes such features as
> <fsdDecl> and <fsdLink> which seem to be a nice portable method for
> linguistic annotation if there is not a sufficient mechanism to actually
> make it work? (/I should state that this isn't meant to insult anyone/)..

The TEI is not an application. Many applications exist that make use of
the descriptions that the TEI makes available. In this case,
unfortunately, such tools are rather scarce and simple, and one usually
ends up weaving their own.

> What about using */xInclude/* to specify/include just the <fsdDecl>
> portion of the features in the separate doc(s)?

Once again, we don't know what you mean by this, especially when you say
you use FSD but not FSR. XInclude is used all over the place within the
TEI itself and within system that are based on the TEI, and it might be
handy for you as well.

Hope this helps, good luck with your project,

  Piotr

> 
> On Tue, Jul 15, 2014 at 12:39 PM, Piotr Bański <bansp <at> o2.pl
> <mailto:bansp <at> o2.pl>> wrote:
> 
>     Dear Jack,
> 
>     You're doing nothing wrong, and the system isn't malfunctioning -- it's
>     just that "the TEI" understood as some derived customized schema (as
>     opposed to an abstract model) doesn't support feature structures out of
>     the box, unless I have missed some recent dramatic development.
> 
>     When I wanted feature structure (TEI/ISO FSR) support to work in a
>     dictionary system, I had to write my own Schematron assertions. Which
>     wasn't all that tedious, I can look for that on the Web, if you need an
>     example. Now, if you want FsdDecl to work (TEI/ISO FSD, as opposed to
>     TEI/ISO FSR), it's a bit worse -- you need an external validator, and
>     except for one attempt that is up for grabs at github[1], I know of no
>     such tool (though I'd be happy to learn about them, preferably from the
>     wiki [2] ;-) ).
> 
>     [1]: https://github.com/bansp/FSD_validator
>     [2]: http://wiki.tei-c.org/index.php/Category:Tools
> 
>     Regards,
> 
>       Piotr
> 
>     On 15/07/14 21:05, Jack Bowers wrote:
>     > Hi,
>     >
>     > I'm trying to use <fsdLink> in a TEI dictionary document to connect to
>     > my project's feature structures which are in a separate (valid)
>     document
>     > so I can use the feature tags defined therein.
>     >
>     > I have read the TEI guidelines discussing feature structures numerous
>     > times and I still can't get it to work.
>     >
>     > I am including it in the following environment as per the guidelines
>     > /(assume this is in the context of the valid TEI header, etc.)/;
>     >
>     > ...
>     > <encodingDesc>
>     >    <fsdDecl>
>     >
>     >
>      <fsdLinktype="LexicalFeatures"target="feature-structures/MIX-Lexical_Feature_Inventory-TEI-fsdDecl.xml"/>
>     >     </fsdDecl>
>     >   </encodingDesc>
>     > ...
>     >
>     > I am using oXygen, and the path to the external document is
>     relative to
>     > the current /(eg: the TEI dictionary document)/.
>     >
>     > Does something need to be declared somewhere above the header or
>     > separately in a schema to make this work?
>     >
>     > Thanks very much in advance for your help,
>     > Jack Bowers
>     >
>     >
>     > /Sorry if I am missing something basic here, I'm teaching myself
>     at the
>     > same time as managing a fairly big project on my own so my
>     understanding
>     > of XML as well as TEI have some gaps ;-)/
> 
> 

Jack Bowers | 15 Jul 21:05 2014
Picon

Using <fsdLink> to use external document(s) with <fsdDecl>

Hi,

I'm trying to use <fsdLink> in a TEI dictionary document to connect to my project's feature structures which are in a separate (valid) document so I can use the feature tags defined therein.

I have read the TEI guidelines discussing feature structures numerous times and I still can't get it to work.

I am including it in the following environment as per the guidelines (assume this is in the context of the valid TEI header, etc.);

... 
<encodingDesc>
   <fsdDecl>
     <fsdLink type="LexicalFeatures" target="feature-structures/MIX-Lexical_Feature_Inventory-TEI-fsdDecl.xml"/> 
    </fsdDecl>
  </encodingDesc>
...

I am using oXygen, and the path to the external document is relative to the current (eg: the TEI dictionary document).

Does something need to be declared somewhere above the header or separately in a schema to make this work?

Thanks very much in advance for your help,
Jack Bowers


Sorry if I am missing something basic here, I'm teaching myself at the same time as managing a fairly big project on my own so my understanding of XML as well as TEI have some gaps ;-)
Fabio Ciotti | 15 Jul 20:29 2014
Picon

http://aiucd2014-unibo.it

********************
Third AIUCD Annual Conference - Humanities and Their Methods in the
Digital Ecosystem (AIUCD 2014)
Scuola di Lettere e Beni Culturali - University of Bologna, via
Zamboni 38, Bologna - Italy
September 18-19, 2014
********************
The Third AIUCD (Associazione per l’Informatica Umanistica e la
Cultura Digitale) Annual Conference, is devoted to discussing the role
of Digital Humanities in the current research practices of the
traditional humanities disciplines. The program of the conference is
now available at http://aiucd2014.unibo.it.
Hope to meet you in Bologna.

Best
AIUCD Committee

Leigh Bonds | 15 Jul 17:04 2014

CFP Extended - Freedman Center for Digital Scholarship Colloquium

Freedman Center for

Digital Scholarship Colloquium

Pedagogy & Practices

November 6-7, 2014

The Freedman Center for Digital Scholarship at Case Western Reserve University’s Kelvin Smith Library welcomes proposals for panels, papers, and presentations that address pedagogical approaches for using digital tools in humanities, science, and social science classrooms.  Submission topics may include (but are not limited to) instructional methodologies and strategies for:

  • introducing undergraduate and graduate students to digital tools and methodologies for research (visualization, data mining, scholarly editing, TEI encoding, mapping, analyzing text, managing data, curating data, building digital exhibits/collections)

  • incorporating digital projects into existing course syllabi

  • advising digital dissertations, theses, or capstone projects

  • training students to work on extracurricular projects

  • collaborating with libraries and/or digital scholarship centers

  • training faculty in digital research, project management, and data curation

Please submit 250-word abstracts and technology requirements to Amanda Koziura (amanda.koziura <at> case.eduby *July 31, 2014*.  Accepted panels, papers, and presentations will be notified by August 15, 2014. All presenters will be responsible for their own registration and travel costs.

URL: http://library.case.edu/fccoll


Amanda Koziura
Digital Learning & Scholarship Librarian
Kelvin Smith Library
Case Western Reserve University
amanda.koziura <at> case.edu
216-368-3654




Kathryn_Tomasek | 14 Jul 19:21 2014

Re: TEI-L Digest - 10 Jul 2014 to 13 Jul 2014 (#2014-151)

Thank you all for this discussion.  Much of it supports my summer learning project, which has been an
introduction to XQUERY/eXist courtesy of ODH and the libraries at Vanderbilt U.  Their experience comes
out of work on http://syriaca.org .  Cliff Anderson and David Michelson are the man contacts there, and
Wynona Salesky is their programmer.

Best, 

Kathryn Tomasek

Sent from my iPad

> On Jul 14, 2014, at 12:00 AM, TEI-L automatic digest system <LISTSERV <at> LISTSERV.BROWN.EDU> wrote:
> 
> There are 17 messages totaling 1526 lines in this issue.
> 
> Topics of the day:
> 
>  1. Advice? Experience? TEI and SQL (16)
>  2. Mapping the SQL: Was: Re: Advice? Experience? TEI and SQL
> 
> ----------------------------------------------------------------------
> 
> Date:    Sun, 13 Jul 2014 10:29:09 +0300
> From:    Hayim Lapin <hlapin <at> UMD.EDU>
> Subject: Advice? Experience? TEI and SQL
> 
> Dear all,
> 
> My TEI/xml-based project (www.digitalmishnah.umd.edu) is increasingly 
> working in collaboration with a European project that works in mySQL. 
> Both projects involve transcriptions of manuscripts, automated alignment 
> of variants followed by hand correction, and potentially additional data 
> (morphological analysis, translation, names/places, etc.) (Currently, on 
> the TEI side addressing by  <at> xml:id takes place at the <ab> level. In the 
> next revision of my TEI schema, texts will be encoded with each word in 
> a <w> element carrying an xml:id. On the mySQL side, too, words are the 
> basic unit of data)
> 
> Output requirements involve, well formatted text, as well as statistical 
> calculations based on textual or orthographic variation.
> 
> At this point, I have no experience at all with mySQL. I wonder if 
> people can point me to projects that yoke mySQL and TEI. I'd be 
> interested in learning--on or off list--how the project is structured, 
> and costs and benefits to such an arrangement.
> 
> Many thanks,
> 
> HL
> 
> -- 
> Hayim Lapin
> Robert H. Smith Professor of Jewish Studies
> Professor of History
> University of Maryland
> College Park, MD 20742
> 301 405 4296
> www.digitalmishnah.org | dev.digitalmishnah.org
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 20:05:02 +1200
> From:    "Stuart A. Yeates" <syeates <at> GMAIL.COM>
> Subject: Re: Advice? Experience? TEI and SQL
> 
> The classic computer science answer to this question is that SQL lacks
> recursion / nesting of data, whereas document-centric XML thrives on
> it.
> 
> Nested <div/>s are obvious things that are going to be very
> challenging to map. I would avoid, if at all possible, mapping the
> structure of your docs into the SQL world, except as a linear series
> of xml:ids in a stream of unicode characters. Also make sure you
> configure you databases to be using unicode when you first configure
> them.
> 
> cheers
> stuart
> 
>> On Sun, Jul 13, 2014 at 7:29 PM, Hayim Lapin <hlapin <at> umd.edu> wrote:
>> Dear all,
>> 
>> My TEI/xml-based project (www.digitalmishnah.umd.edu) is increasingly
>> working in collaboration with a European project that works in mySQL. Both
>> projects involve transcriptions of manuscripts, automated alignment of
>> variants followed by hand correction, and potentially additional data
>> (morphological analysis, translation, names/places, etc.) (Currently, on the
>> TEI side addressing by  <at> xml:id takes place at the <ab> level. In the next
>> revision of my TEI schema, texts will be encoded with each word in a <w>
>> element carrying an xml:id. On the mySQL side, too, words are the basic unit
>> of data)
>> 
>> Output requirements involve, well formatted text, as well as statistical
>> calculations based on textual or orthographic variation.
>> 
>> At this point, I have no experience at all with mySQL. I wonder if people
>> can point me to projects that yoke mySQL and TEI. I'd be interested in
>> learning--on or off list--how the project is structured, and costs and
>> benefits to such an arrangement.
>> 
>> Many thanks,
>> 
>> HL
>> 
>> --
>> Hayim Lapin
>> Robert H. Smith Professor of Jewish Studies
>> Professor of History
>> University of Maryland
>> College Park, MD 20742
>> 301 405 4296
>> www.digitalmishnah.org | dev.digitalmishnah.org
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 11:11:53 +0200
> From:    Doug Reside <dougreside <at> GMAIL.COM>
> Subject: Re: Advice? Experience? TEI and SQL
> 
> Hi Hayim,
> 
> 1) Take the unicode point Stuart makes very seriously.  I feel like
> encoding problems are almost always the biggest difficulty in
> XML->relational database mappings.
> 
> 2) For both technical and ideological reasons, DH projects often don't
> yoke XML+SQL as far as I know.  You'll often hear SOLR referenced for
> XML searching, which is really great for XML documents that tend to be
> a bit more regularly structured than big TEI texts. (eg. But might
> work ok in your case where the text is highly structured already in
> chapter/verse/etc.).  I've also heard good things about eXist which
> you might want to look at if you haven't already.
> 
> 3) All that said, since you'll have a basic unit (a "w" tag with an
> id), it should be relatively simple to map all of the SQL records to
> your documents (less so to stick your XML into the SQL).  I wouldn't
> necessarily try to import the XML in to SQL unless you have a use case
> that absolutely requires it.  For mostly static data sets (added to
> and edited only by project administrators), I've found
> document-centric models (XML, JSON, etc.) sometimes cause less
> headaches and are easier to port around.  So, the SQL points to the
> XML but the XML doesn't necessarily point back to the SQL.
> 
> Happy to discuss more if helpful.
> 
> Doug
> 
> 
>> On Sun, Jul 13, 2014 at 10:05 AM, Stuart A. Yeates <syeates <at> gmail.com> wrote:
>> The classic computer science answer to this question is that SQL lacks
>> recursion / nesting of data, whereas document-centric XML thrives on
>> it.
>> 
>> Nested <div/>s are obvious things that are going to be very
>> challenging to map. I would avoid, if at all possible, mapping the
>> structure of your docs into the SQL world, except as a linear series
>> of xml:ids in a stream of unicode characters. Also make sure you
>> configure you databases to be using unicode when you first configure
>> them.
>> 
>> cheers
>> stuart
>> 
>>> On Sun, Jul 13, 2014 at 7:29 PM, Hayim Lapin <hlapin <at> umd.edu> wrote:
>>> Dear all,
>>> 
>>> My TEI/xml-based project (www.digitalmishnah.umd.edu) is increasingly
>>> working in collaboration with a European project that works in mySQL. Both
>>> projects involve transcriptions of manuscripts, automated alignment of
>>> variants followed by hand correction, and potentially additional data
>>> (morphological analysis, translation, names/places, etc.) (Currently, on the
>>> TEI side addressing by  <at> xml:id takes place at the <ab> level. In the next
>>> revision of my TEI schema, texts will be encoded with each word in a <w>
>>> element carrying an xml:id. On the mySQL side, too, words are the basic unit
>>> of data)
>>> 
>>> Output requirements involve, well formatted text, as well as statistical
>>> calculations based on textual or orthographic variation.
>>> 
>>> At this point, I have no experience at all with mySQL. I wonder if people
>>> can point me to projects that yoke mySQL and TEI. I'd be interested in
>>> learning--on or off list--how the project is structured, and costs and
>>> benefits to such an arrangement.
>>> 
>>> Many thanks,
>>> 
>>> HL
>>> 
>>> --
>>> Hayim Lapin
>>> Robert H. Smith Professor of Jewish Studies
>>> Professor of History
>>> University of Maryland
>>> College Park, MD 20742
>>> 301 405 4296
>>> www.digitalmishnah.org | dev.digitalmishnah.org
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 09:46:55 +0000
> From:    Sebastian Rahtz <sebastian.rahtz <at> IT.OX.AC.UK>
> Subject: Re: Advice? Experience? TEI and SQL
> 
>> On 13 Jul 2014, at 10:11, Doug Reside <dougreside <at> GMAIL.COM> wrote:
>> 
>> So, the SQL points to the
>> XML but the XML doesn't necessarily point back to the SQL.
> 
> amen to that. 
> 
> I’d use an SQL database to manage data derived from TEI transcriptions, or
> for ancillary structured data. In general I’d choose a relational database for one of three reasons:
> 
>    *) maturity of system. for playing with 500,000 records, you need performance and stability. 
> (yes, eXist or BaseX may be able to cope, but not as easily in my experience)
>    *) local support. if you need a web site built on top of the DB, and this is what your IT guys know,
>    it would be perverse to ignore it. 
>    *) it really _is_ data, mostly numbers and tokens, and the model fits
> 
> But comparing TEI and SQL is chalk and cheese, they do different things, at
> different points in your work lifecycle.
> 
> Of course, I also heartily +1 Stuart’s point about encoding. The bane of all our lives. Try adding
> Perl into the mix as well, and you’re half-way to Bedlam.
> --
> Sebastian Rahtz      
> Director (Research) of Academic IT
> University of Oxford IT Services
> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
> 
> Não sou nada.
> Nunca serei nada.
> Não posso querer ser nada.
> À parte isso, tenho em mim todos os sonhos do mundo.
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 12:21:11 +0200
> From:    Doug Reside <dougreside <at> GMAIL.COM>
> Subject: Re: Advice? Experience? TEI and SQL
> 
> I agree with everything Sebastian writes with one small exception. While
> *xml* and sql may be "chalk and cheese", I do think it might be possible,
> advisable even, to create an sql schema based on the TEI guidelines. I
> suspect it could even improve the TEI.  This is, I expect, far out of scope
> for your project though , Hiam.
> On Jul 13, 2014 11:46 AM, "Sebastian Rahtz" <sebastian.rahtz <at> it.ox.ac.uk>
> wrote:
> 
>> 
>>> On 13 Jul 2014, at 10:11, Doug Reside <dougreside <at> GMAIL.COM> wrote:
>>> 
>>> So, the SQL points to the
>>> XML but the XML doesn't necessarily point back to the SQL.
>> 
>> amen to that.
>> 
>> I’d use an SQL database to manage data derived from TEI transcriptions, or
>> for ancillary structured data. In general I’d choose a relational database
>> for one of three reasons:
>> 
>>        *) maturity of system. for playing with 500,000 records, you need
>> performance and stability.
>> (yes, eXist or BaseX may be able to cope, but not as easily in my
>> experience)
>>        *) local support. if you need a web site built on top of the DB,
>> and this is what your IT guys know,
>>        it would be perverse to ignore it.
>>        *) it really _is_ data, mostly numbers and tokens, and the model
>> fits
>> 
>> But comparing TEI and SQL is chalk and cheese, they do different things, at
>> different points in your work lifecycle.
>> 
>> Of course, I also heartily +1 Stuart’s point about encoding. The bane of
>> all our lives. Try adding
>> Perl into the mix as well, and you’re half-way to Bedlam.
>> --
>> Sebastian Rahtz
>> Director (Research) of Academic IT
>> University of Oxford IT Services
>> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>> 
>> Não sou nada.
>> Nunca serei nada.
>> Não posso querer ser nada.
>> À parte isso, tenho em mim todos os sonhos do mundo.
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 10:35:01 +0000
> From:    Sebastian Rahtz <sebastian.rahtz <at> IT.OX.AC.UK>
> Subject: Re: Advice? Experience? TEI and SQL
> 
>> On 13 Jul 2014, at 11:21, Doug Reside <dougreside <at> gmail.com> wrote:
>> 
>> While *xml* and sql may be "chalk and cheese", I do think it might be possible, advisable even, to create an
sql schema based on the TEI guidelines. I suspect it could even improve the TEI.  
> 
> in the same way as one _could_ represent a complex TEI XML document in JSON. I am not sure 
> what it would gain us… apart from being huge fun :-}
> --
> Sebastian Rahtz      
> Director (Research) of Academic IT
> University of Oxford IT Services
> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
> 
> Não sou nada.
> Nunca serei nada.
> Não posso querer ser nada.
> À parte isso, tenho em mim todos os sonhos do mundo.
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 14:20:31 +0300
> From:    Hayim Lapin <hlapin <at> UMD.EDU>
> Subject: Re: Advice? Experience? TEI and SQL
> 
> Thanks for your comments!
> 
> The reasons to go with the DB implementation are that it has a developed 
> content management system, and was created for the kinds of philological 
> stuff that we are interested in recording and working with.  In 
> addition, part of the project involves taking the output of automated 
> alignment (thanks to the developers of CollateX), and hand correcting 
> them, and this, just from what I have seen from my colleague's work is 
> relatively easily implemented in a DB, whereas I'd have to develop an 
> app and content control to do this from scratch, from the TEI directly. 
> Finally, I have the sense that queries like "find all the examples where 
> witA agrees with witB against witC, but not against witD" are the kind 
> of thing that relational databases were designed for.
> 
> I am gathering that (in my instance) stable world-level data could be 
> stored and queried in SQL, but  the structure of the text might continue 
> to be represented in TEI. I can envision all sorts of problems though, 
> like damage or corrections, particularly when they are at the character 
> level.
> 
> I am also gathering from Stuart's comments and Doug and Sebastian's 
> agreements, at least on that, that one should not use the DB to store a 
> kind of linear offset markup (div2 starts here, w1, w2, pb here, w3, ... 
> damage starts, w159, w160, damage ends, ... w300, div2 ends here etc.).
> 
> Again, many thanks for your advice,
> 
> HL
> 
> Hayim Lapin
> Robert H. Smith Professor of Jewish Studies
> Professor of History
> University of Maryland
> College Park, MD 20742
> 301 405 4296
> www.digitalmishnah.org | dev.digitalmishnah.org
> 
>> On 7/13/2014 1:35 PM, Sebastian Rahtz wrote:
>>> On 13 Jul 2014, at 11:21, Doug Reside <dougreside <at> gmail.com> wrote:
>>> 
>>>  While *xml* and sql may be "chalk and cheese", I do think it might be possible, advisable even, to create
an sql schema based on the TEI guidelines. I suspect it could even improve the TEI.
>> in the same way as one _could_ represent a complex TEI XML document in JSON. I am not sure
>> what it would gain us… apart from being huge fun :-}
>> --
>> Sebastian Rahtz
>> Director (Research) of Academic IT
>> University of Oxford IT Services
>> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>> 
>> Não sou nada.
>> Nunca serei nada.
>> Não posso querer ser nada.
>> À parte isso, tenho em mim todos os sonhos do mundo.
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 08:06:35 -0400
> From:    "Birnbaum, David J" <djbpitt <at> PITT.EDU>
> Subject: Re: Advice? Experience? TEI and SQL
> 
> Dear Hayim (cc TEI-L),
> 
> If the materials are to be integrated into a project that relies on an
> architecture involving relational database technology, that requirement
> may preclude alternatives that might be attractive under other conditions.
> 
> Absent that constraint, though,I would try first to use an XML database,
> which can implement the same sorts of queries within a context that is
> always sensitive to the hierarchical structure of the data. That strategy
> would avoid the overhead of keeping the data in the XML coordinated with
> the data in the database, since the XML would then, itself, be the data in
> the database. I would expect the resulting model to be substantially less
> complex, and therefore easier to build, understand, and maintain. I have
> used this strategy successfully on a small scale, and it would be my first
> approach in something larger, but I would also be mindful of Sebastian's
> cautions concerning the maturity of the technology and the availability of
> expert support (should that be a requirement in your project), as
> relational databases have a long head-start over XML databases in those
> areas.
> 
> Best,
> 
> David
> djbpitt <at> gmail.com
> 
>> On 7/13/14, 7:20 AM, "Hayim Lapin" <hlapin <at> UMD.EDU> wrote:
>> 
>> Thanks for your comments!
>> 
>> The reasons to go with the DB implementation are that it has a developed
>> content management system, and was created for the kinds of philological
>> stuff that we are interested in recording and working with.  In
>> addition, part of the project involves taking the output of automated
>> alignment (thanks to the developers of CollateX), and hand correcting
>> them, and this, just from what I have seen from my colleague's work is
>> relatively easily implemented in a DB, whereas I'd have to develop an
>> app and content control to do this from scratch, from the TEI directly.
>> Finally, I have the sense that queries like "find all the examples where
>> witA agrees with witB against witC, but not against witD" are the kind
>> of thing that relational databases were designed for.
>> 
>> I am gathering that (in my instance) stable world-level data could be
>> stored and queried in SQL, but  the structure of the text might continue
>> to be represented in TEI. I can envision all sorts of problems though,
>> like damage or corrections, particularly when they are at the character
>> level.
>> 
>> I am also gathering from Stuart's comments and Doug and Sebastian's
>> agreements, at least on that, that one should not use the DB to store a
>> kind of linear offset markup (div2 starts here, w1, w2, pb here, w3, ...
>> damage starts, w159, w160, damage ends, ... w300, div2 ends here etc.).
>> 
>> Again, many thanks for your advice,
>> 
>> HL
>> 
>> Hayim Lapin
>> Robert H. Smith Professor of Jewish Studies
>> Professor of History
>> University of Maryland
>> College Park, MD 20742
>> 301 405 4296
>> www.digitalmishnah.org | dev.digitalmishnah.org
>> 
>>> On 7/13/2014 1:35 PM, Sebastian Rahtz wrote:
>>>> On 13 Jul 2014, at 11:21, Doug Reside <dougreside <at> gmail.com> wrote:
>>>> 
>>>>  While *xml* and sql may be "chalk and cheese", I do think it might
>>>> be possible, advisable even, to create an sql schema based on the TEI
>>>> guidelines. I suspect it could even improve the TEI.
>>> in the same way as one _could_ represent a complex TEI XML document in
>>> JSON. I am not sure
>>> what it would gain usŠ apart from being huge fun :-}
>>> --
>>> Sebastian Rahtz
>>> Director (Research) of Academic IT
>>> University of Oxford IT Services
>>> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>>> 
>>> Não sou nada.
>>> Nunca serei nada.
>>> Não posso querer ser nada.
>>> À parte isso, tenho em mim todos os sonhos do mundo.
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 14:55:57 +0200
> From:    Doug Reside <dougreside <at> GMAIL.COM>
> Subject: Re: Advice? Experience? TEI and SQL
> 
> Hmmm...I wouldn't discount the offset approach right away.  It's one I
> like a lot, but the biggest drawback, of course, is edits.  You change
> one thing in a document and the whole structure after that change is
> messed up.  The unicode problems we're all going on about are also an
> issue here.  What one programming language / encoding system counts as
> one offset another (not properly configured) counts as 2 or 3.  If
> your team can build a system to deal with this though, it would be
> really fantastic.
> Doug
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 08:30:15 -0700
> From:    Martin Holmes <mholmes <at> UVIC.CA>
> Subject: Re: Advice? Experience? TEI and SQL
> 
> There are some types of data which, although easy to do in TEI, are also 
> friendly to relational databases, as Sebastian says:
> 
>> *) it really _is_ data, mostly numbers and tokens, and the model fits
> 
> Basic personographies, placeographies, GIS info, glossaries, link 
> groups, highly-structured bibliographies, orgographies, and lists of 
> events fall into this category. We occasionally collect data in this 
> fashion through a forms-based web interface before generating TEI XML 
> from the results.
> 
> Doug's point about offsets and editing is really key, too; if one minor 
> edit to correct an error can throw off a thousand offsets, you'll have 
> nothing but pain to look forward to. I'd second David's suggestion of 
> trying an XML database first (eXist or BaseX).
> 
> Cheers,
> Martin
> 
>> On 14-07-13 02:46 AM, Sebastian Rahtz wrote:
>>> On 13 Jul 2014, at 10:11, Doug Reside <dougreside <at> GMAIL.COM> wrote:
>>> 
>>>  So, the SQL points to the
>>> XML but the XML doesn't necessarily point back to the SQL.
>> 
>> amen to that.
>> 
>> I’d use an SQL database to manage data derived from TEI transcriptions, or
>> for ancillary structured data. In general I’d choose a relational database for one of three reasons:
>> 
>>    *) maturity of system. for playing with 500,000 records, you need performance and stability.
>> (yes, eXist or BaseX may be able to cope, but not as easily in my experience)
>>    *) local support. if you need a web site built on top of the DB, and this is what your IT guys know,
>>    it would be perverse to ignore it.
>>    *) it really _is_ data, mostly numbers and tokens, and the model fits
>> 
>> But comparing TEI and SQL is chalk and cheese, they do different things, at
>> different points in your work lifecycle.
>> 
>> Of course, I also heartily +1 Stuart’s point about encoding. The bane of all our lives. Try adding
>> Perl into the mix as well, and you’re half-way to Bedlam.
>> --
>> Sebastian Rahtz
>> Director (Research) of Academic IT
>> University of Oxford IT Services
>> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>> 
>> Não sou nada.
>> Nunca serei nada.
>> Não posso querer ser nada.
>> À parte isso, tenho em mim todos os sonhos do mundo.
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 17:38:00 +0200
> From:    Doug Reside <dougreside <at> GMAIL.COM>
> Subject: Re: Advice? Experience? TEI and SQL
> 
> The thing is, though, that if anybody could actually build a really
> usable editor for offset mark-up, the shifting of text on corrections
> wouldn't matter.  We probably shouldn't be editing XML by hand anyway
> (either embedded or stand off), but we don't have fantastic tools for
> either yet (as far as I know).
> 
> Doug
> 
> 
>> On Sun, Jul 13, 2014 at 5:30 PM, Martin Holmes <mholmes <at> uvic.ca> wrote:
>> There are some types of data which, although easy to do in TEI, are also
>> friendly to relational databases, as Sebastian says:
>> 
>>> *) it really _is_ data, mostly numbers and tokens, and the model fits
>> 
>> Basic personographies, placeographies, GIS info, glossaries, link groups,
>> highly-structured bibliographies, orgographies, and lists of events fall
>> into this category. We occasionally collect data in this fashion through a
>> forms-based web interface before generating TEI XML from the results.
>> 
>> Doug's point about offsets and editing is really key, too; if one minor edit
>> to correct an error can throw off a thousand offsets, you'll have nothing
>> but pain to look forward to. I'd second David's suggestion of trying an XML
>> database first (eXist or BaseX).
>> 
>> Cheers,
>> Martin
>> 
>> 
>>> On 14-07-13 02:46 AM, Sebastian Rahtz wrote:
>>> 
>>>> On 13 Jul 2014, at 10:11, Doug Reside <dougreside <at> GMAIL.COM> wrote:
>>>> 
>>>>  So, the SQL points to the
>>>> XML but the XML doesn't necessarily point back to the SQL.
>>> 
>>> 
>>> amen to that.
>>> 
>>> I’d use an SQL database to manage data derived from TEI transcriptions, or
>>> for ancillary structured data. In general I’d choose a relational database
>>> for one of three reasons:
>>> 
>>>        *) maturity of system. for playing with 500,000 records, you need
>>> performance and stability.
>>> (yes, eXist or BaseX may be able to cope, but not as easily in my
>>> experience)
>>>        *) local support. if you need a web site built on top of the DB,
>>> and this is what your IT guys know,
>>>        it would be perverse to ignore it.
>>>        *) it really _is_ data, mostly numbers and tokens, and the model
>>> fits
>>> 
>>> But comparing TEI and SQL is chalk and cheese, they do different things,
>>> at
>>> different points in your work lifecycle.
>>> 
>>> Of course, I also heartily +1 Stuart’s point about encoding. The bane of
>>> all our lives. Try adding
>>> Perl into the mix as well, and you’re half-way to Bedlam.
>>> --
>>> Sebastian Rahtz
>>> Director (Research) of Academic IT
>>> University of Oxford IT Services
>>> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>>> 
>>> Não sou nada.
>>> Nunca serei nada.
>>> Não posso querer ser nada.
>>> À parte isso, tenho em mim todos os sonhos do mundo.
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 08:49:08 -0700
> From:    Martin Holmes <mholmes <at> UVIC.CA>
> Subject: Re: Advice? Experience? TEI and SQL
> 
>> On 14-07-13 08:38 AM, Doug Reside wrote:
>> The thing is, though, that if anybody could actually build a really
>> usable editor for offset mark-up, the shifting of text on corrections
>> wouldn't matter.  We probably shouldn't be editing XML by hand anyway
>> (either embedded or stand off), but we don't have fantastic tools for
>> either yet (as far as I know).
> 
> I see the point made quite frequently that we shouldn't be editing XML 
> by hand, but I really don't see why. The arguments for it seem to be 
> that it's somehow difficult and error prone, but that's surely not the 
> case with modern schema-aided editors such as Oxygen; certainly we've 
> never had a novice who experienced any problems learning to edit TEI 
> files. When it gets very dense -- when lots of different types of 
> information are encoded simultaneously into the same text-stream -- then 
> I agree that it can be difficult for humans to keep track of, but even 
> with our more complex projects we rarely get to that level.
> 
> I don't think it's difficult in principle to create a good standoff 
> editor -- one can imagine an editor that keeps the text-stream as 
> read-only while allowing multiple encoding "campaigns" (to borrow a term 
> from genetic editing) focusing on different aspects of the text, and is 
> able to combine them and disentangle them for save and reload -- but the 
> fact that no such editor (AFAIK) exists yet suggests that there is not a 
> huge demand. In linguistic annotation, where a multi-layer approach is 
> essential, tools such as ELAN have emerged to support this.
> 
> Cheers,
> Martin
> 
>> 
>> Doug
>> 
>> 
>>> On Sun, Jul 13, 2014 at 5:30 PM, Martin Holmes <mholmes <at> uvic.ca> wrote:
>>> There are some types of data which, although easy to do in TEI, are also
>>> friendly to relational databases, as Sebastian says:
>>> 
>>>> *) it really _is_ data, mostly numbers and tokens, and the model fits
>>> 
>>> Basic personographies, placeographies, GIS info, glossaries, link groups,
>>> highly-structured bibliographies, orgographies, and lists of events fall
>>> into this category. We occasionally collect data in this fashion through a
>>> forms-based web interface before generating TEI XML from the results.
>>> 
>>> Doug's point about offsets and editing is really key, too; if one minor edit
>>> to correct an error can throw off a thousand offsets, you'll have nothing
>>> but pain to look forward to. I'd second David's suggestion of trying an XML
>>> database first (eXist or BaseX).
>>> 
>>> Cheers,
>>> Martin
>>> 
>>> 
>>>> On 14-07-13 02:46 AM, Sebastian Rahtz wrote:
>>>> 
>>>>> On 13 Jul 2014, at 10:11, Doug Reside <dougreside <at> GMAIL.COM> wrote:
>>>>> 
>>>>>   So, the SQL points to the
>>>>> XML but the XML doesn't necessarily point back to the SQL.
>>>> 
>>>> 
>>>> amen to that.
>>>> 
>>>> I’d use an SQL database to manage data derived from TEI transcriptions, or
>>>> for ancillary structured data. In general I’d choose a relational database
>>>> for one of three reasons:
>>>> 
>>>>         *) maturity of system. for playing with 500,000 records, you need
>>>> performance and stability.
>>>> (yes, eXist or BaseX may be able to cope, but not as easily in my
>>>> experience)
>>>>         *) local support. if you need a web site built on top of the DB,
>>>> and this is what your IT guys know,
>>>>         it would be perverse to ignore it.
>>>>         *) it really _is_ data, mostly numbers and tokens, and the model
>>>> fits
>>>> 
>>>> But comparing TEI and SQL is chalk and cheese, they do different things,
>>>> at
>>>> different points in your work lifecycle.
>>>> 
>>>> Of course, I also heartily +1 Stuart’s point about encoding. The bane of
>>>> all our lives. Try adding
>>>> Perl into the mix as well, and you’re half-way to Bedlam.
>>>> --
>>>> Sebastian Rahtz
>>>> Director (Research) of Academic IT
>>>> University of Oxford IT Services
>>>> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>>>> 
>>>> Não sou nada.
>>>> Nunca serei nada.
>>>> Não posso querer ser nada.
>>>> À parte isso, tenho em mim todos os sonhos do mundo.
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 19:30:02 +0300
> From:    Andreas Trianta <andreas <at> TRIANTA.EU>
> Subject: Re: Advice? Experience? TEI and SQL
> 
> hi Sebastian
> 
> slightly off-topic but apart from eXist and BaseX could zorba also be considered -especially- for
"playing with 500,000 records"?
> 
> 
>> On 13 Ιουλ 2014, at 12:46 PM, Sebastian Rahtz <sebastian.rahtz <at> IT.OX.AC.UK> wrote:
>> 
>>> On 13 Jul 2014, at 10:11, Doug Reside <dougreside <at> GMAIL.COM> wrote:
>>> 
>>> So, the SQL points to the
>>> XML but the XML doesn't necessarily point back to the SQL.
>> 
>> amen to that. 
>> 
>> I’d use an SQL database to manage data derived from TEI transcriptions, or
>> for ancillary structured data. In general I’d choose a relational database for one of three reasons:
>> 
>>    *) maturity of system. for playing with 500,000 records, you need performance and stability. 
>> (yes, eXist or BaseX may be able to cope, but not as easily in my experience)
>>    *) local support. if you need a web site built on top of the DB, and this is what your IT guys know,
>>    it would be perverse to ignore it. 
>>    *) it really _is_ data, mostly numbers and tokens, and the model fits
>> 
>> But comparing TEI and SQL is chalk and cheese, they do different things, at
>> different points in your work lifecycle.
>> 
>> Of course, I also heartily +1 Stuart’s point about encoding. The bane of all our lives. Try adding
>> Perl into the mix as well, and you’re half-way to Bedlam.
>> --
>> Sebastian Rahtz      
>> Director (Research) of Academic IT
>> University of Oxford IT Services
>> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>> 
>> Não sou nada.
>> Nunca serei nada.
>> Não posso querer ser nada.
>> À parte isso, tenho em mim todos os sonhos do mundo.
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 21:58:15 +0300
> From:    Hayim Lapin <hlapin <at> UMD.EDU>
> Subject: Re: Advice? Experience? TEI and SQL
> 
> Dear all,
> 
> Thanks for the really useful comments.
> For my project, we are talking about a text of something under 200,000 
> words, but multiple witnesses of various kinds, so several million 
> word-level records. More if one includes translations.
> 
> Here is a follow-up question from the point of view of the next steps in 
> my specific project:
> Lets say I have *currently*
> 
>    1. several dozen accurate and valid TEI/xml text transcriptions (I
>    am revising the schema, and the texts need more editing, but this is
>    at least within sight)
>    2. a process for automated alignment of variants with output in xml
>    or json
> 
> and I want to
> 
>    3. develop an online tool to visually check and where necessary
>    hand-correct the alignments (say, of as many as twenty or thirty
>    witnesses at a time)
>    4. allows editors to sign off on an alignment, so that the alignment
>    becomes part of the project data
>    5. allows subsequent users to create custom editions, but also query
>    the results of the alignments
> 
> What XML-based database platforms are there that can  be part of a 
> content management system, make it possible to build relatively 
> straightforward table-like apps (witnesses in rows, tokens in columns, 
> e.g.), generate queryable data, that can scale upward toward a much 
> larger finished product, or be redeployed as the architecture for 
> another, possibly still larger, project?
> 
> I apologize if this is off topic for the list, but I'd appreciate 
> knowing about such platforms and projects that work on them.
> 
> Many thanks!
> 
> HL
> 
> Hayim Lapin
> Robert H. Smith Professor of Jewish Studies
> Professor of History
> University of Maryland
> College Park, MD 20742
> 301 405 4296
> www.digitalmishnah.org | dev.digitalmishnah.org
> 
>> On 7/13/2014 6:30 PM, Martin Holmes wrote:
>> There are some types of data which, although easy to do in TEI, are 
>> also friendly to relational databases, as Sebastian says:
>> 
>>> *) it really _is_ data, mostly numbers and tokens, and the model fits
>> 
>> Basic personographies, placeographies, GIS info, glossaries, link 
>> groups, highly-structured bibliographies, orgographies, and lists of 
>> events fall into this category. We occasionally collect data in this 
>> fashion through a forms-based web interface before generating TEI XML 
>> from the results.
>> 
>> Doug's point about offsets and editing is really key, too; if one 
>> minor edit to correct an error can throw off a thousand offsets, 
>> you'll have nothing but pain to look forward to. I'd second David's 
>> suggestion of trying an XML database first (eXist or BaseX).
>> 
>> Cheers,
>> Martin
>> 
>>> On 14-07-13 02:46 AM, Sebastian Rahtz wrote:
>>>> On 13 Jul 2014, at 10:11, Doug Reside <dougreside <at> GMAIL.COM> wrote:
>>>> 
>>>>  So, the SQL points to the
>>>> XML but the XML doesn't necessarily point back to the SQL.
>>> 
>>> amen to that.
>>> 
>>> I’d use an SQL database to manage data derived from TEI 
>>> transcriptions, or
>>> for ancillary structured data. In general I’d choose a relational 
>>> database for one of three reasons:
>>> 
>>>    *) maturity of system. for playing with 500,000 records, you need 
>>> performance and stability.
>>> (yes, eXist or BaseX may be able to cope, but not as easily in my 
>>> experience)
>>>    *) local support. if you need a web site built on top of the DB, 
>>> and this is what your IT guys know,
>>>    it would be perverse to ignore it.
>>>    *) it really _is_ data, mostly numbers and tokens, and the model 
>>> fits
>>> 
>>> But comparing TEI and SQL is chalk and cheese, they do different 
>>> things, at
>>> different points in your work lifecycle.
>>> 
>>> Of course, I also heartily +1 Stuart’s point about encoding. The bane 
>>> of all our lives. Try adding
>>> Perl into the mix as well, and you’re half-way to Bedlam.
>>> -- 
>>> Sebastian Rahtz
>>> Director (Research) of Academic IT
>>> University of Oxford IT Services
>>> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>>> 
>>> Não sou nada.
>>> Nunca serei nada.
>>> Não posso querer ser nada.
>>> À parte isso, tenho em mim todos os sonhos do mundo.
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 21:13:04 +0200
> From:    Doug Reside <dougreside <at> GMAIL.COM>
> Subject: Re: Advice? Experience? TEI and SQL
> 
> I suspect it's more because the TEI community hasn't really supported stand
> off markup in any meaningful way.  The guidelines assume single hierarchy
> xml pretty consistently.
> 
> I think if someone would build a really good stand off editor and
> searchable database it could replace Oxygen pretty quickly (and radically
> change the TEI for the better).  It's not a terribly hard problem, but it
> is non trivial.  And inertia is a powerful force.  Oxygen is what folks
> learn, and so that's all they know.
> 
> TEI tool gaps are make me long for a way to explore alternate
> timelines...but my professional effort hierarchies can't overlap at
> present...
>> On Jul 13, 2014 4:49 PM, "Martin Holmes" <mholmes <at> uvic.ca> wrote:
>> 
>>> On 14-07-13 08:38 AM, Doug Reside wrote:
>>> 
>>> The thing is, though, that if anybody could actually build a really
>>> usable editor for offset mark-up, the shifting of text on corrections
>>> wouldn't matter.  We probably shouldn't be editing XML by hand anyway
>>> (either embedded or stand off), but we don't have fantastic tools for
>>> either yet (as far as I know).
>> 
>> I see the point made quite frequently that we shouldn't be editing XML by
>> hand, but I really don't see why. The arguments for it seem to be that it's
>> somehow difficult and error prone, but that's surely not the case with
>> modern schema-aided editors such as Oxygen; certainly we've never had a
>> novice who experienced any problems learning to edit TEI files. When it
>> gets very dense -- when lots of different types of information are encoded
>> simultaneously into the same text-stream -- then I agree that it can be
>> difficult for humans to keep track of, but even with our more complex
>> projects we rarely get to that level.
>> 
>> I don't think it's difficult in principle to create a good standoff editor
>> -- one can imagine an editor that keeps the text-stream as read-only while
>> allowing multiple encoding "campaigns" (to borrow a term from genetic
>> editing) focusing on different aspects of the text, and is able to combine
>> them and disentangle them for save and reload -- but the fact that no such
>> editor (AFAIK) exists yet suggests that there is not a huge demand. In
>> linguistic annotation, where a multi-layer approach is essential, tools
>> such as ELAN have emerged to support this.
>> 
>> Cheers,
>> Martin
>> 
>> 
>>> Doug
>>> 
>>> 
>>>> On Sun, Jul 13, 2014 at 5:30 PM, Martin Holmes <mholmes <at> uvic.ca> wrote:
>>>> 
>>>> There are some types of data which, although easy to do in TEI, are also
>>>> friendly to relational databases, as Sebastian says:
>>>> 
>>>> *) it really _is_ data, mostly numbers and tokens, and the model fits
>>>> 
>>>> Basic personographies, placeographies, GIS info, glossaries, link groups,
>>>> highly-structured bibliographies, orgographies, and lists of events fall
>>>> into this category. We occasionally collect data in this fashion through
>>>> a
>>>> forms-based web interface before generating TEI XML from the results.
>>>> 
>>>> Doug's point about offsets and editing is really key, too; if one minor
>>>> edit
>>>> to correct an error can throw off a thousand offsets, you'll have nothing
>>>> but pain to look forward to. I'd second David's suggestion of trying an
>>>> XML
>>>> database first (eXist or BaseX).
>>>> 
>>>> Cheers,
>>>> Martin
>>>> 
>>>> 
>>>>> On 14-07-13 02:46 AM, Sebastian Rahtz wrote:
>>>>> 
>>>>> 
>>>>> On 13 Jul 2014, at 10:11, Doug Reside <dougreside <at> GMAIL.COM> wrote:
>>>>> 
>>>>>    So, the SQL points to the
>>>>>> XML but the XML doesn't necessarily point back to the SQL.
>>>>> 
>>>>> 
>>>>> amen to that.
>>>>> 
>>>>> I’d use an SQL database to manage data derived from TEI transcriptions,
>>>>> or
>>>>> for ancillary structured data. In general I’d choose a relational
>>>>> database
>>>>> for one of three reasons:
>>>>> 
>>>>>         *) maturity of system. for playing with 500,000 records, you
>>>>> need
>>>>> performance and stability.
>>>>> (yes, eXist or BaseX may be able to cope, but not as easily in my
>>>>> experience)
>>>>>         *) local support. if you need a web site built on top of the
>>>>> DB,
>>>>> and this is what your IT guys know,
>>>>>         it would be perverse to ignore it.
>>>>>         *) it really _is_ data, mostly numbers and tokens, and the
>>>>> model
>>>>> fits
>>>>> 
>>>>> But comparing TEI and SQL is chalk and cheese, they do different things,
>>>>> at
>>>>> different points in your work lifecycle.
>>>>> 
>>>>> Of course, I also heartily +1 Stuart’s point about encoding. The bane of
>>>>> all our lives. Try adding
>>>>> Perl into the mix as well, and you’re half-way to Bedlam.
>>>>> --
>>>>> Sebastian Rahtz
>>>>> Director (Research) of Academic IT
>>>>> University of Oxford IT Services
>>>>> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
>>>>> 
>>>>> Não sou nada.
>>>>> Nunca serei nada.
>>>>> Não posso querer ser nada.
>>>>> À parte isso, tenho em mim todos os sonhos do mundo.
> 
> ------------------------------
> 
> Date:    Sun, 13 Jul 2014 21:43:57 +0000
> From:    Sebastian Rahtz <sebastian.rahtz <at> IT.OX.AC.UK>
> Subject: Re: Advice? Experience? TEI and SQL
> 
>> On 13 Jul 2014, at 17:30, Andreas Trianta <andreas <at> trianta.eu> wrote:
>> 
>> 
>> slightly off-topic but apart from eXist and BaseX could zorba also be considered -especially- for
"playing with 500,000 records"?
> 
> i hadn’t heard of Zorba before. Interesting. We’ve pretty well off for XQuery processors.
> --
> Sebastian Rahtz      
> Director (Research) of Academic IT
> University of Oxford IT Services
> 13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
> 
> Não sou nada.
> Nunca serei nada.
> Não posso querer ser nada.
> À parte isso, tenho em mim todos os sonhos do mundo.
> 
> ------------------------------
> 
> Date:    Mon, 14 Jul 2014 10:07:11 +1200
> From:    "Stuart A. Yeates" <syeates <at> GMAIL.COM>
> Subject: Mapping the SQL: Was: Re: Advice? Experience? TEI and SQL
> 
>> On Sun, Jul 13, 2014 at 10:21 PM, Doug Reside <dougreside <at> gmail.com> wrote:
>> While
>> *xml* and sql may be "chalk and cheese", I do think it might be possible,
>> advisable even, to create an sql schema based on the TEI guidelines. I
>> suspect it could even improve the TEI.  This is, I expect, far out of scope
>> for your project though ,
> 
> OK, I'll bite.
> 
> Is your proposed mapping a complete mapping? i.e. do you propose
> mapping attribute datatypes to SQL?
> 
> Broadly speaking, what improvements do you see being forthcoming?
> 
> cheers
> stuart
> 
> ------------------------------
> 
> End of TEI-L Digest - 10 Jul 2014 to 13 Jul 2014 (#2014-151)
> ************************************************************

Stuart A. Yeates | 14 Jul 00:07 2014
Picon

Mapping the SQL: Was: Re: Advice? Experience? TEI and SQL

On Sun, Jul 13, 2014 at 10:21 PM, Doug Reside <dougreside <at> gmail.com> wrote:
> While
> *xml* and sql may be "chalk and cheese", I do think it might be possible,
> advisable even, to create an sql schema based on the TEI guidelines. I
> suspect it could even improve the TEI.  This is, I expect, far out of scope
> for your project though ,

OK, I'll bite.

Is your proposed mapping a complete mapping? i.e. do you propose
mapping attribute datatypes to SQL?

Broadly speaking, what improvements do you see being forthcoming?

cheers
stuart

Hayim Lapin | 13 Jul 09:29 2014
Picon

Advice? Experience? TEI and SQL

Dear all,

My TEI/xml-based project (www.digitalmishnah.umd.edu) is increasingly 
working in collaboration with a European project that works in mySQL. 
Both projects involve transcriptions of manuscripts, automated alignment 
of variants followed by hand correction, and potentially additional data 
(morphological analysis, translation, names/places, etc.) (Currently, on 
the TEI side addressing by  <at> xml:id takes place at the <ab> level. In the 
next revision of my TEI schema, texts will be encoded with each word in 
a <w> element carrying an xml:id. On the mySQL side, too, words are the 
basic unit of data)

Output requirements involve, well formatted text, as well as statistical 
calculations based on textual or orthographic variation.

At this point, I have no experience at all with mySQL. I wonder if 
people can point me to projects that yoke mySQL and TEI. I'd be 
interested in learning--on or off list--how the project is structured, 
and costs and benefits to such an arrangement.

Many thanks,

HL

--

-- 
Hayim Lapin
Robert H. Smith Professor of Jewish Studies
Professor of History
University of Maryland
College Park, MD 20742
301 405 4296
www.digitalmishnah.org | dev.digitalmishnah.org


Gmane