-
Notifications
You must be signed in to change notification settings - Fork 26
Schematizing XML: TEI and Project Constraints
XML can be schema checked so that only certain elements and attribute values are permitted. An XML schema can also define a particular order elements are expected to appear in the existing "well-formed" hierarchy of XML. Such validation abilities are what has brought us the TEI and are why most projects have specific documentation regarding the elements, attributes, and the document hierarchy of the project's XML files.
Key points:
-
A schema provides the available vocabulary used to name elements, attributes, and attribute values.
-
A schema provides the grammar for how the vocabulary is used: rules for nesting, sequencing, etc.
When we check our XML files against a set of schema rules, we are checking their validity. We can use validity checks to make sure we’re spelling element names properly, writing attribute names and values consistently, and nesting elements in a way that makes sense to us that we want to hold consistent.
Note: an XML document can be well formed, but not be valid; whereas, an XML document cannot be valid if it is not well formed.
TEI stands for Text Encoding Initiative. In the most general sense, the TEI is an international and interdisciplinary standard that is widely used by libraries, museums, publishers, and individual scholars to represent all kinds of textual material for online research and teaching. The TEI Consortium is an international organization of scholars whose mission is to develop and maintain guidelines for the digital encoding of literary and linguistic texts. Those guidelines are referred to as the TEI Guidelines and they are used to structure the schema file which projects can reference from their XML files. The schema and guidelines define the “grammar” for how certain elements and attributes are to be used and provide the rules for nesting, sequencing, etc.
To begin, open a new TEI file in <oXygen/>
by clicking on the top left icon that looks like a corner-folded piece of paper (or going to File → New) and type TEI in the search box. In the results, look for the TEI P5 options and choose the first one: All. Look at the TEI file and notice its characteristic two-part structure, with the teiHeader and text elements that are the children of the root element TEI.
To successfully validate your document against the TEI schema, you must have these extra "schema" lines at the top of your document in addition to the XML prolog:
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
Additionally, your root XML element must be <TEI>
and contain the TEI namespace as an attribute value. Your root element should look like this:
<TEI xmlns="http://www.tei-c.org/ns/1.0">
If your document fails its schema validation, the square in the upper left corner turns red (just as it does for a formedness error), only you’ll see a different error message for failing a validity check.
To be sure you’re running a validity check, click the icon that looks like a red check-mark on a piece of paper, and <oXygen/>
will run a fresh validation check. (The keyboard shortcut on Windows for running a validation check is Ctrl+Shift+v, and on a Mac the shortcut is Cmd+Shift+v.)
After a project decides to use the TEI, the recommended practice is to further constrain the available elements and attribute name-value pairs by customizing the TEI schema through the creation of an ODD and/or the association of an additional home-brewed schema on top of the TEI schema. For the purposes of this class we are not going to delve into the process of creating such documents (feel free to follow the previous links if interested); however, it is important to know what our schema does and how you go about associating it to your XML.
Because the Lili Elbe Digital Archive has been constructed in increments, our ODD-constructed schema is always under construction. While the .rng
schema file holds the rules oXygen is validating our project XML files with, we have created a more human-readable version of those rules titled the LEDA Encoding Guidelines. Encoders: please note we are no longer using the "MIW Encoding Guidelines (DRAFT)" Google Document.
In order to complete the TEI exercise, our engaged learners will need to correctly associate our project's schema to their group's designated Lili Elbe text. To do this, first, download your assigned XML and our LEDA_ODD.rng
file by navigating to each document and clicking the button on the right side near the top of the file's content that gives you the "Raw" view of the file. From there you can copy and paste or right click and save. Then, while viewing your XML document in the <oXygen/>
XML Editor, navigate to the taskbar and click on Document -> Schema -> Associate Schema
. From there, click on the file folder to browse your computer and locate your schema file. A note on mindful file management: remember to save the LEDA_ODD.rng file somewhere on your computer where you can easily and consistently locate it. We recommend you save the schema and your XML file in the same folder so that when you go to browse your computer it should be visible and easily available. To finalize your association click ok
, and <oXygen/>
should insert a superscript that looks like this:
<?xml-model href="LEDA_ODD.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
The above schema line takes the place of the TEI schema lines seen above in the section How to Associate the TEI Schema; therefore, it should appear after the XML prolog but before the root element: <TEI xmlns="http://www.tei-c.org/ns/1.0">
. Once you have associated the schema <oXygen/>
should automatically run a validation check. If you wish to re-run the validation, please follow the instructions above for Running a Validation Check-in <oXygen/>
. Remember to save your XML after the schema is associated.
If encoders are working from a cloned copy of this repository or a cloned copy of one of our project's GitLab repositories please note the @href
attribute-value (in the above schema line) will not just read the filename (LEDA_ODD.rng), but instead will have a preceding file path. For example: href="../lili-elbe-code/schema/out/LEDA_ODD.rng"
would indicate you have the lili-elbe-code GitLab repo. saved outside the folder containing the XML file being associated; therefore, ../
is used to jump into the folder/desktop holding both the folder containing your XML and the lili-elbe-code folder and then the rest of the file path points to [the schema's location within the lili-elbe-code folder. Our primary schema development happens on GitLab therefore you can find our most up-to-date schema there.
The lessons and exercises constructed for this course incorporate materials from Dr. Elisa Beshero-Bondar's Digital Humanities courses, the Digital Mitford Coding School, the Text Encoding Initiative's learning resources, GitHub Guides, and the GitHub Help resources. This repository is public-facing, therefore, the lessons and exercises herein are licensed under a CC BY-NC-SA license.