-
Notifications
You must be signed in to change notification settings - Fork 26
Schematizing XML: TEI and Project Constraints
XML can be schema checked so that only certain elements and attribute values are permitted. An XML schema can also define a particular order elements are expected to appear in the existing "well-formed" hierarchy of XML. Such validation abilities are what has brought us the TEI and are why most projects have specific documentation regarding the elements, attributes, and the document hierarchy of the project's XML files.
Key points:
-
A schema provides the available vocabulary used to name elements, attributes, and attribute values.
-
A schema provides the grammar for how the vocabulary is used: rules for nesting, sequencing, etc.
When we check our XML files against a set of schema rules, we are checking their validity. We can use validity checks to make sure we’re spelling element names properly, writing attribute names and values consistently, and nesting elements in a way that makes sense to us that we want to hold consistent.
Note: an XML document can be well formed, but not be valid; whereas, an XML document cannot be valid if it is not well formed.
TEI stands for Text Encoding Initiative. In the most general sense, the TEI is an international and interdisciplinary standard that is widely used by libraries, museums, publishers, and individual scholars to represent all kinds of textual material for online research and teaching. The TEI Consortium is an international organization of scholars whose mission is to develop and maintain guidelines for the digital encoding of literary and linguistic texts. Those guidelines are referred to as the TEI Guidelines and they are used to structure the schema file which projects can reference from their XML files. The schema and guidelines define the “grammar” for how certain elements and attributes are to be used and provide the rules for nesting, sequencing, etc.
To begin, open a new TEI file in <oXygen/>
by clicking on the top left icon that looks like a corner-folded piece of paper (or going to File → New) and type TEI in the search box. In the results, look for the TEI P5 options and choose the first one: All. Look at the TEI file and notice its characteristic two-part structure, with the teiHeader and text elements that are the children of the root element TEI.
To successfully validate your document against the TEI schema, you must have these extra "schema" lines at the top of your document in addition to the XML prolog:
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
Additionally, your root XML element must be <TEI>
and contain the TEI namespace as an attribute value. Your root element should look like this:
<TEI xmlns="http://www.tei-c.org/ns/1.0">
If your document fails its schema validation, the square in the upper left corner turns red (just as it does for a formedness error), only you’ll see a different error message for failing a validity check.
To be sure you’re running a validity check, click the icon that looks like a red check-mark on a piece of paper, and <oXygen/>
will run a fresh validation check. (The keyboard shortcut in Windows for running a validation check is Ctrl+Shift+v, and on a Mac the shortcut is Cmd+Shift+v.)
After a project decides to use the TEI, the recommended practice is to further constrain the available elements and attribute name-value pairs by customizing the TEI schema through the creation of an ODD and/or the association of an additional home-brewed schema on top of the TEI schema. For the purposes of this class we are not going to delve into the process of creating such documents (feel free to follow the previous links if interested); however, it is important to know what our schema does and how you go about associating it to your XML.
Because the Lili Elbe Digital Archive has been constructed in increments, our ODD-constructed schema is still under construction. Currently, our schema is really only checking to verify encoders use correct key
attribute values on <persName>
and <placeName>
elements. Our primary way of sharing our encoding process among the many project encoders has been our MIW Encoding Guidelines via Google Docs and a spreadsheet listing the key IDs for people and places mentioned in the Lili Elbe Archive's many texts.
In order to complete the TEI exercise, our engaged learners will need to correctly associate our project's schema to their group's designated Lili Elbe text. To do this, first download your assigned XML and our MIWschema.rng file by navigating to each document and clicking the button on the right side near the top of the file's content that gives you the "Raw" view of the file. From there you can copy and paste or right click and save. Then, while viewing your XML document in the <oXygen/>
XML Editor, navigate to the taskbar and click on Document -> Schema -> Associate Schema. From there, click on the file folder to browse your computer and locate your schema file. A note on mindful file management: remember to save the MIWschema.rng file somewhere on your computer where you can easily and consistently locate it. We recommend you save the schema and your XML file in the same folder so that when you go to browse your computer it should be easily available. To finalize your association click ok
, and <oXygen/>
should insert a superscript that looks like this:
<?xml-model href="MIWschema.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
The above schema line takes the place of the TEI schema lines seen above in the section How to Associate the TEI Schema; therefore, it should appear after the XML prolog but before the root element: <TEI xmlns="http://www.tei-c.org/ns/1.0">
. Once you have associated the schema <oXygen/>
should automatically run a validation check. If you wish to re-run the validation, please follow the instructions above for Running a Validation Check-in <oXygen/>
.
The lessons and exercises constructed for this course incorporate materials from Dr. Elisa Beshero-Bondar's Digital Humanities courses, the Digital Mitford Coding School, the Text Encoding Initiative's learning resources, GitHub Guides, and the GitHub Help resources. This repository is public-facing, therefore, the lessons and exercises herein are licensed under a CC BY-NC-SA license.