-
Notifications
You must be signed in to change notification settings - Fork 26
Schematizing XML: TEI and Project Constraints
XML can be schema checked so that only certain elements and attribute values are permitted. An XML schema can also define a particular order elements are expected to appear in the existing "well-formed" hierarchy of XML. Such validation abilities are what has brought us the TEI and are why most projects have specific documentation regarding the elements, attributes, and the document hierarchy of the project's XML files.
Key points:
-
A schema provides the available vocabulary used to name elements, attributes, and attribute values.
-
A schema provides the grammar for how the vocabulary is used: rules for nesting, sequencing, etc.
When we check our XML files against a set of schema rules, we are checking their validity. We can use validity checks to make sure we’re spelling element names properly, writing attribute names and values consistently, and nesting elements in a way that makes sense to us that we want to hold consistent.
Note: an XML document can be well formed, but not be valid; whereas, an XML document cannot be valid if it is not well formed.
TEI stands for Text Encoding Initiative. In the most general sense, the TEI is an international and interdisciplinary standard that is widely used by libraries, museums, publishers, and individual scholars to represent all kinds of textual material for online research and teaching. The TEI Consortium is an international organization of scholars whose mission is to develop and maintain guidelines for the digital encoding of literary and linguistic texts. Those guidelines are referred to as the TEI Guidelines and they are used to structure the schema file which projects can reference from their XML files. The schema and guidelines define the “grammar” for how certain elements and attributes are to be used and provide the rules for nesting, sequencing, etc.
To begin, open a new TEI file in <oXygen/>
by clicking on the top left icon that looks like a corner-folded piece of paper (or going to File → New) and type TEI in the search box. In the results, look for the TEI P5 options and choose the first one: All. Look at the TEI file and notice its characteristic two-part structure, with the teiHeader and text elements that are the children of the root element TEI.
To successfully validate your document against the TEI schema, you must have these extra "schema" lines at the top of your document in addition to the XML prolog:
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
Additionally, your root XML element must be <TEI>
and contain the TEI namespace as an attribute value. Your root element should look like this:
<TEI xmlns="http://www.tei-c.org/ns/1.0">
If your document fails its schema validation, the square in the upper left corner turns red (just as it does for a formedness error), only you’ll see a different error message for failing a validity check. To be sure you’re running a validity check, select the icon that looks like a red check-mark on a piece of paper, and <oXygen/>
will run a fresh validation check. (The keyboard shortcut in Windows for running a validation check is Ctrl+Shift+v, and on a Mac the shortcut is Cmd+Shift+v.)
The lessons and exercises constructed for this course incorporate materials from Dr. Elisa Beshero-Bondar's Digital Humanities courses, the Digital Mitford Coding School, the Text Encoding Initiative's learning resources, GitHub Guides, and the GitHub Help resources. This repository is public-facing, therefore, the lessons and exercises herein are licensed under a CC BY-NC-SA license.