Skip to content

Schematizing XML: TEI and Project Constraints

Rebecca Parker edited this page Jan 28, 2019 · 17 revisions

XML can be schema checked so that only certain elements and attribute values are permitted. An XML schema can also define a particular order elements are expected to appear in the existing "well-formed" hierarchy of XML. Such validation abilities are what has brought us the TEI and are why most projects have specific documentation regarding the elements, attributes, and the document hierarchy of the project's XML files.

What is a schema?

  1. Available vocabulary used to name elements, attributes, and attribute values.

  2. Grammar for how the vocabulary is used: rules for nesting, sequencing, etc.

What is the TEI?

TEI stands for Text Encoding Initiative. In the most general sense, the TEI is an international and interdisciplinary standard that is widely used by libraries, museums, publishers, and individual scholars to represent all kinds of textual material for online research and teaching. The TEI Consortium is an international organization of scholars whose mission is to develop and maintain guidelines for the digital encoding of literary and linguistic texts. Those guidelines are referred to as the TEI Guidelines and they are used to structure the schema file which projects can reference from their XML files. The schema and guidelines define the “grammar” for how certain elements and attributes are to be used and provide the rules for nesting, sequencing, etc.

Why Schematize?

When we check our XML files against a set of schema rules, we are checking their validity. We can use validity checks to make sure we’re spelling element names properly, writing attribute names and values consistently, and nesting elements in a way that makes sense to us that we want to hold consistent.

Note: an XML document can be well formed, but not be valid; whereas, an XML document cannot be valid if it is not well formed.