Skip to content

Project summary

Gabriel Bodard edited this page Nov 20, 2015 · 3 revisions

In recent years the EpiDoc localization of TEI XML has clearly established itself as the most robust and widely supported format for encoding text editions of classical inscriptions, papyri and other ancient texts. Several large projects have published texts in online form based on underlying EpiDoc materials, and many of these have also release the XML files licensed for reuse and interoperability. Training workshops for scholars wishing to encode their own texts in EpiDoc XML have been run in many parts of the world, and there are tool-bases such as SoSOL, Perseids and Oxygen Forms to assist with the editing process. What is missing from this ecosystem is any support for individual authors or small projects without full digital humanities centers behind them to publish an online corpus based on underlying EpiDoc XML code. EFES (the EpiDoc Front-End Services) aims to fill this gap, offering an application to publish EpiDoc files, with many of the functionalities required by a philological project available as standard, with minimal technical expertise or development required to install or maintain.

EFES is a customisation of the existing Kiln software (http://kiln.readthedocs.org) that provides a platform for transforming XML documents into various output formats. Kiln includes built-in search, display and RDF functionality, but is a generic digital humanities tool, and needs to be adapted fairly heavily to fit the particular needs of the EpiDoc community, data, and practices. EFES therefore will incorporate indexing, search and export functionality from several existing EpiDoc projects, many of whom have shared such code under open source software licenses, along with core EpiDoc scripts such as the Example EpiDoc Stylesheets.

Work packages for the EFES project are listed below:

  1. Adopting the standard EpiDoc XSLT for display, and incorporating code from London EpiDoc projects to provide editorial, diplomatic, verse and other editions in parallel.

  2. Generating indices based on XML content for several important and common features often aggregated from ancient texts, such as names, places, lexical words, date, find locations, object type, material, text category, type of writing, prosopography, geography. Basic vocabularies for these features will be provided, with the expectation that projects will both want to enhance these and potentially link them to online typologies and ontologies. The indexing of these features will also enable a faceted browse interface, which is a potentially more powerful way of accessing the data.

  3. Two (possibly three) optional variants of word indexing: the first following the common practice of recording dictionary headwords for all words in the XML, so that an index of lemmata can be generated; the second based on more sparsely encoded XML, with scripts to automatically tokenize and (more or less ambiguously) lemmatize words for the purpose of indexing or searching. A third option, to index only word forms as found in the text, might be offered for corpora in languages that are small, not inflected, or poorly understood.

  4. A user-friendly interface for customizing certain features, such as which indices to offer, Leiden style and other parameters in the text edition, language and internationalization options.

  5. RDF export of data according to the important ontologies for classical materials, including LAWD, Pelagios and SNAP:DRGN.

  6. A textual search interface, fully equipped with philological and semantic search features, such as lemmatised search, complex diacritic handling, restricting results to ipsissima verba, etc.