Skip to content

Indices spec

Jamie Norrish edited this page Nov 28, 2017 · 5 revisions

Indices Specification

For each index, certain information is required for proper display to the user. This page details the requirements for each index and specifications deriving from those.

Requirements

Each index requires its items to have, for each instance:

  • A link to the inscription
  • The identifier of the inscription
  • The number of the text part containing the instance
  • The line number containing the instance
  • An indicator of whether it is partially or completely restored

Do specific indices have further requirements? IOSPE has tei:num indexed alongside whether it is a simple value, an at least value, or an at most value, but I don't see any use made of this on the front end.

Each item needs to have its language indexed.

In addition to the actual index, each index needs some or all of: title, introduction/preamble, notes, and index-specific table headings.

Minimal Solr footprint

IOSPE's Solr index is large and cumbersome to work with (in terms of time taken to index and the need for a special script), and EFES's approach is designed to avoid that. Rather than having a single document for each instance of an index term, every instance for an item is grouped as multiple values within a single doc. This requires encoding all of the information for an instance into a single value (easily doable for identifier, text part number, line number, and restoration state). It does however preclude faceting on these indices.

This approach also requires operating on all of the inscriptions at once.

Facets

See https://github.com/EpiDoc/EFES/issues/32

As noted in the section above, faceting is not available, since the Solr index is improved by grouping all instances of the same term in a single doc.

Implementation

Indices are specified in TEI XML files in content/xml/indices, one file per type of XML document to which the indices defined therein apply. For EpiDoc files (that live in content/xml/epidoc/, their indices are defined in content/xml/indices/epidoc.xml.

Each index is defined within the tei:body, in an IDed tei:div with a tei:head, an optional tei:div[@type='notes'] and an optional tei:div[@type='headings']. The heading and notes are rendered into HTML in the display of the index. The headings, if specified (in a tei:list) provide the explicit headings for the index table.

Solr indexing

Solr indexing is done through the usual Kiln process, with various map:match elements defined in sitemaps/solr.xmap that break the process down into useful pieces. solr.xmap#local-solr-add-indices handles a single index file, creating a document that XIncludes Cocoon URLs to solr.xmap#local-solr-add-index, which is responsible for creating the Solr doc for a specific index within an index file. This makes use of index-specific XSLT in stylesheets/solr. These XSLT follow a common pattern of looping over groups of nodes that share an index term, and creating a doc for each.

The Solr field index_instance_location keeps track of the various pieces of information needed to render a title and link to a document containing the index item. It uses a string with multiple parts separated by "#" to do this. Those parts are: subdirectory of content/xml containing the document; the path to the document, relative to that subdirectory (and without the file extension, which is assumed to be .xml); the text part numbers in descending hierarchical sequence, separated by "."; the line number of the instance; and a Boolean marker for whether the instance is restored or not. This string is then parsed on the display side to create a rendering that can mimic that used in IOSPE.

Templating

The templating for HTML display of an index is designed to allow for a lot of customisation of individual indices. Most index-specific templates will simply inherit from a more general one (eg, index-epidoc.xml). The XSLT stylesheets/tei/indices-epidoc.xsl and stylesheets/tei/indices.xsl are responsible for displaying the index. Any custom fields for an index will require adding an xsl:template matching on the Solr arr or str for the field, and an xsl:apply-templates in the template matching on result/doc to that.