Manifest Destiny: Creating an Author friendly Web Publication Spec

Manifest Destiny: Creating an Author-friendly Web Publication Spec

Motivations

As an EPUB author, why do I need to know the mime-type of a WOFF font? Why must I spend so much time building data structures who's only purpose is to allow reading systems to avoid parsing the content documents?

I’ve been thinking about authors vs developers, and how the web seems to be more and more oriented around web developers as its primary constituency.

I’ve been thinking about Hadrien and Readium2 and manifests. And how JSON is a really terrible format for humans to author. And I was wondering how might we avoid basing all of digital publishing on monolithic, complex JSON files.

And I was looking at the web packaging draft. The result of web packaging is a binary file (there’s a “diagnostic” JSON format for humans to use when discussing it, but the real thing is binary and inscrutable). And the format depends on having byte offsets to each component of the package, which a human could never author. The packaging is purely the result of an automated process.

And that was the “aha!” moment. Why are humans tinkering with information meant only for computers? Why am I responsible for EPUB spines and manifests? Why can I make a perfectly valid EPUB where the spine order is different than the nav order? And the answer is that I shouldn’t be doing any of these things, just as when I make a zip file I control the input, but I don’t actually write the zip header files.

EPUB Zero and authoring formats

What we need is an authoring format. EPUB Zero comes pretty close to what I’m imagining—a folder of web content, a nav file to tie it together, and some metadata (some interesting possibilities here). Pure web stuff. Give me that web stuff and a computer program can turn it into EPUB (or a successor) better than I can, constructing manifests and spines and so on from the content itself. And while we’re at it, build validation into that process, and write the results of validation directly into the output format.

If we separate the concerns of authors and implementors, we can better address the needs of both. Authors don’t care about mime types and manifests. Have an authoring spec that covers their responsibilities, and then if the package needs byte offsets and hashes and signatures and manifests, they can be generated in a manner described by a packaging spec (or extension thereof). And the packaging mechanism could even do stuff like write rel=prev/next into the HTML files, taking implicit information from nav and making it explicit, when it would be too much labor for human authors to do the same.

So much of the complexity of EPUB is because it’s trying to solve the problems of implementors. Let’s just move the boundary between authoring and implementing, and we can make life easier for authors and more predictable for implementors.

I think packaging can be a force for good. Packaging is a way to make all the information implicit in the authored files explicit. The information content is the same.

What We'd Need

An authoring spec, which could be really simple, and entirely consistent with the existing OWP
A packaging spec, or more accurately an extension to web packaging, describing how to process/map the authored content into a form optimized for browsers/servers/reading systems. This would probably involve constructing some form of manifest from nav + folder contents.
A validation tool, integrated with a packaging tool.
APIs

Does this address the issues?

Web packaging seems to address lots of security issues, as the package is signed by an origin
Packaging also provides the identification of a single URL for the publication
Packages are by definition portable
Marcos is already talking about integrating service workers and web app manifest into the package format, so offline seems reasonable
We avoid the duplication and complexity of EPUB
Integrating processing and validation into the workflow gives us more options to improve a11y
The core technologies involved are already of interest to Google etc

It seems like this model could work for both the ‘pure web; the book has it’s own UI’ situation and the ‘a reading system takes some content as input’ situation.

Having a processing step as a major part of the process opens up lots of options. For example, the input folder could allow for an ONIX file full of metadata, which could then be translated into schema.org in the final package. Or the output could optionally be EPUB 2.0.1 :)

How Would It Work?

The pieces

An authoring format, EPUB Zero, which is purely HTML with conventions
A packaging format, .wpk
A processing tool and specification, which takes authored content and transforms
A validation tool, part of the processing.
A web API, the Publication Object Model,

EPUB Zero:

must have nav file with well-known name.

html title of nav is publication title
global metadata in head (RDFa, JSON-LD, whatevs)
- title
- creator
- lang
- identifier
or (optional) config file? YAML? CSV? TXT?
various optional metadata files? ONIX? Will probably be different for different markets (journals, magazines)

all content in a folder
tool does all the work:

rel=prev/next/nav added based on nav order if not already in content files
generate some sort of manifest based on folder contents (for SW, PWA, etc)
package
validate
add validation metadata to package (including a11y certs)

config.yaml? well-known name?

--- 
contributor: 
  name: "Dave Cramer"
  role: mrk
creator: 
  file-as: "Melville, Herman"
  name: "Herman Melville"
id: code.google.com.epub-samples.moby-dick-basic
language: en-US
metadata: /onix/978031600000X.xml
modified: 2017-06-22
publisher: "Harper & Brothers"
rights: "CC BY-SA 3.0"
title: Moby-Dick

how to deal with nested packages

parent
  stuff.html
  stuff.js
  config.yaml
  child 1 folder
    config.yaml
    stuff.html

FXL

Hell, no.

Processing Rules.

manifest = all files in folder

spine =

if nav exists, sequence of files in nav = spine
if some files not in nav, added at end of spine in folder order [NO. THIS IS LINEAR=NO, which is good]
if no nav, all html files in sort order = spine
if no nav, no html files then all image files in sort order

metadata =

read from config
if no config,

a. if nav
   - publication title = nav html title
   - use any other metadata found in nav head
  
b. if no nav
- use title of first html doc
    
       - if no html docs
          
          - use folder name as publication title

Provide feedback

Saved searches

Use saved searches to filter your results more quickly