Skip to content

Latest commit

 

History

History
352 lines (226 loc) · 17.1 KB

README.md

File metadata and controls

352 lines (226 loc) · 17.1 KB

Picture

Travis Codecov version downloads GPL-2.0 semantic-release Commitizen friendly experimental

CWRC-Writer-Base

The Canadian Writing Research Collaboratory (CWRC) is developing an in-browser text markup editor (CWRC-Writer) for use by collaborative scholarly editing projects. This package is the base code that builds on the TinyMCE javascript editor, and is meant to be packaged together (using Browserify) with two other packages that communicate with a server that provides document storage and entity (people, places) lookup. A default version of the CWRC-Writer that uses GitHub for storage and VIAF for entity lookup is available for anyone's use:

http://208.75.74.217

Table of Contents

  1. Overview
  2. Storage and Entity Lookup
  3. Configuration
  4. API
  5. Managers
  6. Modules
  7. Demo
  8. Development

Overview

CWRCWriter is a WYSIWYG text editor for in-browser XML editing and stand-off RDF annotation.
The editor is a customization of the TinyMCE editor.

A 'CWRCWriter' installation is a bundling of the main CWRC-WriterBase (the code in this repository) with
a few other NPM packages that handle interaction (calls to the server from dialogs for user input) with server-side services for:

  • document storage
  • named entity lookup

The default implementation of the CWRC-Writer is the CWRC-GitWriter which uses GitHub to store documents, and uses VIAF, WikiData, DBpedia, Getty for named entity (people, places) lookup.

The dialogs to interact with GitHub, VIAF, are in the NPM packages:

The CWRC-PublicEntityDialogs package in turn uses:

The CWRC-GitWriter (the default CWRC-Writer) therefore bundles (using browserify) those NPM packages together with the CWRC-WriterBase package. You may substitute your own packages with dialogs that interact with your own backend storage and/or entity lookup.

The CWRCWriterBase itself also provides built in interaction with default server-side services for:

  • XML Validation
  • XML Schemas
  • documentation and help

CWRC provides a default XML validation HTTP end point that the CWRC-WriterBase is preconfigured to use.
You may substitute your own, but the CWRC-WriterBase expects validation and error messages in a specific format.
Similarly you can substitute your own documentation and help files.

Storage and Entity Lookup

If you choose not to use either the default CWRC GitHub storage or the CWRC named entity lookups then most of the work in setting up CWRCWriter for your project will be in implementing the dialogs to interact with your backend storage and/or named entity lookups. We have split these pieces off into their own packages in large part to make it easier to substitute your own dialogs and supporting services.

A good example to follow when creating a new CWRC-Writer project is our pubic implementation CWRC-GitWriter. You might also choose to use either the CWRC GitHub storage dialogs or the CWRC public entity lookups, both of which are used by the CWRC-GitWriter, and replace just one of the two. To help understand how we've developed the CWRC-Writer, you could also look at our development docs.

To replace either of the storage and entity dialogs, you'll need to create objects with the following APIs:

Storage Object API

load(writer)

save(writer)

where writer is the writer object (i.e., the object defined in the API section).

The storage object for GitHub is implemented here: cwrc-git-dialogs

Each method is invoked by the CWRC-WriterBase whenever the end user clicks the 'save' or 'load' button in the editor.

Each method spawns a dialog that prompts the user to load or save. Because load(writer) and save(writer) are passed an instance of the CWRC writer object, all of the methods defined below in API are available, to allow get and set of the XML in the writer.

We also define an authenticate method on our cwrc-git-dialogs object to handle the Oauth authentication of GitHub. You may implement your authentication however you like. If you want to follow our approach you can see it here where we authenticate before instantiating the CWRC-WriterBase.

Entity Lookup API

You have at least two choices here:

  1. You can entirely implement your own dialog for lookup, following the model in CWRCPublicEntityDialogs

  2. You can use CWRCPublicEntityDialogs and configure it with different sources. We provide four sources (viaf, wikidata, getty, DBpedia).

You can use any of these sources, and supplement them with your own sources. CWRCPublicEntityDialogs fully explains how to add your own sources.

API

Constructor

The CWRC-WriterBase exports a single constructor function that takes one argument, a configuration object.

See CWRC-GitWriter/src/js/config.js for an example of a base configuration file, and
CWRC-GitWriter/src/js/app.js to see the configuration file loaded, extended, and passed into the constructor.

Configuration Object

Options that can be set on the configuration object:

Required Options
  • config.container: String. The ID of the element that should contain the CWRC-Writer.
  • config.storageDialogs: Object. Storage dialogs, see cwrc-git-dialogs for example and API definition.
  • config.entityLookupDialogs: Object. Entity lookup, see cwrc-public-entity-dialogs for example and API definition.
Other Options
  • config.cwrcRootUrl: String. An absolute URL that should point to the root of the CWRC-Writer directory. If blank, the browser URL will be used.

  • config.modules: Object. The IDs of the modules to load, grouped by their locations relative to the CWRC-Writer.

    For example:

    config.modules = {
      west: ['structure','entities'],
      east: ['selection'],
      south: ['validation']
    }
    
  • config.annotator: Boolean. If true, the end user may only add annotations to the document.

  • config.readonly: Boolean. If true, the end user may not edit nor annotate the document.

  • config.mode: String. The mode in which to start the CWRC-Writer. xml or xmlrdf.

  • config.allowOverlap: Boolean. Should overlapping entities be allowed initially?

  • config.validationUrl: String. The URL to use for XML validation. If blank, will default to the validation service provided by CWRC.

  • config.schemas: Object. A map of schema objects that can be used in the CWRC-Writer. Each entry should contain the following:

    • name: String. The schema title.
    • url: String. An URL that links to the schema (RELAX NG) file.
    • cssUrl: String. An URL that links to the CSS associated with this schema.
    • schemaMappingsId: String. The directory name in the schema directory from which to load mapping and dialogs files for the schema.
    • entityTemplates: Object. Lists URLs for use by citation and note entity dialogs.
  • config.buttons1, config.buttons2, config.buttons3: String. A comma separated list of plugins to set in the CWRC-Writer toolbars. Possible values: addperson, addplace, adddate, addorg, addcitation, addnote, addtitle, addcorrection, addkeyword, addlink, editTag, removeTag, addtriple, viewsource, editsource, validate, savebutton, loadbutton.

Configuration within documents

Configuration information can be included in the XML documents themselves, to override project settings:

XML/RDF mode

Set the mode with a cw:mode setting in the RDF section:

<rdf:Description rdf:about="http://localhost:8080/editor/documents/null">
    <cw:mode>0</cw:mode>
</rdf:Description>

where allowable values for cw:mode are:

0 = XML & RDF (default - XML & RDF with no overlap)
1 = XML
2 = RDF

Annotation Overlap

Overlapping annotations, those that cross XML tags, are disallowed by default. Enable them with:

<rdf:Description rdf:about="http://localhost:8080/editor/documents/null">
    <cw:allowOverlap>true</cw:allowOverlap>
</rdf:Description>

Writer object

The object returned by the constructor is defined here: writer.js. The typical properties and methods you'd want to use when implementing your own storage and/or entity dialogs are:

Properties

isInitialized

boolean
Has the editor been initialized.

isReadOnly

boolean
Is the editor in readonly mode.

isAnnotator

boolean
Is the editor in annotate (entities) only mode.

Methods

loadDocumentURL(docUrl)

Loads an XML document from an URL into the editor

loadDocumentXML(docXml)

Loads an XML document (either a XML Document or a stringified version of such) into the editor

getDocument()

Returns the parsed XML document from the editor

getDocRawContent()

Returns the raw content (HTML) from the editor

showLoadDialog()

Convenience method to call the load() method of the object set in the storageDialogs property of the config object passed to the writer.

showSaveDialog()

Convenience method to call the save() method of the object set in the storageDialogs property of the config object passed to the writer.

validate (callback)

Validates the current document callback(w, valid): function where w is the writer and valid is true/false. Fires a documentValidated event if validation is successful.

Managers

Tasks within CWRC-Writer are handled by specific managers.

Handles conversion of entities to annotations and vice-versa.

Handles the initialization and display of dialogs.

Handles the creation and modification of entities. Stores the list of entities in the current document.

Handles the dissemination of events through the CWRC-Writer using a publication-subscribe pattern. See the code for the full list of events.

Handles schema loading and schema CSS processing. Stores the list of available schemas, as well as the current schema. Handles the creation of schema-appropriate entities, via the Mapper.

Modules

Modules are self-contained components that add extra functionality to CWRC-Writer. These can be specified in the configuration object using the proper module ID.

Module ID: entities

Displays the list of entities in the current document. Allows for modifying, copying, and deleting of entities.

Module ID: imageViewer

Displays images linked from within the current document. Useful for OCR'd documents.

Module ID: relations

Displays the list of entity relationships (i.e. RDF triples) in the current document. Uses triple to add new relationships.

Module ID: selection

Displays the markup of the text that's selected in the current document.

Module ID: structure

Displays the markup of the current document in a tree/outline. Useful for navigating and modifying the document.

Module ID: validation

Requests and displays the results of document validation. See validate.

Demo

A running deployment of the CWRC-GitWriter, our default implementation, is available for anyone's use at:

http://208.75.74.217

This demo may well be all that you need as it allows loading and saving to arbitrary GitHub repositories.

Development

CWRC-Writer-Dev-Docs describes general development practices for CWRC-Writer GitHub repositories, including this one.

Testing

The code in this repository is intended to run in the browser, and so we use browser-run to run browserified tape tests directly in the browser.

We decorate tape with tape-promise to allow testing with promises and async methods.

Mocking

We use sinon

Code Coverage

We generate code coverage by instrumenting our code with istanbul before browser-run runs the tests, then extract the coverage (which istanbul writes to the global object, i.e., the window in the browser), format it with istanbul, and finally report (Travis actually does this for us) to codecov.io

Transpilation

We use babelify and babel-plugin-istanbul to compile our code, tests, and code coverage with babel

Continuous Integration

We use Travis.

Note that to allow our tests to run in Electron on Travis, the following has been added to .travis.yml:

addons:
  apt:
    packages:
      - xvfb
install:
  - export DISPLAY=':99.0'
  - Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 &
  - npm install

Release

We follow SemVer, which Semantic Release makes easy.
Semantic Release also writes our commit messages, sets the version number, publishes to NPM, and finally generates a changelog and a release (including a git tag) on GitHub.