Skip to content
Huan He edited this page Oct 4, 2021 · 21 revisions

Foreword

MedTator is a serverless web tool that focuses on the core steps related to corpus annotation.

What does "serverless" mean?

"Serverless" does not mean that you don't need to prepare a server for MedTator (although you certainly can by using other public services), nor does it mean that MedTator is lacking in some annotation functionality. The "serverless" means that MedTator can process data 100% within your web browser and it won't send any information to any server. MedTator won't hide any data or annotation out of your local environment, and it won't save any user operation (e.g., mouse click, key press, etc.).

Background

Natural language processing (NLP) and machine learning techniques have been widely applied in practice and research, which usually need to rely on high-quality annotated datasets. Therefore, manual annotation is required to collect additional information from document, and a suitable tool is needed to reduce the intensive labor work. To address this need, many text annotation tools have been developed for a variety of tasks, such as text classification, named-entity recognition, and sequence prediction.

However, while existing tools provide many powerful features to cover various needs in text annotation, it is still challenging for non-expert users or annotators to leverage these tools in their own research task. Therefore, based on the feedbacks from our domain experts and experienced annotators, we propose and implement MedTator to address the challenges.

System Architecture

MedTator is implemented in pure frontend JavaScript with the annotation schema and files processed in client’s web browser, which enables installation-free and cross-platform access for both administrators and annotators. Although MedTator is a pure frontend web application that doesn’t require any server components, its architecture design still follows the concept of the Model-View-Controller (MVC) pattern and a refinement of MVC, the Model-View-ViewModel (MVVM) pattern. The MVVM pattern helps to design a blueprint for developers to build frontend / client applications with more responsive user interaction and feedback, while avoiding costly duplication of code (e.g., DOM manipulation and CSS update) and effort across the overall architecture.

Due to the complexity of the annotation tasks, we designed four tabs and each tab focuses on a certain task to avoid users’ recognition overload. Although the task for each tab is different, the functions and data structure used by each tab can be shared. Therefore, we leverage the features provided by the Vue.js and other packages to implement MedTator’s architecture and the core functions needed for annotation tasks.

System Architecture

As shown in the above figure, the architecture of MedTator includes four layers, namely user interface layer, core modules layer, data persistence layer, and open-source packages layer.

The user interface layer contains the four tabs for the core annotation tasks, which are built based on Metro UI. It provides the similar experience of other well-known desktop applications. In the core module layer, we implement a Vue.js based core app controller to route the requests from users to the core functions, such as importing schema and annotation files and IAA calculation. As the intensive requirements of rendering tags and other visual effects, we implement some modules related to visualization. For example, when showing the relation tags, a polyline will be drawn on the editor in SVG (Scalable Vector Graphics) format to indicate the entities to be linked. To get the correct coordinates of the polyline in different display modes (i.e., document mode, and sentence mode), we developed modules to get the relative tag coordinates in the editor and map the coordinates to a SVG path in different coordinate system. The data persistence layer can handle the requests of reading and writing files in various formats.

Open-source Packages

The functions and features of MedTator are based on many open-source packages, which are served from public free content delivery network (CDN) services. So that users won’t need to install any runtime environment on server or client to use it (i.e., no need to install Java, Python, R, or any other runtime). A list of used open-source packages and their details are shown as follows.

Package Name Version Description
Metro UI 4 4.3.2 Metro 4 is an open-source toolkit for developing with HTML, CSS, and JS for quick prototyping responsive web pages.
jQuery 3.4.1 jQuery is a fast, small, and feature-rich JavaScript library for HTML document traversal and manipulation, event handling, Ajax, etc.
jQuery UI 1.12.0 jQuery UI is a curated set of user interface interactions, effects, widgets, and themes built on top of the jQuery JavaScript Library.
Vue.js 2.6.11 Vue.js is an open-source Model–View–ViewModel frontend JavaScript framework for building user interfaces.
jszip 3.2.0 JSZip is an efficient JavaScript library for creating, reading and editing .zip files with simple API set.
dayjs 1.8.36 Day.js is a minimalist JavaScript library that parses, validates, manipulates, and displays dates and times.
CodeMirror 5.62.0 CodeMirror is a versatile text editor implemented in JavaScript for editing code in web browser.
PapaParse 5.3.1 Papa Parse is a fast in-browser CSV (or delimited text) parser for JavaScript, which is reliable according to EFC 4180.
Shepherd 8.3.1 Shepherd is a JavaScript library for guiding users through the main features of a web application.
winkNLP 1.8.0 winkNLP is a JavaScript NLP library that supports stemmer, lexicon, tokenizer, lemmatizer, etc.
Compromise 13.11.4 Compromise is a JavaScript NLP library that supports sentence split, token normalization, named-entity recognition, etc.
xml-formatter 2.4.0 xml-formatter is a JavaScript library for converting XML into human readable format while respecting the xml:space attribute.

Run Your Own Copy

As MedTator is a serverless pure frontend web application, you could fork your own copy on GitHub and run it with your own domain name which is provided by GitHub.

  • First, go to the homepage of the MedTator repository https://github.com/OHNLP/MedTator
  • Secondly, you could find a “Fork” button in the top right, next to the star button. Click this “Fork” button and follow the instruction to fork MedTator repository to your own GitHub account. set GitHub Pages
  • Thirdly, go to the settings of your forked repo and switch the “Pages” section. Set the source to branch “main” and folder “docs”, then save.

Then, GitHub will assign a customized domain name for this forked MedTator. After a few minutes, you could access your own MedTator copy with that customized domain name. For example, if your GitHub account name is username123, you could find your forked MedTator in https://username123.github.io/MedTator by default.

In addition to the above default configurations, you could also specify different branch or folder to server as MedTator homepage according to your own situation. More details about forking a repo on GitHub could be found at https://docs.github.com/en/get-started/quickstart/fork-a-repo and more details about the GitHub pages could be found at https://docs.github.com/articles/configuring-a-publishing-source-for-github-pages/ .

Clone this wiki locally