-
Notifications
You must be signed in to change notification settings - Fork 44
Technical Concept
This paper describes the goals, architecture and implementation of iQvoc - the web-based open source vocabulary management framework.
iQvoc is a web-based vocabulary management framework which provides both an intuitive user interface and Semantic Web interoperability.
iQvoc supports vocabularies that are common to many knowledge organization systems, such as:
- Thesauri
- Taxonomies
- Classification schemes
- Subject heading systems
iQvoc provides comprehensive functionality for all aspects of managing such vocabularies:
- multilingual display and navigation in any web browser
- editorial control for approved versions
- publishing the vocabulary in the Semantic Web
- easy customization according to users' needs
- import of existing vocabularies from a SKOS representation
The goals for developing iQvoc were:
- Create an application that incorporates the features listed in chapter 1.
- Be able to use the complete application as an extendible framework.
This document targets both decision makers and engineers who want to get insight into the architectural decisions we made to achieve the goals listed above. It describes the current status quo of iQvoc and therefore acts as a part of the project's technical documentation.
The only hard constraints for the (continuous) development of iQvoc were:
- Runnable on the JVM
- Deployable in Apache Tomcat >= 6.0
- Using a relational Oracle database
These constraints largely originate from the projects iQvoc is being used in by a specific customer.
iQvoc is actively being developed by innoQ Deutschland GmbH and is being employed in a variety of diverse projects.
At the moment iQvoc is actively being used by several projects. The German Federal Environment Agency (Umweltbundesamt) employs iQvoc in the public thesaurus UMTHES.
This chapter describes the history of iQvoc and its iterative development into a generic framework.
When first building iQvoc, there was a tiny amount of technical constraints we had to respect - these are outlined in chapter 2. Based on our expertise with Ruby and Rails we chose Ruby on Rails as the framework. JRuby allowed us to develop the application in our environment of choice whilst paying respect to the production environment (listed in chapter 2).
Version 1.0 was closely tailored to the requirements of UMTHES (Umweltthesaurus). In fact the UMTHES system was a single application strongly lacking generalization and modularization - at this point it was impossible to reuse generic logic and components for other vocabulary implementations in a practical manner.
This situation led us to a consolidation phase wherein we extracted and refactored the iQvoc core logic into a separate component. While establishing a clean split between core logic and customer extensions we made large parts of the core configurable. By providing a central configuration we were able to avoid hacks that otherwise would have been necessary to overload specific core functionality within a special customer extension. The result of these efforts was iQvoc 2.0.
Before open-sourcing iQvoc we introduced some major API changes and feature extensions which led us to a version bump to 3.0 for the initial public release. These changes mainly consisted of:
- Extraction of SKOS-XL support into a separate component
SKOS-XL (SKOS Extension for Labels) support was tied to the iQvoc core from the beginning because UMTHES required it. Technically speaking SKOSXL elevates labels to first-level entities, alongside concepts. SKOS-XL labels can have their own relations between each other and therefore their own URIs. We decided that SKOS-XL should not be a core functionality; that led to the extraction of iqvoc_skosxl into a separate library. - SKOS importer
iQvoc is able to import standard and valid SKOS data.
In this chapter we document the important architecture choices that enabled us to release and maintain iQvoc as a generic framework.
The core schema is closely tailored to the SKOS (Simple Knowledge Organization System) standard. Vocabulary items can be created as concepts. Concepts can be assigned different names by using so-called labels, i.e. they are labeled. Concepts can be assigned notes of different types, e.g. a definition of what the concept is about. Concepts (as well as collections) can be grouped in collections.
iQvoc employs SKOS in a relational model design. It is in the very nature of the Semantic Web to associate and connect concepts in a vast number of ways. In order to support this we developed configurable relation types.
Example: If one wants to extend iQvoc's standard SKOS concept relations, a new relation class inheriting from the base Concept::Relation class can be implemented and configured. The core configuration provides hooks for every existing relation.
In the beginning the primary goal whilst building iQvoc was to develop a thesaurus editor with certain features for one customer project - it is important to recall this as iQvoc is now much more.
The main goals at the beginning were:
-
Publishing and editing of one specific thesaurus
The application should be able to let users collaborate in editing the managed thesaurus. An editing workflow should offer simplified versioning of thesaurus terms and collaboration. -
Deep integration into the Semantic Web
Supporting SKOS incorporates support for concept representations in different RDF (Resource Description Framework) formats. We wanted to be able to implement the various RDF views in a concise and DRY (Don't Repeat Yourself) way. With that came the requirement to support the importing of standard SKOS data in different RDF formats into an iQvoc instance.
While finishing the mentioned customer project and achieving the goals listed in the respective we got more requests for implementing custom thesauri and vocabularies which lead us to the decision to generalize the architecture of iQvoc, remove any customer- or project-specific components and restructure it as a hybrid of a standalone editing application and a classic framework:
-
iQvoc as a framework
We wanted to be able to reuse a specific amount of code over and over again - the typical case of a framework. Copying parts or the whole core logic were definitelynot an option with respect to several thesauri and vocabulary projects for customers. Vocabulary applications embedding iQvoc should still remain customizable. Because of the inherent complexity of abstraction and generalization that comes with creating a software framework, this goal was also the one on the list that had the biggest impact on our architectural design decisions. -
iQvoc as a stand-alone application
Apart from the need to reuse iQvoc as a framework for applications employing vendor-specific customizations, we wanted the software to be usable as a stand-alone application for cases that do not require modifications or extensions of core functionality. This may be also convenient for quick production or demo setups as well as sample instances.
Because of the many typed relations between models the diagram only shows a simplified schema of iQvoc's model classes.
Explanatory notes:
- Matches are currently implemented as a 1..n association for concepts. A Match instance provides a string attribute that can point to another concept URI in the web.
- Collections can contain both concepts as well as collections - thus concepts can be organized in a hierarchical way.
Comparison to iQvoc's model schema:
- There is no thesaurus class. An iQvoc installation is self-contained and represents a thesaurus instance. Multiple thesauri can be managed by installing multiple iQvoc instances. Connecting concepts can be done with matches.
- Concept groups and schemes can be implemented by using collections.
As a web application, iQvoc consists of two separate layers of infrastructure; the server- and the client-side.
On the server-side iQvoc is based on Ruby on Rails 3 - it is compatible with a variety of SQL databases. Due to its relational database schema it is not compatible with NoSQL databases or key-value stores.
The model layer is implemented using Rails's ActiveRecord ORM. Many model classes make heavy use of Single Table Inheritance (STI).
As an authentication library we chose Authlogic because of its unintrusive approach. Authorization is implemented using the CanCan library.
HTML rendering uses Rails's ERb template language. Additionally we implemented RDF rendering in both Turtle and RDF-XML using a DSL which was extracted into the open source project IqRdf.
As iQvoc provides RDF rendering in multiple formats it can be easily connected to triple stores like Virtuozo.
iQvoc's user interface employs progressive enhancement by making use of jQuery, providing a variety of JavaScript widgets to simplify navigation and data entry:
- treeview provides a dynamic tree navigation of hierarchical constructs
- datepicker simplifies entry of dates
- autocomplete provides in-place suggestions when entering references
- jit-rgraph used to visualize concept and label relations
Strong use of modularization enables us to develop and maintain a full-featured core software whilst being able to extend it easily with new features or replace core logic. This chapter outlines some of the important modularization techniques we used.
Rails 2.3 introduced a very powerful new feature called "Rails Engines". By using engines one can hook up Rails applications into another. The entire iQvoc architecture is based on this technique. Rails 3 elevated the engine feature to a more advanced level and enabled complete "mountable apps".
There are two ways the iQvoc core system can be used:
-
Stand-alone
The iQvoc source code can be cloned in order to set up a vocabulary instance. -
As an engine
If a vocabulary requires deeper customizations and/or extensions, iQvoc can be mounted into a separate vocabulary application. Every model, controller, view or route that the iQvoc core system provides is then available within the stand-alone vocabulary application.
In order to be able to run iQvoc as both a stand-alone application and a mountable engine, we had to use a small hack:
# lib/iqvoc.rb
unless Iqvoc.const_defined?(:Application)
require File.join(File.dirname(__FILE__), '../config/engine')
end
This works because the constant Iqvoc::Application
is only available when iQvoc is booted as a stand-alone Rails application.
The core configuration is implemented as a standard Ruby module. Independent of the given iQvoc setup (stand-alone or engine, see 5.3.1) it is encouraged to leave the standard configuration module lib/iqvoc.rb
as is. Default configuration options are implemented as standard module attributes and can be overwritten.
Example
# config/initializers/iqvoc.rb
require "iqvoc"
Iqvoc::Concept.base_class_name = "Concept::MyNamespace::Base"
Dependencies are managed with Bundler.
Git is used for revision control. The open source code is hosted on GitHub.
Every release has its own Git tag according to the project's versioning guidelines referenced in 5.9. Every release is pushed to Rubygems as well so iQvoc can be installed via the standard Ruby package management system: gem install iqvoc
.
iQvoc uses Semantic Versioning in order to deliver a consistent and understandable versioning schema of the open source code.
iQvoc provides a test suite consisting of:
- Unit tests
Ensure business logic implemented in the model layer does what it is expected to do. - Integration tests
We make use of integration tests in which the site is browsed within a headless Webkit browser. This ensures full-stack testing from the outside in by replaying important user workflows in an automated way.
Resolving dependencies as well as the execution of the test suite and the migration of the database schema are part of a continuous integration process that runs on the open source distributed build platform Travis CI.
Because of iQvoc being a generic framework it can suffer from the typical problems for this case:
- Convention or configuration?
- How many configuration options are provided?
The flexible configuration is definitely one of the most complex things in the iQvoc core. Configurable model associations make class definition code rather complex and hard to read. Additionally, model references all over become rather implicit.
These things could be a potential risk for code maintainability.
iQvoc features a full-fledged editing and publishing workflow for concepts based on roles and permissions. As described in chapter 5.2.1 we use CanCan to define permissions for roles. We chose a very pragmatic approach for roles: Each user can have one role, the role is stored hardcoded in the user record. The permissions are defined with a DSL that CanCan provides in a so-called ability file.
- When viewing an individual concept or label, editors can choose to edit the respective entry by creating a new version - alternatively, new entries can be created via the user's dashboard
- Creating a new version creates a private revision and locks the entry, preventing concurrent edits by other users
- After editing an entry, the editor can propose the changes for publication
- Proposed changes appear in publishers' dashboard where they can be reviewed and publication can be approved
- Upon publication, the updated version replaces the public version
Creating versioning logic and state machines is non-trivial and can lead to bugs. We tried to ensure the software contains as few bugs as possible in the integration of the editing workflow by extending the test suite covering these sections.