Skip to content

A collection of GSQ's vocabularies, formulated using SKOS, serialised as RDF (turtle) files.

License

Notifications You must be signed in to change notification settings

johnmckellar/vocabularies

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GSQ Vocabularies

Introduction

The Geological Survey of Queensland (GSQ) publishes vocabularies - a way to describe things and the relationship between things.

A vocabulary is a set of agreed terms:

  • In GSQ, a vocabulary defines the terms used to describe and represent things in the domain of science and data management.
  • Vocabularies align information within a business area or across systems.
  • Vocabularies can be very complex (with thousands of terms) or very simple (describing one or two concepts only).

Read Why Vocabularies? and more subjects in the Vocabularies Wiki.

Vocabulary - how it all hangs together

Vocabulary context diagram

Fig. 1: Vocabulary context diagram

  1. We use tools such as Vocbench or Excel to create the vocabulary using SKOS Simple Knowledge Organization System. See also the SKOS Primer for the basics.
  2. The native format for a vocabulary is a TTL (turtle) file. This file contains RDF triples - subject > predicate > object statements.
  3. We use Github (where you are now) to store and manage versions of vocabulary TTL files. Github also provides workflow functionality to approve vocabularies. Read the Github getting started guide
  4. We import the TTL files into GraphDB to create a triple store. GraphDB lets us query the triples.
  5. VocPrez presents our vocabs on the web for people and computers to read. VocPrez pulls the triples from GraphDB to create a cache of the vocabularies.
  6. CKAN drop-down form fields pull their values from VocPrez. This ensures that the attributes uses to describe a dataset comes from the controlled vocabulary.

How to create a vocabulary

Vocabulary build and pull process

Fig. 2: Vocabulary build and pull process

  1. Search for existing International, National, and Industry Standards. Use directly where possible, augment and adapt when needed, create new original vocabulary as a final option (see below for links to existing vocabularies).
  2. Create the vocabulary using the SKOS Simple Knowledge Organization System
  3. Allocate a URI to the vocab
    • we use linked.data.gov.au for all GSQ vocabs
    • Arrange for URI allocation via the Contacts below
  4. Export the vocabulary to a TTL file
    • If using Vocbench, it is easier to export the TTL from the Build repository in GraphDB. Follow the instructions here.
  5. Validate the TTL file
    1. Use the online Skosify tool.
      • This tests for SKOS conformance
      • Tick the checkbox Keep skos:related relationships within the same hierarchy, leave the other checkboxes unticked.
    2. The use the GSQ Vocab SAHCL Shapes files
      • This tests for GSQ requirements over and beyond SKOS, such as particular metadata for the vocab
      • The files are stored in this repo, shapes
      • Use the pySHACL tool on your desktop to do the validation
  6. Import the TTL file into a development branch in Github. Name your branch dev-vocabularyName. See how-to instructions here.
  7. Submit a pull request to the vocabularies repository.
  • Create a branch for your vocab named review-vocabularyName
  • add your vocab to that branch
  • create a Pull Request from that review- branch to master branch and nominate reviewers
  • Once 2+ reviews have passed (usually a data managmenet staff member and a science domain expert), the final reviewer will merge the review- branch into master branch and delete the review- branch
  1. Publication of the vocab to production VocPrez will be automated from here onwards

Creation Workflow

The steps outlined above are shown in workflow form at the Vocabulary Review Workflow wiki page

See also

Repository Contents

  • vocabularies/ - all GSQ's vocabularies, in RDF (Turtle) text files
  • shapes/ - SHACL graph shape files used to validate vocab files before publication
  • scripts/ - Python scripts to dump/load a GraphDB instance with these vocab files
  • templates/ - Excel and other tools to help with vocab creation

License

This code repository's content are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0), the deed of which is stored in this repository here: LICENSE.

Contacts

Vocabularies owner:
Mark Gordon
Geological Survey of Quensland
Department of Natural Resources, Mines and Energy
Brisbane, QLD, Australia
[email protected]

Technical contact:
Vance Kelly
Geological Survey of Quensland
Department of Natural Resources, Mines and Energy
Brisbane, QLD, Australia
[email protected]

Author:
David Crosswell
Enterprise Architect
Cross-Lateral Enterprises
https://crosslateral.com.au

About

A collection of GSQ's vocabularies, formulated using SKOS, serialised as RDF (turtle) files.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%