Skip to content

Latest commit

 

History

History
240 lines (172 loc) · 14.8 KB

README.md

File metadata and controls

240 lines (172 loc) · 14.8 KB

Linked Data Authorities by LD4L-Labs

This document describes general information about how to use external authorities directly, access the LD4L-Labs cache server, and how to setup a local cache of an external authory's linked data for query and use in an end-user application.

Installation

Using in rails app with QA already installed

If you have a ruby app with QA installed (ex. Samvera's Hyrax), all you need to do is copy the authority's config file from one of the directories starting with qa_ in subdirectory config/authorities/linked_data of this repository into config/authorities/linked_data directory in your app and restart the rails server.

Using in rails app without QA installed

See the README for Questioning Authority (QA) for instructions on installing QA in your rails app. After it is installed, all you need to do is copy the authority's config file from one of the directories starting with qa_ in subdirectory config/authorities/linked_data of this repository into config/authorities/linked_data directory in your app and restart the rails server.

Using with non-rails app or as stand alone with a rails app

LD4P QA Server provides a simple rails app that runs a standalone QA Server that can be accessed by any app in any language using API calls through CURL. This document talks about the architecture we used to setup a QA Server along with a cache of the authority data. Additional details on setting up a QA Server can be found in the README of the LD4P QA Server code repository. You can explore a working QA Server that has all the authorities described in this repository installed at http://lookup.ld4l.org/.

STATUS OF WORK

  • CONFIGS:
    • _ld4l_cache - configs ending in _ld4l_cache are configured to go against our cache of the authorities data. You can do experiments with this, but it is not recommended for production systems. We do not provide warning when make modifications to this data or the systems supporting it.
    • _direct - configs ending in _direct are configured to go directly against an external authority (e.g. LOC, GeoNames, etc.). These have been tested with QA and we use in our production apps. You can copy these into your QA system to use them. You will want to test them out for use in your apps to be sure they meet your production needs.
  • QA_SERVER:
    • ld4p/qa_server - The qa server engine has releases which are considered stable. You can launch your own QA server based for your production apps. See installation instructions in the README. Add issues if you do experience problems.
    • http://lookup.ld4l.org - This is our production QA Server app. You are welcome to do experiments with this app, but it is not recommended that you use this in your prodcution system. It is for use by the LD4P grant and partners. Notification will be given to grant partners when changes will be made to this system, but broader notification will not be provided.
  • CACHE: The cache work described in this documentation is primarily setup in support of the LD4P grant and its partners. You are welcome to try out any and all of the work described here. Notification will be given to grant partners when changes will be made to this system, but broader notification will not be provided.

Vocabularies

There are a number of authorities that have been tested with the system architecture described in this document.

See each of the sub-directories of this project starting with qa_. Each addresses a separate authority. Each vocabulary has it's own README that addresses configuration and setup specific to that authority. General information about setup is in this README.

NOTE: You can define your own authority configurations. Information on defining configurations is available in the questioning authority (qa) gem README

Architecture Overview

The system described here is designed to process linked data search results into a format that is easily digestable by end applications. It allows user interfaces to be designed to handle searching and presentation of linked data in a consistent way.

There are two servers that drive this system.

  1. linked data authority server - This server can be controlled by the authority provider or you can use a locally cached version of the authority. It must be querable via curl with the query or term uri passed as a parameter. Additionally, it must return a serialization of linked data (e.g. rdf-xml, json-ld, n-triples, turtle, etc.)

  2. questioning authority normalization server - This server provides access to linked data authorities and a process for converting linked data from the various ontologies to a normalized json format that is the same for all ontologies. There are two primary ways to setup this server.

  3. For Ruby applications, include the questioning authority (qa) gem. The copy in the configurations and validations from this repo into the same directory in your app for each authority you want to support.

  4. For non-ruby applications or for ruby applications that want to share a single access point, you can clone or fork the ld4p/qa_server which is a stand alone small application that includes the questioning authority gem. The qa_server app includes all the configurations in this repo. If you do not want all of them, you can remove the authority configuration and validation files.

More information on setting up these servers follows in the Usage section.

More information on working with linked data through the QA gem is available in it's README in section Linked Open Data (LOD) Authorities

Usage

Set up a local linked data authority server (optional if accessing external authority directly)

Basic Steps

These are the basic steps to follow for each authority. I provide a longer list of tasks under Our Workflow as an example of some of the practical considerations.

  • Setup a new triplestore for each authority to hold the cached linked data.
  • Download the triples to a local file.
  • Load the triples into the triplestore. If you are using multiple authorities, it is recommended that you load each linked data authority triples into a separate triplestore to avoid issues with scale and cross contamination.
  • It is recommended that you create an index that can be used for efficient queries.

Our Setup

For reference, here is the server setup that we are using.

Hardware

  • 8-core, 64gb 3Ghz Mac Pro (late 2013), macOS Sierra (10.12.6)
  • 32tb Pegasus-2 Thunderbolt RAID configured as RAID-5

Software

  • Apache Jena Fuseki 2.4.0 provides SPARQL endpoint
  • Apache Tomcat 9.0 runs custom web application(s)
  • Apache Lucene 3.6 provides search interface

Customizations

  • custom per-data-source JSP web application provides search/browse/download functionality
  • custom (generic) SPARQL Tag Library provides API for web apps (available at https://github.com/eichmann/lod-utilities)
  • custom (generic) Lucene Tag Library provides API for web apps (availability on github TBA; SOLR equivalent TBA)

We keep multiple triplestores on the same machine using the same RAID. Fuseki is happy talking to multiple underlying triplestores and allows you to easily switch between them in their web search interface.

Our Workflow

For reference, this is the full workflow we follow when adding a new authority's data.

  • download RDF
  • if necessary, convert to n-triples (required for GeoNames data, for instance)
  • use tdbloader2 to populate a new triplestore
  • configure Fuseki server(s) with triplestore details
  • create new JSP project in Eclipse
  • write one or more indexer programs that populate Lucene indices and run indexer(s)
  • write search/browse/download application logic using the SPARQL and Lucene tags
  • package project as war
  • deploy to Apache Tomcat server(s)
  • add new service to Apache HTTPD virtual host specification

Set up questioning authority normalization server

One time process

For your convenience, QA Server app was created that can be used as the questioning authority normalization server. See QA Server's README for Installation Instructions.

Per vocabulary process

  • Add a configuration file for each vocabulary you want to use to /config/authorities/linked_data in the server app.

See QA Server's Supported Authorities documentation for more information.

Debugging

Because there are a number of systems integrating with each other, you should test the generated links at each level.

Test that the linked data authority server returns expected RDF serialization.

Copy the search:url:template substitute an appropriate value for each variable mapping (e.g. {?query}, {?lang}, etc.). Paste this in a browser or use curl to access. Verify that a linked data serialization is returned. NOTE: You might not see any data if your query does not produce any results. The first step in debugging is to confirm that the query is expected to return results.

Sample json-ld results...

{"@context":
  {"skos":"http:\/\/www.w3.org\/2004\/02\/skos\/core#",
   "isothes":"http:\/\/purl.org\/iso25964\/skos-thes#",
   "onki":"http:\/\/schema.onki.fi\/onki#",
   "uri":"@id",
   "type":"@type",
   "results":
     {"@id":"onki:results",
      "@container":"@list"
     },
   "prefLabel":"skos:prefLabel",
   "altLabel":"skos:altLabel",
   "hiddenLabel":"skos:hiddenLabel",
   "@language":"en"
  },
 "uri":"",
 "results":
   [{"uri":"http:\/\/lod.nal.usda.gov\/nalt\/1293",
     "type":["skos:Concept"],
     "localname":"nalt\/1293","prefLabel":"16SrXI (Rice yellow dwarf group)",
     "lang":"en",
     "vocab":"nalt"
    },
    {"uri":"http:\/\/lod.nal.usda.gov\/nalt\/1892",
     "type":["skos:Concept"],
     "localname":"nalt\/1892",
     "prefLabel":"Acacia holosericea",
     "lang":"en",
     "vocab":"nalt"
    },
    {"uri":"http:\/\/lod.nal.usda.gov\/nalt\/1892",
    "type":["skos:Concept"],
    "localname":"nalt\/1892",
    "prefLabel":"Acacia holosericea",
    "lang":"en",
    "altLabel":"Acacia holosericea A. Cunn.",
    "vocab":"nalt"
   }
  ]
}    

Test that QA returns the expected normalized search results

With the QA server running on port 3002 on your local machine, this will be something like...

http://localhost:3002/qa/search/linked_data/_VOCAB_?q=test&maxRecords=3

Substitute the name of the configuration file you are testing for _VOCAB_.

Sample qa_server search results...

[
  {"uri":"http://lod.nal.usda.gov/nalt/5133",
   "id":"http://lod.nal.usda.gov/nalt/5133",
   "label":"agglutination tests"
  },
  {"uri":"http://lod.nal.usda.gov/nalt/7182",
   "id":"http://lod.nal.usda.gov/nalt/7182",
   "label":"animal use alternatives"
  },
  {"uri":"http://lod.nal.usda.gov/nalt/1841",
   "id":"http://lod.nal.usda.gov/nalt/1841",
   "label":"intestinal absorption"}]

Test that QA generates the expected linked data authority server URL

If you don't see any results, you can check that the correct URL for the linked data server is generated. In the QA server, search log/development.log for QA Linked Data search url:. Then you can copy/paste that URL into a browser or use curl in the terminal to verify that the generated URL accessing the linked data authority server actually returns data as expected.

Test that QA returns the expected normalized term fetch results for search query

With the QA server running on port 3002 on your local machine, this will be something like...

http://localhost:3002/qa/show/linked_data/_VOCAB_/_URI_OR_ID_

Substitute the name of the configuration file you are testing for _VOCAB_.

Sample qa_server term fetch results...

{"uri":"http://id.worldcat.org/fast/262937",
"id":"262937",
"label":["Clemens, Olivia Langdon, 1845-1904"],
"altlabel":["Twain, Mark, Mrs., 1845-1904","Clemens, Samuel L., Mrs., 1845-1904","Clemens, Livy, 1845-1904","Langdon, Olivia, 1845-1904","Langdon, Livy, 1845-1904"],
"narrower":[""],
"broader":[""],
"sameas":["http://id.loc.gov/authorities/names/n89103996","https://viaf.org/viaf/3272855"],
"predicates":
  {"http://schema.org/name":["Twain, Mark, Mrs., 1845-1904","Clemens, Samuel L., Mrs., 1845-1904","Clemens, Livy, 1845-1904","Langdon, Olivia, 1845-1904","Langdon, Livy, 1845-1904","Clemens, Olivia Langdon, 1845-1904"],
  "http://www.w3.org/1999/02/22-rdf-syntax-ns#type":["http://schema.org/Person"],
  "http://www.w3.org/2004/02/skos/core#inScheme":["http://id.worldcat.org/fast/ontology/1.0/#fast","http://id.worldcat.org/fast/ontology/1.0/#facet-Personal"],
  "http://www.w3.org/2004/02/skos/core#prefLabel":["Clemens, Olivia Langdon, 1845-1904"],
  "http://purl.org/dc/terms/identifier":["262937"],
  "http://xmlns.com/foaf/0.1/focus":["http://en.wikipedia.org/wiki/Olivia_Langdon_Clemens"],
  "http://www.w3.org/2004/02/skos/core#altLabel":["Twain, Mark, Mrs., 1845-1904","Clemens, Samuel L., Mrs., 1845-1904","Clemens, Livy, 1845-1904","Langdon, Olivia, 1845-1904","Langdon, Livy, 1845-1904"],
  "http://schema.org/sameAs":["http://id.loc.gov/authorities/names/n89103996","https://viaf.org/viaf/3272855"]}}

Test that QA generates the expected linked data authority server URL for term fetch

If you don't see any results, you can check that the correct URL for the linked data server is generated. In the QA server, search log/development.log for QA Linked Data term url:. Then you can copy/paste that URL into a browser or use curl in the terminal to verify that the generated URL accessing the linked data authority server actually returns data as expected.

See qa_server's validation process for an interactive approach to testing connections and accuracy.