Skip to content

Latest commit

 

History

History
434 lines (396 loc) · 31.6 KB

tools.md

File metadata and controls

434 lines (396 loc) · 31.6 KB

Knowledge Graph Construction Tools

Description of the tools for knowledge graph construction

Tool X (TEMPLATE):

  • Name of the tool:
  • Description:
  • Repository (link to the tool’s repository):
  • Website (if is different to the repository):
  • Open source? (If not open sourced, ideally provide an option to test it):
  • Year introduced:
  • Contact person (who is the main contact person?):
  • Purpose (what can one do with the tool?): Select one of this options: Processor (executes rules to generate a knowledge graph), editor (automatically o manually generation of mapping rules), other (e.g., pre-processing)
  • Mapping language: (which mapping language(s) is supported by the tool)
  • Supported data (formats, sizes):
  • Programming language:
  • Special features:
  • DOI:
  • License:
  • Test cases: (if any for the supported languages)
  • Related use cases: (specify use cases shared with the community group (if any) where the tool is used)
  • Related projects: (specify projects (if any) where the tool is used, ideally provide links to the projects descriptions)

Tool 1:

Tool 2:

  • Name of the tool: RMLEditor
  • Description: The RMLEditor offers a graphical user interface to create rules to generate knowledge graphs based on heterogeneous data sources.
  • Repository (link to the tool’s repository): https://github.com/RMLio/rmleditor-ce
  • Website (if is different to the repository): https://app.rml.io/rmleditor/
  • Open source? (If not open sourced, ideally provide an option to test it): No
  • Year introduced: 2016
  • Contact person (who is the main contact person?): Pieter Heyvaert ([email protected])
  • Purpose (what can one do with the tool?): editor
  • Mapping language: [R2]RML
  • Supported data (formats, sizes): CSV, JSON, XML
  • Programming language: HTML/CSS/JS
  • Special features: Uses LOV to find relevant classes and properties. Uses MapVOWL to visualize rules.
  • DOI: N/A
  • License: Free community edition with limitations and paid edition without limitations.
  • Test cases: None
  • Related use cases: None
  • Related projects: DyVerSIFy, MOS2S, COMBUST

Tool 3:

  • Name of the tool: RMLMapper
  • Description: The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
  • Repository (link to the tool’s repository): https://github.com/RMLio/rmlmapper-java
  • Website (if is different to the repository); https://rml.io
  • Open source? (If not open sourced, ideally provide an option to test it): Yes
  • Year introduced: 2014
  • Contact person (who is the main contact person?): Ben De Meester ([email protected])
  • Purpose (what can one do with the tool?): Processor
  • Mapping language: RML, R2RML
  • Supported data (formats, sizes): local and remote files (CSV using ql:CSV or CSVW, JSON using JSONPath, XML using XPath), databases (MySQL, PostgreSQL, SQLServer, OracleDB). The mapper is in-memory, so query result size should be less than the machine's memory
  • Programming language: JAVA
  • Special features: Extensible in terms of supported data formats, Configurable and extensible data transformations using https://FnO.io, interdatasource join. Reference implementation of RML.
  • DOI: N/A
  • License: MIT
  • Test cases: https://rml.io/test-cases/
  • Related use cases: betweenourworlds-anime, idlab-covid19, idlab-dbpedia, idlab-facebook, idlab-twitter, idlab-velopark
  • Related projects: EcoDaLo, ESSENCE, DAIQUIRI, DiSSeCt

Tool 4:

Tool 5:

  • Name of the tool: Morph-GraphQL
  • Description: Morph-GraphQL is an open source system for generating GraphQL servers automatically from declarative mappings such as R2RML or RML. Currently, Morph-GraphQL is able to generate GraphQL servers in JavaScript and SQL databases. Current experimental features include the generation of GraphQL server in other languages (e.g. Java) and other data models (e.g. MongoDB)
  • Repository: https://github.com/oeg-upm/morph-graphql
  • Website: https://morph.oeg.fi.upm.es/tool/morph-graphql
  • Open source: Yes
  • Year introduced: 2019
  • Contact person: David Chaves ([email protected])
  • Purpose: Processor
  • Mapping language: R2RML and RML
  • Supported data: SQL (tested with H2) and NoSQL (experimental, tested with MongoDB)
  • Programming language: JavaScript/Node.js
  • DOI: N/A
  • License: Apache-2.0
  • Related use cases: N/A
  • Related projects: N/A

Tool 6:

  • Name of the tool: Mapeathor
  • Description: Mapeathor is a simple spreadsheet parser able to generate mapping rules in three mapping languages: R2RML, RML (with extension to functions from FnO) and YARRRML. It takes the mapping rules expressed in a spreadsheet (designed to facilitate the mapping rule writting process) and transforms them into the desired language.
  • Repository (link to the tool’s repository): https://github.com/oeg-upm/Mapeathor
  • Website (if is different to the repository): https://morph.oeg.fi.upm.es/tool/mapeathor
  • Open source? (If not open sourced, ideally provide an option to test it): Yes
  • Year introduced: 2019
  • Contact person (who is the main contact person?): Ana Iglesias ([email protected])
  • Purpose (what can one do with the tool?): Editor
  • Mapping language: R2RML, RML, YARRRML
  • Supported data (formats, sizes): Excel
  • Programming language: Python
  • License: Apache-2.0
  • Test cases: None
  • Related use cases: None
  • Related projects: Ciudades Abiertas

Tool 7:

Tool 8:

Tool 9:

  • Name of the tool: SPARQL micro-services

  • Description: The SPARQL Micro-Service architecture is meant to unlock data silos hidden behind proprietary Web APIs by equipping them with a lightweight SPARQL endpoint. The whole idea is about bringing Web APIs into the Web of Data and making it possible to integrate Linked Data and Web APIs within a simple federated SPARQL query.

    A SPARQL micro-service encapsulates a Web API and typically yields a small, resource-centric graph generated dynamically. It can be seen as a configurable SPARQL endpoint in that it expects parameters, e.g. a SPARQL micro-service to find photos from Snapshat may expect tags.

    An interesting use of SPARQL micro-services is to assign dereferenceable URIs to Web API resources that do not have URIs in the first place. For instance, https://sparql-micro-services.org/ld/flickr/photo/31173091626 is the dereferenceable URI of a photo in Flickr. The content is generated dynamically based on the photo identifier.

  • Repository (link to the tool’s repository): https://github.com/frmichel/sparql-micro-service

  • Website (if is different to the repository): example SPARQL micro-services: https://sparql-micro-services.org/

  • Open source? (If not open sourced, ideally provide an option to test it): yes

  • Year introduced: 2018

  • Contact person (who is the main contact person?): Franck Michel ([email protected])

  • Purpose (what can one do with the tool?): processor, other

  • Mapping language: (which mapping language(s) is supported by the tool) : SPARQL construct

  • Supported data (formats, sizes): mainly JSON-based Web APIs, XML-based Web APIs can be adapted too

  • Programming language: php

  • Special features:

    • Docker deployment ready
    • Assign dereferenceable URIs to Web API resources (bridge Web APIs and LOD)
    • Provide provenance information as part of the graph generated
    • Simple configuration with a config.ini file, or with rich SPARQL Sescription Description and SHACL shapes graph
    • Dynamic generation of HTML documentation + test interface from the SPARQL micro-service Sescription Description see example)
    • Autmatic markup of HTML documentation as schema.org Dataset to allow webscale discoverability of SPARQL micro-services, e.g. with Google Dataset Search
  • DOI: n/a

  • License: Apache 2.0

  • Test cases: n/a

  • Related use cases: https://github.com/kg-construct/use-cases/blob/master/inria-kg-vs-webapis.md

  • Related projects: Taxref-Web (private access only, comparison of 20+ Web APIs in the biodiversity domain). Multiple hands-on sessions experimented successfully with various Web APIs: Flickr, Youtube, Twitter, Spotify, Deezer, Musicbrainz...

Tool 10:

  • Name of the tool: WordLift Plugin
  • Description:

WordLift is a WordPress plugin that brings state-of-the-art semantic technologies to the hands of any blogger and publisher: without requiring any technical skills, it helps produce richer content and organize it by suggesting facts and information to provide readers with meaningful context and adding semantic markup to the text to help machines fully understand any website.

Features:

  • Text Analysis: WordLift analyzes content and identifies matching entities organized in 4 categories: Who, What, When and Where.
  • Tag Content: Editors can accept the suggested entities to add contextual info for the user, efficiently selecting internal links to existing content.
  • Create New Entities: Editors can create new entities providing additional context and enriching the web site Knowledge Graph. WordLift will learn and next time they will be detected.
  • Edit Entities: Editors can edit all entities to customize the Knowledge Graph around the web sites' audiences and build new relationships.
  • Images: WordLift suggests open license images and media from own library, saving the time usually spent searching for visuals.
  • Geomaps: Locations in articles can quickly be mapped adding the Geomap widget.
  • Timelines: Events can be displayed chronologically adding the Timeline widget.
  • Chords: Visualize what relates to what in every article adding the Chord widget.
  • Navigator: Recommend relevant articles adding the Navigator widget.
  • Faceted Search: Suggest additional content related to the topics found in your article, letting readers dive into your archive with the Faceted Search widget.
  • Meaningful Navigation: WordLift automatically identifies topics in articles, using Wikipedia’s classification system. This allows to create new entry points for content based on topics, events, people and places.
  • Publish Search Data: WordLift automatically adds schema.org markup articles, allowing search engines to properly index and display content and intelligent agents such as Siri and Alexa to access it.
  • Publish Linked Data: WordLift publishes content’s metadata.

Tool 11:

  • Name of the tool: RocketRML
  • Description: An efficient RML-mapper implementation with Javascript for the RDF mapping language (RML).
  • Repository (link to the tool’s repository): https://github.com/semantifyit/RocketRML
  • Website (if is different to the repository): https://semantifyit.github.io/RocketRML/
  • Open source? (If not open sourced, ideally provide an option to test it): Yes
  • Year introduced: 2019
  • Contact person (who is the main contact person?): Umutcan Simsek ([email protected])
  • Purpose (what can one do with the tool?): Processor
  • Mapping language: RML (in Turtle and YARRML syntax)
  • Supported data (formats, sizes): CSV, JSON, XML. Tested with 500k triples (takes ~20s)
  • Programming language: Javascript (Node.js)
  • Special features: It efficiently maps hierarchical sources by using some caching mechanisms for iterators and JOIN results. Available as a tool with CLI and as an NPM package. A Dockerfile is also provided. Please see the Github repository.
  • DOI: n/a
  • License: CC-BY-SA-4.0
  • Test cases: n/a
  • Related use cases: TBD
  • Related projects: semantify.it, MindLab

Tool 12:

Tool 13:

Tool 14:

  • Name of the tool: CARML
  • Description: An extensible RML mapping engine with built-in support for JSON, CSV, and XML
  • Repository (link to the tool’s repository): https://github.com/carml/carml
  • Website (if is different to the repository):
  • Open source? (If not open sourced, ideally provide an option to test it): Yes
  • Year introduced: 2017
  • Contact person (who is the main contact person?): Pano Maria ([email protected])
  • Purpose (what can one do with the tool?): Processor
  • Mapping language: RML
  • Supported data (formats, sizes): CSV, JSON, XML
  • Programming language: JAVA
  • Special features: Easily extensible for other formats. InputStream extension for easy programmatic binding of sources. XML document extension to be able to use namespace prefix mappings in XPath expressions. Support for FnO functions.
  • DOI: n/a
  • License: MIT
  • Test cases: https://rml.io/test-cases/
  • Related use cases: Kadaster Data Platform
  • Related projects: Kadaster Data Platform (PDOK), Zazuko XRM, DotWebStack Framework

Tool 15:

  • Name of the tool: Helio
  • Description: Helio is a framework that allows publishing RDF data from different heterogeneous sources as Linked Data
  • Repository (link to the tool’s repository): https://github.com/oeg-upm/Helio
  • Website (if is different to the repository): https://oeg-upm.github.io/Helio/
  • Open source? (If not open sourced, ideally provide an option to test it): Yes
  • Year introduced: 2018
  • Contact person (who is the main contact person?): Andrea Cimmino ([email protected])
  • Purpose (what can one do with the tool?): Processor (executes rules to generate a knowledge graph), Publish Knowledge Graph.
  • Mapping language: RML, WoT-Mapping, and Helio mapping
  • Supported data (formats, sizes): CSV, XML, HTML, text, JSON, RDF
  • Programming language: Java
  • Special features: relies on a plugin sistem that does not require developers to download the core code, customizable html views, can integrate existing tools that generate RDF.
  • DOI:
  • License: APACHE 2.0
  • Test cases: (if any for the supported languages)
  • Related use cases: -
  • Related projects: VICINITY H2020, DELTA H2020, BIMER H2020

Tool 16:

  • Name of the tool: FunMap
  • Description: FunMap is an interpreter of RML+FnO that converts a data integration system defined using RML+FnO into an equivalent data integration system where RML mappings are function-free.
  • Repository (link to the tool’s repository): https://github.com/SDM-TIB/FunMap
  • Open source? (If not open sourced, ideally provide an option to test it): Yes
  • Year introduced: 2020
  • Contact person (who is the main contact person?): Samaneh Jozashoori ([email protected])
  • Purpose (what can one do with the tool?): It can be applied when pre-processing step is provided in the context of mapping rules as functions; i.e. the data pre-processing is supposed to be performed at the time of data model transformation (into RDF) and knowledge graph creation.
  • Mapping language: RML (current version)
  • Supported data (formats, sizes): CSV, RDB
  • Programming language: Python
  • Special features: FunMap empowers the knowledge graph creation process with optimization techniques to reduce execution time.
  • DOI: https://doi.org/10.5281/zenodo.3993657
  • License: Apache-2.0
  • Test cases: -
  • Related use cases: -
  • Related projects: CLARIFY, P4-LUCAT, Ciudades Abiertas

Tool 17:

  • Name of the tool: Squerall
  • Description: An implementation of the so-called Semantic Data Lake, a query engine uniformly accessing original large and heterogeneous data sources using Semantic Web principles and technologies
  • Repository (link to the tool’s repository): https://github.com/EIS-Bonn/Squerall
  • Website (if is different to the repository): https://eis-bonn.github.io/Squerall/
  • Open source? (If not open sourced, ideally provide an option to test it): Yes
  • Year introduced: 2017.
  • Contact person (who is the main contact person?): Mohamed Nadjib Mami ([email protected])
  • Purpose (what can one do with the tool?): Processor (executes rules to generate a knowledge graph). Squerall is a virtual OBDA (Ontology Based Data Access) engine, where a knowledge graph is only constirbuted on-the-fly at query-time. However, with a slight development effort, it would be possible to physically materialize the knowledge graph (in RDF) following a property table partitining-like scheme.
  • Mapping language: RML
  • Supported data (formats, sizes): CSV, Parquet, MongoDB, Cassandra, JDBC (MySQL, SQL Server, etc.), (beta) Elasticsearch. Squerall can be extended to support other sources
  • Programming language: Scala, Java
  • Special features: Use SPARQL to query popular distributed data sources, e.g. files in Hadoop, NoSQL stores on-the-fly i.e. without requiring pre-processing or ingestion. Disparate data may be joinable by declaratively altering some of its atrributes thanks to the use of the FnO ontology. State-of-the-art Big Data query engines are used for the querying, namely Apache Spark and Presto. Squerall can programmatically be extended to use another query engines e.g. Drill or Dremio)
  • DOI: https://zenodo.org/record/2636436#.X3tOY_kzZPY
  • License: Apache-2.0

Tool 18:

  • Name of the tool: Chimera
  • Description: Chimera is a tool to build conversion pipelines leveraging Semantic Web technologies. It is built on-top of Apache Camel to easily configure message-to-message mediators or batch converters using lifting/lowering procedures to/from a reference ontology. In principle the aim is to completely avoid coding by just configuring a pipeline using the various blocks provided.
  • Repository (link to the tool’s repository): https://github.com/cefriel/chimera
  • Open source? : YES
  • Year introduced: 2019
  • Contact person (who is the main contact person?): Mario Scrocca ([email protected])
  • Purpose (what can one do with the tool?): A basic Chimera pipeline involves a lifting Processor (fork of the RMLMapper) Tool and a lowering Processor (rdf-lowerer built on Apache Velocity). Additional blocks, e.g., for pre-processing/enrichment of the knowledge graph, can be integrated in the pipeline.
  • Mapping language: RML for lifting, extended VTL (Velocity Template Language) for lowering
  • Supported data (formats, sizes): CSV, JSON, XML
  • Programming language: Java
  • Special features: High configurability of pipelines to satisfy different data integration requirements using Semantic Web Technologies. Easy to integrate with existing data sources and sinks thanks to Apache Camel components.
  • License: Apache-2.0
  • Related use cases:https://github.com/kg-construct/use-cases/blob/master/oeg-publictransport.md
  • Related projects: http://sprint-transport.eu/

Tool 19:

  • Name of the tool: Ontario
  • Description: A federated query processing engine that is able to access heterogeneous data sources in a Semantic Data Lake. Ontario leverages the concept of RDF Molecule Templates to effectively and efficeintly decompose, plan and execute SPARQL queries over a federation of data sources. The given SPARQL queries are transformed to the query languages of data sources in a Semantic Data Lake using the mapping rules expressed uring RML language.
  • Repository (link to the tool’s repository): https://github.com/SDM-TIB/Ontario
  • Website (if is different to the repository): https://labs.tib.eu/info/projekt/ontario/
  • Open source? (If not open sourced, ideally provide an option to test it): Yes
  • Year introduced: 2017.
  • Contact person (who is the main contact person?): Kemele M. Endris ([email protected])
  • Purpose (what can one do with the tool?): Processor. Ontario is able to answer SPARQL SELECT queries over heterogeneous data sources; CSV, JSON, XML, RDBMS, Neo4j, MongoDB, RDF. Non-RDF data is transformed on-the-fly during query time. Ontario also support SPARQL CONSTRUCT queries to transform data from a Semantic Data Lake to RDF.
  • Mapping language: RML
  • Supported data (formats, sizes): CSV, Parquet, MongoDB, JDBC (MySQL, Postgres, Neo4j, RDF.
  • Programming language: Python
  • DOI: http://doi.org/10.1007/978-3-030-27615-7_29
  • License: GNU/GPL v2

Tool 20:

  • Name of the tool: Gra.fo
  • Description: a visual, collaborative and real-time knowledge graph schema and mapping tool
  • Repository (link to the tool’s repository): N/A
  • Website (if is different to the repository): https://gra.fo/
  • Open source? (If not open sourced, ideally provide an option to test it): No. https://gra.fo/
  • Year introduced: 2019
  • Contact person (who is the main contact person?): Juan Sequeda ([email protected])
  • Purpose (what can one do with the tool?): Editor. Gra.fo in conjunction with data.world provides virtualization
  • Mapping language: R2RML
  • Supported data (formats, sizes): Any relational databases and CSV/XLS connected in data.world
  • Programming language: Commercial tool
  • Special features: Visual (drag and drop), Collaborative (Share document with different permissions, history, comments), Real-Time (multiple users at the same time collaborating)
  • DOI: N/A
  • License: https://gra.fo/terms-and-conditions/