Open linguistic data is a good recently established trend allowing both researchers and developers in the field of natural language processing to create their own applications using high-quality dictionaries, thesauri, corpora, etc. At the same time, the published open data are stored in different formats making them difficult to be used in an efficient way without falling within vendor lock-in.
This work is devoted to the problem of representing popular lexical resources of the Russian language in the form of Linked Open Data. It proposes an approach to converting popular Russian thesauri to the vocabularies that are the essential parts of the Linguistic Linked Open Data Cloud. The proposed approach has been implemented in open source software and the resulted dataset has been made publicly available on NLPub in the Turtle format under the terms of a Creative Commons license.
The three following Russian thesauri have been converted to Linked Data and made available through this website:
- RuThes-lite (circa 1994, under CC BY-NC-SA 3.0),
- Universal Dictionary of Concepts (circa 1999, under CC BY-SA),
- Yet Another RussNet (circa 2013, under CC BY-SA 4.0).
The resulted ontologies are making use of the following vocabularies: RDFS, OWL, Dublin Core, SKOS, lemon, LexInfo. The results are available on NLPub:
Please respect the above-mentioned licenses.
Please cite our resource as follows:
- Ustalov, D.: Russian Thesauri as Linked Open Data. In: Computational Linguistics and Intellectual Technologies: papers from the Annual conference “Dialogue”. Volume 1. RGGU, Moscow (2015) 616–625.
Copyright (c) 2015 Dmitry Ustalov. See LICENSE for details.