-
Notifications
You must be signed in to change notification settings - Fork 356
URI hierarchy
Every object in ConceptNet has a URI that is structured like a path, giving you a standard place to look it up. For example, the concept "common sense" in English has the URI /c/en/common_sense
.
Technicality: The identifiers in ConceptNet are actually "IRIs", not "URIs", because they may contain Unicode characters. However, almost nobody knows what an IRI is, and there's a growing consensus that the acronym should not have been changed.
Most URIs in ConceptNet are intended to be meaningful: if you look at a URI, you can tell what object it is, and if you look at an object you can tell what its URI is.
The different kinds of objects are distinguished by the first element of the path.
- /a/: assertions, also known as edges (as of 5.5, these are the same thing)
- /c/: concepts, also known as terms (words and phrases in a particular language)
- /d/: datasets (broad sources of knowledge)
- /r/: language-independent relations, such as /r/IsA
- /s/: knowledge sources, which can be human contributors, Web sites, or automated processes
- /and/: conjunctions of sources that were used together to create an assertion
Concept URIs contain the text of the concept, with spaces replaced by underscores. All non-ASCII text is in UTF-8, in Unicode's NFC normal form.
Each concept has at least three components: the initial /c to make it a concept, a part that indicates its language (using the BCP 47 language code for that language), and a part with the concept text. An optional fourth component gives the part of speech (as a single letter, following the convention of WordNet).
- /c/en/play_game is the English concept "play a game".
- /c/en/read/v is the English word "read", in all its senses that are verbs.
- /c/ja/紙 is the Japanese concept meaning "paper".
Wordnet one character code indicating the synset type for example /c/en/accountably/r
:
-
/n
: NOUN -
/v
: VERB -
/a
: ADJECTIVE -
/s
: ADJECTIVE SATELLITE -
/r
: ADVERB
Assertion URIs indicate the relation, start, and end of an edge. ("Assertions" and "edges" refer to the same thing, as of ConceptNet 5.5.)
The relation, start, and end are all represented in a bracketed list in the URI. The brackets allow assertion URIs to be nested within each other, in the case where you have assertions about assertions. These lists are surrounded by the components /[/ and /]/ and delimited by /,/. For example, the assertion "A dog is an animal" has the URI /a/[/r/IsA/,/c/en/dog/,/c/en/animal/].
The relation is a language-independent URI starting with /r/, and the start and end are language-specific terms (concepts) starting with /c/.
A single source of knowledge has a URI that begins with /s
. Sources are broken down into more types:
-
/s/contributor
: a human contributor to a crowd-sourced knowledge base. -
/s/activity
: a knowledge-collection task that was being presented by a computer to collect crowd-sourced knowledge. -
/s/process
: an automatic rule for extracting knowledge from a different form.
The sources for an assertion are often conjunctions (/and
) or disjunctions (/or
) of these individual sources. For example, any edge with a contributor probably has an activity as well, and those would be combined into an /and source, such as:
/and/[/s/contributor/omcs/havasi/,/s/activity/omcs1/]
- /e/: In 5.4 and earlier, these represented 'edges' that were not yet combined into assertions.
-
/l/: In 5.4 and earlier, these represented Creative Commons license terms. Now, we use the Creative Commons RDF namespace, so license terms look like
cc:by-sa/4.0
. - /or/: Each assertion can come from multiple conjunctions of sources. In 5.3 and earlier, these conjunctions were combined into one big disjunction, with a URI in the /or/ namespace. Now they're just in a list called "sources".
Code for manipulating these URI representations can be found in: https://github.com/commonsense/conceptnet5/blob/master/conceptnet5/uri.py
Starting points
Reproducibility
Details