-
Notifications
You must be signed in to change notification settings - Fork 8
Document design principles
This is a proposal for how to organize the content of the DMLex standard. The current working draft is organized according to this proposal.
Some bits of this text should probably be in the standard itself.
The standard defines the existence of:
-
Object types such as
Entry
,Sense
,Translation
.- An object can have another object as its parent and can have other objects as its children: the standard defines which object types are allowed to be parents/children of which other object types and with what arities.
- An object can have properties. The standard defines, for each object type, which properties it can have. More about properties below.
-
Relation types such as
EntrySet
,SensePair
,SenseTuple
.- A relation is something which connects two or more objects in a way other than parent-child. The objects involved in a relation are called its participants: the standard defines, for each relation type, objects of which type are allowed to be participants, with what arities, and what role they play in the relation. For example, a
SubsenseRelation
is allowed to have exactly oneSense
as a participant with the role “superordinate sense” and exactly oneSense
as a participant with the role “subordinate sense”. - A relation can have properties. The standard defines, for each relation type, which properties it can have. More about properties below.
- A relation is something which connects two or more objects in a way other than parent-child. The objects involved in a relation are called its participants: the standard defines, for each relation type, objects of which type are allowed to be participants, with what arities, and what role they play in the relation. For example, a
-
Marker types such as
HeadwordMarker
,PlaceholderMarker
.- A marker is an entity which adds inline markup to a property of an object. The standard defines which marker types are allowed to add inline markup to which properties of which object types.
So, “objects”, “relations” and “markers” is the vocabulary through which we are expressing our standard. This vocabulary is implementation-independent. In XML they would probably be implemented as XML elements, while in a relational database they would probably be implemented as tables.
Each object can have up to one parent. The top-level type in DMLex is LexicographicResource
and all other objects are children or descendants of an instance of that. For each object type, DMLex prescribes the types of objects that ca be its parent and its children.
Relations are allowed to have participants. The participant in a relation is a reference to an object somwehere in the same LexicographicResource
(internal links) or in another LexicographicResource
(external links). For each participant of each relation, DMLex prescribes the object type and and the arity.
Objects, relations and markers are allowed to have properties. A property are always something atomic and literal, typically a string of text (but certainly not a reference to another object). For example, many objects have a property named text. An entity can never have more than one property of a given name.
The term “property” is another part of our abstract, implementation-independent vocabulary. In XML, properties would typically be implemented as XML attributes, while in a relational database they would typically be implemented as columns in tables.
How do we decide that something is a property and not an object or a relation? There are two criteria.
-
The criterion of atomicity. It is a property if its value is something atomic and literal, like a string or a number (including items from controlled vocabularies), but not a reference to an object.
-
The criterion of arity. It is a property if the object can always only have a maximum one of it, never more.
Both criteria need to be met in order for something to be treated in our standard as a property of an object (and not as an object). And, conversely, if something meets both criteria, then our standard must treat it as a property of an object (and not as an object).
In most cases, the fact that we are treating something as a property (and not as an object) will be uncontroversial. The only ocassion when that might be surprising is the fact that we are treating headwords as properties of Entry
objects (and not as objects).
This is because a headword meets for both our criteria for “propertyhood”: its value is a literal string, and an entry can never have more than one.
I think that we agreed at some point that we want to prohibit entries from having more than one headword. If I am not mistaken and we do indeed want to prohibit that, then treating headwords as properties of
Entry
is the way to do it. If, on the other hand, we want to make it possible for an entry to have more than one headword – either in the core of our standard or in a possible future extension – then we must create aHeadword
object type as a child ofEntry
. — @michmech
The standard is broken down into a core plus several modules. If somebody wants to claim that they have implemented DMLex, then they have to implement at least the core. The modules are optional. Implementers can say eg. “we are implementing DMLex core plus this, this and this module”.
The core defines several entity types (“entities” is a general term for objects, relations, and markers). Each module defines additional entity types or extends entity types defifned the core.
It may sometimes happen that a module (let’s call it Module A) extends an entity type defined in another module (let’s call it Module B). If an implementer has decided to implement Module A but not Module B, then obviously they can’t implement the extensions to Module B. In that case the implementor’s implementation of Module A is valid nonetheless.
The names of entities (ie. objects, relations and markers) are unique across the entire DMLex standard. There is no need to qualify them with the name of the module they come from.
The names of properties are unique only inside the scope of the entity type they belong to. So,m when talking about properties, it is always necessary to mention the entity it belongs to, eg. “the text
property of the Example
object”.
We use PascalCase
for the names of entity types and camelCase
for the names of properties.
Authors of implementations and/or serializations of DMLex do not have to use the same names as the names we are using in the standard, and they do not have to follow the same naming conventions. They just have to say “our x is an implementation of y from DMLex”.