-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create super-types for relationships #4724
Comments
To illustrate, this query on the {
"class": "PropertyMatchFindRequest",
"matchProperties": {
"class": "InstanceProperties",
"instanceProperties": {
"steward": {
"class": "PrimitivePropertyValue",
"instancePropertyCategory": "PRIMITIVE",
"primitiveDefCategory": "OM_PRIMITIVE_TYPE_STRING",
"primitiveValue": "someone"
}
}
},
"pageSize": 2,
"matchCriteria": "ALL"
} currently must turns into something like this query on the back-end: {
:find [e],
:where [(or [(text-search :relationshipProperties/Antonym.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/DataClassAssignment.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/ForeignKey.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/ISARelationship.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/KnownDuplicateLink.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/LibraryCategoryReference.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/LibraryTermReference.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/PreferredTerm.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/ReferenceValueAssignment.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/RelatedTerm.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/ReplacementTerm.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/SemanticAssignment.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/Synonym.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/TermHASARelationship.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/TermISATypeOFRelationship.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/TermTYPEDBYRelationship.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/Translation.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/UsedInContext.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/ValidValue.steward.value "/someone/") [[e _]]]
[(text-search :relationshipProperties/ValidValuesMapping.steward.value "/someone/") [[e _]]])
[e :type.category 8]],
:order-by [[e :asc]]
} ... when it could be as simple as this (if there were a supertype with these common properties): {
:find [e],
:where [(text-search :relationshipProperties/GlossaryTermRelationship.steward.value "/someone/") [[e _]]]
[e :type.category 8],
:order-by [[e :asc]]
}``` |
Actually I have similar questions / concerns, though maybe somewhat different, with regards to the That way we'd have greater consistency across the types, and a vastly reduced search space for the underlying repositories when someone is searching for something based just on its name. |
@cmgrote some thoughts on this. I can see that finds for relationship properties would be simpler if there were super types. Early on we decided to not have super types for relationships (back in the Altas days). Many repositories do not have support for top level relationships, requiring them to support inheritance as well would seem like a very high barrier to overcome. Another consideration is that an OMAS could expose inherited relationships. In fact subject area does this with the relationships extending Line. I don't think it exposes a find on a Line - you can raise this as an enhancement if you like. |
Discussed on 25/02 and we all seem to agree this would be a good idea, but we all expect that it will cause various unforeseen breakages -- so needs to be introduced early in a release cycle for thorough testing. Ideal approach to attempt initially:
Note: this is different from an OO-interpretation of "supertype" and "subtype" -- where a subtype would need to be capable of doing everything (and more) than its supertype, these constructions would make it such that the subtype will do less (be more restrictive than) the supertype. |
In the call we discussed setting up a feature branch to ease integration testing, to avoid breakage in master whilst we I propose to create a new feature branch 'egeria-feature-4724' but open to other name suggestions.. (4724 being linked to this id). Or we could do 'egeria-feature-superrel' or similar. Best not to be too long. cc: @mandy-chessell @davidradl @grahamwallis . I'll check the build pipelines are ok, and propose to do this tomorrow after the status call unless there is followup discussion here/keenness to get started sooner |
I think that should be fine -- we need to do the usual cut-over of the OpenMetadataTypesArchive class post-release for master first, anyway (see #4849). |
Ok - add a comment here when ready and I'll setup & check |
@planetf1 ok, I think we're ready for the new branch now -- I'd be fine with |
@cmgrote just convention really -- the original release started as egeria-release-xx and we took it from there. I started with egeria-feature-xx when we looked at OLS and we discussed/agreed about the feature proposal there as per . It's good to have a release or feature to help with wildcards in the github actions, but agreed egeria is redundant and is only there historically. A few formatting errors - but this was the documented process: https://github.com/odpi/egeria/blob/master/developer-resources/Feature-Branch.md I do think egeria-feature-relationship-supertypes is quite cumbersome. Technically I have little concerns about changing it - beyond ALSO updating the other cicd files (before the branch), which means a little more testing & also should gather more support for a proposal to change the policy & update the guidelines accordingly? |
Fine for me to gather more support and update the process accordingly... If we want to limit the length of the names, it's a bit odd to me to waste 7-15 characters in that limited length every time by forcing every name to begin with the same 7-15 character string 🤷 In fact, taking this a step further, if we use the first So rather than something like |
I suggest we discuss and agree with the team on the call tomorrow. |
To discuss and agree on 09/03/2021 call:
|
On the question of name vs displayName - in most cases I can think of, the distinction between name and display name is important and we make use of it. I would be cautious about loosing the distinction. Typically "name" is used for technical assets - you can see it occurring frequently in Area 0. The name is typically the technical name used in the digital landscape. It is extracted automatically and is not what someone has chosen to be a meaningful display name. In these entities the qualified name is manufactured by the metadata extraction process often based on the name. The displayable (human authored) names for these entities are added later using a GlossaryTerm linked with the SupplementaryProperties relationship. So for example, in the Asset Manager OMAS interface, Assets have both technical names and business names. The technical names go into the Asset entity and the business names go in the attached GlossaryTerm entity. Using two entities for this is important since the Asset entity is often read-only to the open metadata ecosystem. The displayName property typically appears in metadata elements that are human authored - such as glossary terms, governance zone definitions, ... The author may or may not control the qualified name, but they can choose the display name. It would be interesting to understand if we see inconsistencies in the use of these two property names in the types. |
I did a quick review of the types and while most use of name vs displayName is appropriate there are some notable exceptions:
|
I also just did some quick counts (rough numbers that may be off by a few each):
|
I like the distinction between technical and non-technical, but could this not simply be a characteristic of the instance type in which the property has a value? An area 0 type with a Taking the example where we have both (Asset Manager OMAS): the (Then we don't even have the risk of things becoming inconsistent per the examples above with technical things using |
On the branch naming point, agreed on 09/03 call to use |
Signed-off-by: Christopher Grote <[email protected]>
I think it is more helpful to maintain the different property names since the values have different expectations on their content. I like the idea of updating SchemaElement and Port to use name and deprecate displayName - and add name to ProcessVariables. |
What about introducing 2 new entity typedefs in-between that distinguish between these "technical" and "human-authored" types? For example (purely theoretical names for the types):
This would:
|
I think it works for the Technical Entity - but not for the human authored because we have human authored entities that do not have display name |
#4724 Changes feature branch naming guidance as agreed
The way the model is today is that there are these sorts of supertypes for each major grouping of entity types. Also there are entities that do not inherit from referenceable that have name or display name. |
Have added PR #4887 to update the cicd actions for egeria - sometimes the scripts have assumptions which can lead to clashes of docker images, snapshots etc since we have multiple branches with same pom version etc. fortunately we were ok with the current definitions. But they are needed to allow the branch checks to be done. (if we want those tests) |
I've also updated the release patterns (annoyingly we have a revert branch - so it's an addition) and I think the pipelines as above are working. Should be ok to use now as long as #4890 builds ok. The checks did result in master versions of docker and website being overwritten but the next master merge overwrote so should be good now. (those actions may benefit from additional verification, they currently have assumptions) |
Signed-off-by: Christopher Grote <[email protected]>
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions. |
At the moment none of the
RelationshipDef
s have a super type defined, and I was wondering if there was some rationale behind this: for example, would it mean that the ends of the relationship are fixed by the supertype (?)I raise it because there is significant duplication in the properties of certain relationships, particular in the area of
GlossaryTerm
s, where all of these relationships have almost exactly the same properties defined (description
,expression
,status
,steward
, andsource
):confidence
beyond those above)expression
compared to the others)expression
compared to the others)expression
)(There are ~10 more that have
steward
defined, and a few other overlaps.)Aside from the question of consistency (see minor inconsistencies above), there is also the question of the performance impact of this. Per #4647 we agreed that we should not have a single "namespace" for all properties; however, in most (if not all?) of the examples above, I would expect that these 5 properties all have the same meaning and representation.
In other words, that they'd be relationship subtypes of a more abstract relationship type like
GlossaryTermRelationship
.The impact of having so many identically-named properties across types is that when a consumer runs a search through our
find...
methods there is no mandatory qualification of the type against which they want results. For an underlying repository (both the graph repository and the Crux repository), this means that we need to do an OR-based query across ~20 different underlying properties (treating each type as a separate property) to find all that match -- combined with the fact that since many of these properties are string values, a consumer could be sending their request with an arbitrary regular expression -- and this therefore ends up being very expensive (slow).If we could use a supertype, at least the property would only be registered against a single type in the underlying repositories, and then we could do a single query for the property rather than what ends up being the equivalent of ~20 distinct queries whose results are all combined.
The text was updated successfully, but these errors were encountered: