Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine SNAP relationship output in taxonomy TEI and its associated column names #35

Open
wlpotter opened this issue Nov 22, 2021 · 22 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@wlpotter
Copy link
Owner

@dlschwartz Could you let me know how you want the SNAP relationships to appear in the TEI along with their crosswalk to Syriaca URIs?

Here is what you said about these columns in #6, for reference:

Column K is a bit of a relic. There was a time when SPEAR was using namespaced relationship types but we've switched over to Syriaca URIs instead. Column K still serves a purpose, however. It offers a crosswalk between Syriaca URIs and equivalent snap relationships. Whatever column heading works for that purpose would be fine.

I think we also need an additional column here. We have relationships that are more precise than snap relationships. For example, we get fairly fine-grained regarding clerical relationships: bishop-over-clergy, fellow-monastic, etc.. In a LOD environment, we will want to serve these up our more specific relationship to snap as their rather generic "professional relationship." I haven't worked out how to render this in TEI but I should probably do that and create a column for that purpose.

@dlschwartz
Copy link
Collaborator

@wlpotter I think this issue is coming together with srophe issue #930. I think what we want for this is for column K to be a skos:closeMatch or skos:exactMatch on a <relation> element. See issue 930 for discussion of attributes. In short, however, this should become just another crosswalk column like the current columns H and I.

That said, let me know what you think of this. I think all we want to assert for the crosswalk to LOC, for example, is a skos:closeMatch. In the case of SNAP we mostly want to assert skos:closeMatch [or maybe skos:exactMatch, I probably need to do a bit more work on this]. In the cases discussed in srophe issue #930, we would want to assert skos:broadMatch. What is your preference on the transform side of things:

  1. a column for SNAP concepts for which there is a skos:closeMatch and another column for SNAP concepts for which there is a skos:broadMatch?
  2. one column for the SNAP crosswalk that would have # separated values (like in the persons spreadsheet for PRLE) that would look something like: "skos:broadMatch#snap:professionalRelationship"?

Let's discuss this. Thanks Will.

@wlpotter
Copy link
Owner Author

@dlschwartz We can discuss this, but my first instinct is to say we should have a separate column for each relation 'type'. I could also see an argument for keeping the columns as they are and just renaming. The main benefit is we wouldn't have to do any reorganizing.

So, starting in column H you'd have (for LOC, DNB, ISO Lang Code, SNAP)

relation1.skosCloseMatch | relation2.skosCloseMatch | relation3.skosExactMatch | relation4.skosBroadMatch

The types of relations might change depending on what we decide they should be. You could keep a second row or a comment on these columns to remind encoders which URIs to put in which columns.

The only other change here would be to start the enumeration of columns AF-AK at relation5. (These relations may also change type based on decisions in srophe issue #930)

@wlpotter
Copy link
Owner Author

Ah, sorry, I missed the point about needing both skos:closeMatch and skos:broadMatch for SNAP relations. I think we should have a column for each.

So instead of just relation4.skosBroadMatch we would have relation4.skosCloseMatch | relation5.skosBroadMatch. Both relations 4 and 5 could be used for SNAP relations as needed.

@dlschwartz
Copy link
Collaborator

@wlpotter I think this sounds good but let's chat about it. Thanks.

@dlschwartz
Copy link
Collaborator

@wlpotter I've had a chance to read up a bit more and now I'm following the W3C definitions and what you've written here a bit better. To summarize:

  • skos:closeMatch: we should use this when we have a concept in our taxonomy that is closely related to a concept in another taxonomy. [I don't think we want to claim exact matches, not at this point anyway.]
  • skos:broadMatch: we should use this when we have a concept in our taxonomy that is more precise than a concept in another taxonomy. An example of this is relationships we want to encode for that are subsets of the snap:professionalRelationship.
  • skos:broader: we should use this for describing the hierarchical structure in the taxonomy. This would go in each record and be used to record its "parent" or "parents."

I don't think we should use skos:narrower because I think it is easier to list the "parent/s" of the concepts in each record rather than to list all of the "children" concepts in the parent record. Moreover, these are transitive: https://www.w3.org/TR/skos-primer/#secrel according to the SKOS model. Encoding them in a tei:relation with @active and @passive attributes we will be able to query for either.

Unfortunately, the nesting into something resembling a tree is not automatic, see https://www.w3.org/TR/skos-primer/#sectransitivebroader. Notice there that a "grandparent/grandchild" relationship can be inferred as a skos:broaderTransitive. In an RDF environment I think this should mean that we can query for things like all the descendants of a concept or all the children of a parent concept.

As we work on developing the ontology, we might need to tweak this. At the moment though, I think this is where we should start. Any thoughts?

If we go with this approach, I believe that we would do the following in the spreadsheet:

  • Columns H-J would all be skos:closeMatch
  • Column K would get split into two columns, both for SNAP but one for skos:closeMatch and another for skos:broadMatch.
  • Columns AF-AK would all be skos:broader

Btw, a lot of this comes out of srophe/syriaca-data#930 but I think the discussion belongs here.

@dlschwartz
Copy link
Collaborator

@wlpotter I suppose this is the right place to deal with @ref vs. @name. My inclination is to do both but I'm not sure that's right. See: https://www.w3.org/TR/skos-reference/#broader.
<relation name="skos:broader" ref="http://www.w3.org/2004/02/skos/core#broader"

@wlpotter
Copy link
Owner Author

wlpotter commented Dec 7, 2021

@dlschwartz This all sounds good.

I think using skos:broader and sticking to it makes sense as it and skos:narrower are inverses.

The lack of transitivity does pose some problems. We could use skos:broaderTransitive, and I think that means we would double up relations:

A skos:broader B; skos:broaderTransitive B. 
B skos:broader C; skos:broaderTransitive C.

This would allow the broader link between A and C. For the spreadsheet, we could implement some way to flag if we want to include a skos:broaderTransitive relation -- maybe a relationN.isTransitive column with a boolean flag.

Maybe an alternative would be to explicitly declare skos:broader for each level of relationship, though depending on the depth of the tree this could be even more tedious.


I think the column changes sound good. I will make the adjustments to how the script outputs the relation elements.

For @ref vs @name I agree that we should do both (it's no trouble from a script perspective). The worst case is that one of the attributes is superfluous -- better than losing important data. I will make these script changes and try a few test outputs for you to review.

@wlpotter
Copy link
Owner Author

wlpotter commented Dec 8, 2021

@dlschwartz changing the encoding of SNAP from idno[@type="SPEAR] to skos:broadMatch or skos:closeMatch tei:relation elements raises a question for column G. Previously this column was included as an @ana attribute on the tei:idno. Should we include this as an @ana on the tei:relation element instead? We could also use a @type and/or @subtype for this?

Also, as we now have closeMatch and broadMatch, we may need two columns for this designation as "directed" or "mutual"

@dlschwartz
Copy link
Collaborator

@wlpotter actually, I'm not sure we need this at all. I think it's enough that we have a relationship between our concept and the SNAP concept.

But maybe we should discuss this further. From the perspective of a triple store and of an API sharing data with SNAP, maybe it's best to clearly mark when our concept relates to a SNAP concept. Let's discuss this when we meet this afternoon.

@wlpotter
Copy link
Owner Author

wlpotter commented Dec 8, 2021

@dlschwartz sounds good, let's discuss just this issue to make sure we're on the same page. I believe it may be related to #37 as you mentioned in this comment that

the only use of [column G] information is in SPEAR. I have an xslt that transforms the taxonomy into an index. I use the data here to validate that some relationships get a @mutual attribute while others get @active/@passive.

@dlschwartz
Copy link
Collaborator

@wlpotter alright, I'm seeing now that I've got myself in a bind between "browse by" categories and the structured hierarchy of an ontology. I need to re-think some things. It might be easiest just to discuss this afternoon in our meeting.

@dlschwartz
Copy link
Collaborator

It might be as simple as putting "browse by" categories as a @subtype on tei:entryFree and using tei:relation elements for the structured hierarchy of the ontology. But let's discuss.

@dlschwartz
Copy link
Collaborator

@wlpotter I've been working on the taxonomy relationships. I've grouped them in rows 1049-1132 in the spreadsheet.

Columns K and L should not contain an accurate crosswalk with SNAP. Column K is used only for skos:closeMatch and column L contains skos:broadMatch when there is no skos:closeMatch, i.e. it indicates the narrowest concept in SNAP under which our concept falls. This should allow us to share data with SNAP even when we have relationships they don't have.

Columns AG and AH contain one or more parent concepts for each relationship. They should accurately reflect this SNAP graph minus concepts for which we haven't created a keyword and with our concept keywords added in.

I have a question about the difference between "Link" and "Bond" which leaves me less than clear about where to put things like relationships between events or between persons and objects. I think these are a "Link" while relationships between persons are a "Bond" but I'm not sure about that. Let's not close this issue until I figure that out.

@dlschwartz
Copy link
Collaborator

Correction: Columns K and L should NOW contain an accurate crosswalk with SNAP.

@wlpotter
Copy link
Owner Author

wlpotter commented Dec 13, 2021

@dlschwartz these look great! I will have the script output them as follows:

<relation name="skos:closeMatch" ref="http://www.w3.org/2004/02/skos/core#closeMatch" active="http://syriaca.org/keyword/adopted-family-relationship" passive="snap:AdoptedFamilyRelationship"/>

or

<relation name="skos:broadMatch" ref="http://www.w3.org/2004/02/skos/core#broadMatch" active="http://syriaca.org/keyword/alleged-relationship" passive="snap:QualifierRelationship"/>

This raises one question: the @passive values contain the "snap" namespace prefix, but we don't have this prefix bound anywhere in our data. We could declare the SNAP namespace on the root TEI element, but I'm not sure if attribute values are within the scope of those declarations? Another option would be to find and replace "snap:" with the namespace URI (e.g., "http://data.snapdrgn.net/ontology/snap#AdoptedFamilyRelationship").

(Note that we run into a similar issue with //entryFree/@type which is currently "skos:concept" for most keywords -- is the skos prefix able to be dereferenced as an attribute value?

@dlschwartz
Copy link
Collaborator

@wlpotter, thanks for the question. I think there are two separate issues here.

  • First, is the namespace issue. It's actually not an issue in this case because an attribute value is a simple string. We would only need to declare the namespace if we had it in an element or attribute name. When we serialize the data, we'll need to declare the namespace in that context, but here we should be good.
  • Second, what's best for LOD, should we be using the full URL? I'm going to tentatively vote for just using the more human readable version here and not the full URL. If we need the full URL in the RDF we can generate that in the serialization.

Does this all make sense?

@wlpotter
Copy link
Owner Author

@dlschwartz yes, I think you're right that the two issues are separate, and I was mostly thinking about the second, LOD issue even though I was perhaps putting it in terms of namespaces.

My concern with only declaring the human-readable is that without some external reference table, these attribute values aren't really machine readable (or at the very least wouldn't be useful as machine-actionable data). Perhaps that's not probable enough to warrant concern though?

From a technical standpoint it would be simple to implement the conversion from snap:x to full URI at the transform level using a simple replace function.

@dlschwartz
Copy link
Collaborator

@wlpotter Let's talk through what makes most sense tomorrow. Thanks.

@wlpotter
Copy link
Owner Author

We will leave the "snap:" in the @passive attribute.

Change "skos:concept" to the full URI (maybe open separate issue?)

@wlpotter
Copy link
Owner Author

wlpotter commented Dec 15, 2021

For column G, add to TEI like this: <note type="relationshipType" subtype="mutual"/>

FYI, #42 is the issue for changing skos:Concept to the full URI

wlpotter added a commit that referenced this issue Dec 16, 2021
Adding a note for relationship type
@wlpotter
Copy link
Owner Author

I have updated the tei:relation generation to match the comment above. I have also added the note for relationship types to the transform. I will run a new test output to double check, but then I believe this issue can be closed.

@wlpotter
Copy link
Owner Author

wlpotter commented Jan 6, 2022

@dlschwartz when you get a chance, could you take a look at the files from this commit, especially the ones that are relationships with snap close/broad matches? They should be ready to go except for the schemas (on which see #44). The files are also here

@wlpotter wlpotter added documentation Improvements or additions to documentation and removed question Further information is requested priority labels Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants