-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
taxonomy transform should build the taxonomy index #37
Comments
@dlschwartz as we discussed I will implement the XSLT as an XQuery module for the transform and have the script take over generating the index. I do wonder, though, if we still want to have a separate index of subjects that is like those for persons, places, etc. (a flat index just containing all the URIs that exist on the server) Does the index which the XSLT makes have all of the keywords in it? In other words, could we use it to check if a subject already exists or is it just to have a quick reference of hierarchical relations? |
@wlpotter the current index includes a number of different ways of grouping the keywords followed, starting here: https://github.com/srophe/srophe-app-data/blob/master/data/subjects/taxonomyIndex.xml#L358, by a list of all keyword whether or not they appear above. I don't have a strong opinion about whether we maintain that practice or whether we have two files, one containing groupings and another containing a simple list. |
Ah, I suppose I should have looked closer at the data 😆 From the perspective of consistency and simplicity, I would prefer having the index of all URIs built by the srophe app in the same way that the persons, et al. are built. (This also allows all entity-types to be treated the same way in the transform for #3 ) But, if you use the |
@wlpotter that's a good point. I trust it's effortless to make two files? The grouped index (including the |
@dlschwartz can we revisit this as I'm getting hung up on the XQuery given the changes we've made to the data model. For instance, https://github.com/srophe/srophe-app-data/blob/master/data/subjects/taxonomyIndex.xml#L335-L342 has the list of http://syriaca.org/keyword/personal-relationships URIs with the Do we want to keep these formatted this way rather than, e.g.,
We have the SNAP relationships now encoded as skos:closeMatch or skos:broadMatch within the records (rather than previously as tei:idno elements). I could have the script replicate the current index, but wanted to check in first to make sure there isn't a preferred format. |
@wlpotter I'm so sorry! I messed this up. Somehow I sent you to the index in the master branch when I'm currently validating against the index in the dev branch. The dev branch has full URIs for everything: https://github.com/srophe/srophe-app-data/blob/dev/data/subjects/taxonomyIndex.xml#L345. Also, in the example in your post above, it has the A better/current example is "religious-relationship," (formerly "religious-relationships"). This is listed in column AG:AL as a skosBroader for the following: clerical-relationship, monastic-relationship, commune-together, confessor-for, and commemorates. This should output the following: I'm guessing the script is correctly outputting this but perhaps the last time you ran the script was before I had made some fairly recent changes. The most up to date version is here: https://docs.google.com/spreadsheets/d/14jU8K-hjFH193zsqXzrdYPYfx2HFqttX0TPScdsbukA/edit#gid=959652535. Can you run that again and we can test the output? Thanks Will! |
Sorry, this is easier to look at
|
@wlpotter it looks like the latest commit has the correct data and I was just looking at an earlier commit. This looks correct: https://github.com/wlpotter/csv-to-srophe/tree/main/out/subjects/2022-01-06 Just a minute ago I was looking at the link from your email: https://github.com/wlpotter/csv-to-srophe/tree/main/test/out/csv-tests/subjects. This has the old relationships that we've dropped. |
Generating the index from this correct commit should produce the properly formatted index. But let's confirm. Thanks. |
@dlschwartz Ah, thank you for this clarification! This should really help streamline the way the index is generated as the version on dev shows more what I was expecting (a set of listUri elements that list the uris of keywords that have a skos:broader relation to the I can circle back to the index output and make sure it is producing what's on dev. Also, apologies for sharing the wrong link in the email...I forgot I had treated that as 'real' output rather than test output. So, yes, the data is at https://github.com/wlpotter/csv-to-srophe/tree/main/out/subjects/2022-01-06 |
@wlpotter thanks! I'm pretty optimistic that everything (except schema) should be in good shape with little or no changes. Thanks. |
Currently only outputting to console; will implement saving to file once it's working correctly #37
@dlschwartz I think this is working, but I'm still not getting the relationships. I think I need to make them singular and not plural? I've moved the taxonomy outline to a 'config' xml file, here. This file essentially recreates the outline of the index without filling in the matching data (that's what the script adds). If I understand correctly, the URIs like line 66, "religious-relationships" should be made singular? If you'd like, you can copy that file and send back how the taxonomy outline should look (or clone/fork the repo and create a pull request). Otherwise the taxonomy is outputting like we want, I just need to have it save as a file rather than to console |
Hmm, so it doesn't look like it's just singular and plural issues. For instance, event-relationships appear to have become "related-event", which is nested now in "link" rather than "relationships". I think I'm still missing the most recent changes to the data? |
@wlpotter so I've gone through the old index and figured out the categories I need in this index and how to generate this index from the new spreadsheet. Unfortunately, this is kind of clumsy. I'm very open to thinking about other ways to do this, including creating new columns. Hopefully this gets us started though. What I've got below relies on existing spreadsheet columns:
In the end, the index would look like this:
|
@dlschwartz I think I've got the script generating the index the way you specify in your most recent comment Here is a sample output: https://raw.githubusercontent.com/wlpotter/csv-to-srophe/main/test/2022-03-03_test-taxonomy-index-output.xml Let me know if that looks like it's working. If so, I can walk you through how the taxonomy config file works in case you need to update the selected categories, etc. Note to self: I still need to implement saving the index to a file, currently just outputting to console for debugging purposes. (should be as simple as adding an 'output path' variable to the config or config-taxonomy) |
Also, I know you mentioned that we need to re-run the transform to catch some new data changes. Once we get the index working satisfactorily I'll pull down the most recent spreadsheet data and generate a new batch of XML files and an updated index. |
@wlpotter Fantastic! This looks perfect. Thanks so much. Before writing back I though briefly of trying to create keywords for place types and for religious confessions, but that's not going to happen. I think we can run this now. Will this be set up in a way that I can run this transform? Should I learn how to do that? Thanks Will! |
Yes, it should be set up to where you can run it after adjusting the configuration settings (which are in an xml document). I will go back through the documentation to make sure it's up to date, then we can walk through how to run the transform. |
The files here should be up to date. The index is there as well under https://github.com/wlpotter/csv-to-srophe/blob/main/out/subjects/2022-03-10/index/taxonomyIndex.xml. I am opening an issue on the srophe app repository for Syriaca (srophe/syriaca#20) so we can test the new data there once we've moved it over. I believe, unless you notice any glaring issues with the most recent output, this issue can be closed? |
@wlpotter I'm getting around to testing this and I've found some problems. I didn't include http://syriaca.org/keyword/bond as one of the skos:Broader to grab for |
@dlschwartz I can re-run the index to include This made me realize that it would be worth having an additional, stand-alone script that just re-generates the taxonomy index as needed. The main transform script will keep the functionality as well, so you won't generally need to run two scripts. But you can have the option if you are only interested in the taxonomy index. I will write this script and add instructions to the documentation. |
And fixed typo in "enmity-for". See #37
@dlschwartz I updated the index and moved it to the main srophe-app-data repository (see this commit). Let me know if you notice anything else. |
@wlpotter That all sounds great. I think we might want to keep it in the app documentation. I'll move it over. Thank you! |
To fix the issue with qualifier-relationships showing up under
|
@dlschwartz I have added the 'include self' functionality to the index generation. Would you like me to upload the new version to the server or do you want to do one last round of spot-checking? |
@wlpotter I'll go ahead and move it over. I'd like to open both files in oxygen and compare them first just to be certain. Thanks! |
@dlschwartz just saw some errors (it's accidentally grabbing all the tei:idnos not just the keyword URI...) I will fix that, update the taxonomy index file, and send you a link. |
It may have been a false alarm (I've been having to switch back and forth between app-data master and dev for manuscript work, and I think I was running the index generation on old data...) In any case, the file here should now be updated with the newest version (removed "bond" from the "relationships" listURI, and the "qualifier-relationships" shouldn't be there either) |
@wlpotter you've got it regarding "bond" and "qualifier-relationships"! The only odd thing I find is under the "relationships" listURI where there is a "descendent-of" [sic] with no |
@dlschwartz Ha! That was a typo in the taxonomy config file...I've updated it and re-run. The same link above should work still. The correct "descendant-of" is there with |
@wlpotter Looks great! Thanks. I'll move this over. |
Cf. https://github.com/srophe/srophe-app-data/blob/master/data/subjects/taxonomyIndex.xml for the desired output
Cf. https://github.com/srophe/srophe-app-data/blob/master/data/subjects/taxonomyIndex.xsl for the xslt that currently builds this from the records
Priority as part of #6 . Related to #3
The text was updated successfully, but these errors were encountered: