-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inject process metadata (provenance tracking) into produced turtle files #28
Comments
gave this some (a little) thought... call optionallyIt should be optional, and kept separate from the real data-flow, so a command line switch should be added to point to the provenance report to be generated.
under control of template writerIt should only add provenance statements concerning selected nodes controlled by the template-designer. So the template-designer should have a mechanism to "add" certain selected URI to the provenance set. Maybe be wrapping that uri in a pass-through function like this in the template:
calling towards a new function that follows this general structure:
follow the templateIt should eat our own dogfood , so the prov.ttl should be produced by some pysubyt template itself - we should have a built-in prov-template.ttl file inside the py lib package that actually holds the template producing the output based on an internal python-dict holding the assembled prov info during the run. @laurianvm - if you agree with this approach, you might want to use this issue to draft / suggest the outlines of such python-dict and an appropriate template (and thus useful vocabs) :) first ideas: prov = {
'about': { 'code': '[email protected]', 'exects': '2021-11-23T21:15:52', ...} ,
'context': { ... stuff from the context , like flags ... } ,
'inputs': { ... describing the files making up the sources of sets and _ ...} ,
'events': [
{ 'source': ref to input-source,
'location': some ref to line and or item-number in the set,
'produced': [ ... list of uri's that were registered through provit into this "event" ...]
]
} direction of linkI don't like the idea that we would add this kind of provenance info as properties to nodes we add, i.e. let us not reuse those as Instead, I would prefer the prov-context to stand on its own feet, but rather link up to these registered nodes as
implementation thoughts
|
(draft) we should be able to track back to the origin of the record, track versions
e.g. a data point that is altered after QC
--> in order to do so a set of metadata triples should be produced (e.g. date, time, version of pysubyt, arguments, ...)
The text was updated successfully, but these errors were encountered: