Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripts should output metadata file about what they did #676

Open
mr-c opened this issue Dec 4, 2014 · 11 comments
Open

Scripts should output metadata file about what they did #676

mr-c opened this issue Dec 4, 2014 · 11 comments

Comments

@mr-c
Copy link
Contributor

mr-c commented Dec 4, 2014

While most of the file formats we work with don't have support for metadata we aren't off the hook for recording such information and preserving it.

Based upon the discussion captured here: https://groups.google.com/d/msg/common-workflow-language/wx8G2zvDUV4/lzZPUPtQEHwJ we should store such information in a JSON file (like @kdmurray91's addition to load-into-counting.py)

@ctb
Copy link
Member

ctb commented Apr 3, 2017

A few more thoughts --

salmon does this now!

also see #189, generating PROV-compliant output.

I always get hung up on what to do when we are running many things in a row. Do we just have a file that we append to? or perhaps a directory that contains multiple little files with UUIDs containing the provenance output?

is there any workflow-supported standard emerging? is PROV it?

@standage
Copy link
Member

standage commented Apr 3, 2017

Brainstorming some considerations.

  • probably want to use a filename format similar to 2017-04-03T06:55:21-khmer.json (maybe s/khmer/oxli/)
    • fixed length
    • sorts correctly every time without any tricks
  • store a distinct file for each script invocation in some directory?
    • in situ with something like ./.oxli/ or ./oxli-meta/?
    • global with something like ~/.oxli?
  • ...or use a user-specified file to keep a running log of JSON entries?
    • if nothing is specified, default to something like ./oxli-meta.json
  • wait until the end of the script (to confirm successful exit) to print?
    • otherwise could end up with a lot of unhelpful/misleading files/entries
  • report relative paths (for portability) or absolute paths (for clearest provenance)?

@standage
Copy link
Member

standage commented Apr 3, 2017

salmon does this now!

The most recent stable release, or current master? I just compiled the latest release from source and ran a few commands but didn't see any JSON metadata or any way to invoke it.

@ctb
Copy link
Member

ctb commented Apr 3, 2017 via email

@standage
Copy link
Member

standage commented Apr 3, 2017

👍 Got it. Also several .json files in the index directory.

@ctb
Copy link
Member

ctb commented Apr 3, 2017 via email

@standage
Copy link
Member

standage commented Apr 3, 2017

I like salmon's approach of attaching metadata to each artifact it creates (sequence index or quantification table). That could be an alternative to the single log file or directory approaches I discussed above.

That said, salmon's approach works really well for a decidedly NOT streaming approach. If we start stitching together 3 or 4 khmer/oxli commands via UNIX pipes, all of a sudden attaching metadata to output files doesn't make as much sense.

@mr-c
Copy link
Contributor Author

mr-c commented Apr 3, 2017

The closest thing to a non-domain specific standard is what we did in CWL: programs can hand off JSON files with key-value pairs: http://www.commonwl.org/v1.0/CommandLineTool.html#Output_binding (this is under documented and I am happy to explain more)

@ctb
Copy link
Member

ctb commented Apr 4, 2017 via email

@betatim
Copy link
Member

betatim commented Apr 6, 2017

Purely technical consideration: appending to a file when there are multiple concurrent writers is #hard. Especially when you need to make it work across operating systems and weird file systems like NFS. It is worth letting someone else provide the file locking (eg sqlite). Drawback is that you need a tool to look at your data, which is tedious compared to opening it in vim.

@ctb
Copy link
Member

ctb commented Apr 6, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants