Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataset to store list of software that produced a file #578

Merged
merged 4 commits into from
Sep 19, 2024

Conversation

stephprince
Copy link
Contributor

@stephprince stephprince commented Jun 27, 2024

Summary of changes

Fix #319.

Here is a draft proposal for basic provenance information. Given the discussion in #319, I think the goal of the first draft was an easily shareable, approachable representation where the NWB file has an optional field with the names of the software packages and versions used to generate the data in the file.

After following up offline with @rly , we discussed that this information should likely:

  1. be a dataset instead of an attribute since attributes have size limits and these lists could be large depending on the amount of information the user wants to store.
  2. be a string dataset of shape (N, 2) instead of a compound data type to allow more flexible modification in the future if we want to provide the option to save additional information. The con of this approach is that the user will not know the labels without looking at the documentation.

Some further considerations might be:

  • It was mentioned that this optional field could be added to all objects (not just the nwb file object). If that is preferred, should it be an optional dataset for the Container data type in the hdmf-common-schema?
  • what should the name of this field be? I used was_generated_by based on the PROV naming, but something like software_versions might be more interpretable.

Related pynwb changes here: NeurodataWithoutBorders/pynwb#1924

Checklist

For all schema changes:

  • Add release notes for the PR to docs/format/source/format_release_notes.rst.
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.
  • Make sure that hdmf-common-schema points to the latest release and not the latest commit on the main branch.

If this is the first schema change after a schema release (i.e., the version string in core/nwb.namespace.yaml does not
end in "-alpha"), then:

  • Update the version string in core/nwb.namespace.yaml and core/nwb.file.yaml to the next major/minor/patch
    version with the suffix "-alpha". For example, if the current version is 2.4.0 and this is a minor change, then the
    new version string should be "2.5.0-alpha".
  • Update the value of the version variable in docs/format/source/conf.py to the next version without the
    suffix "-alpha", e.g., "2.5.0".
  • Update the value of the release variable in docs/format/source/conf.py to the next version with the suffix
    "-alpha", e.g., "2.5.0-alpha".
  • Add a new section in the release notes docs/format/source/format_release_notes.rst for the new version
    with the date "Upcoming" in parentheses.

@t-b
Copy link
Contributor

t-b commented Jun 27, 2024

Sounds like a good idea.

In MIES we have added our own version info since ages a la

image

but having a builtin definition for that is much preferred.

@stephprince
Copy link
Contributor Author

@t-b great, if I'm correctly interpreting the image you shared, I think all of the MIES version info could be mapped to the proposed (name, version) builtin definition?

e.g. something like:

[('Igor Pro 64bit', '9.0.6.1.56565'),
 ('MIES', 'Release_2.7_20230809-747-g005144'),
 ('Labnotebook', '23'),
 ('HDF5', '1.10.7'),
 ('Sweep Epoch', '9') 
]

@t-b
Copy link
Contributor

t-b commented Jun 27, 2024

I think all of the MIES version info could be mapped to the proposed (name, version) builtin definition?

@stephprince Yes exactly.

@stephprince stephprince marked this pull request as ready for review September 19, 2024 17:02
@stephprince stephprince requested a review from rly September 19, 2024 17:08
@stephprince stephprince merged commit 67bcded into dev Sep 19, 2024
5 checks passed
@stephprince stephprince deleted the provenance-tracking branch September 19, 2024 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

An attribute (or dataset?) list software/library which produced that file/dataset etc
3 participants