Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use DVC for tracking data and pipelines #126

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Use DVC for tracking data and pipelines #126

wants to merge 1 commit into from

Conversation

evetion
Copy link
Member

@evetion evetion commented Aug 13, 2024

See https://dvc.org/doc, and https://www.gitops.tech/

Shortly, this is like git-lfs, but on steroids. DVC can specify data pipelines, registering the required input/output for each stage in a storage platform (in our case dav). This should enable us to reproduce every build we publish, by storing the hash of the required data next to the code in git.

flowchart TD
   id["Input datasets"] -- generate --> id2[Ribasim models] -- test -->  id3[Test outputs]
	id2 -- stitch --> id4[Combined Model] -- test --> id5[Test output]
	id5 -- on git tag --> id6[Release model]
Loading

In the long term, DVC would replace the cloud.py codebase. This requires full automatisation of our code, so we're probably not there yet.

@Huite
Copy link

Huite commented Aug 14, 2024

We've mostly stuck to snakemake for groundwater modeling: https://snakemake.readthedocs.io/en/stable/
We use snakemake to manage workflows, and use dvc for the data versioning. We're reasonably happy with snakemake as a workflow manager.

I remember reading about the dvc pipelines many years ago when I was looking at the data versioning and workflows. I think my conclusion then was that snakemake provides a lot of features that we care about. But I'm not sure dvc actually had a DAG backing the pipeline then, enabling "incremental computation". It would be worthwhile to do a bit of comparing.

@evetion
Copy link
Member Author

evetion commented Aug 22, 2024

We'll try to add the finished hoofdwaternetwerk to the DVC pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants