duplicate data in directory structure #16

rgreminger · 2020-03-03T22:09:32Z

The proposed directory structure is such that the same data will be stored in multiple locations, as it will be duplicated into (possibly multiple) input folders. Though the structure with the input folders adds clarity for the workflow, an approach like this could use up a lot of disk space very quickly (except if symbolic links are used, though I doubt that those would work across different platforms).

hannesdatta · 2020-03-05T05:59:01Z

the key is to keep different pipeline stages portable. i.e., you can work on analysis and I have prepped the dataset. I know that for the main guy on this project, you do have a lot of duplicate files. I kind of am fine with this because disk space is cheap. if you can find a solution, let me know.

another issue is that for this minimal example, we could host a zip with the raw data on TilburgScienceHub, as we strictly need to avoid teaching you can store your data on GitHub. makes sense? download data via R script is platform-independent...

rgreminger · 2020-03-05T09:26:19Z

Portability definitely is a good point. I'll try to implement this in the example some time soon, but one thing is a bit unclear to me from the site (though I might just have missed this part). What is the best approach to keep the input folders up-to-date with upstream changes? Should this be done by upstream, or downstream?

Good idea regarding the raw data. I'll try adding the zip to the page through a PR, and will update the example.

rgreminger mentioned this issue Mar 4, 2020

several fixes, checked for portability rgreminger/example-make-workflow#2

Merged

4 tasks

rgreminger mentioned this issue Mar 5, 2020

Datasets not share through github rgreminger/example-make-workflow#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

duplicate data in directory structure #16

duplicate data in directory structure #16

rgreminger commented Mar 3, 2020

hannesdatta commented Mar 5, 2020

rgreminger commented Mar 5, 2020

duplicate data in directory structure #16

duplicate data in directory structure #16

Comments

rgreminger commented Mar 3, 2020

hannesdatta commented Mar 5, 2020

rgreminger commented Mar 5, 2020