Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reproducibility] Automate downloading of resources #10

Open
cthoyt opened this issue Feb 8, 2021 · 0 comments
Open

[reproducibility] Automate downloading of resources #10

cthoyt opened this issue Feb 8, 2021 · 0 comments

Comments

@cthoyt
Copy link

cthoyt commented Feb 8, 2021

There's a step in reproduction that requires users to manually download data and place it in a location relative to the code itself. These manual steps aren't so good for reproducibility, so after packaging the code as suggested in #9, it would be good to automate the download of these resources. pystow is a tool for just this (disclaimer: I did write this tool, but it was exactly for enabling this kind of thing in a simple and approachable way that's good for scientists).

Example:

import pystow

url = 'https://drive.google.com/file/d/1D-iHmdRncTOImh68B54mEHkUvo5CHJVk/view'
module_name = 'pidginv3'
path = pystow.ensure(module_name, url=url, name='no_ortho.zip')
# path is now ~/.data/pidginv3/no_ortho.zip

Now you can use the path to open the file, etc. but you don't have to think about how the user gets it or where it's stored. This can be used anywhere in the code and will eagerly download the file on first download. However, I'm not sure how well this works with google, and have therefore suggested moving the resources to Zenodo/Figshare/equivalent in #11.

@cthoyt cthoyt changed the title Automate downloading of resources [reproducibility] Automate downloading of resources Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant