Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep track of the versions hosted locally #3

Open
rlafuente opened this issue Aug 18, 2014 · 1 comment
Open

Keep track of the versions hosted locally #3

rlafuente opened this issue Aug 18, 2014 · 1 comment
Assignees

Comments

@rlafuente
Copy link
Contributor

The files at _output are re-generated at every run. While there are checks to see if the Git repository of the data package has changed, currently we have no way to know if the files at _output are stale or not.

This is an issue in the case of big datasets, taking some time to copy the CSV files to the download dir.

The solution would be a cache file that registers the last commit from which a data package was generated the checksum of each CSV data file to determine if the files are identical (and if not, they should be overwritten).

@rlafuente
Copy link
Contributor Author

The cache file could be simple JSON with file name -> md5. (last commit is not ideal since the file may be dirty)

@rlafuente rlafuente self-assigned this Oct 9, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant