Keep track of the versions hosted locally #3

rlafuente · 2014-08-18T17:03:43Z

The files at _output are re-generated at every run. While there are checks to see if the Git repository of the data package has changed, currently we have no way to know if the files at _output are stale or not.

This is an issue in the case of big datasets, taking some time to copy the CSV files to the download dir.

The solution would be a cache file that registers the ~~last commit from which a data package was generated~~ the checksum of each CSV data file to determine if the files are identical (and if not, they should be overwritten).

The text was updated successfully, but these errors were encountered:

rlafuente · 2014-10-09T23:08:32Z

The cache file could be simple JSON with file name -> md5. (last commit is not ideal since the file may be dirty)

rlafuente self-assigned this Oct 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep track of the versions hosted locally #3

Keep track of the versions hosted locally #3

rlafuente commented Aug 18, 2014

rlafuente commented Oct 9, 2014

Keep track of the versions hosted locally #3

Keep track of the versions hosted locally #3

Comments

rlafuente commented Aug 18, 2014

rlafuente commented Oct 9, 2014