Datasets use atomic write when persisting to disk #307
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Partially fixes PolicyEngine/policyengine-api#1954
Prior to this change dataset.download() would use a normal file write to persist downloaded data to disk.
This meant another process or thread could check for the file and attempt to read it before the full content was written.
This change uses a temporary file + a rename to update the file atomically.
If a process is already reading a file that the new verion overwrites, the previous file node is unlinked rather than being overwritten so the read will work as expected.
This will allow us to back out optimistically pre-loading dataset data before it is needed (and causing 404 errors when running tests on machines without the appropriate permissions to download UK data)
Thanks for contributing! Please remove any top-level sections that do not apply to your changes.
make format && make documentation
has been run.New variable
What's changed
See description in the commit above
Bug fix
What this fixes and how it's fixed
See description in commit above