-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve data download feature #818
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, @smcalilly, great refactor! Once you respond to my super minor comments, let's get this up on staging for review.
@hancush I made that python change, thanks for that. When I added the individual sources file to the pipeline, it was sometimes missing a column and erroring the sfm-cms/data/processors/blank_columns.py Lines 17 to 32 in 5a944f6
Not the cleanest but it works. |
Overview
For #816 and #807
Includes:
Notes
The data archive is a flat directory. The recipes write to
<country_name>_<entity_type>.csv
instead of creating a new directory with the country name. I couldn't figure out how to cleanly create a new directory based on the dynamic country name values. Let me know if you see a way to do this?I had to create a custom processing script to blank out the values for two specific columns based on a discussion in our slack #computer-programming channel. For context: csvgrep can only remove the columns. csvsql can change the column values, but csvsql would completely remove the header row from a file that had no data rows, thus creating a blank file which isn't something we want.
Testing Instructions
wwic-data-archive-staging token
in last passcp .env.s3.example .env
and add your AWS access tokensdocker-compose --env-file .env.s3 run --rm app make data_archive
localhost:8000/en/download/
to download the data