improve data download feature #818

smcalilly · 2022-10-26T20:55:19Z

Overview

Includes:

additional make recipes for filtering data, based on the list in #816 comment
updated the download template with text in #816 comment
removed the old download buttons
added nav link to download page
blank readme for demo (per tom's instruction in an email)
new import docket with country name
updated the download data import to use country name

Notes

The data archive is a flat directory. The recipes write to <country_name>_<entity_type>.csv instead of creating a new directory with the country name. I couldn't figure out how to cleanly create a new directory based on the dynamic country name values. Let me know if you see a way to do this?

I had to create a custom processing script to blank out the values for two specific columns based on a discussion in our slack #computer-programming channel. For context: csvgrep can only remove the columns. csvsql can change the column values, but csvsql would completely remove the header row from a file that had no data rows, thus creating a blank file which isn't something we want.

Testing Instructions

get aws s3 token from wwic-data-archive-staging token in last pass
cp .env.s3.example .env and add your AWS access tokens
for testing purposes, you might need to remove the myanmar row from the import docket, since that spreadsheet naming convention will break the import (this issue has been mentioned in an email, to be fixed)
to create an archive locally, run docker-compose --env-file .env.s3 run --rm app make data_archive
visit localhost:8000/en/download/ to download the data

…e + add download link to navbar

hancush

Hey, @smcalilly, great refactor! Once you respond to my super minor comments, let's get this up on staging for review.

data/processors/blank_columns.py

docket.mk

smcalilly · 2022-11-03T19:13:23Z

@hancush I made that python change, thanks for that.

When I added the individual sources file to the pipeline, it was sometimes missing a column and erroring the blank_columns.py script. So I did some explicit checks for that:

sfm-cms/data/processors/blank_columns.py

Lines 17 to 32 in 5a944f6

    
           for row in reader: 
        
               comment_key = f'{args.entity}:comments:admin' 
        
               comments = row.get(comment_key) 
        
               if comments: 
        
                   row.update({ 
        
                       comment_key: '' 
        
                   }) 
        
               owner_key = f'{args.entity}:owner:admin' 
        
               owner = row.get(owner_key) 
        
               if owner: 
        
                   row.update({ 
        
                       owner_key: '' 
        
                   })

Not the cleanest but it works.

smcalilly added 6 commits October 26, 2022 15:54

save wip

6399000

cleanup code

80f14ad

cleanup and reorganize docket.mk + finish adding copy to download pag…

a21ef37

…e + add download link to navbar

remove old download code

efac70f

cleanup code

d40c1b4

cleanup code

d82a749

smcalilly changed the title ~~[wip] improve data download feature~~ improve data download feature Oct 31, 2022

smcalilly marked this pull request as ready for review October 31, 2022 19:59

smcalilly requested a review from hancush October 31, 2022 20:00

hancush requested changes Nov 1, 2022

View reviewed changes

data/processors/blank_columns.py Outdated Show resolved Hide resolved

data/processors/blank_columns.py Outdated Show resolved Hide resolved

data/processors/blank_columns.py Outdated Show resolved Hide resolved

docket.mk Show resolved Hide resolved

docket.mk Show resolved Hide resolved

smcalilly added 3 commits November 1, 2022 16:49

better org

1b64984

better dict reader with stdout

ae65701

reorganize file structure + fix sources bug in blank_columns.py

5a944f6

smcalilly requested a review from hancush November 3, 2022 19:13

hancush approved these changes Nov 3, 2022

View reviewed changes

smcalilly merged commit b05e83a into master Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve data download feature #818

improve data download feature #818

smcalilly commented Oct 26, 2022 •

edited

Loading

hancush left a comment

smcalilly commented Nov 3, 2022 •

edited

Loading

improve data download feature #818

improve data download feature #818

Conversation

smcalilly commented Oct 26, 2022 • edited Loading

Overview

Notes

Testing Instructions

hancush left a comment

Choose a reason for hiding this comment

smcalilly commented Nov 3, 2022 • edited Loading

smcalilly commented Oct 26, 2022 •

edited

Loading

smcalilly commented Nov 3, 2022 •

edited

Loading