Vaccination data is updated on a daily basis. For some countries, the update is done by means of an automated process, while others require some manual work. To keep track of the currently automated processes, check this table.
This directory contains the following files:
File name | Description |
---|---|
output/ |
Temporary automated imports are placed here. |
src/vax/ |
Scripts to automate country data imports. |
config.yaml |
Data pipeline configuration. |
us_states/input/ |
Data for US-state vaccination data updates. |
MANIFEST.in , setup.py , requirements.txt , requirements-flake.txt |
Library development related files |
automation_state.csv |
Lists if country process is automated (TRUE) or not (FALSE). |
source_table.html |
HTML table with country source URLs. Shown at OWID's website. |
vax_update.sh.template |
Template to push vaccination update changes. |
*Only most relevant files have been listed
Show steps ...
Follow the steps below to correctly set up your virtual environment.Make sure you have a working environment with Python 3 installed. We use Python >= 3.7.
You can check this with:
python --version
In your environment (shell), install the library in development mode. That is, run:
$ pip install -e .
In addition to owid-covid19-vaccination-dev
package, this will install the command tool cowid-vax
, which is required
to run the data pipeline.
To correctly run the data pipeline, make sure to have a valid configuration file. We currently use config.yaml. This file contains data used throughout the different pipeline stages.
global:
project_dir: !ENV ${OWID_COVID_PROJECT_DIR}
credentials: !ENV ${OWID_COVID_VAX_CREDENTIALS_FILE}
pipeline:
get-data:
parallel: True
countries:
njobs: -2
skip_countries:
- Colombia
process-data:
skip_complete:
skip_monotonic_check:
Northern Ireland:
- date: 2021-04-29
metrics: people_vaccinated
skip_anomaly_check:
Bahrain:
- date: 2021-03-06
metrics: total_vaccinations
Bolivia:
- date: 2021-03-06
metrics: people_vaccinated
Brazil:
- date: 2021-01-21
metrics:
- total_vaccinations
- people_vaccinated
generate-dataset:
Our current configuration requires to previously set environment variables ${OWID_COVID_PROJECT_DIR}
and
${OWID_COVID_VAX_CREDENTIALS_FILE}
. We recommend defining them in ~/.bashrc
or /.bash_profile
. For instance:
export OWID_COVID_PROJECT_DIR=/Users/username/projects/covid-19-data
export OWID_COVID_VAX_CREDENTIALS_FILE=${OWID_COVID_PROJECT_DIR}/scripts/scripts/vaccinations/vax_dataset_config.json
The environment variable ${OWID_COVID_VAX_CREDENTIALS_FILE}
corresponds to the path to the credentials file. This is internal. Google-related fields require a valid OAuth JSON credentials file (see gsheets
documentation). The file should have the following structure:
{
"greece_api_token": "[GREECE_API_TOKEN]",
"owid_cloud_table_post": "[OWID_CLOUD_TABLE_POST]",
"google_credentials": "[CREDENTIALS_JSON_PATH]",
"google_spreadsheet_vax_id": "[SHEET_ID]",
"twitter_consumer_key": "[TWITTER_CONSUMER_KEY]",
"twitter_consumer_secret": "[TWITTER_CONSUMER_SECRET]"
}
We use flake8 to check the style of our code. The configuration lives in file tox.ini. To check the style, simply run
$ tox
Note: This requires tox to be installed ($ pip install tox
)
To update the data, prior to running the code, make sure to correctly set up the development environment.
Check for new updates and manually add them in the internal spreadsheet:
- See this repo's pull requests and issues.
- Look for new data based on previously-used source URLs.
Once all manual processes have been finished, it is time to leverage the tool cowid-vax
. The automation step is
further broken into 4 sub-steps, which we explain below. While these can all be run at once, we recommend running them
one by one. Prior to running these, make sure you are correctly using your configuration file.
Note: you can use vax_update.sh.template as an example of how to run the data pipeline automated step.
To correctly use the configuration in your config.yaml, you can:
- Set environment variable
${OWID_COVID_VAX_CONFIG_FILE}
to file's path. - Save configuration under
~/.config/cowid/config.yaml
and run. - Run
$ cowid-vax --config config.yaml
, explicitly specifying the path to the config file. If above was not possible, use arguments passed via the command call, i.e.--parallel
,--countries
, etc.
For more details run: cowid-vax --help
usage: cowid-vax [-h] [-c COUNTRIES] [-p] [-j NJOBS] [-s] [--config CONFIG] [--credentials CREDENTIALS] [--checkr]
{get-data,process-data,generate-dataset,export,all}
Execute COVID-19 vaccination data collection pipeline.
positional arguments:
{get-data,process-data,generate-dataset,export,all}
Choose a step: i) `get-data` will run automated scripts, 2) `process-data` will get csvs generated in
1 and collect all data from spreadsheet, 3) `generate-dataset` generate the output files, 4) `export`
to generate all final files, 5) `all` will run all steps sequentially.
optional arguments:
-h, --help show this help message and exit
-c COUNTRIES, --countries COUNTRIES
Run for a specific country. For a list of countries use commas to separate them (only in mode get-
data)E.g.: peru, norway. Special keywords: 'all' to run all countries, 'incremental' to run
incrementalupdates, 'batch' to run batch updates. Defaults to all countries. (default: all)
-p, --parallel Execution done in parallel (only in mode get-data). (default: False)
-j NJOBS, --njobs NJOBS
Number of jobs for parallel processing. Check Parallel class in joblib library for more info (only in
mode get-data). (default: -2)
-s, --show-config Display configuration parameters at the beginning of the execution. (default: False)
--config CONFIG Path to config file (YAML). Will look for file in path given by environment variable
`$OWID_COVID_VAX_CONFIG_FILE`. If not set, will default to ~/.config/cowid/config.yaml (default:
/Users/lucasrodes/repos/covid-19-data/scripts/scripts/vaccinations/config.yaml)
--credentials CREDENTIALS
Path to credentials file (JSON). If a config file is being used, the value ther will be prioritized.
(default: vax_dataset_config.json)
--checkr Compare results from generate-dataset with results obtained with former generate_dataset.R script.It
requires that the R script is previously run (without removing temporary files vax & metadata)!
(default: False)
Run:
$ cowid-vax get
This step runs the scrips for batch and incremental updates. It will then generate
individual country files and save them in output
.
Note: This step might crash for some countries, as the automation scripts might no longer (or temporarily) work (e.g. due to changes in the source). Try to keep the scripts up to date.
Run:
$ cowid-vax process
Collect manually updated data from the spreadsheet and data generated in (1). Process this data, and generate public country data in
country_data
, as well as temporary files
vaccinations.preliminary.csv
and metadata.preliminary.csv
.
Run:
$ cowid-vax generate
Generate pipeline output files.
Run:
$ cowid-vax export
Final pipeline step. This updates few more output files. Also, this opens OWID's vaccination website, in order to update the table references (HTML is automatically copied to clipboard).
Once the automation is successfully executed, the following files and directories are updated:
File name | Description |
---|---|
vaccinations.csv |
Main output with vaccination data of all countries. |
vaccinations.json |
Same as vaccinations.csv but in JSON format. |
vaccinations-by-manufacturer.csv |
Secondary output with vaccination by manufacturer for a select number of countries. |
country_data/ |
Individual country CSV files. |
locations.csv |
Country-level metadata. |
source_table.csv |
HTML table with country source URLs. Shown at OWID's website |
automation_state.csv |
Lists if country process is automated (TRUE) or not (FALSE). |
COVID-19 - Vaccinations.csv |
Internal file for OWID grapher on vaccinations. |
COVID-19 - Vaccinations by manufacturer.csv |
Internal file for OWID grapher on vaccinations by manufacturer. |
You can find more information about these files here.
You can run several steps at once, e.g.
$ cowid-vax get process
It is extremely useful to get some insights on which data are we tracking (and which are we not). This can be done with
the tool cowid-vax-track
. Find below some use cases.
Note: Use uption --to-csv
to export results as csv files (a default filename is used).
Which countries are missing?
Run$ cowid-vax-track countries-missing
Countries are given from most to least populated.
Which countries have been updated unfrequently?
Get the list of countries sorted by least frequently updated. The update frequency is defined by the ratio between the number of days with an update and the number of days of observation (i.e. days since first update).$ cowid-vax-track countries-least-updatedfreq
Countries are given from least to most frequently updated.
Which countries haven't been updated for some time?
Get the list of countries and their last update by running:$ cowid-vax-track countries-last-updated
Countries are given from least to most recently updated.
Which countries have been updated few times?
Get the list of countries least updated (in absolute counts):$ cowid-vax-track countries-least-updated
Countries are given from least to most frequently updated.
Which vaccines are missing?
Get the list of countries with missing vaccines:$ cowid-vax-track vaccines-missing
Countries are given from the one with the least to the one with he most number of untracked vaccines.
We welcome contributions! Read more in CONTRIBUTE
Kindly open an issue. If you have a technical proposal, feel free to open a pull request
If you detect that an automation is no longer working, and the process seems like it can't be fixed at the moment:
- Set its state to
automated = FALSE
in theLOCATIONS
tab of the internal spreadsheet. - Add a new tab in the spreadsheet to manually input the country data. Make sure to include the historical data from the
output
file. - Delete the automation script and automated CSV output to avoid confusion.