This folder includes all the district Covid-19 case data reported on a provincial level. Currently, the provincial governments do not report this data in a standard way.
This Readme is intended to guide the DSFSI research group for capturing data ensuring all province leads follow a common standard.
If the data structures are broken, it will affect third party stakeholders.
Each province should strive to have only one csv file to capture both district and subdistrict level data.
In the data file, columns are separated by commas. There are three fields to always include: date
, YYYYMMDD
and source
. An example data table is shown below:
date | YYYYMMDD | ... | source |
---|---|---|---|
e.g. 25-06-2020 | 20200625 | ... | e.g. web address |
The ...
represents the province specific district column list.
All existing columns should be kept and not be renamed. To rename a column rather use the combined district keys, defined in the next section. Any new data columns should be added at the end of the column list but before the source
column.
All the province specific columns need to be defined in combined_district_keys.csv
.
The combined_district_keys.csv combines all the province district keys in a single file.
The purpose of this single file is:
- Be the single truth for every province's district column list
- Easy way to convert coded column names to friendly display names
- Avoid adding multiple key files that do not follow the same standard
- Highlight inconsistencies with the existing column name, to improve future column structures
- The key file is not normalised; that would create multiple files adding to the complexity of the standard.
The combined district keys file has the following columns:
e.g EC
which relates to all the province columns names in covid19za_provincial_cumulative_timeline...
data files.
District Level | Description | Example |
---|---|---|
0 | Province level data | Total case for Gauteng |
1 | District level data | Total case for Johannesburg |
2 | Subdistrict level data | Total case for Johannesburg A |
The coded column name used in the province data file, e.g. alfred_nzo
.
Choose a data type defined in the table below:
Data Type | Description |
---|---|
Case | The column captures case data |
Recovery | The column captures recovery data |
Death | The column captures death data |
Active | The column captures active case data |
Hospital | The column captures hospitalisation data |
A friendly display name that stakeholder can use to represent the data in tables, plots or maps. e.g. Alfred Nzo
.
This column contains category codes.
CAT_B
and DISTRICT
data is defined in LM_2018.csv, by the Municipal Demaraction Board.
The demarcation names were used as a starting point for most of the provincial data files and provincial governments report according to these demarcations in some form.
A subdistrict is category B and has a CAT_B
code. Use the DISTRICT
code for districts and cities.
By providing the category code, any stakeholder can use the key file to match it to their systems.
This column contains DISTRICT
demarcation codes and corresponds to level 1 district Cat
codes. In this way, subdistricts are linked with their corresponding districts, eg. Polokwane (LIM354
) is a subdistrict of Capricorn (DC35
).
The order of the column in the province's district column list. Some stakeholders rely on the order of the columns not to change.
The data file that the column comes from. Some provinces have multiple data files.
Complete this field if more information on the district is required that is not represented in any of the fields in the key file.
The leads for every province should take responsibility to apply changes the data structure in the correct way, by following the standard in this Readme.
For new columns:
- Add a new record to the combined key file, at the end of the respective section for that province.
- Remember to include the order number in the combined key file.
- Add the column at the end of the district column list, just before the
source
column. - Most important, be consistent with the column names for the province.
To rename a column:
- Do not rename existing data column.
- Change the
Friend Name
on the combined key file record.
Province lead: @dmackie
Province lead: @vukosi
Province lead: @shaze
The primary source is daily figures released by the Gauteng Department of Health.
Data file: provincial_gp_cumulative.csv
Within a column label, a TAB separates the name of a place (e.g. Tshwane) or pseudo place ("GP Unallocated") and a category (e.g. "Cases", "Recoveries", "Hospitalisations"). Within a name are spaces and should be treated as ordinary characters (e.g. West Rand Merafong City).
Province lead: None
Province lead: @JosephSefara
Province lead: @lizelgreyling
Province lead: @vukosi
Province lead: @mphomokoatle
Province lead: @naturofix
- Daily updates by Premier Alan Wilde on Twitter.
NICD Epidemiology briefings are taken as a source. Week 17 number of tests are computed from the provincial per capita tests given in the report.
- Mid-year 2019 Estimates Stats SA