Skip to content

Commit

Permalink
updates to UN SDG scripts (datacommonsorg#871)
Browse files Browse the repository at this point in the history
* add UNGeoRegions to SDG scripts and update submodule

* fix test

* update submodule

* ADD NEW SCRIPTS

* delete some old files

* add some files to lfs

* some updates

* add footnotes to main script

* update geography

* update process

* update util

* readme

* tests

* tests

* test test

* test test

* test test

* tests

* more tests

* tests

* even more tests

* tests

* tests

* test

* clean
  • Loading branch information
n-h-diaz authored Oct 4, 2023
1 parent 3612cf1 commit e3efabc
Show file tree
Hide file tree
Showing 29 changed files with 926 additions and 2,298 deletions.
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[submodule "scripts/un/sdg/sdg-dataset"]
path = scripts/un/sdg/sdg-dataset
url = https://code.officialstatistics.org/undata2/data-commons/sdg-dataset.git
[submodule "scripts/un/sdg/sssom-mappings"]
path = scripts/un/sdg/sssom-mappings
url = https://code.officialstatistics.org/undata2/sssom-mappings.git
1 change: 1 addition & 0 deletions scripts/un/sdg/.gitattributes
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
csv/* filter=lfs diff=lfs merge=lfs -text
schema/* filter=lfs diff=lfs merge=lfs -text
dc_generated/* filter=lfs diff=lfs merge=lfs -text
geography/* filter=lfs diff=lfs merge=lfs -text
21 changes: 16 additions & 5 deletions scripts/un/sdg/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
# UN Stats Sustainable Development Goals

This import includes country, city, and select region-level data from the [UN SDG Global Database](https://unstats.un.org/sdgs/dataportal). Data is read from the submodule `sdg-dataset` which is managed by UN Stats.
This import includes data from the [UN SDG Global Database](https://unstats.un.org/sdgs/dataportal). Data is read from the submodule `sdg-dataset` which is managed by UN Stats. Geography mappings are read from the submodule `sssom-mappings` which is also managed by UN Stats.


To generate city dcids:
To generate place mappings:
```
python3 cities.py <DATACOMMONS_API_KEY>
python3 geography.py
```
(Note: many of these cities will require manual curation, so this script likely should not be rerun.)
Produces:
* geography/ folder:
* un_places.mcf (place mcf)
* un_containment.mcf (place containment triples)
* place_mappings.csv (map of SDG code -> dcid)

Note that the `place_mappings.csv` is required before running the `process.py` script.

To process data and generate artifacts:
```
Expand All @@ -23,9 +29,14 @@ Produces:
* unit.mcf
* csv/ folder:
* [CODE].csv
(Note that the `schema/` folder is not included in the repository but can be regenerated by running the script.)
(Note that these folders are not included in the repository but can be regenerated by running the script.)

When refreshing the data, the `geography`, `schema`, and `csv` folders might all get updated and will need to be resubmitted to g3. The corresponding TMCF file is `sdg.tmcf`.

To run unit tests:
```
python3 -m unittest discover -v -s ../ -p "*_test.py"
```

Notes:
* We currently drop certain series and variables (refer to `util.py` for the list) which have been identified by UN as potentially containing outliers.
Loading

0 comments on commit e3efabc

Please sign in to comment.