-
Notifications
You must be signed in to change notification settings - Fork 0
/
lau_nuts_botched.Rmd
243 lines (173 loc) · 7.19 KB
/
lau_nuts_botched.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
---
title: "Botched attempts at combining LAU and NUTS"
description: |
Why is this such a pain?
author:
- name: Giorgio Comai
url: https://giorgiocomai.eu
affiliation: OBCT/EDJNet
affiliation_url: https://www.europeandatajournalism.eu/
date: "`r Sys.Date()`"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library("tidyverse", quietly = TRUE)
library("sf", quietly = TRUE)
library("latlon2map")
library("RSQLite") # for caching
options(timeout = 60000)
cache_folder <- fs::path(fs::path_home_r(), "R", "ll_data")
fs::dir_create(cache_folder)
ll_set_folder(path = cache_folder)
## set db
db <- DBI::dbConnect(
drv = RSQLite::SQLite(),
fs::path(cache_folder, "pop_weighted_centre.sqlite")
)
```
## Requirements
- a consistent dataset with a population-weighted centre for all LAU
- all LAUs need to be paired to a NUTS region
- to the extent that is possible, the resulting dataset should not have unexpected missing data
This page outlines *early and unfinished* attempts to verify the full coverage of the concordance tables. Its only purpose is to illustrate how matching LAUs to NUTs is not straightforward, in spite of the fact that concordance tables exist.
## LAU 2018
Let's start from the 2018 LAU dataset [distributed by GISCO](https://gisco-services.ec.europa.eu/distribution/v2/lau/download/).
```{r}
lau_2018_df <- ll_get_lau_eu(year = 2018) %>%
sf::st_drop_geometry()
```
The dataset has some inconsistencies and incomplete data.
Looking only at mainland Europe (excluding from the map French overseas territories for clarity), we already notice that Bosnia and (as we'll see) to some extent Kosovo are not included in the dataset.
```{r}
ll_get_nuts_eu(year = 2016, level = 0) %>%
dplyr::filter(CNTR_CODE %in% unique(lau_2018_df$CNTR_CODE)) %>%
ggplot() +
geom_sf() +
scale_x_continuous(limits = c(-30, 35)) +
scale_y_continuous(limits = c(25, NA)) +
theme_minimal()
```
There are however other issues.
## Missing place names
For some reason, a considerable number of municipalities included in the dataset do not have the name included:
```{r}
ll_get_lau_eu(year = 2018) %>%
sf::st_drop_geometry() %>%
dplyr::group_by(CNTR_CODE) %>%
dplyr::add_count(name = "total_lau_per_country") %>%
filter(is.na(LAU_NAME)) %>%
dplyr::group_by(CNTR_CODE, total_lau_per_country) %>%
dplyr::count(name = "missing_lau_name_per_country") %>%
dplyr::ungroup() %>%
dplyr::mutate(missing_share = missing_lau_name_per_country/total_lau_per_country)
```
So all LAU names are missing for Montenegro, Norway, and Kosovo. Almost half of them are missing for Slovenia. About two per cent are missing in Switzerland.
The total number of LAUs in Montenegro is suspicioulsy low, so we will have to check that as well.
## Switzerland: missing place names
```{r}
missing_ch_df <- ll_get_lau_eu(year = 2018) %>%
sf::st_drop_geometry() %>%
dplyr::filter(CNTR_CODE == "CH") %>%
dplyr::filter(is.na(LAU_NAME))
```
In Switzerland there are `r nrow(missing_ch_df)` municipalities with missing name. They are apparently overwhelmingly from mountain and/or border locations.
```{r}
ggplot() +
geom_sf(data = ll_get_nuts_eu(year = 2016,
level = 0,
resolution = 1) %>%
dplyr::filter(CNTR_CODE == "CH")) +
geom_sf(data = ll_get_lau_eu(year = 2018) %>%
dplyr::filter(CNTR_CODE == "CH") %>%
dplyr::filter(is.na(LAU_NAME)==FALSE), fill = "lightgreen") +
geom_sf(data = ll_get_lau_eu(year = 2018) %>%
dplyr::filter(CNTR_CODE == "CH") %>%
dplyr::filter(is.na(LAU_NAME)), fill = "pink") +
theme_minimal()
```
The concordance tables for 2018 are of no help.
```{r}
missing_ch_df %>%
dplyr::left_join(y = ll_get_lau_nuts_concordance(lau_year = 2018) %>%
dplyr::filter(country == "CH") %>%
dplyr::rename(GISCO_ID = gisco_id),
by = "GISCO_ID")
```
```{r}
ggplot() +
geom_sf(data = ll_get_nuts_eu(year = 2016,
level = 0,
resolution = 1) %>%
dplyr::filter(CNTR_CODE == "CH")) +
geom_sf(data = ll_get_lau_eu(year = 2019) %>%
dplyr::filter(CNTR_CODE == "CH") %>%
dplyr::filter(is.na(LAU_NAME)==FALSE), fill = "lightgreen") +
geom_sf(data = ll_get_lau_eu(year = 2018) %>%
dplyr::filter(CNTR_CODE == "CH") %>%
dplyr::filter(is.na(LAU_NAME)), fill = "pink") +
theme_minimal()
```
### Norway: missing place names
The dataset for 2018 does not include the name of municipalities in Norway. So we have their boundaries, but not their name. Unfortunately, they are also not included in the relevant [LAU/NUTS concordance tables for 2018](https://ec.europa.eu/eurostat/web/nuts/local-administrative-units).
```{r}
no_2018_df <- ll_get_lau_eu(year = 2018) %>%
sf::st_drop_geometry() %>%
filter(CNTR_CODE=="NO") %>%
dplyr::select(GISCO_ID, CNTR_CODE, LAU_ID, LAU_NAME) %>%
dplyr::arrange(GISCO_ID)
no_2018_df
ll_get_lau_nuts_concordance(lau_year = 2018) %>%
dplyr::filter(country=="NO")
```
The names are however included in the 2019 dataset. Bar for one municipality.
```{r}
lau_no_with_names <- ll_get_lau_eu(year = 2018) %>%
sf::st_drop_geometry() %>%
filter(CNTR_CODE=="NO") %>%
dplyr::select(-LAU_NAME) %>%
dplyr::left_join(y = ll_get_lau_eu(year = 2019) %>%
sf::st_drop_geometry() %>%
dplyr::filter(CNTR_CODE=="NO") %>%
dplyr::select(GISCO_ID, LAU_NAME),
by = "GISCO_ID")
lau_no_with_names %>%
dplyr::filter(is.na(LAU_NAME))
```
But we're lucky enough to find the name of that municipality in the 2017 dataset:
```{r}
lau_no_with_names$LAU_NAME[lau_no_with_names$GISCO_ID=="NO_1567"] <-
ll_get_lau_eu(gisco_id = "NO_1567", year = 2017) %>%
sf::st_drop_geometry() %>%
dplyr::pull(LAU_NAME)
```
Is there more?
...
## LAU 2019
Keeping the LAU for 2019 as point of reference has the advantage of being available to rely on [validated concordance tables](https://ec.europa.eu/eurostat/web/nuts/local-administrative-units) between LAU and NUTS, not yet available for 2020 as of this writing in October 2021.
Let's start from the 2019 LAU dataset [distributed by GISCO](https://gisco-services.ec.europa.eu/distribution/v2/lau/download/).
```{r}
lau_2019_df <- ll_get_lau_eu(year = 2019) %>%
sf::st_drop_geometry()
```
Looking only at mainland Europe (excluding from the map French overseas territories for clarity), we notice that part of the Western Balkans is missing (namely, Bosnia Hercegovina, Montenegro, Kosovo).
```{r}
ll_get_nuts_eu(year = 2016, level = 0) %>%
dplyr::filter(CNTR_CODE %in% unique(lau_2019_df$CNTR_CODE)) %>%
ggplot() +
geom_sf() +
scale_x_continuous(limits = c(-30, 35)) +
scale_y_continuous(limits = c(25, NA)) +
theme_minimal()
```
```{r}
missing_df <- ll_get_lau_eu(year = 2019, silent = TRUE) %>%
sf::st_drop_geometry() %>%
dplyr::transmute(gisco_id = GISCO_ID) %>%
dplyr::left_join(y = ll_get_lau_nuts_concordance(lau_year = 2019,
nuts_year = 2016) %>%
dplyr::rename(lau_name = lau_name_national),
by = "gisco_id") %>%
dplyr::filter(is.na(lau_name))
```
It appears that `r scales::number(nrow(missing_df))` LAUs are missing from the official concordance tables for 2019.
...