diff --git a/articles/convert.html b/articles/convert.html index 658b037..381eaf0 100644 --- a/articles/convert.html +++ b/articles/convert.html @@ -218,7 +218,7 @@ dates_1 <- c(start = "2020-02-17", end = "2020-02-19") db_2 <- spod_convert(type = "od", zones = "distr", dates = dates_1, overwrite = TRUE) -

In this case, any missing data that has now yet been downloaded will be automatically downloaded, while 2020-02-17 will not be redownloaded, as we already requsted it when creating db_1. Then the requested dates will be converted into DuckDB, overwriting the file with db_1. Once again, we save the path to the output DuckDB database file into db_2 variable.

+

In this case, any missing data that has not yet been downloaded will be automatically downloaded, while 2020-02-17 will not be redownloaded, as we already requsted it when creating db_1. Then the requested dates will be converted into DuckDB, overwriting the file with db_1. Once again, we save the path to the output DuckDB database file into db_2 variable.

6.2 Load the converted DuckDB diff --git a/articles/convert.qmd b/articles/convert.qmd index a302be2..dd4fccc 100644 --- a/articles/convert.qmd +++ b/articles/convert.qmd @@ -193,7 +193,7 @@ dates_1 <- c(start = "2020-02-17", end = "2020-02-19") db_2 <- spod_convert(type = "od", zones = "distr", dates = dates_1, overwrite = TRUE) ``` -In this case, any missing data that has now yet been downloaded will be automatically downloaded, while 2020-02-17 will not be redownloaded, as we already requsted it when creating `db_1`. Then the requested dates will be converted into `DuckDB`, overwriting the file with `db_1`. Once again, we save the path to the output `DuckDB` database file into `db_2` variable. +In this case, any missing data that has not yet been downloaded will be automatically downloaded, while 2020-02-17 will not be redownloaded, as we already requsted it when creating `db_1`. Then the requested dates will be converted into `DuckDB`, overwriting the file with `db_1`. Once again, we save the path to the output `DuckDB` database file into `db_2` variable. ## Load the converted `DuckDB` {#load-converted-duckdb} diff --git a/pkgdown.yml b/pkgdown.yml index 0dd843c..7973e30 100644 --- a/pkgdown.yml +++ b/pkgdown.yml @@ -8,7 +8,7 @@ articles: flowmaps-static: flowmaps-static.html v1-2020-2021-mitma-data-codebook: v1-2020-2021-mitma-data-codebook.html v2-2022-onwards-mitma-data-codebook: v2-2022-onwards-mitma-data-codebook.html -last_built: 2024-10-22T21:54Z +last_built: 2024-10-22T21:57Z urls: reference: https://rOpenSpain.github.io/spanishoddata/reference article: https://rOpenSpain.github.io/spanishoddata/articles diff --git a/search.json b/search.json index 7e73dcd..4b17694 100644 --- a/search.json +++ b/search.json @@ -1 +1 @@ -[{"path":"https://rOpenSpain.github.io/spanishoddata/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2024 spanishoddata authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"intro","dir":"Articles","previous_headings":"","what":"Introduction","title":"Download and convert mobility datasets","text":"TL;DR (long, didn’t read): analysing 1 week data, use spod_convert() convert data DuckDB spod_connect() connect analysis using dplyr. Skip section . main focus vignette show get long periods origin-destination data analysis. First, describe compare two ways get mobility data using origin-destination data example. package functions overall approaches working types data available package, number trips, overnight stays data. show get days origin-destination data spod_get(). Finally, show download convert multiple weeks, months even years origin-destination data analysis-ready formats. See description datasets Codebook cookbook v1 (2020-2021) Spanish mobility data Codebook cookbook v2 (2022 onwards) Spanish mobility data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"two-ways-to-get-the-data","dir":"Articles","previous_headings":"","what":"Two ways to get the data","title":"Download and convert mobility datasets","text":"two main ways import datasets: -memory object spod_get(); connection DuckDB Parquet files disk spod_convert() + spod_connect(). latter recommended large datasets (1 week), much faster memory efficient, demonstarte . spod_get() returns objects appropriate small datasets representing days national origin-destination flows. recommend converting data analysis-ready formats (DuckDB Parquet) using spod_convert() + spod_connect(). allow work much longer time periods (months years) consumer laptop (8-16 GB memory). See section details.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"analysing-large-datasets","dir":"Articles","previous_headings":"","what":"Analysing large datasets","title":"Download and convert mobility datasets","text":"mobility datasets available {spanishiddata} large. Particularly origin-destination data, contains millions rows. data sets may fit memory computer, especially plan run analysis multiple days, weeks, months, even years. work datasets, highly recommend using DuckDB Parquet. systems efficiently processing larger--memory datasets, user-firendly presenting data familiar data.frame/tibble object (almost). great intoroduction , recommend materials Danielle Navarro, Jonathan Keane, Stephanie Hazlitt: website, slides, video tutorial. can also find examples aggregating origin-destination data flows analysis visualisation vignettes static interactive flows visualisation. Learning use DuckDB Parquet easy anyone ever worked dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. However, since learning curve master new tools, provide helper functions novices get started easily open datasets DuckDB Parquet. Please read relevant sections , first show convert data, use . main considerations make choosing DuckDB Parquet (can get spod_convert() + spod_connect()), well CSV.gz (can get spod_get()) analysis speed, convenience data analysis, specific approach prefer getting data. discuss three . data format choose may dramatically impact speed analysis (e.g. filtering dates, calculating number trips per hour, per week, per month, per origin-destination pair, data aggregation manipulation). tests (see Figure 1), found conducting analysis using DuckDB database provided significant speed advantage using Parquet , importantly, raw CSV.gz files. Specifically, comparing query determine mean hourly trips 18 months zone pair, observed using DuckDB database 5 times faster using Parquet files 8 times faster using CSV.gz files. Figure 1: Data processing speed comparison: DuckDB engine running CSV.gz files vs DuckDB database vs folder Parquet files reference, simple query used speed comparison Figure 1: Figure 1 also shows DuckDB format give best performance even low-end systems limited memory number processor cores, conditional fast SSD storage. Also note, choose work long time periods using CSV.gz files via spod_get(), need balance amount memory processor cores via max_n_cpu max_mem_gb arguments, otherwise analysis may fail (see grey area figure), many parallel processes running time limited memory. Regardless data format (DuckDB, Parquet, CSV.gz), functions need data manipulation analysis . analysis actually performed DuckDB (Mühleisen Raasveldt 2024) engine, presents data regular data.frame/tibble object R (almost). point view, difference data formats. can manipulate data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. provide examples following sections. Please refer recommended external tutorials vignettes Analysing large datasets section. choice converting DuckDB Parquet also made based plan work data. Specifically whether want just download long periods even available data, want get data gradually, progress analysis. plan work long time periods, recommend DuckDB, one big file easier update completely. example may working 2020 data. Later decide add 2021 data. case better delete database create scratch. want certain dates, analyse add additional dates later, Parquet may better, day saved separate file, just like original CSV files. Therefore updating folder Parquet files easy just creating new file missing date. work individual days, may notice advantages DuckDB Parquet formats. case, can keep using CSV.gz format analysis using spod_get() function. also useful quick tutorials, need one two days data demonstration purposes.","code":"# data represents either CSV files acquired from `spod_get()`, a `DuckDB` database or a folder of Parquet files connceted with `spod_connect()` data |> group_by(id_origin, id_destination, time_slot) |> summarise(mean_hourly_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"duckdb-vs-parquet-csv","dir":"Articles","previous_headings":"","what":"How to choose between DuckDB, Parquet, and CSV","title":"Download and convert mobility datasets","text":"main considerations make choosing DuckDB Parquet (can get spod_convert() + spod_connect()), well CSV.gz (can get spod_get()) analysis speed, convenience data analysis, specific approach prefer getting data. discuss three . data format choose may dramatically impact speed analysis (e.g. filtering dates, calculating number trips per hour, per week, per month, per origin-destination pair, data aggregation manipulation). tests (see Figure 1), found conducting analysis using DuckDB database provided significant speed advantage using Parquet , importantly, raw CSV.gz files. Specifically, comparing query determine mean hourly trips 18 months zone pair, observed using DuckDB database 5 times faster using Parquet files 8 times faster using CSV.gz files. Figure 1: Data processing speed comparison: DuckDB engine running CSV.gz files vs DuckDB database vs folder Parquet files reference, simple query used speed comparison Figure 1: Figure 1 also shows DuckDB format give best performance even low-end systems limited memory number processor cores, conditional fast SSD storage. Also note, choose work long time periods using CSV.gz files via spod_get(), need balance amount memory processor cores via max_n_cpu max_mem_gb arguments, otherwise analysis may fail (see grey area figure), many parallel processes running time limited memory. Regardless data format (DuckDB, Parquet, CSV.gz), functions need data manipulation analysis . analysis actually performed DuckDB (Mühleisen Raasveldt 2024) engine, presents data regular data.frame/tibble object R (almost). point view, difference data formats. can manipulate data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. provide examples following sections. Please refer recommended external tutorials vignettes Analysing large datasets section. choice converting DuckDB Parquet also made based plan work data. Specifically whether want just download long periods even available data, want get data gradually, progress analysis. plan work long time periods, recommend DuckDB, one big file easier update completely. example may working 2020 data. Later decide add 2021 data. case better delete database create scratch. want certain dates, analyse add additional dates later, Parquet may better, day saved separate file, just like original CSV files. Therefore updating folder Parquet files easy just creating new file missing date. work individual days, may notice advantages DuckDB Parquet formats. case, can keep using CSV.gz format analysis using spod_get() function. also useful quick tutorials, need one two days data demonstration purposes.","code":"# data represents either CSV files acquired from `spod_get()`, a `DuckDB` database or a folder of Parquet files connceted with `spod_connect()` data |> group_by(id_origin, id_destination, time_slot) |> summarise(mean_hourly_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"speed-comparison","dir":"Articles","previous_headings":"3 Analysing large datasets","what":"Analysis Speed","title":"Download and convert mobility datasets","text":"data format choose may dramatically impact speed analysis (e.g. filtering dates, calculating number trips per hour, per week, per month, per origin-destination pair, data aggregation manipulation). tests (see Figure 1), found conducting analysis using DuckDB database provided significant speed advantage using Parquet , importantly, raw CSV.gz files. Specifically, comparing query determine mean hourly trips 18 months zone pair, observed using DuckDB database 5 times faster using Parquet files 8 times faster using CSV.gz files. Figure 1: Data processing speed comparison: DuckDB engine running CSV.gz files vs DuckDB database vs folder Parquet files reference, simple query used speed comparison Figure 1: Figure 1 also shows DuckDB format give best performance even low-end systems limited memory number processor cores, conditional fast SSD storage. Also note, choose work long time periods using CSV.gz files via spod_get(), need balance amount memory processor cores via max_n_cpu max_mem_gb arguments, otherwise analysis may fail (see grey area figure), many parallel processes running time limited memory.","code":"# data represents either CSV files acquired from `spod_get()`, a `DuckDB` database or a folder of Parquet files connceted with `spod_connect()` data |> group_by(id_origin, id_destination, time_slot) |> summarise(mean_hourly_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"convenience-of-data-analysis","dir":"Articles","previous_headings":"3 Analysing large datasets","what":"Convenience of data analysis","title":"Download and convert mobility datasets","text":"Regardless data format (DuckDB, Parquet, CSV.gz), functions need data manipulation analysis . analysis actually performed DuckDB (Mühleisen Raasveldt 2024) engine, presents data regular data.frame/tibble object R (almost). point view, difference data formats. can manipulate data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. provide examples following sections. Please refer recommended external tutorials vignettes Analysing large datasets section.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"scenarios-of-getting-the-data","dir":"Articles","previous_headings":"3 Analysing large datasets","what":"Scenarios of getting the data","title":"Download and convert mobility datasets","text":"choice converting DuckDB Parquet also made based plan work data. Specifically whether want just download long periods even available data, want get data gradually, progress analysis. plan work long time periods, recommend DuckDB, one big file easier update completely. example may working 2020 data. Later decide add 2021 data. case better delete database create scratch. want certain dates, analyse add additional dates later, Parquet may better, day saved separate file, just like original CSV files. Therefore updating folder Parquet files easy just creating new file missing date. work individual days, may notice advantages DuckDB Parquet formats. case, can keep using CSV.gz format analysis using spod_get() function. also useful quick tutorials, need one two days data demonstration purposes.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"setup","dir":"Articles","previous_headings":"","what":"Setup","title":"Download and convert mobility datasets","text":"Make sure loaded package: Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"library(spanishoddata) Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Download and convert mobility datasets","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"spod-get","dir":"Articles","previous_headings":"","what":"Getting a single day with spod_get()","title":"Download and convert mobility datasets","text":"might seen codebooks v1 v2 data, can get single day’s worth data -memory object spod_get(): output look like : Note lazily-evaluated -memory object (note :memory: database path). means data loaded memory call collect() . useful quick exploration data, recommended large datasets, demonstrated .","code":"dates <- c(\"2024-03-01\") d_1 <- spod_get(type = \"od\", zones = \"distr\", dates = dates) class(d_1) # Source: table [?? x 19] # Database: DuckDB v1.0.0 [... 6.5.0-45-generic:R 4.4.1/:memory:] date time_slot id_origin id_destination distance activity_origin 1 2024-03-01 19 01009_AM 01001 0.5-2 frequent_activity 2 2024-03-01 15 01002 01001 10-50 frequent_activity"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"duckdb","dir":"Articles","previous_headings":"","what":"Analysing the data using DuckDB database","title":"Download and convert mobility datasets","text":"Please make sure steps Setup section . can download convert data DuckDB database two steps. example, select dates, download data manually (note: use dates_2 refer fact using v2 data): , can convert downloaded data (including files might downloaded previosly running spod_get() spod_download() dates date intervals) DuckDB like (dates = \"cached_v2\" means use downloaded files): dates = \"cached_v2\" (can also dates = \"cached_v1\" v1 data) argument instructs function work already-downloaded files. default resulting DuckDB database v2 origin-destination data districts saved SPANISH_OD_DATA_DIR directory v2/tabular/duckdb/ filename od_distritos.duckdb (can change file path save_path argument). function returns full path database file, save db_2 variable. can also desired save location save_path argument spod_convert(). can also convert dates range dates list DuckDB: case, missing data now yet downloaded automatically downloaded, 2020-02-17 redownloaded, already requsted creating db_1. requested dates converted DuckDB, overwriting file db_1. , save path output DuckDB database file db_2 variable. can read introductory information connect DuckDB files , however simplify things created helper function. connect data stored path db_1 db_2 can following: Just like , spod_get() funciton used download raw CSV.gz files analyse without conversion, resulting object my_od_data_2 also tbl_duckdb_connection. , can treat regular data.frame tibble use dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. analysis, please refer recommended external tutorials vignettes Analysing large datasets section. finishing working my_od_data_2 advise “disconnect” data using: useful free-memory neccessary like run spod_convert() save data location. Otherwise, also helpful avoid unnecessary possible warnings terminal garbage collected connections.","code":"dates_2 <- c(start = \"2023-02-14\", end = \"2023-02-17\") spod_download(type = \"od\", zones = \"distr\", dates = dates_2) db_2 <- spod_convert(type = \"od\", zones = \"distr\", dates = \"cached_v2\", save_format = \"duckdb\", overwrite = TRUE) db_2 # check the path to the saved `DuckDB` database dates_1 <- c(start = \"2020-02-17\", end = \"2020-02-19\") db_2 <- spod_convert(type = \"od\", zones = \"distr\", dates = dates_1, overwrite = TRUE) my_od_data_2 <- spod_connect(db_2) spod_disconnect(my_od_data_2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"convert-to-duckdb","dir":"Articles","previous_headings":"","what":"Convert to DuckDB","title":"Download and convert mobility datasets","text":"can download convert data DuckDB database two steps. example, select dates, download data manually (note: use dates_2 refer fact using v2 data): , can convert downloaded data (including files might downloaded previosly running spod_get() spod_download() dates date intervals) DuckDB like (dates = \"cached_v2\" means use downloaded files): dates = \"cached_v2\" (can also dates = \"cached_v1\" v1 data) argument instructs function work already-downloaded files. default resulting DuckDB database v2 origin-destination data districts saved SPANISH_OD_DATA_DIR directory v2/tabular/duckdb/ filename od_distritos.duckdb (can change file path save_path argument). function returns full path database file, save db_2 variable. can also desired save location save_path argument spod_convert(). can also convert dates range dates list DuckDB: case, missing data now yet downloaded automatically downloaded, 2020-02-17 redownloaded, already requsted creating db_1. requested dates converted DuckDB, overwriting file db_1. , save path output DuckDB database file db_2 variable.","code":"dates_2 <- c(start = \"2023-02-14\", end = \"2023-02-17\") spod_download(type = \"od\", zones = \"distr\", dates = dates_2) db_2 <- spod_convert(type = \"od\", zones = \"distr\", dates = \"cached_v2\", save_format = \"duckdb\", overwrite = TRUE) db_2 # check the path to the saved `DuckDB` database dates_1 <- c(start = \"2020-02-17\", end = \"2020-02-19\") db_2 <- spod_convert(type = \"od\", zones = \"distr\", dates = dates_1, overwrite = TRUE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"load-converted-duckdb","dir":"Articles","previous_headings":"","what":"Load the converted DuckDB","title":"Download and convert mobility datasets","text":"can read introductory information connect DuckDB files , however simplify things created helper function. connect data stored path db_1 db_2 can following: Just like , spod_get() funciton used download raw CSV.gz files analyse without conversion, resulting object my_od_data_2 also tbl_duckdb_connection. , can treat regular data.frame tibble use dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. analysis, please refer recommended external tutorials vignettes Analysing large datasets section. finishing working my_od_data_2 advise “disconnect” data using: useful free-memory neccessary like run spod_convert() save data location. Otherwise, also helpful avoid unnecessary possible warnings terminal garbage collected connections.","code":"my_od_data_2 <- spod_connect(db_2) spod_disconnect(my_od_data_2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"parquet","dir":"Articles","previous_headings":"","what":"Analysing the data using Parquet","title":"Download and convert mobility datasets","text":"Please make sure steps Setup section . process exactly DuckDB . difference data converted parquet format stored SPANISH_OD_DATA_DIR v1/clean_data/tabular/parquet/ directory v1 data (change save_path argument), subfolders hive-style format like year=2020/month=2/day=14 inside folders single parquet file placed containing data day. advantage format can “update” quickly. example, first downloaded data March April 2020, converted period parquet format, downloaded data May June 2020, run convertion function , convert data May June 2020 add existing parquet files. save time wait March April 2020 converted . Let us convert dates parquet format: now request additional dates overlap already converted data like specifiy argument overwrite = 'update' update existing parquet files new data: , 16 17 Feboruary converted . new data, converted (18 19 February) converted, added existing folder structure ofparquet files stored default save_path location, /clean_data/v1/tabular/parquet/od_distritos. Alternatively, can set save location setting save_path argument. Working parquet files exactly DuckDB Arrow files. Just like , can use helper function spod_connect() connect parquet files: Mind though, first converted data period 14 17 February 2020, converted data period 16 19 February 2020 save default location, od_parquet contains path data, therefore my_od_data_3 connect data. can check like : analysis, please refer recommended external tutorials vignettes Analysing large datasets section.","code":"type <- \"od\" zones <- \"distr\" dates <- c(start = \"2020-02-14\", end = \"2020-02-17\") od_parquet <- spod_convert(type = type, zones = zones, dates = dates, save_format = \"parquet\") dates <- c(start = \"2020-02-16\", end = \"2020-02-19\") od_parquet <- spod_convert(type = type, zones = zones, dates = dates, save_format = \"parquet\", overwrite = 'update') my_od_data_3 <- spod_connect(od_parquet) my_od_data_3 |> dplyr::distinct(date) |> dplyr::arrange(date)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"convert-to-parquet","dir":"Articles","previous_headings":"","what":"Convert to Parquet","title":"Download and convert mobility datasets","text":"process exactly DuckDB . difference data converted parquet format stored SPANISH_OD_DATA_DIR v1/clean_data/tabular/parquet/ directory v1 data (change save_path argument), subfolders hive-style format like year=2020/month=2/day=14 inside folders single parquet file placed containing data day. advantage format can “update” quickly. example, first downloaded data March April 2020, converted period parquet format, downloaded data May June 2020, run convertion function , convert data May June 2020 add existing parquet files. save time wait March April 2020 converted . Let us convert dates parquet format: now request additional dates overlap already converted data like specifiy argument overwrite = 'update' update existing parquet files new data: , 16 17 Feboruary converted . new data, converted (18 19 February) converted, added existing folder structure ofparquet files stored default save_path location, /clean_data/v1/tabular/parquet/od_distritos. Alternatively, can set save location setting save_path argument.","code":"type <- \"od\" zones <- \"distr\" dates <- c(start = \"2020-02-14\", end = \"2020-02-17\") od_parquet <- spod_convert(type = type, zones = zones, dates = dates, save_format = \"parquet\") dates <- c(start = \"2020-02-16\", end = \"2020-02-19\") od_parquet <- spod_convert(type = type, zones = zones, dates = dates, save_format = \"parquet\", overwrite = 'update')"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"load-converted-parquet","dir":"Articles","previous_headings":"","what":"Load the converted Parquet","title":"Download and convert mobility datasets","text":"Working parquet files exactly DuckDB Arrow files. Just like , can use helper function spod_connect() connect parquet files: Mind though, first converted data period 14 17 February 2020, converted data period 16 19 February 2020 save default location, od_parquet contains path data, therefore my_od_data_3 connect data. can check like : analysis, please refer recommended external tutorials vignettes Analysing large datasets section.","code":"my_od_data_3 <- spod_connect(od_parquet) my_od_data_3 |> dplyr::distinct(date) |> dplyr::arrange(date)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"all-dates","dir":"Articles","previous_headings":"","what":"Download all available data","title":"Download and convert mobility datasets","text":"prepare origin-destination data v1 (2020-2021) analysis whole period data availability, please follow steps : Warning Due mobile network outages, data certain dates missing. Kindly keep mind calculating mean monthly weekly flows. Please check original data page currently known missing dates. time writing, following dates missing: 26, 27, 30, 31 October, 1, 2 3 November 2023 4, 18, 19 April 2024. can use spod_get_valid_dates() function get available dates. example origin-destination district level v1 data. can change type “number_of_trips” zones “municipalities” v1 data. v2 data, just use dates starting 2022-01-01 dates_v2 . Use function arguments v2 way shown v1, also consult v2 data codebook, many datasets addition “origin-destination” “number_of_trips”. convert downloaded data DuckDB format lightning fast analysis. can change save_format parquet want save data Parquet format. comparison overview two formats please see Converting data DuckDB/Parquet faster analysis. default, spod_convert_data() save converted data SPANISH_OD_DATA_DIR directory. can change save_path argument spod_convert_data() want save data different location. conversion, 4 GB operating memory enough, speed process depends number processor cores speed disk storage. SSD preferred. default, spod_convert_data() use except one processor cores computer. can adjust max_n_cpu argument spod_convert_data(). can also increase maximum amount memory used max_mem_gb argument, makes difference analysis stage. Finally, analysis_data_storage simply store path converted data. Either path DuckDB database file path folder Parquet files. reference, converting whole v1 origin-destination data DuckDB takes 20 minutes 4 GB memory 3 processor cores. final size DuckDB database 18 GB, Parquet format - 26 GB. raw CSV files gzip archives 20GB. v2 data much larger, origin-destination tables 2022 - mid-2024 taking 150+ GB raw CSV.gz format. can pass analysis_data_storage path spod_connect() function, whether DuckDB Parquet. function determine data type automatically give back tbl_duckdb_connection1. set max_mem_gb 16 GB. Generally, , feel free increase , also consult Figure 1 speed testing results Speed section. can try combinations max_mem_gb max_n_cpu arguments needs Compared conversion process, might want increase available memory analysis step. , better. can control max_mem_gb argument. can manipulate my_data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. analysis, please refer recommended external tutorials vignettes Analysing large datasets section. finishing working my_data advise “disconnect” free memory:","code":"dates_v1 <- spod_get_valid_dates(ver = 1) dates_v2 <- spod_get_valid_dates(ver = 2) type <- \"origin-destination\" zones <- \"districts\" spod_download( type = type, zones = zones, dates = dates_v1, return_local_file_paths = FALSE, # to avoid getting all downloaded file paths printed to console max_download_size_gb = 50 # in Gb, this should be well over the actual download size for v1 data ) save_format <- \"duckdb\" analysis_data_storage <- spod_convert_data( type = type, zones = zones, dates = \"cached_v1\", # to just convert all data that was previously downloaded, no need to specify dates here save_format = save_format, overwrite = TRUE ) my_data <- spod_connect( data_path = analysis_data_storage, max_mem_gb = 16 ) spod_disconnect(my_data)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"download-all-data","dir":"Articles","previous_headings":"","what":"Download all data","title":"Download and convert mobility datasets","text":"example origin-destination district level v1 data. can change type “number_of_trips” zones “municipalities” v1 data. v2 data, just use dates starting 2022-01-01 dates_v2 . Use function arguments v2 way shown v1, also consult v2 data codebook, many datasets addition “origin-destination” “number_of_trips”.","code":"type <- \"origin-destination\" zones <- \"districts\" spod_download( type = type, zones = zones, dates = dates_v1, return_local_file_paths = FALSE, # to avoid getting all downloaded file paths printed to console max_download_size_gb = 50 # in Gb, this should be well over the actual download size for v1 data )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"convert-all-data-into-analysis-ready-format","dir":"Articles","previous_headings":"","what":"Convert all data into analysis ready format","title":"Download and convert mobility datasets","text":"convert downloaded data DuckDB format lightning fast analysis. can change save_format parquet want save data Parquet format. comparison overview two formats please see Converting data DuckDB/Parquet faster analysis. default, spod_convert_data() save converted data SPANISH_OD_DATA_DIR directory. can change save_path argument spod_convert_data() want save data different location. conversion, 4 GB operating memory enough, speed process depends number processor cores speed disk storage. SSD preferred. default, spod_convert_data() use except one processor cores computer. can adjust max_n_cpu argument spod_convert_data(). can also increase maximum amount memory used max_mem_gb argument, makes difference analysis stage. Finally, analysis_data_storage simply store path converted data. Either path DuckDB database file path folder Parquet files.","code":"save_format <- \"duckdb\" analysis_data_storage <- spod_convert_data( type = type, zones = zones, dates = \"cached_v1\", # to just convert all data that was previously downloaded, no need to specify dates here save_format = save_format, overwrite = TRUE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"conversion-speed","dir":"Articles","previous_headings":"","what":"Conversion speed","title":"Download and convert mobility datasets","text":"reference, converting whole v1 origin-destination data DuckDB takes 20 minutes 4 GB memory 3 processor cores. final size DuckDB database 18 GB, Parquet format - 26 GB. raw CSV files gzip archives 20GB. v2 data much larger, origin-destination tables 2022 - mid-2024 taking 150+ GB raw CSV.gz format.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"connecting-to-and-analysing-the-converted-datasets","dir":"Articles","previous_headings":"","what":"Connecting to and analysing the converted datasets","title":"Download and convert mobility datasets","text":"can pass analysis_data_storage path spod_connect() function, whether DuckDB Parquet. function determine data type automatically give back tbl_duckdb_connection1. set max_mem_gb 16 GB. Generally, , feel free increase , also consult Figure 1 speed testing results Speed section. can try combinations max_mem_gb max_n_cpu arguments needs Compared conversion process, might want increase available memory analysis step. , better. can control max_mem_gb argument. can manipulate my_data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. analysis, please refer recommended external tutorials vignettes Analysing large datasets section. finishing working my_data advise “disconnect” free memory:","code":"my_data <- spod_connect( data_path = analysis_data_storage, max_mem_gb = 16 ) spod_disconnect(my_data)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/disaggregation.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"OD data disaggregation","text":"vignette demonstrates origin-destination (OD) data disaggregation using {odjitter} package. package implementation method described paper “Jittering: Computationally Efficient Method Generating Realistic Route Networks Origin-Destination Data” (Lovelace, Félix, Carlino 2022) adding value OD data disaggregating desire lines. can especially useful transport planning purposes high levels geographic resolution required (see also od2net direct network generation OD data).","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/disaggregation.html","id":"data-preparation","dir":"Articles","previous_headings":"","what":"Data preparation","title":"OD data disaggregation","text":"’ll start loading week’s worth origin-destination data city Salamanca, building example README (note: chunks evaluated):","code":"od_db <- spod_get( type = \"od\", zones = \"distritos\", dates = c(start = \"2024-03-01\", end = \"2024-03-07\") ) distritos <- spod_get_zones(\"distritos\", ver = 2) distritos_wgs84 <- distritos |> sf::st_simplify(dTolerance = 200) |> sf::st_transform(4326) od_national_aggregated <- od_db |> group_by(id_origin, id_destination) |> summarise(Trips = sum(n_trips), .groups = \"drop\") |> filter(Trips > 500) |> collect() |> arrange(desc(Trips)) od_national_aggregated od_national_interzonal <- od_national_aggregated |> filter(id_origin != id_destination) salamanca_zones <- zonebuilder::zb_zone(\"Salamanca\") distritos_salamanca <- distritos_wgs84[salamanca_zones, ] ids_salamanca <- distritos_salamanca$id od_salamanca <- od_national_interzonal |> filter(id_origin %in% ids_salamanca) |> filter(id_destination %in% ids_salamanca) |> arrange(Trips) od_salamanca_sf <- od::od_to_sf( od_salamanca, z = distritos_salamanca )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/disaggregation.html","id":"disaggregating-desire-lines","dir":"Articles","previous_headings":"","what":"Disaggregating desire lines","title":"OD data disaggregation","text":"’ll need additional dependencies: ’ll get road network OSM: can use road network disaggregate desire lines: Let’s plot disaggregated desire lines: results show can add value OD data disaggregating desire lines {odjitter} package. can useful understanding spatial distribution trips within zone transport planning. plotted disaggregated desire lines top major road network Salamanca. next step routing help prioritise infrastructure improvements.","code":"remotes::install_github(\"dabreegster/odjitter\", subdir = \"r\") remotes::install_github(\"nptscot/osmactive\") salamanca_boundary <- sf::st_union(distritos_salamanca) osm_full <- osmactive::get_travel_network(salamanca_boundary) osm <- osm_full[salamanca_boundary, ] drive_net <- osmactive::get_driving_network(osm) drive_net_major <- osmactive::get_driving_network_major(osm) cycle_net <- osmactive::get_cycling_network(osm) cycle_net <- osmactive::distance_to_road(cycle_net, drive_net_major) cycle_net <- osmactive::classify_cycle_infrastructure(cycle_net) map_net <- osmactive::plot_osm_tmap(cycle_net) map_net od_jittered <- odjitter::jitter( od_salamanca_sf, zones = distritos_salamanca, subpoints = drive_net, disaggregation_threshold = 1000, disaggregation_key = \"Trips\" ) od_jittered |> arrange(Trips) |> ggplot() + geom_sf(aes(colour = Trips), size = 1) + scale_colour_viridis_c() + geom_sf(data = drive_net_major, colour = \"black\") + theme_void()"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"setup","dir":"Articles","previous_headings":"","what":"Setup","title":"Making interactive flow maps","text":"basemap final visualisation need free Mapbox access token. can get one account.mapbox.com/access-tokens/ (need Mapbox account, free). may skip step, case interative flowmap basemap, flows just flow solid colour background. got access token, can set MAPBOX_TOKEN environment variable like : Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(MAPBOX_TOKEN = \"YOUR_MAPBOX_ACCESS_TOKEN\") library(spanishoddata) library(flowmapblue) library(tidyverse) library(sf) Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Making interactive flow maps","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"simple-example","dir":"Articles","previous_headings":"","what":"Simple example - plot flows data as it is","title":"Making interactive flow maps","text":"Let us get flows districts tipycal working day 2021-04-07: also get district zones polygons mathch flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook). visualise flows, flowmapblue expects two data.frames following format (use packages’s built-data Switzerland illustration): Locations data.frame id, optional name, well lat lon coordinates locations WGS84 (EPSG: 4326) coordinate reference system. Flows data.frame origin, dest, count flows locations, origin dest must match id’s locations data.frame , count number trips . need coordinates origin destination. can use centroids districts_v1 polygons . Remember, map basemap, need setup Mapbox access token setup section vignette. Create interactive flowmap flowmapblue function. example use darkMode clustering, disable animation. recommend disabling clustering plotting flows hundreds thousands locations, reduce redability map. Video Video demonstrating standard interactive flowmap can play around arguments flowmapblue function. example, can turn animation mode: Video Video demonstrating animated interactive flowmap Screenshot demonstrating animated interactive flowmap","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_ine_code residence_province_name time_slot distance n_trips trips_total_length_km year month day 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 7 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 7 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 7 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 7 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 7 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 7 districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; 22117; 22164; 22187; 22214; 22102; 22103; 22115; 22117; 22164; 22187; 222… Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 070… Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429… str(flowmapblue::ch_locations) 'data.frame': 26 obs. of 4 variables: $ id : chr \"ZH\" \"LU\" \"UR\" \"SZ\" ... $ name: chr \"Zürich\" \"Luzern\" \"Uri\" \"Schwyz\" ... $ lat : num 47.4 47.1 46.8 47.1 46.9 ... $ lon : num 8.65 8.11 8.63 8.76 8.24 ... str(flowmapblue::ch_flows) str(flowmapblue::ch_flows) 'data.frame': 676 obs. of 3 variables: $ origin: chr \"ZH\" \"ZH\" \"ZH\" \"ZH\" ... $ dest : chr \"ZH\" \"BE\" \"LU\" \"UR\" ... $ count : int 66855 1673 1017 84 1704 70 94 250 1246 173 ... od_20210407_total <- od_20210407 |> group_by(origin = id_origin, dest = id_destination) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_total) # A tibble: 6 × 3 origin dest count 1 01001_AM 01036 39.8 2 01001_AM 01051 2508. 3 01001_AM 0105903 1644. 4 01001_AM 09363_AM 3.96 5 01001_AM 09907_AM 32.6 6 01001_AM 17033 9.61 districts_v1_centroids <- districts_v1 |> st_transform(4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = districts_v1$id) |> rename(lon = X, lat = Y) head(districts_v1_centroids) lon lat id 1 -5.5551053 42.59849 2408910 2 0.3260681 42.17266 22117_AM 3 -3.8136448 37.74344 2305009 4 2.8542636 39.80672 07058_AM 5 -3.8229513 37.77294 2305006 6 -3.8151096 37.86309 2305005 flowmap <- flowmapblue( locations = districts_v1_centroids, flows = od_20210407_total, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap flowmap_anim <- flowmapblue( locations = districts_v1_centroids, flows = od_20210407_total, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = TRUE, clustering = TRUE ) flowmap_anim"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"get-data","dir":"Articles","previous_headings":"","what":"Get data","title":"Making interactive flow maps","text":"Let us get flows districts tipycal working day 2021-04-07: also get district zones polygons mathch flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook).","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_ine_code residence_province_name time_slot distance n_trips trips_total_length_km year month day 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 7 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 7 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 7 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 7 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 7 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 7 districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; 22117; 22164; 22187; 22214; 22102; 22103; 22115; 22117; 22164; 22187; 222… Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 070… Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429…"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"flows","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Flows","title":"Making interactive flow maps","text":"Let us get flows districts tipycal working day 2021-04-07:","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_ine_code residence_province_name time_slot distance n_trips trips_total_length_km year month day 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 7 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 7 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 7 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 7 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 7 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 7"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"zones","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Zones","title":"Making interactive flow maps","text":"also get district zones polygons mathch flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook).","code":"districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; 22117; 22164; 22187; 22214; 22102; 22103; 22115; 22117; 22164; 22187; 222… Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 070… Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429…"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"prepare-data-for-visualization","dir":"Articles","previous_headings":"","what":"Prepare data for visualization","title":"Making interactive flow maps","text":"visualise flows, flowmapblue expects two data.frames following format (use packages’s built-data Switzerland illustration): Locations data.frame id, optional name, well lat lon coordinates locations WGS84 (EPSG: 4326) coordinate reference system. Flows data.frame origin, dest, count flows locations, origin dest must match id’s locations data.frame , count number trips .","code":"str(flowmapblue::ch_locations) 'data.frame': 26 obs. of 4 variables: $ id : chr \"ZH\" \"LU\" \"UR\" \"SZ\" ... $ name: chr \"Zürich\" \"Luzern\" \"Uri\" \"Schwyz\" ... $ lat : num 47.4 47.1 46.8 47.1 46.9 ... $ lon : num 8.65 8.11 8.63 8.76 8.24 ... str(flowmapblue::ch_flows) str(flowmapblue::ch_flows) 'data.frame': 676 obs. of 3 variables: $ origin: chr \"ZH\" \"ZH\" \"ZH\" \"ZH\" ... $ dest : chr \"ZH\" \"BE\" \"LU\" \"UR\" ... $ count : int 66855 1673 1017 84 1704 70 94 250 1246 173 ... od_20210407_total <- od_20210407 |> group_by(origin = id_origin, dest = id_destination) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_total) # A tibble: 6 × 3 origin dest count 1 01001_AM 01036 39.8 2 01001_AM 01051 2508. 3 01001_AM 0105903 1644. 4 01001_AM 09363_AM 3.96 5 01001_AM 09907_AM 32.6 6 01001_AM 17033 9.61"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"expected-data-format","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Expected data format","title":"Making interactive flow maps","text":"visualise flows, flowmapblue expects two data.frames following format (use packages’s built-data Switzerland illustration): Locations data.frame id, optional name, well lat lon coordinates locations WGS84 (EPSG: 4326) coordinate reference system. Flows data.frame origin, dest, count flows locations, origin dest must match id’s locations data.frame , count number trips .","code":"str(flowmapblue::ch_locations) 'data.frame': 26 obs. of 4 variables: $ id : chr \"ZH\" \"LU\" \"UR\" \"SZ\" ... $ name: chr \"Zürich\" \"Luzern\" \"Uri\" \"Schwyz\" ... $ lat : num 47.4 47.1 46.8 47.1 46.9 ... $ lon : num 8.65 8.11 8.63 8.76 8.24 ... str(flowmapblue::ch_flows) str(flowmapblue::ch_flows) 'data.frame': 676 obs. of 3 variables: $ origin: chr \"ZH\" \"ZH\" \"ZH\" \"ZH\" ... $ dest : chr \"ZH\" \"BE\" \"LU\" \"UR\" ... $ count : int 66855 1673 1017 84 1704 70 94 250 1246 173 ..."},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"aggregate-data---count-total-flows","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Aggregate data - count total flows","title":"Making interactive flow maps","text":"","code":"od_20210407_total <- od_20210407 |> group_by(origin = id_origin, dest = id_destination) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_total) # A tibble: 6 × 3 origin dest count 1 01001_AM 01036 39.8 2 01001_AM 01051 2508. 3 01001_AM 0105903 1644. 4 01001_AM 09363_AM 3.96 5 01001_AM 09907_AM 32.6 6 01001_AM 17033 9.61"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"create-locations-table","dir":"Articles","previous_headings":"","what":"Create locations table with coordinates","title":"Making interactive flow maps","text":"need coordinates origin destination. can use centroids districts_v1 polygons .","code":"districts_v1_centroids <- districts_v1 |> st_transform(4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = districts_v1$id) |> rename(lon = X, lat = Y) head(districts_v1_centroids) lon lat id 1 -5.5551053 42.59849 2408910 2 0.3260681 42.17266 22117_AM 3 -3.8136448 37.74344 2305009 4 2.8542636 39.80672 07058_AM 5 -3.8229513 37.77294 2305006 6 -3.8151096 37.86309 2305005"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"create-the-plot","dir":"Articles","previous_headings":"","what":"Create the plot","title":"Making interactive flow maps","text":"Remember, map basemap, need setup Mapbox access token setup section vignette. Create interactive flowmap flowmapblue function. example use darkMode clustering, disable animation. recommend disabling clustering plotting flows hundreds thousands locations, reduce redability map. Video Video demonstrating standard interactive flowmap can play around arguments flowmapblue function. example, can turn animation mode: Video Video demonstrating animated interactive flowmap Screenshot demonstrating animated interactive flowmap","code":"flowmap <- flowmapblue( locations = districts_v1_centroids, flows = od_20210407_total, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap flowmap_anim <- flowmapblue( locations = districts_v1_centroids, flows = od_20210407_total, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = TRUE, clustering = TRUE ) flowmap_anim"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"advanced-example","dir":"Articles","previous_headings":"","what":"Advanced example - time filter","title":"Making interactive flow maps","text":"following simple example, let us now add time filter flows. use flowmapblue function plot flows districts_v1_centroids typical working day 2021-04-07. Just like , aggregate data rename columns. time keep combine date time_slot (corresponds hour day) procude timestamps, flows can interactively filtered time day. now using flows hour day, 24 times rows data, simple example. Therefore take longer generate plot resulting visualisation may work slower. create manageable example, let us filter data Madrid surrounding areas. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. District zone boundaries Barcelona nearby areas Now prepare table coordinates flowmap: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. Video Video demonstrating time filtering flowmap Screnshot demonstrating time filtering flowmap","code":"od_20210407_time <- od_20210407 |> mutate(time = as.POSIXct(paste0(date, \"T\", time_slot, \":00:00\"))) |> group_by(origin = id_origin, dest = id_destination, time) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_time) # A tibble: 6 × 4 origin dest time count 1 08054 0818401 2021-04-07 01:00:00 43.7 2 08054 0818401 2021-04-07 17:00:00 87.1 3 08054 0818402 2021-04-07 16:00:00 62.6 4 08054 0818403 2021-04-07 05:00:00 26.8 5 08054 0818403 2021-04-07 07:00:00 44.9 6 08054 0818403 2021-04-07 02:00:00 7.11 zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_transform(crs = 4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = zones_barcelona_fua$id) |> rename(lon = X, lat = Y) head(zones_barcelona_fua_coords) lon lat id 1 2.154317 41.49969 08180 2 1.968438 41.48274 08054 3 2.106401 41.41265 0801905 4 2.118221 41.38697 0801904 5 2.150536 41.42915 0801907 6 2.152419 41.41014 0801906 od_20210407_time_barcelona <- od_20210407_time |> filter(origin %in% zones_barcelona_fua$id & dest %in% zones_barcelona_fua$id) flowmap_time <- flowmapblue( locations = zones_barcelona_fua_coords, flows = od_20210407_time_barcelona, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap_time"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"prepare-data-for-visualization-1","dir":"Articles","previous_headings":"","what":"Prepare data for visualization","title":"Making interactive flow maps","text":"Just like , aggregate data rename columns. time keep combine date time_slot (corresponds hour day) procude timestamps, flows can interactively filtered time day. now using flows hour day, 24 times rows data, simple example. Therefore take longer generate plot resulting visualisation may work slower. create manageable example, let us filter data Madrid surrounding areas. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. District zone boundaries Barcelona nearby areas Now prepare table coordinates flowmap: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. Video Video demonstrating time filtering flowmap Screnshot demonstrating time filtering flowmap","code":"od_20210407_time <- od_20210407 |> mutate(time = as.POSIXct(paste0(date, \"T\", time_slot, \":00:00\"))) |> group_by(origin = id_origin, dest = id_destination, time) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_time) # A tibble: 6 × 4 origin dest time count 1 08054 0818401 2021-04-07 01:00:00 43.7 2 08054 0818401 2021-04-07 17:00:00 87.1 3 08054 0818402 2021-04-07 16:00:00 62.6 4 08054 0818403 2021-04-07 05:00:00 26.8 5 08054 0818403 2021-04-07 07:00:00 44.9 6 08054 0818403 2021-04-07 02:00:00 7.11 zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_transform(crs = 4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = zones_barcelona_fua$id) |> rename(lon = X, lat = Y) head(zones_barcelona_fua_coords) lon lat id 1 2.154317 41.49969 08180 2 1.968438 41.48274 08054 3 2.106401 41.41265 0801905 4 2.118221 41.38697 0801904 5 2.150536 41.42915 0801907 6 2.152419 41.41014 0801906 od_20210407_time_barcelona <- od_20210407_time |> filter(origin %in% zones_barcelona_fua$id & dest %in% zones_barcelona_fua$id) flowmap_time <- flowmapblue( locations = zones_barcelona_fua_coords, flows = od_20210407_time_barcelona, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap_time"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"filter-the-zones","dir":"Articles","previous_headings":"3 Advanced example - time filter","what":"Filter the zones","title":"Making interactive flow maps","text":"now using flows hour day, 24 times rows data, simple example. Therefore take longer generate plot resulting visualisation may work slower. create manageable example, let us filter data Madrid surrounding areas. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. District zone boundaries Barcelona nearby areas Now prepare table coordinates flowmap:","code":"zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_transform(crs = 4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = zones_barcelona_fua$id) |> rename(lon = X, lat = Y) head(zones_barcelona_fua_coords) lon lat id 1 2.154317 41.49969 08180 2 1.968438 41.48274 08054 3 2.106401 41.41265 0801905 4 2.118221 41.38697 0801904 5 2.150536 41.42915 0801907 6 2.152419 41.41014 0801906"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"prepare-the-flows","dir":"Articles","previous_headings":"3 Advanced example - time filter","what":"Prepare the flows","title":"Making interactive flow maps","text":"Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around .","code":"od_20210407_time_barcelona <- od_20210407_time |> filter(origin %in% zones_barcelona_fua$id & dest %in% zones_barcelona_fua$id)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"visualise-the-flows-for-barcelona-and-surrounding-areas","dir":"Articles","previous_headings":"3 Advanced example - time filter","what":"Visualise the flows for Barcelona and surrounding areas","title":"Making interactive flow maps","text":"Now, can create new plot data. Video Video demonstrating time filtering flowmap Screnshot demonstrating time filtering flowmap","code":"flowmap_time <- flowmapblue( locations = zones_barcelona_fua_coords, flows = od_20210407_time_barcelona, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap_time"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"setup","dir":"Articles","previous_headings":"","what":"Setup","title":"Making static flow maps","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"library(spanishoddata) library(flowmapper) library(tidyverse) library(sf) Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Making static flow maps","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"simple-example","dir":"Articles","previous_headings":"","what":"Simple example - plot flows data as it is","title":"Making static flow maps","text":"Let us get flows districts typical working day 2021-04-07: also get district zones polygons match flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook). flowmapper package developed visualise origin-destination ‘flow’ data (Mast 2024). package expects data following format: data.frame origin-destination pairs flow counts following columns: o: unique id origin node d: unique id destination node value: intensity flow origin destination Another data.frame node ids names coorindates. coordinate reference system match whichever data planning use plot. name: unique id name node, must match o d flows data.frame ; x: x coordinate node; y: y coordinate node; previous code chunk created od_20210407_total column names expected flowmapper. need coordinates origin destination. can use centroids districts_v1 polygons . Now data structure match flowmapper‘s expected data format can plot sample data (plot containing flows ’busy’ world resemble haystack!). k_node argument add_flowmap function can used reduce business. Let us filter flows zones data just specific functional urban area take closer look flows. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. also prepare nodes add_flowmap function: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. , need k_node argument tweak aggregation nodes flows. Feel free tweak see results change.","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_in…¹ residence_province_n…² time_slot distance n_trips trips_total_length_km year month 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 # ℹ abbreviated names: ¹​residence_province_ine_code, ²​residence_province_name # ℹ 1 more variable: day districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; … Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; … Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429… od_20210407_total <- od_20210407 |> group_by(o = id_origin, d = id_destination) |> summarise(value = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(o, d, value) head(od_20210407_total) # A tibble: 6 × 3 o d value 1 2408910 2408910 1889. 2 2408910 24154_AM 11.0 3 2408910 5029703 12.8 4 2408910 24181_AM 22.3 5 2408910 4802004 9.45 6 2408910 4718608 4.75 districts_v1_coords <- districts_v1 |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = districts_v1$id) |> rename(x = X, y = Y) head(districts_v1_coords) x y name 1 290380.7 4719394 2408910 2 774727.2 4674304 22117_AM 3 428315.4 4177662 2305009 4 1001283.0 4422732 07058_AM 5 427524.2 4180942 2305006 6 428302.1 4190937 2305005 # create base ggplot with boundaries removing various visual clutter base_plot_districts <- ggplot() + geom_sf(data = districts_v1, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_all_districts <- base_plot_districts |> add_flowmap( od = od_20210407_total, nodes = districts_v1_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 20 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_all_districts <- flows_plot_all_districts + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_all_districts zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = zones_barcelona_fua$id) |> rename(x = X, y = Y) head(zones_barcelona_fua_coords) x y name 1 930267.0 4607072 08180 2 914854.0 4604279 08054 3 926837.9 4597166 0801905 4 927995.1 4594372 0801904 5 930418.9 4599218 0801907 6 930702.3 4597116 0801906 od_20210407_total_barcelona <- od_20210407_total |> filter(o %in% zones_barcelona_fua$id & d %in% zones_barcelona_fua$id) # create base ggplot with boundaries removing various visual clutter base_plot_barcelona <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_barcelona <- base_plot_barcelona |> add_flowmap( od = od_20210407_total_barcelona, nodes = zones_barcelona_fua_coords, node_radius_factor = 1, edge_width_factor = 0.6, arrow_point_angle = 45, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 30 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_barcelona <- flows_plot_barcelona + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_barcelona"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"get-data","dir":"Articles","previous_headings":"","what":"Get data","title":"Making static flow maps","text":"Let us get flows districts typical working day 2021-04-07: also get district zones polygons match flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook).","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_in…¹ residence_province_n…² time_slot distance n_trips trips_total_length_km year month 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 # ℹ abbreviated names: ¹​residence_province_ine_code, ²​residence_province_name # ℹ 1 more variable: day districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; … Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; … Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429…"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"flows","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Flows","title":"Making static flow maps","text":"Let us get flows districts typical working day 2021-04-07:","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_in…¹ residence_province_n…² time_slot distance n_trips trips_total_length_km year month 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 # ℹ abbreviated names: ¹​residence_province_ine_code, ²​residence_province_name # ℹ 1 more variable: day "},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"zones","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Zones","title":"Making static flow maps","text":"also get district zones polygons match flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook).","code":"districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; … Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; … Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429…"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"aggregate-data---count-total-flows","dir":"Articles","previous_headings":"","what":"Aggregate data - count total flows","title":"Making static flow maps","text":"","code":"od_20210407_total <- od_20210407 |> group_by(o = id_origin, d = id_destination) |> summarise(value = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(o, d, value)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"reshape-flows-for-visualization","dir":"Articles","previous_headings":"","what":"Reshape flows for visualization","title":"Making static flow maps","text":"flowmapper package developed visualise origin-destination ‘flow’ data (Mast 2024). package expects data following format: data.frame origin-destination pairs flow counts following columns: o: unique id origin node d: unique id destination node value: intensity flow origin destination Another data.frame node ids names coorindates. coordinate reference system match whichever data planning use plot. name: unique id name node, must match o d flows data.frame ; x: x coordinate node; y: y coordinate node; previous code chunk created od_20210407_total column names expected flowmapper. need coordinates origin destination. can use centroids districts_v1 polygons .","code":"head(od_20210407_total) # A tibble: 6 × 3 o d value 1 2408910 2408910 1889. 2 2408910 24154_AM 11.0 3 2408910 5029703 12.8 4 2408910 24181_AM 22.3 5 2408910 4802004 9.45 6 2408910 4718608 4.75 districts_v1_coords <- districts_v1 |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = districts_v1$id) |> rename(x = X, y = Y) head(districts_v1_coords) x y name 1 290380.7 4719394 2408910 2 774727.2 4674304 22117_AM 3 428315.4 4177662 2305009 4 1001283.0 4422732 07058_AM 5 427524.2 4180942 2305006 6 428302.1 4190937 2305005"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-flows-table","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Prepare the flows table","title":"Making static flow maps","text":"previous code chunk created od_20210407_total column names expected flowmapper.","code":"head(od_20210407_total) # A tibble: 6 × 3 o d value 1 2408910 2408910 1889. 2 2408910 24154_AM 11.0 3 2408910 5029703 12.8 4 2408910 24181_AM 22.3 5 2408910 4802004 9.45 6 2408910 4718608 4.75"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-nodes-table-with-coordinates","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Prepare the nodes table with coordinates","title":"Making static flow maps","text":"need coordinates origin destination. can use centroids districts_v1 polygons .","code":"districts_v1_coords <- districts_v1 |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = districts_v1$id) |> rename(x = X, y = Y) head(districts_v1_coords) x y name 1 290380.7 4719394 2408910 2 774727.2 4674304 22117_AM 3 428315.4 4177662 2305009 4 1001283.0 4422732 07058_AM 5 427524.2 4180942 2305006 6 428302.1 4190937 2305005"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"plot-the-flows","dir":"Articles","previous_headings":"","what":"Plot the flows","title":"Making static flow maps","text":"Now data structure match flowmapper‘s expected data format can plot sample data (plot containing flows ’busy’ world resemble haystack!). k_node argument add_flowmap function can used reduce business. Let us filter flows zones data just specific functional urban area take closer look flows. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. also prepare nodes add_flowmap function: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. , need k_node argument tweak aggregation nodes flows. Feel free tweak see results change.","code":"# create base ggplot with boundaries removing various visual clutter base_plot_districts <- ggplot() + geom_sf(data = districts_v1, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_all_districts <- base_plot_districts |> add_flowmap( od = od_20210407_total, nodes = districts_v1_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 20 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_all_districts <- flows_plot_all_districts + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_all_districts zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = zones_barcelona_fua$id) |> rename(x = X, y = Y) head(zones_barcelona_fua_coords) x y name 1 930267.0 4607072 08180 2 914854.0 4604279 08054 3 926837.9 4597166 0801905 4 927995.1 4594372 0801904 5 930418.9 4599218 0801907 6 930702.3 4597116 0801906 od_20210407_total_barcelona <- od_20210407_total |> filter(o %in% zones_barcelona_fua$id & d %in% zones_barcelona_fua$id) # create base ggplot with boundaries removing various visual clutter base_plot_barcelona <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_barcelona <- base_plot_barcelona |> add_flowmap( od = od_20210407_total_barcelona, nodes = zones_barcelona_fua_coords, node_radius_factor = 1, edge_width_factor = 0.6, arrow_point_angle = 45, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 30 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_barcelona <- flows_plot_barcelona + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_barcelona"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"plot-the-entire-country","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Plot the entire country","title":"Making static flow maps","text":"Now data structure match flowmapper‘s expected data format can plot sample data (plot containing flows ’busy’ world resemble haystack!). k_node argument add_flowmap function can used reduce business.","code":"# create base ggplot with boundaries removing various visual clutter base_plot_districts <- ggplot() + geom_sf(data = districts_v1, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_all_districts <- base_plot_districts |> add_flowmap( od = od_20210407_total, nodes = districts_v1_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 20 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_all_districts <- flows_plot_all_districts + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_all_districts"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"zoom-in-to-the-city-level","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Zoom in to the city level","title":"Making static flow maps","text":"Let us filter flows zones data just specific functional urban area take closer look flows. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. also prepare nodes add_flowmap function: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. , need k_node argument tweak aggregation nodes flows. Feel free tweak see results change.","code":"zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = zones_barcelona_fua$id) |> rename(x = X, y = Y) head(zones_barcelona_fua_coords) x y name 1 930267.0 4607072 08180 2 914854.0 4604279 08054 3 926837.9 4597166 0801905 4 927995.1 4594372 0801904 5 930418.9 4599218 0801907 6 930702.3 4597116 0801906 od_20210407_total_barcelona <- od_20210407_total |> filter(o %in% zones_barcelona_fua$id & d %in% zones_barcelona_fua$id) # create base ggplot with boundaries removing various visual clutter base_plot_barcelona <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_barcelona <- base_plot_barcelona |> add_flowmap( od = od_20210407_total_barcelona, nodes = zones_barcelona_fua_coords, node_radius_factor = 1, edge_width_factor = 0.6, arrow_point_angle = 45, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 30 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_barcelona <- flows_plot_barcelona + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_barcelona"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"filter-the-zones","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Filter the zones","title":"Making static flow maps","text":"Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. also prepare nodes add_flowmap function:","code":"zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = zones_barcelona_fua$id) |> rename(x = X, y = Y) head(zones_barcelona_fua_coords) x y name 1 930267.0 4607072 08180 2 914854.0 4604279 08054 3 926837.9 4597166 0801905 4 927995.1 4594372 0801904 5 930418.9 4599218 0801907 6 930702.3 4597116 0801906"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-flows","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Prepare the flows","title":"Making static flow maps","text":"Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around .","code":"od_20210407_total_barcelona <- od_20210407_total |> filter(o %in% zones_barcelona_fua$id & d %in% zones_barcelona_fua$id)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"visualise-the-flows-for-barcelona-and-surrounding-areas","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Visualise the flows for Barcelona and surrounding areas","title":"Making static flow maps","text":"Now, can create new plot data. , need k_node argument tweak aggregation nodes flows. Feel free tweak see results change.","code":"# create base ggplot with boundaries removing various visual clutter base_plot_barcelona <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_barcelona <- base_plot_barcelona |> add_flowmap( od = od_20210407_total_barcelona, nodes = zones_barcelona_fua_coords, node_radius_factor = 1, edge_width_factor = 0.6, arrow_point_angle = 45, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 30 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_barcelona <- flows_plot_barcelona + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_barcelona"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"advanced-example","dir":"Articles","previous_headings":"","what":"Advanced example - aggregate flows for {spanishoddata} logo","title":"Making static flow maps","text":"advanced example need two additional packages: mapSpain (Hernangómez 2024) hexSticker (R-hexSticker?). Just like simple example , need flows visualise. Let us get origin-destination flows districts typical working day 2022-04-06: Also get spatial data zones. using version 2 zones, data got 2022 onwards, corresponds v2 data (see relevant codebook). Ultimately, like plot flows map Spain, aggregate flows visualisation avoid visual clutter. therefore also need nice map Spain, get using mapSpain (Hernangómez 2024) package: getting two sets boundaries. First one Canary Islands moved closer mainland Spain, nicer visualisation. Second one original location islands, can spatially join zones districts data got spanishoddata. Let us count total number trips made locations selected day 2022-04-06: Now need spatial join districts spain_for_join find districts fall within autonomous community. use spain_for_join. used spain_for_vis, districts Canary Islands match boundaries islands. way get table districts ids corresponding autonomous community names. can now add ids total flows districts id pairs calculate total flows autonomous communities: going use flowmapper (Mast 2024) package plot flows. package expects data following format: data.frame origin-destination pairs flow counts following columns: o: unique id origin node d: unique id destination node value: intensity flow origin destination Another data.frame node ids names coorindates. coordinate reference system match whichever data planning use plot. name: unique id name node, must match o d flows data.frame ; x: x coordinate node; y: y coordinate node; data right now flows_by_ca already correct format expected flowmapper. need coordinates origin destination. can use centroids districts_v1 polygons . Now data structure match flowmapper’s expected data format: image may look bit bleak, put sticker, look great. make sticker using hexSticker (Yu 2020) package.","code":"# two new packages library(mapSpain) library(hexSticker) # load these too, if you have not already library(spanishoddata) library(flowmapper) library(tidyverse) library(sf) od <- spod_get(\"od\", zones = \"distr\", dates = \"2022-04-06\") districts <- spod_get_zones(\"distr\", ver = 2) spain_for_vis <- esp_get_ccaa() spain_for_join <- esp_get_ccaa(moveCAN = FALSE) flows_by_district <- od |> group_by(id_origin, id_destination) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(desc(id_origin), id_destination, n_trips) flows_by_district # A tibble: 402,711 × 3 id_origin id_destination n_trips 1 31260_AM 01017_AM 7.15 2 31260_AM 01043 13.7 3 31260_AM 0105902 16.1 4 31260_AM 2512005 12.2 5 31260_AM 26002_AM 8 6 31260_AM 26026_AM 4 7 31260_AM 26036 38.3 8 31260_AM 26061_AM 10.6 9 31260_AM 26084 5.5 10 31260_AM 2608902 109. # ℹ 402,701 more rows # ℹ Use `print(n = ...)` to see more rows district_centroids <- districts |> st_centroid() |> st_transform(crs = st_crs(spain_for_join)) ca_distr <- district_centroids |> st_join(spain_for_join) |> st_drop_geometry() |> filter(!is.na(ccaa.shortname.en)) |> select(id, ca_name = ccaa.shortname.en) ca_distr # A tibble: 3,784 × 2 id ca_name 1 01001 Basque Country 2 01002 Basque Country 3 01004_AM Basque Country 4 01009_AM Basque Country 5 01010 Basque Country 6 01017_AM Basque Country 7 01028_AM Basque Country 8 01036 Basque Country 9 01043 Basque Country 10 01047_AM Basque Country # ℹ 3,774 more rows # ℹ Use `print(n = ...)` to see more rows flows_by_ca <- flows_by_district |> left_join(ca_distr |> rename(id_orig = ca_name), by = c(\"id_origin\" = \"id\") ) |> left_join(ca_distr |> rename(id_dest = ca_name), by = c(\"id_destination\" = \"id\") ) |> group_by(id_orig, id_dest) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> rename(o = id_orig, d = id_dest, value = n_trips) flows_by_ca # A tibble: 358 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. 7 Andalusia Cantabria 153. 8 Andalusia Castile and León 3114. 9 Andalusia Castile-La Mancha 13655. 10 Andalusia Catalonia 5453. # ℹ 348 more rows # ℹ Use `print(n = ...)` to see more rows head(flows_by_ca) # A tibble: 6 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. spain_for_vis_coords <- spain_for_vis |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = spain_for_vis$ccaa.shortname.en) |> rename(x = X, y = Y) head(spain_for_vis_coords) x y name 1 -4.5777846 37.46782 Andalusia 2 -0.6648791 41.51335 Aragon 3 -5.9936312 43.29377 Asturias 4 2.9065933 39.57481 Balearic Islands 5 -10.7324736 35.36091 Canary Islands 6 -4.0300438 43.19772 Cantabria # create base ggplot with boundaries removing any extra elements base_plot <- ggplot() + geom_sf(data = spain_for_vis, fill=NA, col = \"grey30\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent'), legend.box.background = element_rect(fill='transparent') ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot <- base_plot|> add_flowmap( od = flows_by_ca, nodes = spain_for_vis_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", k_node = 10 # play around with this parameter to aggregate nodes and flows ) # customise colours and remove legend, as we need a clean image for the logo flows_plot <- flows_plot + guides(fill=\"none\") + scale_fill_gradient(low=\"#FABB29\", high = \"#AB061F\") flows_plot sticker(flows_plot, # package name package= \"spanishoddata\", p_size=4, p_y = 1.6, p_color = \"gray25\", p_family=\"Roboto\", # ggplot image size and position s_x=1.02, s_y=1.19, s_width=2.6, s_height=2.72, # white hex h_fill=\"#ffffff\", h_color=\"grey\", h_size=1.3, # url url = \"github.com/rOpenSpain/spanishoddata\", u_color= \"gray25\", u_family = \"Roboto\", u_size = 1.2, # save output name and resolution filename=\"./man/figures/logo.png\", dpi=300 # )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"get-data-1","dir":"Articles","previous_headings":"","what":"Get data","title":"Making static flow maps","text":"Just like simple example , need flows visualise. Let us get origin-destination flows districts typical working day 2022-04-06: Also get spatial data zones. using version 2 zones, data got 2022 onwards, corresponds v2 data (see relevant codebook). Ultimately, like plot flows map Spain, aggregate flows visualisation avoid visual clutter. therefore also need nice map Spain, get using mapSpain (Hernangómez 2024) package: getting two sets boundaries. First one Canary Islands moved closer mainland Spain, nicer visualisation. Second one original location islands, can spatially join zones districts data got spanishoddata.","code":"od <- spod_get(\"od\", zones = \"distr\", dates = \"2022-04-06\") districts <- spod_get_zones(\"distr\", ver = 2) spain_for_vis <- esp_get_ccaa() spain_for_join <- esp_get_ccaa(moveCAN = FALSE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"flows-1","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Flows","title":"Making static flow maps","text":"Just like simple example , need flows visualise. Let us get origin-destination flows districts typical working day 2022-04-06: Also get spatial data zones. using version 2 zones, data got 2022 onwards, corresponds v2 data (see relevant codebook).","code":"od <- spod_get(\"od\", zones = \"distr\", dates = \"2022-04-06\") districts <- spod_get_zones(\"distr\", ver = 2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"map-of-spain","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Map of Spain","title":"Making static flow maps","text":"Ultimately, like plot flows map Spain, aggregate flows visualisation avoid visual clutter. therefore also need nice map Spain, get using mapSpain (Hernangómez 2024) package: getting two sets boundaries. First one Canary Islands moved closer mainland Spain, nicer visualisation. Second one original location islands, can spatially join zones districts data got spanishoddata.","code":"spain_for_vis <- esp_get_ccaa() spain_for_join <- esp_get_ccaa(moveCAN = FALSE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"flows-aggregation","dir":"Articles","previous_headings":"","what":"Flows aggregation","title":"Making static flow maps","text":"Let us count total number trips made locations selected day 2022-04-06: Now need spatial join districts spain_for_join find districts fall within autonomous community. use spain_for_join. used spain_for_vis, districts Canary Islands match boundaries islands. way get table districts ids corresponding autonomous community names. can now add ids total flows districts id pairs calculate total flows autonomous communities:","code":"flows_by_district <- od |> group_by(id_origin, id_destination) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(desc(id_origin), id_destination, n_trips) flows_by_district # A tibble: 402,711 × 3 id_origin id_destination n_trips 1 31260_AM 01017_AM 7.15 2 31260_AM 01043 13.7 3 31260_AM 0105902 16.1 4 31260_AM 2512005 12.2 5 31260_AM 26002_AM 8 6 31260_AM 26026_AM 4 7 31260_AM 26036 38.3 8 31260_AM 26061_AM 10.6 9 31260_AM 26084 5.5 10 31260_AM 2608902 109. # ℹ 402,701 more rows # ℹ Use `print(n = ...)` to see more rows district_centroids <- districts |> st_centroid() |> st_transform(crs = st_crs(spain_for_join)) ca_distr <- district_centroids |> st_join(spain_for_join) |> st_drop_geometry() |> filter(!is.na(ccaa.shortname.en)) |> select(id, ca_name = ccaa.shortname.en) ca_distr # A tibble: 3,784 × 2 id ca_name 1 01001 Basque Country 2 01002 Basque Country 3 01004_AM Basque Country 4 01009_AM Basque Country 5 01010 Basque Country 6 01017_AM Basque Country 7 01028_AM Basque Country 8 01036 Basque Country 9 01043 Basque Country 10 01047_AM Basque Country # ℹ 3,774 more rows # ℹ Use `print(n = ...)` to see more rows flows_by_ca <- flows_by_district |> left_join(ca_distr |> rename(id_orig = ca_name), by = c(\"id_origin\" = \"id\") ) |> left_join(ca_distr |> rename(id_dest = ca_name), by = c(\"id_destination\" = \"id\") ) |> group_by(id_orig, id_dest) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> rename(o = id_orig, d = id_dest, value = n_trips) flows_by_ca # A tibble: 358 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. 7 Andalusia Cantabria 153. 8 Andalusia Castile and León 3114. 9 Andalusia Castile-La Mancha 13655. 10 Andalusia Catalonia 5453. # ℹ 348 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"aggregate-raw-origin-destination-data-by-original-ids","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Aggregate raw origin destination data by original ids","title":"Making static flow maps","text":"Let us count total number trips made locations selected day 2022-04-06:","code":"flows_by_district <- od |> group_by(id_origin, id_destination) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(desc(id_origin), id_destination, n_trips) flows_by_district # A tibble: 402,711 × 3 id_origin id_destination n_trips 1 31260_AM 01017_AM 7.15 2 31260_AM 01043 13.7 3 31260_AM 0105902 16.1 4 31260_AM 2512005 12.2 5 31260_AM 26002_AM 8 6 31260_AM 26026_AM 4 7 31260_AM 26036 38.3 8 31260_AM 26061_AM 10.6 9 31260_AM 26084 5.5 10 31260_AM 2608902 109. # ℹ 402,701 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"match-ids-of-districts-with-autonomous-communities","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Match ids of districts with autonomous communities","title":"Making static flow maps","text":"Now need spatial join districts spain_for_join find districts fall within autonomous community. use spain_for_join. used spain_for_vis, districts Canary Islands match boundaries islands. way get table districts ids corresponding autonomous community names.","code":"district_centroids <- districts |> st_centroid() |> st_transform(crs = st_crs(spain_for_join)) ca_distr <- district_centroids |> st_join(spain_for_join) |> st_drop_geometry() |> filter(!is.na(ccaa.shortname.en)) |> select(id, ca_name = ccaa.shortname.en) ca_distr # A tibble: 3,784 × 2 id ca_name 1 01001 Basque Country 2 01002 Basque Country 3 01004_AM Basque Country 4 01009_AM Basque Country 5 01010 Basque Country 6 01017_AM Basque Country 7 01028_AM Basque Country 8 01036 Basque Country 9 01043 Basque Country 10 01047_AM Basque Country # ℹ 3,774 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"count-flows-between-pairs-of-autonomous-communities","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Count flows between pairs of autonomous communities","title":"Making static flow maps","text":"can now add ids total flows districts id pairs calculate total flows autonomous communities:","code":"flows_by_ca <- flows_by_district |> left_join(ca_distr |> rename(id_orig = ca_name), by = c(\"id_origin\" = \"id\") ) |> left_join(ca_distr |> rename(id_dest = ca_name), by = c(\"id_destination\" = \"id\") ) |> group_by(id_orig, id_dest) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> rename(o = id_orig, d = id_dest, value = n_trips) flows_by_ca # A tibble: 358 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. 7 Andalusia Cantabria 153. 8 Andalusia Castile and León 3114. 9 Andalusia Castile-La Mancha 13655. 10 Andalusia Catalonia 5453. # ℹ 348 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"reshape-flows-for-visualization-1","dir":"Articles","previous_headings":"","what":"Reshape flows for visualization","title":"Making static flow maps","text":"going use flowmapper (Mast 2024) package plot flows. package expects data following format: data.frame origin-destination pairs flow counts following columns: o: unique id origin node d: unique id destination node value: intensity flow origin destination Another data.frame node ids names coorindates. coordinate reference system match whichever data planning use plot. name: unique id name node, must match o d flows data.frame ; x: x coordinate node; y: y coordinate node; data right now flows_by_ca already correct format expected flowmapper. need coordinates origin destination. can use centroids districts_v1 polygons .","code":"head(flows_by_ca) # A tibble: 6 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. spain_for_vis_coords <- spain_for_vis |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = spain_for_vis$ccaa.shortname.en) |> rename(x = X, y = Y) head(spain_for_vis_coords) x y name 1 -4.5777846 37.46782 Andalusia 2 -0.6648791 41.51335 Aragon 3 -5.9936312 43.29377 Asturias 4 2.9065933 39.57481 Balearic Islands 5 -10.7324736 35.36091 Canary Islands 6 -4.0300438 43.19772 Cantabria"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-flows-table-1","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Prepare the flows table","title":"Making static flow maps","text":"data right now flows_by_ca already correct format expected flowmapper.","code":"head(flows_by_ca) # A tibble: 6 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899."},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-nodes-table-with-coordinates-1","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Prepare the nodes table with coordinates","title":"Making static flow maps","text":"need coordinates origin destination. can use centroids districts_v1 polygons .","code":"spain_for_vis_coords <- spain_for_vis |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = spain_for_vis$ccaa.shortname.en) |> rename(x = X, y = Y) head(spain_for_vis_coords) x y name 1 -4.5777846 37.46782 Andalusia 2 -0.6648791 41.51335 Aragon 3 -5.9936312 43.29377 Asturias 4 2.9065933 39.57481 Balearic Islands 5 -10.7324736 35.36091 Canary Islands 6 -4.0300438 43.19772 Cantabria"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"plot-the-flows-1","dir":"Articles","previous_headings":"","what":"Plot the flows","title":"Making static flow maps","text":"Now data structure match flowmapper’s expected data format: image may look bit bleak, put sticker, look great.","code":"# create base ggplot with boundaries removing any extra elements base_plot <- ggplot() + geom_sf(data = spain_for_vis, fill=NA, col = \"grey30\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent'), legend.box.background = element_rect(fill='transparent') ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot <- base_plot|> add_flowmap( od = flows_by_ca, nodes = spain_for_vis_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", k_node = 10 # play around with this parameter to aggregate nodes and flows ) # customise colours and remove legend, as we need a clean image for the logo flows_plot <- flows_plot + guides(fill=\"none\") + scale_fill_gradient(low=\"#FABB29\", high = \"#AB061F\") flows_plot"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"make-the-sticker","dir":"Articles","previous_headings":"","what":"Make the sticker","title":"Making static flow maps","text":"make sticker using hexSticker (Yu 2020) package.","code":"sticker(flows_plot, # package name package= \"spanishoddata\", p_size=4, p_y = 1.6, p_color = \"gray25\", p_family=\"Roboto\", # ggplot image size and position s_x=1.02, s_y=1.19, s_width=2.6, s_height=2.72, # white hex h_fill=\"#ffffff\", h_color=\"grey\", h_size=1.3, # url url = \"github.com/rOpenSpain/spanishoddata\", u_color= \"gray25\", u_family = \"Roboto\", u_size = 1.2, # save output name and resolution filename=\"./man/figures/logo.png\", dpi=300 # )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"install-package","dir":"Articles","previous_headings":"","what":"Install the package","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"package yet available CRAN. can install latest version package rOpenSpain R universe: Alternative way install package GitHub: Developers load package locally, clone navigate root package terminal, e.g. following: run following command R console: Load follows: Using instructions , set data folder package download files . may need 30 GB download data another 30 GB like convert downloaded data analysis ready format (DuckDB database file, folder parquet files). can find info conversion Download convert OD datasets vignette.","code":"install.packages(\"spanishoddata\", repos = c(\"https://ropenspain.r-universe.dev\", \"https://cloud.r-project.org\")) if (!require(\"remotes\")) install.packages(\"remotes\") remotes::install_github(\"rOpenSpain/spanishoddata\", force = TRUE, dependencies = TRUE) gh repo clone rOpenSpain/spanishoddata code spanishoddata # with rstudio: rstudio spanishoddata/spanishoddata.Rproj devtools::load_all() library(spanishoddata)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"overall-approach-to-accessing-the-data","dir":"Articles","previous_headings":"","what":"Overall approach to accessing the data","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"want analyse data days, can use spod_get() function. download raw data CSV format let analyse -memory. cover steps page. need longer periods (several months years), use spod_convert() spod_connect() functions, convert data special format much faster analysis, see Download convert OD datasets vignette. spod_get_zones() give spatial data zones can matched origin-destination flows functions using zones ’id’s. Please see simple example , also consult vignettes detailed data description instructions package vignettes spod_codebook(ver = 1) spod_codebook(ver = 2), simply visit package website https://ropenspain.github.io/spanishoddata/. Figure 1 presents overall approach accessing data spanishoddata package. Figure 1: overview use pacakge functions get data","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"spatial-data-with-zoning-boundaries","dir":"Articles","previous_headings":"","what":"1. Spatial data with zoning boundaries","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"boundary data provided two geographic levels: Distrtics Municipalities. ’s important note always align official Spanish census districts municipalities. comply data protection regulations, certain aggregations made districts municipalities”. Districts correspond official census districts cities; however, lower population density, grouped together. rural areas, one district often equal municipality, municipalities low population combined larger units preserve privacy individuals dataset. Therefore, 2850 ‘districts’ compared 10494 official census districts based. access : districts_v1 object class sf consisting polygons. Data structure: Municipalities made official municipalities certain size; however, also aggregated cases lower population density. result, 2,205 municipalities compared 8,125 official municipalities based. access : resulting municipalities_v1 object type sf consisting polygons. Data structure: spatial data get via spanishoddata package downloaded directly source, geometries polygons automatically fixed invalid geometries. zone identifiers stored id column. Apart id column, original zones files metadata. However, seen , using spanishoddata package get many additional columns provide semantic connection official statistical zones used Spanish government zones can get v2 data (2022 onward).","code":"districts_v1 <- spod_get_zones(\"dist\", ver = 1) municipalities_v1 <- spod_get_zones(\"muni\", ver = 1)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"districts","dir":"Articles","previous_headings":"","what":"1.1 Districts","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"Districts correspond official census districts cities; however, lower population density, grouped together. rural areas, one district often equal municipality, municipalities low population combined larger units preserve privacy individuals dataset. Therefore, 2850 ‘districts’ compared 10494 official census districts based. access : districts_v1 object class sf consisting polygons. Data structure:","code":"districts_v1 <- spod_get_zones(\"dist\", ver = 1)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"municipalities","dir":"Articles","previous_headings":"","what":"1.2 Municipalities","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"Municipalities made official municipalities certain size; however, also aggregated cases lower population density. result, 2,205 municipalities compared 8,125 official municipalities based. access : resulting municipalities_v1 object type sf consisting polygons. Data structure: spatial data get via spanishoddata package downloaded directly source, geometries polygons automatically fixed invalid geometries. zone identifiers stored id column. Apart id column, original zones files metadata. However, seen , using spanishoddata package get many additional columns provide semantic connection official statistical zones used Spanish government zones can get v2 data (2022 onward).","code":"municipalities_v1 <- spod_get_zones(\"muni\", ver = 1)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"mobility-data","dir":"Articles","previous_headings":"","what":"2. Mobility data","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"mobility data referenced via id_origin, id_destination, location identifiers (mostly labelled id) two sets zones described . origin-destination data contain number trips districts municipalities Spain every hour every day 2020-02-14 2021-05-09. flow also attributes trip purpose (composed type activity (home/work_or_study/) origin destination), province residence individuals making trip, distance covered making trip. See detailed attributes table. Figure 2 shows example total flows province Barcelona Feb 14th, 2020. Figure 2: Origin destination flows Barcelona 2020-02-14 variables can find district municipality level origin-destination data: original data stored maestra-2 folder suffixes distritos (district zoning) municipios (municipality zoning). use district level data several data issues municipality data documented , also distric level data contains columns useful origin-destination flow characteristics. result, get district level data municipality level data columns. Municipality level data simply re-aggregation district level data using official relations file district identifiers mapped municipality identifiers (orginal file relaciones_distrito_mitma.csv). Getting data access data, use spod_get() function. example use short interval dates: data specified dates automatically downloaded cached SPANISH_OD_DATA_DIR directory. Existing files re-downloaded. Working data resulting objects od_dist od_muni class tbl_duckdb_connection1. Basically, can treat regular data.frames tibbles. One important difference data actually loaded memory, requested dates, e.g. whole month year, data likely fit computer’s memory. tbl_duckdb_connection mapped downloaded CSV files cached disk data loaded small chunks needed time computation. can manipulate od_dist od_muni using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble like : example , calculated mean hourly flows 4 days requested period. full data 4 days probably never loaded memory . Rather available memory computer used maximum limit make calculation happen, without ever exceeding available memory limit. opearation 100 even days, work way possible even limited memory. done transparantly user help DuckDB (specifically, {duckdb} R package Mühleisen Raasveldt (2024)). summary operation provided example can done entire dataset full 18 month regular laptop 8-16 GB memory. take bit time complete, done. speed things , please also see vignette converting data formats increase analsysis performance. Note long use table connection object created spod_get() function, much quicker filter dates year, month day variables, rather date variable. data day separate CSV file located folders look like year=2020/month=2/day=14. filtering date field, R scan CSV files comparing specified date stored inside CSV file. However, query year, month day variables, R needs check path CSV file, much quicker. caveat relevant long use spod_get() . convert (see relevant vignette) downloaded data format optimized quick analysis, can use whichever field want, affect performance. “number trips” data shows number individuals district municipality made trips categorised number trips. original data stored maestra-2 folder suffixes distritos (district zoning) municipios (municipality zoning). use district level data several data issues municipality data documented , also distric level data contains columns useful origin-destination flow characteristics. result, get district level data municipality level data columns. Municipality level data simply re-aggregation district level data using official relations file district identifiers mapped municipality identifiers (orginal file relaciones_distrito_mitma.csv). Getting data access use spod_get() type set “number_of_trips”, just “nt”. can also set dates maximum possible date range 2020-02-14 2021-05-09 get data, data relatively small (200 Mb). data small, can actually load completely memory:","code":"dates <- c(start = \"2020-02-14\", end = \"2020-02-17\") od_dist <- spod_get(type = \"od\", zones = \"dist\", dates = dates) od_muni <- spod_get(type = \"od\", zones = \"muni\", dates = dates) library(dplyr) od_mean_hourly_trips_over_the_4_days <- od_dist |> group_by(time_slot) |> summarise( mean_hourly_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() od_mean_hourly_trips_over_the_4_days # A tibble: 24 × 2 time_slot mean_hourly_trips 1 18 21.4 2 10 19.3 3 2 14.8 4 15 19.8 5 11 19.9 6 16 19.6 7 22 20.9 8 0 18.6 9 13 21.1 10 19 22.5 # ℹ 14 more rows # ℹ Use `print(n = ...)` to see more rows dates <- c(start = \"2020-02-14\", end = \"2021-05-09\") nt_dist <- spod_get(type = \"number_of_trips\", zones = \"dist\", dates = dates) nt_dist_tbl <- nt_dist |> dplyr::collect()"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"od-data","dir":"Articles","previous_headings":"","what":"2.1. Origin-destination data","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"origin-destination data contain number trips districts municipalities Spain every hour every day 2020-02-14 2021-05-09. flow also attributes trip purpose (composed type activity (home/work_or_study/) origin destination), province residence individuals making trip, distance covered making trip. See detailed attributes table. Figure 2 shows example total flows province Barcelona Feb 14th, 2020. Figure 2: Origin destination flows Barcelona 2020-02-14 variables can find district municipality level origin-destination data: original data stored maestra-2 folder suffixes distritos (district zoning) municipios (municipality zoning). use district level data several data issues municipality data documented , also distric level data contains columns useful origin-destination flow characteristics. result, get district level data municipality level data columns. Municipality level data simply re-aggregation district level data using official relations file district identifiers mapped municipality identifiers (orginal file relaciones_distrito_mitma.csv). Getting data access data, use spod_get() function. example use short interval dates: data specified dates automatically downloaded cached SPANISH_OD_DATA_DIR directory. Existing files re-downloaded. Working data resulting objects od_dist od_muni class tbl_duckdb_connection1. Basically, can treat regular data.frames tibbles. One important difference data actually loaded memory, requested dates, e.g. whole month year, data likely fit computer’s memory. tbl_duckdb_connection mapped downloaded CSV files cached disk data loaded small chunks needed time computation. can manipulate od_dist od_muni using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble like : example , calculated mean hourly flows 4 days requested period. full data 4 days probably never loaded memory . Rather available memory computer used maximum limit make calculation happen, without ever exceeding available memory limit. opearation 100 even days, work way possible even limited memory. done transparantly user help DuckDB (specifically, {duckdb} R package Mühleisen Raasveldt (2024)). summary operation provided example can done entire dataset full 18 month regular laptop 8-16 GB memory. take bit time complete, done. speed things , please also see vignette converting data formats increase analsysis performance. Note long use table connection object created spod_get() function, much quicker filter dates year, month day variables, rather date variable. data day separate CSV file located folders look like year=2020/month=2/day=14. filtering date field, R scan CSV files comparing specified date stored inside CSV file. However, query year, month day variables, R needs check path CSV file, much quicker. caveat relevant long use spod_get() . convert (see relevant vignette) downloaded data format optimized quick analysis, can use whichever field want, affect performance.","code":"dates <- c(start = \"2020-02-14\", end = \"2020-02-17\") od_dist <- spod_get(type = \"od\", zones = \"dist\", dates = dates) od_muni <- spod_get(type = \"od\", zones = \"muni\", dates = dates) library(dplyr) od_mean_hourly_trips_over_the_4_days <- od_dist |> group_by(time_slot) |> summarise( mean_hourly_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() od_mean_hourly_trips_over_the_4_days # A tibble: 24 × 2 time_slot mean_hourly_trips 1 18 21.4 2 10 19.3 3 2 14.8 4 15 19.8 5 11 19.9 6 16 19.6 7 22 20.9 8 0 18.6 9 13 21.1 10 19 22.5 # ℹ 14 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"nt-data","dir":"Articles","previous_headings":"","what":"2.2. Number of trips data","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"“number trips” data shows number individuals district municipality made trips categorised number trips. original data stored maestra-2 folder suffixes distritos (district zoning) municipios (municipality zoning). use district level data several data issues municipality data documented , also distric level data contains columns useful origin-destination flow characteristics. result, get district level data municipality level data columns. Municipality level data simply re-aggregation district level data using official relations file district identifiers mapped municipality identifiers (orginal file relaciones_distrito_mitma.csv). Getting data access use spod_get() type set “number_of_trips”, just “nt”. can also set dates maximum possible date range 2020-02-14 2021-05-09 get data, data relatively small (200 Mb). data small, can actually load completely memory:","code":"dates <- c(start = \"2020-02-14\", end = \"2021-05-09\") nt_dist <- spod_get(type = \"number_of_trips\", zones = \"dist\", dates = dates) nt_dist_tbl <- nt_dist |> dplyr::collect()"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"advanced-use","dir":"Articles","previous_headings":"","what":"Advanced use","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"advanced use, especially analysing longer periods (months even years), please see Download convert mobility datasets.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"install-package","dir":"Articles","previous_headings":"","what":"Install the package","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"package yet available CRAN. can install latest version package rOpenSpain R universe: Alternative way install package GitHub: Developers load package locally, clone navigate root package terminal, e.g. following: run following command R console: Load follows: Using instructions , set data folder package download files . may need 400 GB download data another 400 GB like convert downloaded data analysis ready format (DuckDB database file, folder parquet files). can find info conversion Download convert OD datasets vignette.","code":"install.packages(\"spanishoddata\", repos = c(\"https://ropenspain.r-universe.dev\", \"https://cloud.r-project.org\")) if (!require(\"remotes\")) install.packages(\"remotes\") remotes::install_github(\"rOpenSpain/spanishoddata\", force = TRUE, dependencies = TRUE) gh repo clone rOpenSpain/spanishoddata code spanishoddata # with rstudio: rstudio spanishoddata/spanishoddata.Rproj devtools::load_all() library(spanishoddata)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"overall-approach-to-accessing-the-data","dir":"Articles","previous_headings":"","what":"Overall approach to accessing the data","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"want analyse data days, can use spod_get() function. download raw data CSV format let analyse -memory. cover steps page. need longer periods (several months years), use spod_convert() spod_connect() functions, convert data special format much faster analysis, see Download convert OD datasets vignette. spod_get_zones() give spatial data zones can matched origin-destination flows functions using zones ’id’s. Please see simple example , also consult vignettes detailed data description instructions package vignettes spod_codebook(ver = 1) spod_codebook(ver = 2), simply visit package website https://ropenspain.github.io/spanishoddata/. Figure 1 presents overall approach accessing data spanishoddata package. Figure 1: overview use pacakge functions get data","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"spatial-data-with-zoning-boundaries","dir":"Articles","previous_headings":"","what":"1. Spatial data with zoning boundaries","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"boundary data provided three geographic levels: Distrtics, Municipalities, Large Urban Areas. ’s important note always align official Spanish census districts municipalities. comply data protection regulations, certain aggregations made districts municipalities”. Districts correspond official census districts cities; however, lower population density, grouped together. rural areas, one district often equal municipality, municipalities low population combined larger units preserve privacy individuals dataset. Therefore, 3792 ‘districts’ compared 10494 official census districts based. also NUTS3 statistical regions covering France (94 units) Portugal (23 units). Therefore total 3909 zones Districts dataset. districts_v2 object class sf consisting polygons. Data structure: Municipalities made official municipalities certain size; however, also aggregated cases lower population density. result, 2618 municipalities compared 8,125 official municipalities based. also NUTS3 statistical regions covering France (94 units) Portugal (23 units). Therefore total 2735 zones Districts dataset. resulting municipalities_v2 object type sf consisting polygons. Data structure: Large Urban Areas (LUAs) essentially spatial units Municipalities, aggregated. Therefore, 2086 locations LUAs dataset. also NUTS3 statistical regions covering France (94 units) Portugal (23 units). Therefore total 2203 zones LUAs dataset. resulting luas_v2 object type sf consisting polygons. Data structure:","code":"districts_v2 <- spod_get_zones(\"dist\", ver = 2) municipalities_v2 <- spod_get_zones(\"muni\", ver = 2) luas_v2 <- spod_get_zones(\"lua\", ver = 2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"districts","dir":"Articles","previous_headings":"","what":"1.1 Districts","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"Districts correspond official census districts cities; however, lower population density, grouped together. rural areas, one district often equal municipality, municipalities low population combined larger units preserve privacy individuals dataset. Therefore, 3792 ‘districts’ compared 10494 official census districts based. also NUTS3 statistical regions covering France (94 units) Portugal (23 units). Therefore total 3909 zones Districts dataset. districts_v2 object class sf consisting polygons. Data structure:","code":"districts_v2 <- spod_get_zones(\"dist\", ver = 2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"municipalities","dir":"Articles","previous_headings":"","what":"1.2 Municipalities","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"Municipalities made official municipalities certain size; however, also aggregated cases lower population density. result, 2618 municipalities compared 8,125 official municipalities based. also NUTS3 statistical regions covering France (94 units) Portugal (23 units). Therefore total 2735 zones Districts dataset. resulting municipalities_v2 object type sf consisting polygons. Data structure:","code":"municipalities_v2 <- spod_get_zones(\"muni\", ver = 2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"luas","dir":"Articles","previous_headings":"","what":"1.3 LUAs (Large Urban Areas)","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"Large Urban Areas (LUAs) essentially spatial units Municipalities, aggregated. Therefore, 2086 locations LUAs dataset. also NUTS3 statistical regions covering France (94 units) Portugal (23 units). Therefore total 2203 zones LUAs dataset. resulting luas_v2 object type sf consisting polygons. Data structure:","code":"luas_v2 <- spod_get_zones(\"lua\", ver = 2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"mobility-data","dir":"Articles","previous_headings":"","what":"2. Mobility data","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"mobility data referenced via id_origin, id_destination, location identifiers (mostly labelled id) two sets zones described . origin-destination data contain number trips districts, municipalities, large urban areas (LUAs) Spain every hour every day 2022-02-01 whichever currently available latest data (2024-06-30 time writing). flow also attributes trip purpose (composed type activity (home/work_or_study/frequent_activity/infrequent_activity) origin destination, also age, sex, income group individuals traveling origin destination), province residence individuals making trip, distance covered making trip. See detailed attributes table. variables can find district, municipality large urban area level data: Getting data access data, use spod_get() function. example use short interval dates: data specified dates automatically downloaded cached SPANISH_OD_DATA_DIR directory. Existing files re-downloaded. Working data resulting objects od_dist od_muni class tbl_duckdb_connection5. Basically, can treat regular data.frames tibbles. One important difference data actually loaded memory, requested dates, e.g. whole month year, data likely fit computer’s memory. tbl_duckdb_connection mapped downloaded CSV files cached disk data loaded small chunks needed time computation. can manipulate od_dist od_muni using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble like : example , becaus data hourly intervals within day, first summed number trips day age, sex, income groups. grouped data dropping day variable calculated mean number trips per day age, sex, income groups. full data 4 days probably never loaded memory . Rather available memory computer used maximum limit make calculation happen, without ever exceeding available memory limit. opearation 100 even days, work way possible even limited memory. done transparantly user help DuckDB (specifically, {duckdb} R package Mühleisen Raasveldt (2024)). summary operation provided example can done entire dataset multiple years worth data regular laptop 8-16 GB memory. take bit time complete, done. speed things , please also see vignette converting data formats increase analsysis performance. Note long use table connection object created spod_get() function, much quicker filter dates year, month day variables, rather date variable. data day separate CSV file located folders look like year=2020/month=2/day=14. filtering date field, R scan CSV files comparing specified date stored inside CSV file. However, query year, month day variables, R needs check path CSV file, much quicker. caveat relevant long use spod_get() . convert (see relevant vignette) downloaded data format optimized quick analysis, can use whichever field want, affect performance. location, “number trips” data provides number individuals spent night , breakdown number trips made, age, sex. Getting data access use spod_get() type set “number_of_trips”, just “nt”. data small, can actually load completely memory: dataset provides number people spend night location, also identifying place residence census district level according INE encoding. variables can find district, municipality large urban area level data: Getting data access use spod_get() type set “number_of_trips”, just “nt”. data small, can actually load completely memory:","code":"dates <- c(start = \"2022-01-01\", end = \"2022-01-04\") od_dist <- spod_get(type = \"od\", zones = \"dist\", dates = dates) od_muni <- spod_get(type = \"od\", zones = \"muni\", dates = dates) library(dplyr) od_mean_trips_by_ses_over_the_4_days <- od_dist |> group_by(date, age, sex, income) |> summarise( n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> group_by(age, sex, income) |> summarise( daily_mean_n_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() od_mean_trips_by_ses_over_the_4_days # A tibble: 39 × 4 age sex income daily_mean_n_trips 1 NA NA <10 7002485. 2 NA NA 10-15 16551405. 3 NA NA >15 2651481. 4 0-25 NA <10 539060. 5 0-25 NA 10-15 1950892. 6 0-25 NA >15 401557. 7 0-25 female <10 1484989. 8 0-25 female 10-15 5357785. 9 0-25 female >15 1764454. 10 0-25 male <10 1558461. # ℹ 29 more rows # ℹ Use `print(n = ...)` to see more rows dates <- c(start = \"2022-01-01\", end = \"2022-01-04\") nt_dist <- spod_get(type = \"number_of_trips\", zones = \"dist\", dates = dates) nt_dist_tbl <- nt_dist |> dplyr::collect() dates <- c(start = \"2022-01-01\", end = \"2022-01-04\") os_dist <- spod_get(type = \"overnight_stays\", zones = \"dist\", dates = dates) os_dist_tbl <- os_dist |> dplyr::collect()"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"od-data","dir":"Articles","previous_headings":"","what":"2.1. Origin-destination data","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"origin-destination data contain number trips districts, municipalities, large urban areas (LUAs) Spain every hour every day 2022-02-01 whichever currently available latest data (2024-06-30 time writing). flow also attributes trip purpose (composed type activity (home/work_or_study/frequent_activity/infrequent_activity) origin destination, also age, sex, income group individuals traveling origin destination), province residence individuals making trip, distance covered making trip. See detailed attributes table. variables can find district, municipality large urban area level data: Getting data access data, use spod_get() function. example use short interval dates: data specified dates automatically downloaded cached SPANISH_OD_DATA_DIR directory. Existing files re-downloaded. Working data resulting objects od_dist od_muni class tbl_duckdb_connection5. Basically, can treat regular data.frames tibbles. One important difference data actually loaded memory, requested dates, e.g. whole month year, data likely fit computer’s memory. tbl_duckdb_connection mapped downloaded CSV files cached disk data loaded small chunks needed time computation. can manipulate od_dist od_muni using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble like : example , becaus data hourly intervals within day, first summed number trips day age, sex, income groups. grouped data dropping day variable calculated mean number trips per day age, sex, income groups. full data 4 days probably never loaded memory . Rather available memory computer used maximum limit make calculation happen, without ever exceeding available memory limit. opearation 100 even days, work way possible even limited memory. done transparantly user help DuckDB (specifically, {duckdb} R package Mühleisen Raasveldt (2024)). summary operation provided example can done entire dataset multiple years worth data regular laptop 8-16 GB memory. take bit time complete, done. speed things , please also see vignette converting data formats increase analsysis performance. Note long use table connection object created spod_get() function, much quicker filter dates year, month day variables, rather date variable. data day separate CSV file located folders look like year=2020/month=2/day=14. filtering date field, R scan CSV files comparing specified date stored inside CSV file. However, query year, month day variables, R needs check path CSV file, much quicker. caveat relevant long use spod_get() . convert (see relevant vignette) downloaded data format optimized quick analysis, can use whichever field want, affect performance.","code":"dates <- c(start = \"2022-01-01\", end = \"2022-01-04\") od_dist <- spod_get(type = \"od\", zones = \"dist\", dates = dates) od_muni <- spod_get(type = \"od\", zones = \"muni\", dates = dates) library(dplyr) od_mean_trips_by_ses_over_the_4_days <- od_dist |> group_by(date, age, sex, income) |> summarise( n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> group_by(age, sex, income) |> summarise( daily_mean_n_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() od_mean_trips_by_ses_over_the_4_days # A tibble: 39 × 4 age sex income daily_mean_n_trips 1 NA NA <10 7002485. 2 NA NA 10-15 16551405. 3 NA NA >15 2651481. 4 0-25 NA <10 539060. 5 0-25 NA 10-15 1950892. 6 0-25 NA >15 401557. 7 0-25 female <10 1484989. 8 0-25 female 10-15 5357785. 9 0-25 female >15 1764454. 10 0-25 male <10 1558461. # ℹ 29 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"nt-data","dir":"Articles","previous_headings":"","what":"2.2. Number of trips data","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"location, “number trips” data provides number individuals spent night , breakdown number trips made, age, sex. Getting data access use spod_get() type set “number_of_trips”, just “nt”. data small, can actually load completely memory:","code":"dates <- c(start = \"2022-01-01\", end = \"2022-01-04\") nt_dist <- spod_get(type = \"number_of_trips\", zones = \"dist\", dates = dates) nt_dist_tbl <- nt_dist |> dplyr::collect()"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"os-data","dir":"Articles","previous_headings":"","what":"2.3. Overnight stays","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"dataset provides number people spend night location, also identifying place residence census district level according INE encoding. variables can find district, municipality large urban area level data: Getting data access use spod_get() type set “number_of_trips”, just “nt”. data small, can actually load completely memory:","code":"dates <- c(start = \"2022-01-01\", end = \"2022-01-04\") os_dist <- spod_get(type = \"overnight_stays\", zones = \"dist\", dates = dates) os_dist_tbl <- os_dist |> dplyr::collect()"},{"path":"https://rOpenSpain.github.io/spanishoddata/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Egor Kotov. Author, maintainer. Robin Lovelace. Author. Eugeni Vidal-Tortosa. Contributor.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Kotov E, Lovelace R, Vidal-Tortosa E (2024). spanishoddata. doi:10.32614/CRAN.package.spanishoddata, https://github.com/rOpenSpain/spanishoddata.","code":"@Manual{spanishoddata, title = {spanishoddata}, author = {Egor Kotov and Robin Lovelace and Eugeni Vidal-Tortosa}, year = {2024}, url = {https://github.com/rOpenSpain/spanishoddata}, doi = {10.32614/CRAN.package.spanishoddata}, }"},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"spanishoddata-get-spanish-origin-destination-data-","dir":"","previous_headings":"","what":"Get Spanish Origin-Destination Data","title":"Get Spanish Origin-Destination Data","text":"spanishoddata R package provides functions downloading formatting Spanish open mobility data released Ministry Transport Sustainable mobility Spain (Secretaría de Estado de Transportes y Movilidad Sostenible 2024). supports two versions Spanish mobility data consists origin-destination matrices additional data sets. first version covers data 2020 2021, including period COVID-19 pandemic. second version contains data January 2022 onwards updated monthly fifteenth month. versions data primarily consist mobile phone positioning data, include matrices overnight stays, individual movements, trips Spanish residents different geographical levels. See package website vignettes v1 v2 data details. spanishoddata designed save people time providing data analysis-ready formats. Automating process downloading, cleaning, importing data can also reduce risk errors laborious process data preparation. also reduces computational resources using computationally efficient packages behind scenes. effectively work multiple data files, ’s recommended set data directory package can search data download files already present.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"examples-of-available-data","dir":"","previous_headings":"","what":"Examples of available data","title":"Get Spanish Origin-Destination Data","text":"Figure 1: Example data available package: daily flows Barcelona create static maps like see vignette . Figure 2: Example data available package: interactive daily flows Spain Figure 3: Example data available package: interactive daily flows Barcelona time filter create interactive maps see vignette .","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"install-the-package","dir":"","previous_headings":"","what":"Install the package","title":"Get Spanish Origin-Destination Data","text":"package yet available CRAN. can install latest version package rOpenSpain R universe: Alternative way install package GitHub: Developers load package locally, clone navigate root package terminal, e.g. following: run following command R console: Load follows:","code":"install.packages(\"spanishoddata\", repos = c(\"https://ropenspain.r-universe.dev\", \"https://cloud.r-project.org\")) if (!require(\"remotes\")) install.packages(\"remotes\") remotes::install_github(\"rOpenSpain/spanishoddata\", force = TRUE, dependencies = TRUE) gh repo clone rOpenSpain/spanishoddata code spanishoddata # with rstudio: rstudio spanishoddata/spanishoddata.Rproj devtools::load_all() library(spanishoddata)"},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"set-the-data-directory","dir":"","previous_headings":"","what":"Set the data directory","title":"Get Spanish Origin-Destination Data","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"overall-approach-to-accessing-the-data","dir":"","previous_headings":"","what":"Overall approach to accessing the data","title":"Get Spanish Origin-Destination Data","text":"want analyse data days, can use spod_get() function. download raw data CSV format let analyse -memory. cover steps page. need longer periods (several months years), use spod_convert() spod_connect() functions, convert data special format much faster analysis, see Download convert OD datasets vignette. spod_get_zones() give spatial data zones can matched origin-destination flows functions using zones ’id’s. Please see simple example , also consult vignettes detailed data description instructions package vignettes spod_codebook(ver = 1) spod_codebook(ver = 2), simply visit package website https://ropenspain.github.io/spanishoddata/. Figure 4 presents overall approach accessing data spanishoddata package. Figure 4: overview use pacakge functions get data","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"showcase","dir":"","previous_headings":"","what":"Showcase","title":"Get Spanish Origin-Destination Data","text":"run code README use following setup: Get metadata datasets follows (using version 2 data covering years 2022 onwards):","code":"library(tidyverse) theme_set(theme_minimal()) sf::sf_use_s2(FALSE) metadata <- spod_available_data(ver = 2) # for version 2 of the data metadata # A tibble: 9,442 × 6 target_url pub_ts file_extension data_ym data_ymd 1 https://movilidad-o… 2024-07-30 10:54:08 gz NA 2022-10-23 2 https://movilidad-o… 2024-07-30 10:51:07 gz NA 2022-10-22 3 https://movilidad-o… 2024-07-30 10:47:52 gz NA 2022-10-20 4 https://movilidad-o… 2024-07-30 10:14:55 gz NA 2022-10-18 5 https://movilidad-o… 2024-07-30 10:11:58 gz NA 2022-10-17 6 https://movilidad-o… 2024-07-30 10:09:03 gz NA 2022-10-12 7 https://movilidad-o… 2024-07-30 10:05:57 gz NA 2022-10-07 8 https://movilidad-o… 2024-07-30 10:02:12 gz NA 2022-08-07 9 https://movilidad-o… 2024-07-30 09:58:34 gz NA 2022-08-06 10 https://movilidad-o… 2024-07-30 09:54:30 gz NA 2022-08-05 # ℹ 9,432 more rows # ℹ 1 more variable: local_path "},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"zones","dir":"","previous_headings":"","what":"Zones","title":"Get Spanish Origin-Destination Data","text":"Zones can downloaded follows:","code":"distritos <- spod_get_zones(\"distritos\", ver = 2) distritos_wgs84 <- distritos |> sf::st_simplify(dTolerance = 200) |> sf::st_transform(4326) plot(sf::st_geometry(distritos_wgs84))"},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"od-data","dir":"","previous_headings":"","what":"OD data","title":"Get Spanish Origin-Destination Data","text":"result R database interface object (tbl_dbi) can used dplyr functions SQL queries ‘lazily’, meaning data loaded memory needed. Let’s aggregation find total number trips per hour 7 days: figure summarises 925,874,012 trips 7 days associated 135,866,524 records.","code":"od_db <- spod_get( type = \"origin-destination\", zones = \"districts\", dates = c(start = \"2024-03-01\", end = \"2024-03-07\") ) class(od_db) [1] \"tbl_duckdb_connection\" \"tbl_dbi\" \"tbl_sql\" [4] \"tbl_lazy\" \"tbl\" colnames(od_db) [1] \"full_date\" \"time_slot\" [3] \"id_origin\" \"id_destination\" [5] \"distance\" \"activity_origin\" [7] \"activity_destination\" \"study_possible_origin\" [9] \"study_possible_destination\" \"residence_province_ine_code\" [11] \"residence_province\" \"income\" [13] \"age\" \"sex\" [15] \"n_trips\" \"trips_total_length_km\" [17] \"year\" \"month\" [19] \"day\" n_per_hour <- od_db |> group_by(date, time_slot) |> summarise(n = n(), Trips = sum(n_trips)) |> collect() |> mutate(Time = lubridate::ymd_h(paste0(date, time_slot, sep = \" \"))) |> mutate(Day = lubridate::wday(Time, label = TRUE)) n_per_hour |> ggplot(aes(x = Time, y = Trips)) + geom_line(aes(colour = Day)) + labs(title = \"Number of trips per hour over 7 days\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"spanishoddata-advantage-over-accessing-the-data-yourself","dir":"","previous_headings":"","what":"spanishoddata advantage over accessing the data yourself","title":"Get Spanish Origin-Destination Data","text":"demonstrated , can perform quick analysis using just lines code. highlight benefits package, manually: download xml file download links parse xml extract download links write script download files locate disk logical manner figure data structure downloaded files, read codebook translate data (columns values) English, familiar Spanish write script load data database figure way claculate summaries multiple files much … present simple functions get straight data one line code, ready run analysis .","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"desire-lines","dir":"","previous_headings":"","what":"Desire lines","title":"Get Spanish Origin-Destination Data","text":"’ll use input data pick-important flows Spain, focus longer trips visualisation: results show largest flows intra-zonal. Let’s keep inter-zonal flows: can convert geographic data {od} package (Lovelace Morgan 2024): Let’s focus trips around particular area (Salamanca): use information subset rows, capture movement within study area: Let’s plot results:","code":"od_national_aggregated <- od_db |> group_by(id_origin, id_destination) |> summarise(Trips = sum(n_trips), .groups = \"drop\") |> filter(Trips > 500) |> collect() |> arrange(desc(Trips)) od_national_aggregated # A tibble: 96,404 × 3 id_origin id_destination Trips 1 2807908 2807908 2441404. 2 0801910 0801910 2112188. 3 0801902 0801902 2013618. 4 2807916 2807916 1821504. 5 2807911 2807911 1785981. 6 04902 04902 1690606. 7 2807913 2807913 1504484. 8 2807910 2807910 1299586. 9 0704004 0704004 1287122. 10 28106 28106 1286058. # ℹ 96,394 more rows od_national_interzonal <- od_national_aggregated |> filter(id_origin != id_destination) od_national_sf <- od::od_to_sf( od_national_interzonal, z = distritos_wgs84 ) distritos_wgs84 |> ggplot() + geom_sf(aes(fill = population)) + geom_sf(data = spData::world, fill = NA, colour = \"black\") + geom_sf(aes(size = Trips), colour = \"blue\", data = od_national_sf) + coord_sf(xlim = c(-10, 5), ylim = c(35, 45)) + theme_void() salamanca_zones <- zonebuilder::zb_zone(\"Salamanca\") distritos_salamanca <- distritos_wgs84[salamanca_zones, ] plot(distritos_salamanca) ids_salamanca <- distritos_salamanca$id od_salamanca <- od_national_sf |> filter(id_origin %in% ids_salamanca) |> filter(id_destination %in% ids_salamanca) |> arrange(Trips) od_salamanca_sf <- od::od_to_sf( od_salamanca, z = distritos_salamanca ) ggplot() + geom_sf(fill = \"grey\", data = distritos_salamanca) + geom_sf(aes(colour = Trips), size = 1, data = od_salamanca_sf) + scale_colour_viridis_c() + theme_void()"},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"further-information","dir":"","previous_headings":"","what":"Further information","title":"Get Spanish Origin-Destination Data","text":"information package, see: Information functions v1 data (2020-2021) codebook v2 data (2022 onwards) codebook (work progress) Download convert data OD disaggregation vignette showcases flows disaggregation Making static flowmaps vignette shows create flowmaps using data acquired spanishoddata Making interactive flowmaps shows create interactive flowmap using data acquired spanishoddata","code":""},{"path":[]},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/global_quiet_param.html","id":null,"dir":"Reference","previous_headings":"","what":"Global Quiet Parameter — global_quiet_param","title":"Global Quiet Parameter — global_quiet_param","text":"Documentation quiet parameter, used globally.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/global_quiet_param.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Global Quiet Parameter — global_quiet_param","text":"","code":"global_quiet_param(quiet = FALSE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/global_quiet_param.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Global Quiet Parameter — global_quiet_param","text":"quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data.html","id":null,"dir":"Reference","previous_headings":"","what":"Get available data list — spod_available_data","title":"Get available data list — spod_available_data","text":"Get table links available data files specified data version. Optionally check (see arguments) certain files already downloaded cache directory specified SPANISH_OD_DATA_DIR environment variable custom path specified data_dir argument.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get available data list — spod_available_data","text":"","code":"spod_available_data( ver = 2, check_local_files = FALSE, quiet = FALSE, data_dir = spod_get_data_dir() )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get available data list — spod_available_data","text":"ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards. check_local_files Whether check local files exist. Defaults FALSE. quiet logical value indicating whether suppress messages. Default FALSE. data_dir directory data stored. Defaults value returned spod_get_data_dir().","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get available data list — spod_available_data","text":"tibble links, release dates files data, dates data coverage, local paths files, download status. target_url character. URL link data file. pub_ts POSIXct. timestamp file published. file_extension character. file extension data file (e.g., 'tar', 'gz'). data_ym Date. year month data coverage, available. data_ymd Date. specific date data coverage, available. local_path character. local file path data stored. downloaded logical. Indicator whether data file downloaded locally.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v1.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the available v1 data list — spod_available_data_v1","title":"Get the available v1 data list — spod_available_data_v1","text":"function provides table available data list MITMA v1 (2020-2021), remote local.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the available v1 data list — spod_available_data_v1","text":"","code":"spod_available_data_v1( data_dir = spod_get_data_dir(), check_local_files = FALSE, quiet = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v1.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the available v1 data list — spod_available_data_v1","text":"data_dir directory data stored. Defaults value returned spod_get_data_dir(). check_local_files Whether check local files exist. Defaults FALSE. quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v1.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the available v1 data list — spod_available_data_v1","text":"tibble links, release dates files data, dates data coverage, local paths files, download status. target_url character. URL link data file. pub_ts POSIXct. timestamp file published. file_extension character. file extension data file (e.g., 'tar', 'gz'). data_ym Date. year month data coverage, available. data_ymd Date. specific date data coverage, available. local_path character. local file path data stored. downloaded logical. Indicator whether data file downloaded locally.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v1.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the available v1 data list — spod_available_data_v1","text":"","code":"# Get the available v1 data list for the default data directory if (FALSE) { metadata <- spod_available_data_v1() names(metadata) head(metadata) }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v2.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the data dictionary — spod_available_data_v2","title":"Get the data dictionary — spod_available_data_v2","text":"function retrieves data dictionary specified data directory.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the data dictionary — spod_available_data_v2","text":"","code":"spod_available_data_v2( data_dir = spod_get_data_dir(), check_local_files = FALSE, quiet = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the data dictionary — spod_available_data_v2","text":"data_dir directory data stored. Defaults value returned spod_get_data_dir(). check_local_files Whether check local files exist. Defaults FALSE. quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v2.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the data dictionary — spod_available_data_v2","text":"tibble links, release dates files data, dates data coverage, local paths files, download status. target_url character. URL link data file. pub_ts POSIXct. timestamp file published. file_extension character. file extension data file (e.g., 'tar', 'gz'). data_ym Date. year month data coverage, available. data_ymd Date. specific date data coverage, available. local_path character. local file path data stored. downloaded logical. Indicator whether data file downloaded locally.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v2.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the data dictionary — spod_available_data_v2","text":"","code":"# Get the data dictionary for the default data directory if (FALSE) { metadata <- spod_available_data_v2() names(metadata) head(metadata) }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_ram.html","id":null,"dir":"Reference","previous_headings":"","what":"Get available RAM — spod_available_ram","title":"Get available RAM — spod_available_ram","text":"Get available RAM","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_ram.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get available RAM — spod_available_ram","text":"","code":"spod_available_ram()"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_ram.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get available RAM — spod_available_ram","text":"numeric amount available RAM GB.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v1.html","id":null,"dir":"Reference","previous_headings":"","what":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v1","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v1","text":"function fixes invalid geometries zones data renames \"ID\" column \"id\".","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v1","text":"","code":"spod_clean_zones_v1(zones_path, zones)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v1.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v1","text":"zones_path path zones spatial data file. zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards).","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v1.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v1","text":"spatial object containing cleaned zones data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v2.html","id":null,"dir":"Reference","previous_headings":"","what":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v2","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v2","text":"function fixes invalid geometries zones data renames \"ID\" column \"id\". also attacches population counts zone names provided csv files supplied original data provider.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v2","text":"","code":"spod_clean_zones_v2(zones_path)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v2","text":"zones_path path zones spatial data file.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v2.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v2","text":"spatial object containing cleaned zones data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_codebook.html","id":null,"dir":"Reference","previous_headings":"","what":"View codebooks for v1 and v2 open mobility data — spod_codebook","title":"View codebooks for v1 and v2 open mobility data — spod_codebook","text":"Opens relevant vignette.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_codebook.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"View codebooks for v1 and v2 open mobility data — spod_codebook","text":"","code":"spod_codebook(ver = 1)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_codebook.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"View codebooks for v1 and v2 open mobility data — spod_codebook","text":"ver integer numeric value. version data. Defaults 1. Can 1 v1 (2020-2021) data 2 v2 (2022 onwards) data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_codebook.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"View codebooks for v1 and v2 open mobility data — spod_codebook","text":"Nothing, calls relevant vignette.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_connect.html","id":null,"dir":"Reference","previous_headings":"","what":"Connect to data converted to DuckDB — spod_connect","title":"Connect to data converted to DuckDB — spod_connect","text":"function allows user quickly connect data converted DuckDB spod_convert_to_duckdb() function. function simplificaiton connection process. uses","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_connect.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Connect to data converted to DuckDB — spod_connect","text":"","code":"spod_connect( data_path, target_table_name = NULL, quiet = FALSE, max_mem_gb = max(4, spod_available_ram() - 4), max_n_cpu = parallelly::availableCores() - 1, temp_path = spod_get_temp_dir() )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_connect.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Connect to data converted to DuckDB — spod_connect","text":"data_path path DuckDB database file '.duckdb' extension, path folder parquet files. Eigher one created spod_convert() function. target_table_name Default NULL. connecting folder parquet files, argument ignored. connecting DuckDB database, character vector length 1 table name open database file. specified, guessed data_path argument table names available database. manually interfered database, guessed automatically need specify . quiet logical value indicating whether suppress messages. Default FALSE. max_mem_gb maximum memory use GB. conservative default 3 GB, enough resaving data DuckDB form folder CSV.gz files small enough fit memory even old computers. data analysis using already converted data (DuckDB Parquet format) raw CSV.gz data, recommended increase according available resources. max_n_cpu maximum number threads use. Defaults number available cores minus 1. temp_path path temp folder DuckDB intermediate spilling case set memory limit /physical memory computer low perform query. default set temp directory data folder defined SPANISH_OD_DATA_DIR environment variable. Otherwise, queries folders CSV files parquet files, temporary path set current R working directory, probably undesirable, current working directory can slow storage, storage may limited space, compared data folder.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_connect.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Connect to data converted to DuckDB — spod_connect","text":"DuckDB table connection object.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_convert.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert data from plain text to duckdb or parquet format — spod_convert","title":"Convert data from plain text to duckdb or parquet format — spod_convert","text":"Converts data faster analysis either DuckDB file parquet files hive-style directory structure. Running analysis files sometimes 100x times faster working raw CSV files, espetially gzip archives. connect converted data, please use mydata <- spod_connect() passing path data saved. connected mydata can analysed using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. -depth usage data, please refer DuckDB documentation examples https://duckdb.org/docs/api/r#dbplyr . useful examples can found https://arrow-user2022.netlify.app/data-wrangling#combining-arrow--duckdb . may also use arrow package work parquet files https://arrow.apache.org/docs/r/.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_convert.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert data from plain text to duckdb or parquet format — spod_convert","text":"","code":"spod_convert( type = c(\"od\", \"origin-destination\", \"os\", \"overnight_stays\", \"nt\", \"number_of_trips\"), zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\"), dates = NULL, save_format = \"duckdb\", save_path = NULL, overwrite = FALSE, data_dir = spod_get_data_dir(), quiet = FALSE, max_mem_gb = max(4, spod_available_ram() - 4), max_n_cpu = parallelly::availableCores() - 1, max_download_size_gb = 1 )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_convert.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert data from plain text to duckdb or parquet format — spod_convert","text":"type type data download. Can \"origin-destination\" (ust \"od\"), \"number_of_trips\" (just \"nt\") v1 data. v2 data \"overnight_stays\" (just \"os\") also available. data types supported future. See codebooks v1 v2 data vignettes spod_codebook(1) spod_codebook(2) (spod_codebook). zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). dates character Date vector dates process. Kindly keep mind v1 v2 data follow different data collection methodologies may directly comparable. Therefore, try request data versions date range. need compare data versions, please refer respective codebooks methodology documents. v1 data covers period 2020-02-14 2021-05-09, v2 data covers period 2022-01-01 present notice. true dates range checked available data version every function run. possible values can following: spod_get() spod_convert() functions, dates can set \"cached_v1\" \"cached_v2\" request data cached (already previously downloaded) v1 (2020-2021) v2 (2022 onwards) data. case, function identify use data files downloaded cached locally, (e.g. using explicit run spod_download(), data requests made using spod_get() spod_convert() functions). single date ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. vector dates ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. Can non-consecutive sequence dates. date range eigher character Date object length 2 clearly named elements start end ISO (YYYY-MM-DD) YYYYMMDD format. E.g. c(start = \"2020-02-15\", end = \"2020-02-17\"); character object form YYYY-MM-DD_YYYY-MM-DD YYYYMMDD_YYYYMMDD. example, 2020-02-15_2020-02-17 20200215_20200217. regular expression match dates format YYYYMMDD. character object. example, ^202002 match dates February 2020. save_format character vector length 1 values \"duckdb\" \"parquet\". Defaults \"duckdb\". NULL automatically inferred save_path argument. save_format provided, save_path set default location set SPANISH_OD_DATA_DIR environment variable using Sys.setenv(SPANISH_OD_DATA_DIR = 'path///cache/dir')). v1 data path /clean_data/v1/tabular/duckdb/ /clean_data/v1/tabular/parquet/. can also set save_path. ends \".duckdb\", save DuckDB database format, save_path end \".duckdb\", save parquet format treat save_path path folder, file, create necessary hive-style subdirectories folder. Hive style looks like year=2020/month=2/day=14 inside directory data_0.parquet file contains data day. save_path character vector length 1. full (relative) path DuckDB database file parquet folder. save_path ends .duckdb, saved DuckDB database file. format argument automatically set save_format='duckdb'. save_path ends folder name (e.g. /data_dir/clean_data/v1/tabular/parquet/od_distr origin-destination data district level), data saved collection parquet files hive-style directory structure. subfolders od_distr year=2020/month=2/day=14 inside folders single parquet file placed containing data day. NULL, uses default location data_dir (set SPANISH_OD_DATA_DIR environment variable using Sys.setenv(SPANISH_OD_DATA_DIR = 'path///cache/dir')). Therefore, default relative path DuckDB /clean_data/v1/tabular/duckdb/_.duckdb parquet files /clean_data/v1/tabular/parquet/_/, type type data (e.g. 'od', 'os', 'nt', correspoind 'origin-destination', 'overnight-stays', 'number--trips', etc.) zones name geographic zones (e.g. 'distr', 'muni', etc.). See details function arguments description. overwrite logical character vector length 1. TRUE, overwrites existing DuckDBorparquetfiles. Defaults toFALSE`. parquet files can also set 'update', parquet files created dates yet converted. data_dir directory data stored. Defaults value returned spod_get_data_dir() returns value environment variable SPANISH_OD_DATA_DIR temporary directory variable set. quiet logical value indicating whether suppress messages. Default FALSE. max_mem_gb maximum memory use GB. conservative default 3 GB, enough resaving data DuckDB form folder CSV.gz files small enough fit memory even old computers. data analysis using already converted data (DuckDB Parquet format) raw CSV.gz data, recommended increase according available resources. max_n_cpu maximum number threads use. Defaults number available cores minus 1. max_download_size_gb maximum download size gigabytes. Defaults 1.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_convert.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert data from plain text to duckdb or parquet format — spod_convert","text":"Path saved DuckDB file.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_dates_argument_to_dates_seq.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert multiple formates of date arguments to a sequence of dates — spod_dates_argument_to_dates_seq","title":"Convert multiple formates of date arguments to a sequence of dates — spod_dates_argument_to_dates_seq","text":"function processes date arguments provided various functions package. can handle single dates arbitratry sequences (vectors) dates ISO (YYYY-MM-DD) YYYYMMDD format. can also handle date ranges format 'YYYY-MM-DD_YYYY-MM-DD' ('YYYYMMDD_YYYYMMDD'), date ranges named vec regular expressions match dates format YYYYMMDD.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_dates_argument_to_dates_seq.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert multiple formates of date arguments to a sequence of dates — spod_dates_argument_to_dates_seq","text":"","code":"spod_dates_argument_to_dates_seq(dates)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_dates_argument_to_dates_seq.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert multiple formates of date arguments to a sequence of dates — spod_dates_argument_to_dates_seq","text":"dates character Date vector dates process. Kindly keep mind v1 v2 data follow different data collection methodologies may directly comparable. Therefore, try request data versions date range. need compare data versions, please refer respective codebooks methodology documents. v1 data covers period 2020-02-14 2021-05-09, v2 data covers period 2022-01-01 present notice. true dates range checked available data version every function run. possible values can following: spod_get() spod_convert() functions, dates can set \"cached_v1\" \"cached_v2\" request data cached (already previously downloaded) v1 (2020-2021) v2 (2022 onwards) data. case, function identify use data files downloaded cached locally, (e.g. using explicit run spod_download(), data requests made using spod_get() spod_convert() functions). single date ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. vector dates ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. Can non-consecutive sequence dates. date range eigher character Date object length 2 clearly named elements start end ISO (YYYY-MM-DD) YYYYMMDD format. E.g. c(start = \"2020-02-15\", end = \"2020-02-17\"); character object form YYYY-MM-DD_YYYY-MM-DD YYYYMMDD_YYYYMMDD. example, 2020-02-15_2020-02-17 20200215_20200217. regular expression match dates format YYYYMMDD. character object. example, ^202002 match dates February 2020.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_dates_argument_to_dates_seq.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert multiple formates of date arguments to a sequence of dates — spod_dates_argument_to_dates_seq","text":"character vector dates ISO format (YYYY-MM-DD).","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_disconnect.html","id":null,"dir":"Reference","previous_headings":"","what":"Safely disconnect from data and free memory — spod_disconnect","title":"Safely disconnect from data and free memory — spod_disconnect","text":"function ensure DuckDB connections CSV.gz files (created via spod_get()), well DuckDB files folders parquet files (created via spod_convert()) closed properly prevent conflicting connections. Essentially just wrapper around DBI::dbDisconnect() reaches .$src$con object tbl_duckdb_connection connection object returned user via spod_get() spod_connect(). disonnecting database, also frees memory running gc().","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_disconnect.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Safely disconnect from data and free memory — spod_disconnect","text":"","code":"spod_disconnect(tbl_con, free_mem = TRUE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_disconnect.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Safely disconnect from data and free memory — spod_disconnect","text":"tbl_con tbl_duckdb_connection connection object get either spod_get() spod_connect(). free_mem logical. Whether free memory running gc(). Defaults TRUE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_disconnect.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Safely disconnect from data and free memory — spod_disconnect","text":"","code":"if (FALSE) { # \\dontrun{ od_distr <- spod_get(\"od\", zones = \"distr\", dates <- c(\"2020-01-01\", \"2020-01-02\")) spod_disconnect(od_distr) } # }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download.html","id":null,"dir":"Reference","previous_headings":"","what":"Download the data files of specified type, zones, and dates — spod_download","title":"Download the data files of specified type, zones, and dates — spod_download","text":"function downloads data files specified type, zones, dates data version.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Download the data files of specified type, zones, and dates — spod_download","text":"","code":"spod_download( type = c(\"od\", \"origin-destination\", \"os\", \"overnight_stays\", \"nt\", \"number_of_trips\"), zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), dates = NULL, max_download_size_gb = 1, data_dir = spod_get_data_dir(), quiet = FALSE, return_local_file_paths = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Download the data files of specified type, zones, and dates — spod_download","text":"type type data download. Can \"origin-destination\" (ust \"od\"), \"number_of_trips\" (just \"nt\") v1 data. v2 data \"overnight_stays\" (just \"os\") also available. data types supported future. See codebooks v1 v2 data vignettes spod_codebook(1) spod_codebook(2) (spod_codebook). zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). dates character Date vector dates process. Kindly keep mind v1 v2 data follow different data collection methodologies may directly comparable. Therefore, try request data versions date range. need compare data versions, please refer respective codebooks methodology documents. v1 data covers period 2020-02-14 2021-05-09, v2 data covers period 2022-01-01 present notice. true dates range checked available data version every function run. possible values can following: spod_get() spod_convert() functions, dates can set \"cached_v1\" \"cached_v2\" request data cached (already previously downloaded) v1 (2020-2021) v2 (2022 onwards) data. case, function identify use data files downloaded cached locally, (e.g. using explicit run spod_download(), data requests made using spod_get() spod_convert() functions). single date ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. vector dates ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. Can non-consecutive sequence dates. date range eigher character Date object length 2 clearly named elements start end ISO (YYYY-MM-DD) YYYYMMDD format. E.g. c(start = \"2020-02-15\", end = \"2020-02-17\"); character object form YYYY-MM-DD_YYYY-MM-DD YYYYMMDD_YYYYMMDD. example, 2020-02-15_2020-02-17 20200215_20200217. regular expression match dates format YYYYMMDD. character object. example, ^202002 match dates February 2020. max_download_size_gb maximum download size gigabytes. Defaults 1. data_dir directory data stored. Defaults value returned spod_get_data_dir() returns value environment variable SPANISH_OD_DATA_DIR temporary directory variable set. quiet logical value indicating whether suppress messages. Default FALSE. return_local_file_paths Logical. TRUE, function returns character vector paths downloaded files. FALSE, function returns NULL.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Download the data files of specified type, zones, and dates — spod_download","text":"Nothing. return_local_file_paths = TRUE, character vector paths downloaded files.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Download the data files of specified type, zones, and dates — spod_download","text":"","code":"if (FALSE) { # \\dontrun{ # Download the origin-destination on district level for the a date range in March 2020 spod_download( type = \"od\", zones = \"districts\", dates = c(start = \"2020-03-20\", end = \"2020-03-24\") ) # Download the origin-destination on district level for select dates in 2020 and 2021 spod_download( type = \"od\", zones = \"dist\", dates = c(\"2020-03-20\", \"2020-03-24\", \"2021-03-20\", \"2021-03-24\") ) # Download the origin-destination on municipality level using regex for a date range in March 2020 # (the regex will capture the dates 2020-03-20 to 2020-03-24) spod_download( type = \"od\", zones = \"municip\", dates = \"2020032[0-4]\" ) } # }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download_zones_v1.html","id":null,"dir":"Reference","previous_headings":"","what":"Downloads and extracts the raw v1 zones data — spod_download_zones_v1","title":"Downloads and extracts the raw v1 zones data — spod_download_zones_v1","text":"function ensures necessary v1 raw data zones files downloaded extracted specified data directory.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download_zones_v1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Downloads and extracts the raw v1 zones data — spod_download_zones_v1","text":"","code":"spod_download_zones_v1( zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\"), data_dir = spod_get_data_dir(), quiet = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download_zones_v1.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Downloads and extracts the raw v1 zones data — spod_download_zones_v1","text":"zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\"). data_dir directory data stored. quiet Boolean flag control display messages.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download_zones_v1.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Downloads and extracts the raw v1 zones data — spod_download_zones_v1","text":"path downloaded extracted file.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_filter_by_dates.html","id":null,"dir":"Reference","previous_headings":"","what":"Filter a duckdb conenction by dates — spod_duckdb_filter_by_dates","title":"Filter a duckdb conenction by dates — spod_duckdb_filter_by_dates","text":"IMPORTANT: function assumes table view filtered separate year, month day columns integer values. done filtering faster CSV files stored folder structure hive-style /year=2020/month=2/day=14/.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_filter_by_dates.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Filter a duckdb conenction by dates — spod_duckdb_filter_by_dates","text":"","code":"spod_duckdb_filter_by_dates(con, source_view_name, new_view_name, dates)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_filter_by_dates.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Filter a duckdb conenction by dates — spod_duckdb_filter_by_dates","text":"con duckdb connection source_view_name name source duckdb \"view\" (virtual table, context current package likely connected folder CSV files) new_view_name name new duckdb \"view\" (virtual table, context current package likely connected folder CSV files). dates character Date vector dates process. Kindly keep mind v1 v2 data follow different data collection methodologies may directly comparable. Therefore, try request data versions date range. need compare data versions, please refer respective codebooks methodology documents. v1 data covers period 2020-02-14 2021-05-09, v2 data covers period 2022-01-01 present notice. true dates range checked available data version every function run. possible values can following: spod_get() spod_convert() functions, dates can set \"cached_v1\" \"cached_v2\" request data cached (already previously downloaded) v1 (2020-2021) v2 (2022 onwards) data. case, function identify use data files downloaded cached locally, (e.g. using explicit run spod_download(), data requests made using spod_get() spod_convert() functions). single date ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. vector dates ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. Can non-consecutive sequence dates. date range eigher character Date object length 2 clearly named elements start end ISO (YYYY-MM-DD) YYYYMMDD format. E.g. c(start = \"2020-02-15\", end = \"2020-02-17\"); character object form YYYY-MM-DD_YYYY-MM-DD YYYYMMDD_YYYYMMDD. example, 2020-02-15_2020-02-17 20200215_20200217. regular expression match dates format YYYYMMDD. character object. example, ^202002 match dates February 2020.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_limit_resources.html","id":null,"dir":"Reference","previous_headings":"","what":"Set maximum memory and number of threads for a DuckDB connection — spod_duckdb_limit_resources","title":"Set maximum memory and number of threads for a DuckDB connection — spod_duckdb_limit_resources","text":"Set maximum memory number threads DuckDB connection","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_limit_resources.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set maximum memory and number of threads for a DuckDB connection — spod_duckdb_limit_resources","text":"","code":"spod_duckdb_limit_resources( con, max_mem_gb = max(4, spod_available_ram() - 4), max_n_cpu = parallelly::availableCores() - 1 )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_limit_resources.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set maximum memory and number of threads for a DuckDB connection — spod_duckdb_limit_resources","text":"con duckdb connection max_mem_gb maximum memory use GB. conservative default 3 GB, enough resaving data DuckDB form folder CSV.gz files small enough fit memory even old computers. data analysis using already converted data (DuckDB Parquet format) raw CSV.gz data, recommended increase according available resources. max_n_cpu maximum number threads use. Defaults number available cores minus 1.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_limit_resources.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Set maximum memory and number of threads for a DuckDB connection — spod_duckdb_limit_resources","text":"duckdb connection.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_number_of_trips.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a duckdb number of trips table — spod_duckdb_number_of_trips","title":"Create a duckdb number of trips table — spod_duckdb_number_of_trips","text":"function creates duckdb connection number trips data stored folder CSV.gz files.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_number_of_trips.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a duckdb number of trips table — spod_duckdb_number_of_trips","text":"","code":"spod_duckdb_number_of_trips( con = DBI::dbConnect(duckdb::duckdb(), dbdir = \":memory:\", read_only = FALSE), zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), ver = NULL, data_dir = spod_get_data_dir() )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_number_of_trips.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a duckdb number of trips table — spod_duckdb_number_of_trips","text":"con duckdb connection object. specified, new -memory connection created. zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards. data_dir directory data stored. Defaults value returned spod_get_data_dir().","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_number_of_trips.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a duckdb number of trips table — spod_duckdb_number_of_trips","text":"duckdb connection 2 views.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_od.html","id":null,"dir":"Reference","previous_headings":"","what":"Creates a duckdb connection to origin-destination data — spod_duckdb_od","title":"Creates a duckdb connection to origin-destination data — spod_duckdb_od","text":"function creates duckdb connection origin-destination data stored CSV.gz files.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_od.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Creates a duckdb connection to origin-destination data — spod_duckdb_od","text":"","code":"spod_duckdb_od( con = DBI::dbConnect(duckdb::duckdb(), dbdir = \":memory:\", read_only = FALSE), zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), ver = NULL, data_dir = spod_get_data_dir() )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_od.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Creates a duckdb connection to origin-destination data — spod_duckdb_od","text":"con duckdb connection object. specified, new -memory connection created. zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards. data_dir directory data stored. Defaults value returned spod_get_data_dir().","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_od.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Creates a duckdb connection to origin-destination data — spod_duckdb_od","text":"duckdb connection object 2 views: od_csv_raw - raw table view cached CSV files origin-destination data previously cached $SPANISH_OD_DATA_DIR od_csv_clean - cleaned-table view od_csv_raw column names values translated mapped English. still includes cached data. structure cleaned-views od_csv_clean follows: date Date. full date trip, including year, month, day. id_origin factor. identifier origin location trip, formatted code (e.g., '01001_AM'). id_destination factor. identifier destination location trip, formatted code (e.g., '01001_AM'). activity_origin factor. type activity origin location (e.g., 'home', 'work'). Note: available district level data. activity_destination factor. type activity destination location (e.g., 'home', ''). Note: available district level data. residence_province_ine_code factor. province residence group individual making trip, encoded according INE classification. Note: available district level data. residence_province_name factor. province residence group individuals making trip (e.g., 'Cuenca', 'Girona'). Note: available district level data. time_slot integer. time slot (hour day) trip started, represented integer (e.g., 0, 1, 2). distance factor. distance category trip, represented code (e.g., '002-005' 2-5 km). n_trips double. number trips taken within specified time slot distance. trips_total_length_km double. total length trips kilometers specified time slot distance. year double. year trip. month double. month trip. day double. day trip. structure original data od_csv_raw follows: fecha Date. date trip, including year, month, day. origen character. identifier origin location trip, formatted character string (e.g., '01001_AM'). destino character. identifier destination location trip, formatted character string (e.g., '01001_AM'). actividad_origen character. type activity origin location (e.g., 'casa', 'trabajo'). actividad_destino character. type activity destination location (e.g., 'otros', 'trabajo'). residencia character. code representing residence individual making trip (e.g., '01') according official INE classification. edad character. age individual making trip. data actaully filled 'NA' values, column removed cleaned-translated view described . periodo integer. time period trip started, represented integer (e.g., 0, 1, 2). distancia character. distance category trip, represented character string (e.g., '002-005' 2-5 km). viajes double. number trips taken within specified time period distance. viajes_km double. total length trips kilometers specified time period distance. day double. day trip. month double. month trip. year double. year trip.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_overnight_stays.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a duckdb overnight stays table — spod_duckdb_overnight_stays","title":"Create a duckdb overnight stays table — spod_duckdb_overnight_stays","text":"function creates duckdb connection overnight stays data stored folder CSV.gz files.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_overnight_stays.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a duckdb overnight stays table — spod_duckdb_overnight_stays","text":"","code":"spod_duckdb_overnight_stays( con = DBI::dbConnect(duckdb::duckdb(), dbdir = \":memory:\", read_only = FALSE), zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), ver = NULL, data_dir = spod_get_data_dir() )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_overnight_stays.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a duckdb overnight stays table — spod_duckdb_overnight_stays","text":"con duckdb connection object. specified, new -memory connection created. zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards. data_dir directory data stored. Defaults value returned spod_get_data_dir().","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_overnight_stays.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a duckdb overnight stays table — spod_duckdb_overnight_stays","text":"duckdb connection 2 views.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_set_temp.html","id":null,"dir":"Reference","previous_headings":"","what":"Set temp file for DuckDB connection — spod_duckdb_set_temp","title":"Set temp file for DuckDB connection — spod_duckdb_set_temp","text":"Set temp file DuckDB connection","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_set_temp.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set temp file for DuckDB connection — spod_duckdb_set_temp","text":"","code":"spod_duckdb_set_temp(con, temp_path = spod_get_temp_dir())"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_set_temp.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set temp file for DuckDB connection — spod_duckdb_set_temp","text":"con duckdb connection temp_path path temp folder DuckDB intermediate spilling case set memory limit /physical memory computer low perform query. default set temp directory data folder defined SPANISH_OD_DATA_DIR environment variable. Otherwise, queries folders CSV files parquet files, temporary path set current R working directory, probably undesirable, current working directory can slow storage, storage may limited space, compared data folder.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_set_temp.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Set temp file for DuckDB connection — spod_duckdb_set_temp","text":"duckdb connection.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_expand_dates_from_regex.html","id":null,"dir":"Reference","previous_headings":"","what":"Function to expand dates from a regex — spod_expand_dates_from_regex","title":"Function to expand dates from a regex — spod_expand_dates_from_regex","text":"function generates sequence dates regular expression pattern. based provided regular expression.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_expand_dates_from_regex.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function to expand dates from a regex — spod_expand_dates_from_regex","text":"","code":"spod_expand_dates_from_regex(date_regex)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_expand_dates_from_regex.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function to expand dates from a regex — spod_expand_dates_from_regex","text":"date_regex regular expression match dates format yyyymmdd.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_expand_dates_from_regex.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function to expand dates from a regex — spod_expand_dates_from_regex","text":"character vector dates matching regex.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_files_sizes.html","id":null,"dir":"Reference","previous_headings":"","what":"Get files sizes for remote files of v1 and v2 data and save them into a csv.gz file in the inst/extdata folder. — spod_files_sizes","title":"Get files sizes for remote files of v1 and v2 data and save them into a csv.gz file in the inst/extdata folder. — spod_files_sizes","text":"Get files sizes remote files v1 v2 data save csv.gz file inst/extdata folder.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_files_sizes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get files sizes for remote files of v1 and v2 data and save them into a csv.gz file in the inst/extdata folder. — spod_files_sizes","text":"","code":"spod_files_sizes(ver = 2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_files_sizes.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get files sizes for remote files of v1 and v2 data and save them into a csv.gz file in the inst/extdata folder. — spod_files_sizes","text":"ver version data (1 2). Can . Defaults 2, v1 data updated since 2021.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get.html","id":null,"dir":"Reference","previous_headings":"","what":"Get tabular data — spod_get","title":"Get tabular data — spod_get","text":"function creates DuckDB lazy table connection object specified type zones. checks missing data downloads necessary. connnection made raw CSV files gzip archives, analysing data connection may slow select days. can manipulate object using {dplyr} functions select, filter, mutate, group_by, summarise, etc. end sequence commands need add collect execute whole chain data manipulations load results memory R data.frame/tibble. See codebooks v1 v2 data vignettes spod_codebook(1) spod_codebook(2) (spod_codebook). want analyse longer periods time (especiially several months even whole data several years), consider using spod_convert spod_connect.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get tabular data — spod_get","text":"","code":"spod_get( type = c(\"od\", \"origin-destination\", \"os\", \"overnight_stays\", \"nt\", \"number_of_trips\"), zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), dates = NULL, data_dir = spod_get_data_dir(), quiet = FALSE, max_mem_gb = max(4, spod_available_ram() - 4), max_n_cpu = parallelly::availableCores() - 1, max_download_size_gb = 1, duckdb_target = \":memory:\", temp_path = spod_get_temp_dir() )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get tabular data — spod_get","text":"type type data download. Can \"origin-destination\" (ust \"od\"), \"number_of_trips\" (just \"nt\") v1 data. v2 data \"overnight_stays\" (just \"os\") also available. data types supported future. See codebooks v1 v2 data vignettes spod_codebook(1) spod_codebook(2) (spod_codebook). zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). dates character Date vector dates process. Kindly keep mind v1 v2 data follow different data collection methodologies may directly comparable. Therefore, try request data versions date range. need compare data versions, please refer respective codebooks methodology documents. v1 data covers period 2020-02-14 2021-05-09, v2 data covers period 2022-01-01 present notice. true dates range checked available data version every function run. possible values can following: spod_get() spod_convert() functions, dates can set \"cached_v1\" \"cached_v2\" request data cached (already previously downloaded) v1 (2020-2021) v2 (2022 onwards) data. case, function identify use data files downloaded cached locally, (e.g. using explicit run spod_download(), data requests made using spod_get() spod_convert() functions). single date ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. vector dates ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. Can non-consecutive sequence dates. date range eigher character Date object length 2 clearly named elements start end ISO (YYYY-MM-DD) YYYYMMDD format. E.g. c(start = \"2020-02-15\", end = \"2020-02-17\"); character object form YYYY-MM-DD_YYYY-MM-DD YYYYMMDD_YYYYMMDD. example, 2020-02-15_2020-02-17 20200215_20200217. regular expression match dates format YYYYMMDD. character object. example, ^202002 match dates February 2020. data_dir directory data stored. Defaults value returned spod_get_data_dir() returns value environment variable SPANISH_OD_DATA_DIR temporary directory variable set. quiet logical value indicating whether suppress messages. Default FALSE. max_mem_gb maximum memory use GB. conservative default 3 GB, enough resaving data DuckDB form folder CSV.gz files small enough fit memory even old computers. data analysis using already converted data (DuckDB Parquet format) raw CSV.gz data, recommended increase according available resources. max_n_cpu maximum number threads use. Defaults number available cores minus 1. max_download_size_gb maximum download size gigabytes. Defaults 1. duckdb_target (Optional) path duckdb file save data , convertation CSV reuqested spod_convert function. specified, set \":memory:\" data stored memory. temp_path path temp folder DuckDB intermediate spilling case set memory limit /physical memory computer low perform query. default set temp directory data folder defined SPANISH_OD_DATA_DIR environment variable. Otherwise, queries folders CSV files parquet files, temporary path set current R working directory, probably undesirable, current working directory can slow storage, storage may limited space, compared data folder.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get tabular data — spod_get","text":"DuckDB lazy table connection object class tbl_duckdb_connection.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get tabular data — spod_get","text":"","code":"if (FALSE) { # \\dontrun{ # create a connection to the v1 data Sys.setenv(SPANISH_OD_DATA_DIR = \"~/path/to/your/cache/dir\") dates <- c(\"2020-02-14\", \"2020-03-14\", \"2021-02-14\", \"2021-02-14\", \"2021-02-15\") od_dist <- spod_get(type = \"od\", zones = \"distr\", dates = dates) # od dist is a table view filtered to the specified dates # access the source connection with all dates # list tables DBI::dbListTables(od_dist$src$con) } # }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_data_dir.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the data directory — spod_get_data_dir","title":"Get the data directory — spod_get_data_dir","text":"function retrieves data directory environment variable SPANISH_OD_DATA_DIR. environment variable set, returns temporary directory.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_data_dir.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the data directory — spod_get_data_dir","text":"","code":"spod_get_data_dir(quiet = FALSE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_data_dir.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the data directory — spod_get_data_dir","text":"quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_data_dir.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the data directory — spod_get_data_dir","text":"data directory.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_file_size_from_url.html","id":null,"dir":"Reference","previous_headings":"","what":"Get file size from URL — spod_get_file_size_from_url","title":"Get file size from URL — spod_get_file_size_from_url","text":"Get file size URL","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_file_size_from_url.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get file size from URL — spod_get_file_size_from_url","text":"","code":"spod_get_file_size_from_url(x_url)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_file_size_from_url.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get file size from URL — spod_get_file_size_from_url","text":"x_url URL","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_file_size_from_url.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get file size from URL — spod_get_file_size_from_url","text":"File size MB","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v1_file_list.html","id":null,"dir":"Reference","previous_headings":"","what":"Get latest file list from the XML for MITMA open mobility data v1 (2020-2021) — spod_get_latest_v1_file_list","title":"Get latest file list from the XML for MITMA open mobility data v1 (2020-2021) — spod_get_latest_v1_file_list","text":"Get latest file list XML MITMA open mobility data v1 (2020-2021)","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v1_file_list.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get latest file list from the XML for MITMA open mobility data v1 (2020-2021) — spod_get_latest_v1_file_list","text":"","code":"spod_get_latest_v1_file_list( data_dir = spod_get_data_dir(), xml_url = \"https://opendata-movilidad.mitma.es/RSS.xml\" )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v1_file_list.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get latest file list from the XML for MITMA open mobility data v1 (2020-2021) — spod_get_latest_v1_file_list","text":"data_dir directory data stored. Defaults value returned spod_get_data_dir(). xml_url URL XML file download. Defaults \"https://opendata-movilidad.mitma.es/RSS.xml\".","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v1_file_list.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get latest file list from the XML for MITMA open mobility data v1 (2020-2021) — spod_get_latest_v1_file_list","text":"path downloaded XML file.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v1_file_list.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get latest file list from the XML for MITMA open mobility data v1 (2020-2021) — spod_get_latest_v1_file_list","text":"","code":"if (FALSE) { spod_get_latest_v1_file_list() }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v2_file_list.html","id":null,"dir":"Reference","previous_headings":"","what":"Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards) — spod_get_latest_v2_file_list","title":"Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards) — spod_get_latest_v2_file_list","text":"Get latest file list XML MITMA open mobility data v2 (2022 onwards)","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v2_file_list.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards) — spod_get_latest_v2_file_list","text":"","code":"spod_get_latest_v2_file_list( data_dir = spod_get_data_dir(), xml_url = \"https://movilidad-opendata.mitma.es/RSS.xml\" )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v2_file_list.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards) — spod_get_latest_v2_file_list","text":"data_dir directory data stored. Defaults value returned spod_get_data_dir(). xml_url URL XML file download. Defaults \"https://movilidad-opendata.mitma.es/RSS.xml\".","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v2_file_list.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards) — spod_get_latest_v2_file_list","text":"path downloaded XML file.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v2_file_list.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards) — spod_get_latest_v2_file_list","text":"","code":"if (FALSE) { spod_get_latest_v2_file_list() }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_temp_dir.html","id":null,"dir":"Reference","previous_headings":"","what":"Get temporary directory for DuckDB intermediate spilling — spod_get_temp_dir","title":"Get temporary directory for DuckDB intermediate spilling — spod_get_temp_dir","text":"Get path temp folder DuckDB intermediate spilling case set memory limit /physical memory computer low perform query.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_temp_dir.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get temporary directory for DuckDB intermediate spilling — spod_get_temp_dir","text":"","code":"spod_get_temp_dir(data_dir = spod_get_data_dir())"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_temp_dir.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get temporary directory for DuckDB intermediate spilling — spod_get_temp_dir","text":"data_dir directory data stored. Defaults value returned spod_get_data_dir().","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_temp_dir.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get temporary directory for DuckDB intermediate spilling — spod_get_temp_dir","text":"path temp folder DuckDB intermediate spilling.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_valid_dates.html","id":null,"dir":"Reference","previous_headings":"","what":"Get valid dates for the specified data version — spod_get_valid_dates","title":"Get valid dates for the specified data version — spod_get_valid_dates","text":"Get valid dates specified data version","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_valid_dates.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get valid dates for the specified data version — spod_get_valid_dates","text":"","code":"spod_get_valid_dates(ver = NULL)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_valid_dates.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get valid dates for the specified data version — spod_get_valid_dates","text":"ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_valid_dates.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get valid dates for the specified data version — spod_get_valid_dates","text":"vector type Date possible valid dates specified data version (v1 2020-2021 v2 2020 onwards).","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones.html","id":null,"dir":"Reference","previous_headings":"","what":"Get zones — spod_get_zones","title":"Get zones — spod_get_zones","text":"Get spatial zones specified data version. Supports v1 (2020-2021) v2 (2022 onwards) data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get zones — spod_get_zones","text":"","code":"spod_get_zones( zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), ver = NULL, data_dir = spod_get_data_dir(), quiet = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get zones — spod_get_zones","text":"zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards. data_dir directory data stored. Defaults value returned spod_get_data_dir() returns value environment variable SPANISH_OD_DATA_DIR temporary directory variable set. quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get zones — spod_get_zones","text":"sf object (Simple Feature collection). columns v1 (2020-2021) data include: id character vector containing unique identifier district, assigned data provider. id matches id_origin, id_destination, id district-level origin-destination number trips data. census_districts string semicolon-separated identifiers census districts classified Spanish Statistical Office (INE) spatially bound within polygons id. municipalities_mitma string semicolon-separated municipality identifiers (assigned data provider) corresponding district id. municipalities string semicolon-separated municipality identifiers classified Spanish Statistical Office (INE) corresponding id. district_names_in_v2/municipality_names_in_v2 string semicolon-separated district names (v2 version data) corresponding district id v1. district_ids_in_v2/municipality_ids_in_v2 string semicolon-separated district identifiers (v2 version data) corresponding district id v1. geometry MULTIPOLYGON column containing spatial geometry district, stored sf object. geometry projected ETRS89 / UTM zone 30N coordinate reference system (CRS), XY dimensions. columns v2 (2022 onwards) data include: id character vector containing unique identifier zone, assigned data provider. name character vector name district. population numeric vector representing population district (2022). census_sections string semicolon-separated identifiers census sections corresponding district. census_districts string semicolon-separated identifiers census districts classified Spanish Statistical Office (INE) corresponding district. municipalities string semicolon-separated identifiers municipalities classified Spanish Statistical Office (INE) corresponding district. municipalities_mitma string semicolon-separated identifiers municipalities, assigned data provider, correspond district. luas_mitma string semicolon-separated identifiers LUAs (Local Urban Areas) provider, associated district. district_ids_in_v1/municipality_ids_in_v1 string semicolon-separated district identifiers v1 data corresponding district v2. match exists, marked NA. geometry MULTIPOLYGON column containing spatial geometry district, stored sf object. geometry projected ETRS89 / UTM zone 30N coordinate reference system (CRS), XY dimensions.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v1.html","id":null,"dir":"Reference","previous_headings":"","what":"Retrieves the zones for v1 data — spod_get_zones_v1","title":"Retrieves the zones for v1 data — spod_get_zones_v1","text":"function retrieves zones data specified data directory. can retrieve either \"distritos\" \"municipios\" zones data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Retrieves the zones for v1 data — spod_get_zones_v1","text":"","code":"spod_get_zones_v1( zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\"), data_dir = spod_get_data_dir(), quiet = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v1.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Retrieves the zones for v1 data — spod_get_zones_v1","text":"zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\"). data_dir directory data stored. quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v1.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Retrieves the zones for v1 data — spod_get_zones_v1","text":"sf object (Simple Feature collection) 2 fields: id character vector containing unique identifier zone, matched identifiers tabular data. geometry MULTIPOLYGON column containing spatial geometry zone, stored sf object. geometry projected ETRS89 / UTM zone 30N coordinate reference system (CRS), XY dimensions.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v1.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Retrieves the zones for v1 data — spod_get_zones_v1","text":"","code":"if (FALSE) { zones <- spod_get_zones_v1() }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v2.html","id":null,"dir":"Reference","previous_headings":"","what":"retrieves the zones data — spod_get_zones_v2","title":"retrieves the zones data — spod_get_zones_v2","text":"function retrieves zones data specified data directory. can retrieve either \"distritos\" \"municipios\" zones data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"retrieves the zones data — spod_get_zones_v2","text":"","code":"spod_get_zones_v2( zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), data_dir = spod_get_data_dir(), quiet = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"retrieves the zones data — spod_get_zones_v2","text":"zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\"). data_dir directory data stored. quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v2.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"retrieves the zones data — spod_get_zones_v2","text":"sf object (Simple Feature collection) 4 fields: id character vector containing unique identifier zone, matched identifiers tabular data. name character vector name zone. population numeric vector representing population zone (2022). geometry MULTIPOLYGON column containing spatial geometry zone, stored sf object. geometry projected ETRS89 / UTM zone 30N coordinate reference system (CRS), XY dimensions.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v2.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"retrieves the zones data — spod_get_zones_v2","text":"","code":"if (FALSE) { zones <- spod_get_zones_v2() }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_is_data_version_overlaps.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if specified dates span both data versions — spod_is_data_version_overlaps","title":"Check if specified dates span both data versions — spod_is_data_version_overlaps","text":"function checks specified dates date ranges span v1 v2 data versions.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_is_data_version_overlaps.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if specified dates span both data versions — spod_is_data_version_overlaps","text":"","code":"spod_is_data_version_overlaps(dates)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_is_data_version_overlaps.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if specified dates span both data versions — spod_is_data_version_overlaps","text":"dates Dates vector dates check.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_is_data_version_overlaps.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if specified dates span both data versions — spod_is_data_version_overlaps","text":"TRUE dates span data versions, FALSE otherwise.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_match_data_type.html","id":null,"dir":"Reference","previous_headings":"","what":"Match data types for normalisation — spod_match_data_type","title":"Match data types for normalisation — spod_match_data_type","text":"Match data types normalisation","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_match_data_type.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Match data types for normalisation — spod_match_data_type","text":"","code":"spod_match_data_type( type = c(\"od\", \"origin-destination\", \"viajes\", \"os\", \"overnight_stays\", \"pernoctaciones\", \"nt\", \"number_of_trips\", \"personas\") )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_match_data_type.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Match data types for normalisation — spod_match_data_type","text":"type type data match. Can \"od\", \"origin-destination\", \"os\", \"overnight_stays\", \"nt\", \"number_of_trips\".","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_match_data_type_for_local_folders.html","id":null,"dir":"Reference","previous_headings":"","what":"Match data types to folders — spod_match_data_type_for_local_folders","title":"Match data types to folders — spod_match_data_type_for_local_folders","text":"Match data types folders","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_match_data_type_for_local_folders.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Match data types to folders — spod_match_data_type_for_local_folders","text":"","code":"spod_match_data_type_for_local_folders( type = c(\"od\", \"origin-destination\", \"os\", \"overnight_stays\", \"nt\", \"number_of_trips\"), ver = c(1, 2) )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_match_data_type_for_local_folders.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Match data types to folders — spod_match_data_type_for_local_folders","text":"ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_read_sql.html","id":null,"dir":"Reference","previous_headings":"","what":"Load an SQL query, glue it, dplyr::sql it — spod_read_sql","title":"Load an SQL query, glue it, dplyr::sql it — spod_read_sql","text":"Load SQL query specified file package installation directory, glue::collapse , glue::glue case variables need replaced, dplyr::sql additional safety.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_read_sql.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Load an SQL query, glue it, dplyr::sql it — spod_read_sql","text":"","code":"spod_read_sql(sql_file_name)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_read_sql.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Load an SQL query, glue it, dplyr::sql it — spod_read_sql","text":"sql_file_name name SQL file load package installation directory.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_read_sql.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Load an SQL query, glue it, dplyr::sql it — spod_read_sql","text":"Text SQL query class sql/character.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_sql_where_dates.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate a WHERE part of an SQL query from a sequence of dates — spod_sql_where_dates","title":"Generate a WHERE part of an SQL query from a sequence of dates — spod_sql_where_dates","text":"Generate part SQL query sequence dates","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_sql_where_dates.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate a WHERE part of an SQL query from a sequence of dates — spod_sql_where_dates","text":"","code":"spod_sql_where_dates(dates)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_sql_where_dates.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate a WHERE part of an SQL query from a sequence of dates — spod_sql_where_dates","text":"dates Dates vector dates process.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_sql_where_dates.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate a WHERE part of an SQL query from a sequence of dates — spod_sql_where_dates","text":"character vector SQL query.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_clean_data_cache.html","id":null,"dir":"Reference","previous_headings":"","what":"Get clean data subfolder name — spod_subfolder_clean_data_cache","title":"Get clean data subfolder name — spod_subfolder_clean_data_cache","text":"Change subfolder name code function clean data cache apply globally, functions package use function get clean data cache path.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_clean_data_cache.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get clean data subfolder name — spod_subfolder_clean_data_cache","text":"","code":"spod_subfolder_clean_data_cache(ver = 1)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_clean_data_cache.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get clean data subfolder name — spod_subfolder_clean_data_cache","text":"ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_clean_data_cache.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get clean data subfolder name — spod_subfolder_clean_data_cache","text":"Character string subfolder name clean data cache.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_metadata_cache.html","id":null,"dir":"Reference","previous_headings":"","what":"Get metadata cache subfolder name — spod_subfolder_metadata_cache","title":"Get metadata cache subfolder name — spod_subfolder_metadata_cache","text":"Change subfolder name code function metadata cache apply globally, functions package use function get metadata cache path.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_metadata_cache.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get metadata cache subfolder name — spod_subfolder_metadata_cache","text":"","code":"spod_subfolder_metadata_cache()"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_metadata_cache.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get metadata cache subfolder name — spod_subfolder_metadata_cache","text":"Character string subfolder name raw data cache.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_raw_data_cache.html","id":null,"dir":"Reference","previous_headings":"","what":"Get raw data cache subfolder name — spod_subfolder_raw_data_cache","title":"Get raw data cache subfolder name — spod_subfolder_raw_data_cache","text":"Change subfolder name code function raw data cache apply globally, functions package use function get raw data cache path.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_raw_data_cache.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get raw data cache subfolder name — spod_subfolder_raw_data_cache","text":"","code":"spod_subfolder_raw_data_cache(ver = 1)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_raw_data_cache.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get raw data cache subfolder name — spod_subfolder_raw_data_cache","text":"ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_raw_data_cache.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get raw data cache subfolder name — spod_subfolder_raw_data_cache","text":"Character string subfolder name raw data cache.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_unique_separated_ids.html","id":null,"dir":"Reference","previous_headings":"","what":"Remove duplicate values in a semicolon-separated string — spod_unique_separated_ids","title":"Remove duplicate values in a semicolon-separated string — spod_unique_separated_ids","text":"Remove duplicate IDs semicolon-separated string selected column data frame","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_unique_separated_ids.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Remove duplicate values in a semicolon-separated string — spod_unique_separated_ids","text":"","code":"spod_unique_separated_ids(column)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_unique_separated_ids.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Remove duplicate values in a semicolon-separated string — spod_unique_separated_ids","text":"column character vector column data frame remove duplicates .","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_unique_separated_ids.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Remove duplicate values in a semicolon-separated string — spod_unique_separated_ids","text":"character vector semicolon-separated unique IDs.","code":""}] +[{"path":"https://rOpenSpain.github.io/spanishoddata/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2024 spanishoddata authors Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"intro","dir":"Articles","previous_headings":"","what":"Introduction","title":"Download and convert mobility datasets","text":"TL;DR (long, didn’t read): analysing 1 week data, use spod_convert() convert data DuckDB spod_connect() connect analysis using dplyr. Skip section . main focus vignette show get long periods origin-destination data analysis. First, describe compare two ways get mobility data using origin-destination data example. package functions overall approaches working types data available package, number trips, overnight stays data. show get days origin-destination data spod_get(). Finally, show download convert multiple weeks, months even years origin-destination data analysis-ready formats. See description datasets Codebook cookbook v1 (2020-2021) Spanish mobility data Codebook cookbook v2 (2022 onwards) Spanish mobility data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"two-ways-to-get-the-data","dir":"Articles","previous_headings":"","what":"Two ways to get the data","title":"Download and convert mobility datasets","text":"two main ways import datasets: -memory object spod_get(); connection DuckDB Parquet files disk spod_convert() + spod_connect(). latter recommended large datasets (1 week), much faster memory efficient, demonstarte . spod_get() returns objects appropriate small datasets representing days national origin-destination flows. recommend converting data analysis-ready formats (DuckDB Parquet) using spod_convert() + spod_connect(). allow work much longer time periods (months years) consumer laptop (8-16 GB memory). See section details.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"analysing-large-datasets","dir":"Articles","previous_headings":"","what":"Analysing large datasets","title":"Download and convert mobility datasets","text":"mobility datasets available {spanishiddata} large. Particularly origin-destination data, contains millions rows. data sets may fit memory computer, especially plan run analysis multiple days, weeks, months, even years. work datasets, highly recommend using DuckDB Parquet. systems efficiently processing larger--memory datasets, user-firendly presenting data familiar data.frame/tibble object (almost). great intoroduction , recommend materials Danielle Navarro, Jonathan Keane, Stephanie Hazlitt: website, slides, video tutorial. can also find examples aggregating origin-destination data flows analysis visualisation vignettes static interactive flows visualisation. Learning use DuckDB Parquet easy anyone ever worked dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. However, since learning curve master new tools, provide helper functions novices get started easily open datasets DuckDB Parquet. Please read relevant sections , first show convert data, use . main considerations make choosing DuckDB Parquet (can get spod_convert() + spod_connect()), well CSV.gz (can get spod_get()) analysis speed, convenience data analysis, specific approach prefer getting data. discuss three . data format choose may dramatically impact speed analysis (e.g. filtering dates, calculating number trips per hour, per week, per month, per origin-destination pair, data aggregation manipulation). tests (see Figure 1), found conducting analysis using DuckDB database provided significant speed advantage using Parquet , importantly, raw CSV.gz files. Specifically, comparing query determine mean hourly trips 18 months zone pair, observed using DuckDB database 5 times faster using Parquet files 8 times faster using CSV.gz files. Figure 1: Data processing speed comparison: DuckDB engine running CSV.gz files vs DuckDB database vs folder Parquet files reference, simple query used speed comparison Figure 1: Figure 1 also shows DuckDB format give best performance even low-end systems limited memory number processor cores, conditional fast SSD storage. Also note, choose work long time periods using CSV.gz files via spod_get(), need balance amount memory processor cores via max_n_cpu max_mem_gb arguments, otherwise analysis may fail (see grey area figure), many parallel processes running time limited memory. Regardless data format (DuckDB, Parquet, CSV.gz), functions need data manipulation analysis . analysis actually performed DuckDB (Mühleisen Raasveldt 2024) engine, presents data regular data.frame/tibble object R (almost). point view, difference data formats. can manipulate data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. provide examples following sections. Please refer recommended external tutorials vignettes Analysing large datasets section. choice converting DuckDB Parquet also made based plan work data. Specifically whether want just download long periods even available data, want get data gradually, progress analysis. plan work long time periods, recommend DuckDB, one big file easier update completely. example may working 2020 data. Later decide add 2021 data. case better delete database create scratch. want certain dates, analyse add additional dates later, Parquet may better, day saved separate file, just like original CSV files. Therefore updating folder Parquet files easy just creating new file missing date. work individual days, may notice advantages DuckDB Parquet formats. case, can keep using CSV.gz format analysis using spod_get() function. also useful quick tutorials, need one two days data demonstration purposes.","code":"# data represents either CSV files acquired from `spod_get()`, a `DuckDB` database or a folder of Parquet files connceted with `spod_connect()` data |> group_by(id_origin, id_destination, time_slot) |> summarise(mean_hourly_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"duckdb-vs-parquet-csv","dir":"Articles","previous_headings":"","what":"How to choose between DuckDB, Parquet, and CSV","title":"Download and convert mobility datasets","text":"main considerations make choosing DuckDB Parquet (can get spod_convert() + spod_connect()), well CSV.gz (can get spod_get()) analysis speed, convenience data analysis, specific approach prefer getting data. discuss three . data format choose may dramatically impact speed analysis (e.g. filtering dates, calculating number trips per hour, per week, per month, per origin-destination pair, data aggregation manipulation). tests (see Figure 1), found conducting analysis using DuckDB database provided significant speed advantage using Parquet , importantly, raw CSV.gz files. Specifically, comparing query determine mean hourly trips 18 months zone pair, observed using DuckDB database 5 times faster using Parquet files 8 times faster using CSV.gz files. Figure 1: Data processing speed comparison: DuckDB engine running CSV.gz files vs DuckDB database vs folder Parquet files reference, simple query used speed comparison Figure 1: Figure 1 also shows DuckDB format give best performance even low-end systems limited memory number processor cores, conditional fast SSD storage. Also note, choose work long time periods using CSV.gz files via spod_get(), need balance amount memory processor cores via max_n_cpu max_mem_gb arguments, otherwise analysis may fail (see grey area figure), many parallel processes running time limited memory. Regardless data format (DuckDB, Parquet, CSV.gz), functions need data manipulation analysis . analysis actually performed DuckDB (Mühleisen Raasveldt 2024) engine, presents data regular data.frame/tibble object R (almost). point view, difference data formats. can manipulate data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. provide examples following sections. Please refer recommended external tutorials vignettes Analysing large datasets section. choice converting DuckDB Parquet also made based plan work data. Specifically whether want just download long periods even available data, want get data gradually, progress analysis. plan work long time periods, recommend DuckDB, one big file easier update completely. example may working 2020 data. Later decide add 2021 data. case better delete database create scratch. want certain dates, analyse add additional dates later, Parquet may better, day saved separate file, just like original CSV files. Therefore updating folder Parquet files easy just creating new file missing date. work individual days, may notice advantages DuckDB Parquet formats. case, can keep using CSV.gz format analysis using spod_get() function. also useful quick tutorials, need one two days data demonstration purposes.","code":"# data represents either CSV files acquired from `spod_get()`, a `DuckDB` database or a folder of Parquet files connceted with `spod_connect()` data |> group_by(id_origin, id_destination, time_slot) |> summarise(mean_hourly_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"speed-comparison","dir":"Articles","previous_headings":"3 Analysing large datasets","what":"Analysis Speed","title":"Download and convert mobility datasets","text":"data format choose may dramatically impact speed analysis (e.g. filtering dates, calculating number trips per hour, per week, per month, per origin-destination pair, data aggregation manipulation). tests (see Figure 1), found conducting analysis using DuckDB database provided significant speed advantage using Parquet , importantly, raw CSV.gz files. Specifically, comparing query determine mean hourly trips 18 months zone pair, observed using DuckDB database 5 times faster using Parquet files 8 times faster using CSV.gz files. Figure 1: Data processing speed comparison: DuckDB engine running CSV.gz files vs DuckDB database vs folder Parquet files reference, simple query used speed comparison Figure 1: Figure 1 also shows DuckDB format give best performance even low-end systems limited memory number processor cores, conditional fast SSD storage. Also note, choose work long time periods using CSV.gz files via spod_get(), need balance amount memory processor cores via max_n_cpu max_mem_gb arguments, otherwise analysis may fail (see grey area figure), many parallel processes running time limited memory.","code":"# data represents either CSV files acquired from `spod_get()`, a `DuckDB` database or a folder of Parquet files connceted with `spod_connect()` data |> group_by(id_origin, id_destination, time_slot) |> summarise(mean_hourly_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"convenience-of-data-analysis","dir":"Articles","previous_headings":"3 Analysing large datasets","what":"Convenience of data analysis","title":"Download and convert mobility datasets","text":"Regardless data format (DuckDB, Parquet, CSV.gz), functions need data manipulation analysis . analysis actually performed DuckDB (Mühleisen Raasveldt 2024) engine, presents data regular data.frame/tibble object R (almost). point view, difference data formats. can manipulate data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. provide examples following sections. Please refer recommended external tutorials vignettes Analysing large datasets section.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"scenarios-of-getting-the-data","dir":"Articles","previous_headings":"3 Analysing large datasets","what":"Scenarios of getting the data","title":"Download and convert mobility datasets","text":"choice converting DuckDB Parquet also made based plan work data. Specifically whether want just download long periods even available data, want get data gradually, progress analysis. plan work long time periods, recommend DuckDB, one big file easier update completely. example may working 2020 data. Later decide add 2021 data. case better delete database create scratch. want certain dates, analyse add additional dates later, Parquet may better, day saved separate file, just like original CSV files. Therefore updating folder Parquet files easy just creating new file missing date. work individual days, may notice advantages DuckDB Parquet formats. case, can keep using CSV.gz format analysis using spod_get() function. also useful quick tutorials, need one two days data demonstration purposes.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"setup","dir":"Articles","previous_headings":"","what":"Setup","title":"Download and convert mobility datasets","text":"Make sure loaded package: Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"library(spanishoddata) Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Download and convert mobility datasets","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"spod-get","dir":"Articles","previous_headings":"","what":"Getting a single day with spod_get()","title":"Download and convert mobility datasets","text":"might seen codebooks v1 v2 data, can get single day’s worth data -memory object spod_get(): output look like : Note lazily-evaluated -memory object (note :memory: database path). means data loaded memory call collect() . useful quick exploration data, recommended large datasets, demonstrated .","code":"dates <- c(\"2024-03-01\") d_1 <- spod_get(type = \"od\", zones = \"distr\", dates = dates) class(d_1) # Source: table [?? x 19] # Database: DuckDB v1.0.0 [... 6.5.0-45-generic:R 4.4.1/:memory:] date time_slot id_origin id_destination distance activity_origin 1 2024-03-01 19 01009_AM 01001 0.5-2 frequent_activity 2 2024-03-01 15 01002 01001 10-50 frequent_activity"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"duckdb","dir":"Articles","previous_headings":"","what":"Analysing the data using DuckDB database","title":"Download and convert mobility datasets","text":"Please make sure steps Setup section . can download convert data DuckDB database two steps. example, select dates, download data manually (note: use dates_2 refer fact using v2 data): , can convert downloaded data (including files might downloaded previosly running spod_get() spod_download() dates date intervals) DuckDB like (dates = \"cached_v2\" means use downloaded files): dates = \"cached_v2\" (can also dates = \"cached_v1\" v1 data) argument instructs function work already-downloaded files. default resulting DuckDB database v2 origin-destination data districts saved SPANISH_OD_DATA_DIR directory v2/tabular/duckdb/ filename od_distritos.duckdb (can change file path save_path argument). function returns full path database file, save db_2 variable. can also desired save location save_path argument spod_convert(). can also convert dates range dates list DuckDB: case, missing data yet downloaded automatically downloaded, 2020-02-17 redownloaded, already requsted creating db_1. requested dates converted DuckDB, overwriting file db_1. , save path output DuckDB database file db_2 variable. can read introductory information connect DuckDB files , however simplify things created helper function. connect data stored path db_1 db_2 can following: Just like , spod_get() funciton used download raw CSV.gz files analyse without conversion, resulting object my_od_data_2 also tbl_duckdb_connection. , can treat regular data.frame tibble use dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. analysis, please refer recommended external tutorials vignettes Analysing large datasets section. finishing working my_od_data_2 advise “disconnect” data using: useful free-memory neccessary like run spod_convert() save data location. Otherwise, also helpful avoid unnecessary possible warnings terminal garbage collected connections.","code":"dates_2 <- c(start = \"2023-02-14\", end = \"2023-02-17\") spod_download(type = \"od\", zones = \"distr\", dates = dates_2) db_2 <- spod_convert(type = \"od\", zones = \"distr\", dates = \"cached_v2\", save_format = \"duckdb\", overwrite = TRUE) db_2 # check the path to the saved `DuckDB` database dates_1 <- c(start = \"2020-02-17\", end = \"2020-02-19\") db_2 <- spod_convert(type = \"od\", zones = \"distr\", dates = dates_1, overwrite = TRUE) my_od_data_2 <- spod_connect(db_2) spod_disconnect(my_od_data_2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"convert-to-duckdb","dir":"Articles","previous_headings":"","what":"Convert to DuckDB","title":"Download and convert mobility datasets","text":"can download convert data DuckDB database two steps. example, select dates, download data manually (note: use dates_2 refer fact using v2 data): , can convert downloaded data (including files might downloaded previosly running spod_get() spod_download() dates date intervals) DuckDB like (dates = \"cached_v2\" means use downloaded files): dates = \"cached_v2\" (can also dates = \"cached_v1\" v1 data) argument instructs function work already-downloaded files. default resulting DuckDB database v2 origin-destination data districts saved SPANISH_OD_DATA_DIR directory v2/tabular/duckdb/ filename od_distritos.duckdb (can change file path save_path argument). function returns full path database file, save db_2 variable. can also desired save location save_path argument spod_convert(). can also convert dates range dates list DuckDB: case, missing data yet downloaded automatically downloaded, 2020-02-17 redownloaded, already requsted creating db_1. requested dates converted DuckDB, overwriting file db_1. , save path output DuckDB database file db_2 variable.","code":"dates_2 <- c(start = \"2023-02-14\", end = \"2023-02-17\") spod_download(type = \"od\", zones = \"distr\", dates = dates_2) db_2 <- spod_convert(type = \"od\", zones = \"distr\", dates = \"cached_v2\", save_format = \"duckdb\", overwrite = TRUE) db_2 # check the path to the saved `DuckDB` database dates_1 <- c(start = \"2020-02-17\", end = \"2020-02-19\") db_2 <- spod_convert(type = \"od\", zones = \"distr\", dates = dates_1, overwrite = TRUE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"load-converted-duckdb","dir":"Articles","previous_headings":"","what":"Load the converted DuckDB","title":"Download and convert mobility datasets","text":"can read introductory information connect DuckDB files , however simplify things created helper function. connect data stored path db_1 db_2 can following: Just like , spod_get() funciton used download raw CSV.gz files analyse without conversion, resulting object my_od_data_2 also tbl_duckdb_connection. , can treat regular data.frame tibble use dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. analysis, please refer recommended external tutorials vignettes Analysing large datasets section. finishing working my_od_data_2 advise “disconnect” data using: useful free-memory neccessary like run spod_convert() save data location. Otherwise, also helpful avoid unnecessary possible warnings terminal garbage collected connections.","code":"my_od_data_2 <- spod_connect(db_2) spod_disconnect(my_od_data_2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"parquet","dir":"Articles","previous_headings":"","what":"Analysing the data using Parquet","title":"Download and convert mobility datasets","text":"Please make sure steps Setup section . process exactly DuckDB . difference data converted parquet format stored SPANISH_OD_DATA_DIR v1/clean_data/tabular/parquet/ directory v1 data (change save_path argument), subfolders hive-style format like year=2020/month=2/day=14 inside folders single parquet file placed containing data day. advantage format can “update” quickly. example, first downloaded data March April 2020, converted period parquet format, downloaded data May June 2020, run convertion function , convert data May June 2020 add existing parquet files. save time wait March April 2020 converted . Let us convert dates parquet format: now request additional dates overlap already converted data like specifiy argument overwrite = 'update' update existing parquet files new data: , 16 17 Feboruary converted . new data, converted (18 19 February) converted, added existing folder structure ofparquet files stored default save_path location, /clean_data/v1/tabular/parquet/od_distritos. Alternatively, can set save location setting save_path argument. Working parquet files exactly DuckDB Arrow files. Just like , can use helper function spod_connect() connect parquet files: Mind though, first converted data period 14 17 February 2020, converted data period 16 19 February 2020 save default location, od_parquet contains path data, therefore my_od_data_3 connect data. can check like : analysis, please refer recommended external tutorials vignettes Analysing large datasets section.","code":"type <- \"od\" zones <- \"distr\" dates <- c(start = \"2020-02-14\", end = \"2020-02-17\") od_parquet <- spod_convert(type = type, zones = zones, dates = dates, save_format = \"parquet\") dates <- c(start = \"2020-02-16\", end = \"2020-02-19\") od_parquet <- spod_convert(type = type, zones = zones, dates = dates, save_format = \"parquet\", overwrite = 'update') my_od_data_3 <- spod_connect(od_parquet) my_od_data_3 |> dplyr::distinct(date) |> dplyr::arrange(date)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"convert-to-parquet","dir":"Articles","previous_headings":"","what":"Convert to Parquet","title":"Download and convert mobility datasets","text":"process exactly DuckDB . difference data converted parquet format stored SPANISH_OD_DATA_DIR v1/clean_data/tabular/parquet/ directory v1 data (change save_path argument), subfolders hive-style format like year=2020/month=2/day=14 inside folders single parquet file placed containing data day. advantage format can “update” quickly. example, first downloaded data March April 2020, converted period parquet format, downloaded data May June 2020, run convertion function , convert data May June 2020 add existing parquet files. save time wait March April 2020 converted . Let us convert dates parquet format: now request additional dates overlap already converted data like specifiy argument overwrite = 'update' update existing parquet files new data: , 16 17 Feboruary converted . new data, converted (18 19 February) converted, added existing folder structure ofparquet files stored default save_path location, /clean_data/v1/tabular/parquet/od_distritos. Alternatively, can set save location setting save_path argument.","code":"type <- \"od\" zones <- \"distr\" dates <- c(start = \"2020-02-14\", end = \"2020-02-17\") od_parquet <- spod_convert(type = type, zones = zones, dates = dates, save_format = \"parquet\") dates <- c(start = \"2020-02-16\", end = \"2020-02-19\") od_parquet <- spod_convert(type = type, zones = zones, dates = dates, save_format = \"parquet\", overwrite = 'update')"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"load-converted-parquet","dir":"Articles","previous_headings":"","what":"Load the converted Parquet","title":"Download and convert mobility datasets","text":"Working parquet files exactly DuckDB Arrow files. Just like , can use helper function spod_connect() connect parquet files: Mind though, first converted data period 14 17 February 2020, converted data period 16 19 February 2020 save default location, od_parquet contains path data, therefore my_od_data_3 connect data. can check like : analysis, please refer recommended external tutorials vignettes Analysing large datasets section.","code":"my_od_data_3 <- spod_connect(od_parquet) my_od_data_3 |> dplyr::distinct(date) |> dplyr::arrange(date)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"all-dates","dir":"Articles","previous_headings":"","what":"Download all available data","title":"Download and convert mobility datasets","text":"prepare origin-destination data v1 (2020-2021) analysis whole period data availability, please follow steps : Warning Due mobile network outages, data certain dates missing. Kindly keep mind calculating mean monthly weekly flows. Please check original data page currently known missing dates. time writing, following dates missing: 26, 27, 30, 31 October, 1, 2 3 November 2023 4, 18, 19 April 2024. can use spod_get_valid_dates() function get available dates. example origin-destination district level v1 data. can change type “number_of_trips” zones “municipalities” v1 data. v2 data, just use dates starting 2022-01-01 dates_v2 . Use function arguments v2 way shown v1, also consult v2 data codebook, many datasets addition “origin-destination” “number_of_trips”. convert downloaded data DuckDB format lightning fast analysis. can change save_format parquet want save data Parquet format. comparison overview two formats please see Converting data DuckDB/Parquet faster analysis. default, spod_convert_data() save converted data SPANISH_OD_DATA_DIR directory. can change save_path argument spod_convert_data() want save data different location. conversion, 4 GB operating memory enough, speed process depends number processor cores speed disk storage. SSD preferred. default, spod_convert_data() use except one processor cores computer. can adjust max_n_cpu argument spod_convert_data(). can also increase maximum amount memory used max_mem_gb argument, makes difference analysis stage. Finally, analysis_data_storage simply store path converted data. Either path DuckDB database file path folder Parquet files. reference, converting whole v1 origin-destination data DuckDB takes 20 minutes 4 GB memory 3 processor cores. final size DuckDB database 18 GB, Parquet format - 26 GB. raw CSV files gzip archives 20GB. v2 data much larger, origin-destination tables 2022 - mid-2024 taking 150+ GB raw CSV.gz format. can pass analysis_data_storage path spod_connect() function, whether DuckDB Parquet. function determine data type automatically give back tbl_duckdb_connection1. set max_mem_gb 16 GB. Generally, , feel free increase , also consult Figure 1 speed testing results Speed section. can try combinations max_mem_gb max_n_cpu arguments needs Compared conversion process, might want increase available memory analysis step. , better. can control max_mem_gb argument. can manipulate my_data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. analysis, please refer recommended external tutorials vignettes Analysing large datasets section. finishing working my_data advise “disconnect” free memory:","code":"dates_v1 <- spod_get_valid_dates(ver = 1) dates_v2 <- spod_get_valid_dates(ver = 2) type <- \"origin-destination\" zones <- \"districts\" spod_download( type = type, zones = zones, dates = dates_v1, return_local_file_paths = FALSE, # to avoid getting all downloaded file paths printed to console max_download_size_gb = 50 # in Gb, this should be well over the actual download size for v1 data ) save_format <- \"duckdb\" analysis_data_storage <- spod_convert_data( type = type, zones = zones, dates = \"cached_v1\", # to just convert all data that was previously downloaded, no need to specify dates here save_format = save_format, overwrite = TRUE ) my_data <- spod_connect( data_path = analysis_data_storage, max_mem_gb = 16 ) spod_disconnect(my_data)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"download-all-data","dir":"Articles","previous_headings":"","what":"Download all data","title":"Download and convert mobility datasets","text":"example origin-destination district level v1 data. can change type “number_of_trips” zones “municipalities” v1 data. v2 data, just use dates starting 2022-01-01 dates_v2 . Use function arguments v2 way shown v1, also consult v2 data codebook, many datasets addition “origin-destination” “number_of_trips”.","code":"type <- \"origin-destination\" zones <- \"districts\" spod_download( type = type, zones = zones, dates = dates_v1, return_local_file_paths = FALSE, # to avoid getting all downloaded file paths printed to console max_download_size_gb = 50 # in Gb, this should be well over the actual download size for v1 data )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"convert-all-data-into-analysis-ready-format","dir":"Articles","previous_headings":"","what":"Convert all data into analysis ready format","title":"Download and convert mobility datasets","text":"convert downloaded data DuckDB format lightning fast analysis. can change save_format parquet want save data Parquet format. comparison overview two formats please see Converting data DuckDB/Parquet faster analysis. default, spod_convert_data() save converted data SPANISH_OD_DATA_DIR directory. can change save_path argument spod_convert_data() want save data different location. conversion, 4 GB operating memory enough, speed process depends number processor cores speed disk storage. SSD preferred. default, spod_convert_data() use except one processor cores computer. can adjust max_n_cpu argument spod_convert_data(). can also increase maximum amount memory used max_mem_gb argument, makes difference analysis stage. Finally, analysis_data_storage simply store path converted data. Either path DuckDB database file path folder Parquet files.","code":"save_format <- \"duckdb\" analysis_data_storage <- spod_convert_data( type = type, zones = zones, dates = \"cached_v1\", # to just convert all data that was previously downloaded, no need to specify dates here save_format = save_format, overwrite = TRUE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"conversion-speed","dir":"Articles","previous_headings":"","what":"Conversion speed","title":"Download and convert mobility datasets","text":"reference, converting whole v1 origin-destination data DuckDB takes 20 minutes 4 GB memory 3 processor cores. final size DuckDB database 18 GB, Parquet format - 26 GB. raw CSV files gzip archives 20GB. v2 data much larger, origin-destination tables 2022 - mid-2024 taking 150+ GB raw CSV.gz format.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/convert.html","id":"connecting-to-and-analysing-the-converted-datasets","dir":"Articles","previous_headings":"","what":"Connecting to and analysing the converted datasets","title":"Download and convert mobility datasets","text":"can pass analysis_data_storage path spod_connect() function, whether DuckDB Parquet. function determine data type automatically give back tbl_duckdb_connection1. set max_mem_gb 16 GB. Generally, , feel free increase , also consult Figure 1 speed testing results Speed section. can try combinations max_mem_gb max_n_cpu arguments needs Compared conversion process, might want increase available memory analysis step. , better. can control max_mem_gb argument. can manipulate my_data using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. analysis, please refer recommended external tutorials vignettes Analysing large datasets section. finishing working my_data advise “disconnect” free memory:","code":"my_data <- spod_connect( data_path = analysis_data_storage, max_mem_gb = 16 ) spod_disconnect(my_data)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/disaggregation.html","id":"introduction","dir":"Articles","previous_headings":"","what":"Introduction","title":"OD data disaggregation","text":"vignette demonstrates origin-destination (OD) data disaggregation using {odjitter} package. package implementation method described paper “Jittering: Computationally Efficient Method Generating Realistic Route Networks Origin-Destination Data” (Lovelace, Félix, Carlino 2022) adding value OD data disaggregating desire lines. can especially useful transport planning purposes high levels geographic resolution required (see also od2net direct network generation OD data).","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/disaggregation.html","id":"data-preparation","dir":"Articles","previous_headings":"","what":"Data preparation","title":"OD data disaggregation","text":"’ll start loading week’s worth origin-destination data city Salamanca, building example README (note: chunks evaluated):","code":"od_db <- spod_get( type = \"od\", zones = \"distritos\", dates = c(start = \"2024-03-01\", end = \"2024-03-07\") ) distritos <- spod_get_zones(\"distritos\", ver = 2) distritos_wgs84 <- distritos |> sf::st_simplify(dTolerance = 200) |> sf::st_transform(4326) od_national_aggregated <- od_db |> group_by(id_origin, id_destination) |> summarise(Trips = sum(n_trips), .groups = \"drop\") |> filter(Trips > 500) |> collect() |> arrange(desc(Trips)) od_national_aggregated od_national_interzonal <- od_national_aggregated |> filter(id_origin != id_destination) salamanca_zones <- zonebuilder::zb_zone(\"Salamanca\") distritos_salamanca <- distritos_wgs84[salamanca_zones, ] ids_salamanca <- distritos_salamanca$id od_salamanca <- od_national_interzonal |> filter(id_origin %in% ids_salamanca) |> filter(id_destination %in% ids_salamanca) |> arrange(Trips) od_salamanca_sf <- od::od_to_sf( od_salamanca, z = distritos_salamanca )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/disaggregation.html","id":"disaggregating-desire-lines","dir":"Articles","previous_headings":"","what":"Disaggregating desire lines","title":"OD data disaggregation","text":"’ll need additional dependencies: ’ll get road network OSM: can use road network disaggregate desire lines: Let’s plot disaggregated desire lines: results show can add value OD data disaggregating desire lines {odjitter} package. can useful understanding spatial distribution trips within zone transport planning. plotted disaggregated desire lines top major road network Salamanca. next step routing help prioritise infrastructure improvements.","code":"remotes::install_github(\"dabreegster/odjitter\", subdir = \"r\") remotes::install_github(\"nptscot/osmactive\") salamanca_boundary <- sf::st_union(distritos_salamanca) osm_full <- osmactive::get_travel_network(salamanca_boundary) osm <- osm_full[salamanca_boundary, ] drive_net <- osmactive::get_driving_network(osm) drive_net_major <- osmactive::get_driving_network_major(osm) cycle_net <- osmactive::get_cycling_network(osm) cycle_net <- osmactive::distance_to_road(cycle_net, drive_net_major) cycle_net <- osmactive::classify_cycle_infrastructure(cycle_net) map_net <- osmactive::plot_osm_tmap(cycle_net) map_net od_jittered <- odjitter::jitter( od_salamanca_sf, zones = distritos_salamanca, subpoints = drive_net, disaggregation_threshold = 1000, disaggregation_key = \"Trips\" ) od_jittered |> arrange(Trips) |> ggplot() + geom_sf(aes(colour = Trips), size = 1) + scale_colour_viridis_c() + geom_sf(data = drive_net_major, colour = \"black\") + theme_void()"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"setup","dir":"Articles","previous_headings":"","what":"Setup","title":"Making interactive flow maps","text":"basemap final visualisation need free Mapbox access token. can get one account.mapbox.com/access-tokens/ (need Mapbox account, free). may skip step, case interative flowmap basemap, flows just flow solid colour background. got access token, can set MAPBOX_TOKEN environment variable like : Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(MAPBOX_TOKEN = \"YOUR_MAPBOX_ACCESS_TOKEN\") library(spanishoddata) library(flowmapblue) library(tidyverse) library(sf) Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Making interactive flow maps","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"simple-example","dir":"Articles","previous_headings":"","what":"Simple example - plot flows data as it is","title":"Making interactive flow maps","text":"Let us get flows districts tipycal working day 2021-04-07: also get district zones polygons mathch flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook). visualise flows, flowmapblue expects two data.frames following format (use packages’s built-data Switzerland illustration): Locations data.frame id, optional name, well lat lon coordinates locations WGS84 (EPSG: 4326) coordinate reference system. Flows data.frame origin, dest, count flows locations, origin dest must match id’s locations data.frame , count number trips . need coordinates origin destination. can use centroids districts_v1 polygons . Remember, map basemap, need setup Mapbox access token setup section vignette. Create interactive flowmap flowmapblue function. example use darkMode clustering, disable animation. recommend disabling clustering plotting flows hundreds thousands locations, reduce redability map. Video Video demonstrating standard interactive flowmap can play around arguments flowmapblue function. example, can turn animation mode: Video Video demonstrating animated interactive flowmap Screenshot demonstrating animated interactive flowmap","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_ine_code residence_province_name time_slot distance n_trips trips_total_length_km year month day 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 7 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 7 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 7 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 7 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 7 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 7 districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; 22117; 22164; 22187; 22214; 22102; 22103; 22115; 22117; 22164; 22187; 222… Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 070… Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429… str(flowmapblue::ch_locations) 'data.frame': 26 obs. of 4 variables: $ id : chr \"ZH\" \"LU\" \"UR\" \"SZ\" ... $ name: chr \"Zürich\" \"Luzern\" \"Uri\" \"Schwyz\" ... $ lat : num 47.4 47.1 46.8 47.1 46.9 ... $ lon : num 8.65 8.11 8.63 8.76 8.24 ... str(flowmapblue::ch_flows) str(flowmapblue::ch_flows) 'data.frame': 676 obs. of 3 variables: $ origin: chr \"ZH\" \"ZH\" \"ZH\" \"ZH\" ... $ dest : chr \"ZH\" \"BE\" \"LU\" \"UR\" ... $ count : int 66855 1673 1017 84 1704 70 94 250 1246 173 ... od_20210407_total <- od_20210407 |> group_by(origin = id_origin, dest = id_destination) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_total) # A tibble: 6 × 3 origin dest count 1 01001_AM 01036 39.8 2 01001_AM 01051 2508. 3 01001_AM 0105903 1644. 4 01001_AM 09363_AM 3.96 5 01001_AM 09907_AM 32.6 6 01001_AM 17033 9.61 districts_v1_centroids <- districts_v1 |> st_transform(4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = districts_v1$id) |> rename(lon = X, lat = Y) head(districts_v1_centroids) lon lat id 1 -5.5551053 42.59849 2408910 2 0.3260681 42.17266 22117_AM 3 -3.8136448 37.74344 2305009 4 2.8542636 39.80672 07058_AM 5 -3.8229513 37.77294 2305006 6 -3.8151096 37.86309 2305005 flowmap <- flowmapblue( locations = districts_v1_centroids, flows = od_20210407_total, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap flowmap_anim <- flowmapblue( locations = districts_v1_centroids, flows = od_20210407_total, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = TRUE, clustering = TRUE ) flowmap_anim"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"get-data","dir":"Articles","previous_headings":"","what":"Get data","title":"Making interactive flow maps","text":"Let us get flows districts tipycal working day 2021-04-07: also get district zones polygons mathch flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook).","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_ine_code residence_province_name time_slot distance n_trips trips_total_length_km year month day 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 7 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 7 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 7 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 7 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 7 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 7 districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; 22117; 22164; 22187; 22214; 22102; 22103; 22115; 22117; 22164; 22187; 222… Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 070… Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429…"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"flows","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Flows","title":"Making interactive flow maps","text":"Let us get flows districts tipycal working day 2021-04-07:","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_ine_code residence_province_name time_slot distance n_trips trips_total_length_km year month day 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 7 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 7 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 7 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 7 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 7 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 7"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"zones","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Zones","title":"Making interactive flow maps","text":"also get district zones polygons mathch flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook).","code":"districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; 22117; 22164; 22187; 22214; 22102; 22103; 22115; 22117; 22164; 22187; 222… Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 07025; 07034; 07058; 07019; 070… Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429…"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"prepare-data-for-visualization","dir":"Articles","previous_headings":"","what":"Prepare data for visualization","title":"Making interactive flow maps","text":"visualise flows, flowmapblue expects two data.frames following format (use packages’s built-data Switzerland illustration): Locations data.frame id, optional name, well lat lon coordinates locations WGS84 (EPSG: 4326) coordinate reference system. Flows data.frame origin, dest, count flows locations, origin dest must match id’s locations data.frame , count number trips .","code":"str(flowmapblue::ch_locations) 'data.frame': 26 obs. of 4 variables: $ id : chr \"ZH\" \"LU\" \"UR\" \"SZ\" ... $ name: chr \"Zürich\" \"Luzern\" \"Uri\" \"Schwyz\" ... $ lat : num 47.4 47.1 46.8 47.1 46.9 ... $ lon : num 8.65 8.11 8.63 8.76 8.24 ... str(flowmapblue::ch_flows) str(flowmapblue::ch_flows) 'data.frame': 676 obs. of 3 variables: $ origin: chr \"ZH\" \"ZH\" \"ZH\" \"ZH\" ... $ dest : chr \"ZH\" \"BE\" \"LU\" \"UR\" ... $ count : int 66855 1673 1017 84 1704 70 94 250 1246 173 ... od_20210407_total <- od_20210407 |> group_by(origin = id_origin, dest = id_destination) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_total) # A tibble: 6 × 3 origin dest count 1 01001_AM 01036 39.8 2 01001_AM 01051 2508. 3 01001_AM 0105903 1644. 4 01001_AM 09363_AM 3.96 5 01001_AM 09907_AM 32.6 6 01001_AM 17033 9.61"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"expected-data-format","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Expected data format","title":"Making interactive flow maps","text":"visualise flows, flowmapblue expects two data.frames following format (use packages’s built-data Switzerland illustration): Locations data.frame id, optional name, well lat lon coordinates locations WGS84 (EPSG: 4326) coordinate reference system. Flows data.frame origin, dest, count flows locations, origin dest must match id’s locations data.frame , count number trips .","code":"str(flowmapblue::ch_locations) 'data.frame': 26 obs. of 4 variables: $ id : chr \"ZH\" \"LU\" \"UR\" \"SZ\" ... $ name: chr \"Zürich\" \"Luzern\" \"Uri\" \"Schwyz\" ... $ lat : num 47.4 47.1 46.8 47.1 46.9 ... $ lon : num 8.65 8.11 8.63 8.76 8.24 ... str(flowmapblue::ch_flows) str(flowmapblue::ch_flows) 'data.frame': 676 obs. of 3 variables: $ origin: chr \"ZH\" \"ZH\" \"ZH\" \"ZH\" ... $ dest : chr \"ZH\" \"BE\" \"LU\" \"UR\" ... $ count : int 66855 1673 1017 84 1704 70 94 250 1246 173 ..."},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"aggregate-data---count-total-flows","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Aggregate data - count total flows","title":"Making interactive flow maps","text":"","code":"od_20210407_total <- od_20210407 |> group_by(origin = id_origin, dest = id_destination) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_total) # A tibble: 6 × 3 origin dest count 1 01001_AM 01036 39.8 2 01001_AM 01051 2508. 3 01001_AM 0105903 1644. 4 01001_AM 09363_AM 3.96 5 01001_AM 09907_AM 32.6 6 01001_AM 17033 9.61"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"create-locations-table","dir":"Articles","previous_headings":"","what":"Create locations table with coordinates","title":"Making interactive flow maps","text":"need coordinates origin destination. can use centroids districts_v1 polygons .","code":"districts_v1_centroids <- districts_v1 |> st_transform(4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = districts_v1$id) |> rename(lon = X, lat = Y) head(districts_v1_centroids) lon lat id 1 -5.5551053 42.59849 2408910 2 0.3260681 42.17266 22117_AM 3 -3.8136448 37.74344 2305009 4 2.8542636 39.80672 07058_AM 5 -3.8229513 37.77294 2305006 6 -3.8151096 37.86309 2305005"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"create-the-plot","dir":"Articles","previous_headings":"","what":"Create the plot","title":"Making interactive flow maps","text":"Remember, map basemap, need setup Mapbox access token setup section vignette. Create interactive flowmap flowmapblue function. example use darkMode clustering, disable animation. recommend disabling clustering plotting flows hundreds thousands locations, reduce redability map. Video Video demonstrating standard interactive flowmap can play around arguments flowmapblue function. example, can turn animation mode: Video Video demonstrating animated interactive flowmap Screenshot demonstrating animated interactive flowmap","code":"flowmap <- flowmapblue( locations = districts_v1_centroids, flows = od_20210407_total, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap flowmap_anim <- flowmapblue( locations = districts_v1_centroids, flows = od_20210407_total, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = TRUE, clustering = TRUE ) flowmap_anim"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"advanced-example","dir":"Articles","previous_headings":"","what":"Advanced example - time filter","title":"Making interactive flow maps","text":"following simple example, let us now add time filter flows. use flowmapblue function plot flows districts_v1_centroids typical working day 2021-04-07. Just like , aggregate data rename columns. time keep combine date time_slot (corresponds hour day) procude timestamps, flows can interactively filtered time day. now using flows hour day, 24 times rows data, simple example. Therefore take longer generate plot resulting visualisation may work slower. create manageable example, let us filter data Madrid surrounding areas. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. District zone boundaries Barcelona nearby areas Now prepare table coordinates flowmap: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. Video Video demonstrating time filtering flowmap Screnshot demonstrating time filtering flowmap","code":"od_20210407_time <- od_20210407 |> mutate(time = as.POSIXct(paste0(date, \"T\", time_slot, \":00:00\"))) |> group_by(origin = id_origin, dest = id_destination, time) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_time) # A tibble: 6 × 4 origin dest time count 1 08054 0818401 2021-04-07 01:00:00 43.7 2 08054 0818401 2021-04-07 17:00:00 87.1 3 08054 0818402 2021-04-07 16:00:00 62.6 4 08054 0818403 2021-04-07 05:00:00 26.8 5 08054 0818403 2021-04-07 07:00:00 44.9 6 08054 0818403 2021-04-07 02:00:00 7.11 zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_transform(crs = 4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = zones_barcelona_fua$id) |> rename(lon = X, lat = Y) head(zones_barcelona_fua_coords) lon lat id 1 2.154317 41.49969 08180 2 1.968438 41.48274 08054 3 2.106401 41.41265 0801905 4 2.118221 41.38697 0801904 5 2.150536 41.42915 0801907 6 2.152419 41.41014 0801906 od_20210407_time_barcelona <- od_20210407_time |> filter(origin %in% zones_barcelona_fua$id & dest %in% zones_barcelona_fua$id) flowmap_time <- flowmapblue( locations = zones_barcelona_fua_coords, flows = od_20210407_time_barcelona, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap_time"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"prepare-data-for-visualization-1","dir":"Articles","previous_headings":"","what":"Prepare data for visualization","title":"Making interactive flow maps","text":"Just like , aggregate data rename columns. time keep combine date time_slot (corresponds hour day) procude timestamps, flows can interactively filtered time day. now using flows hour day, 24 times rows data, simple example. Therefore take longer generate plot resulting visualisation may work slower. create manageable example, let us filter data Madrid surrounding areas. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. District zone boundaries Barcelona nearby areas Now prepare table coordinates flowmap: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. Video Video demonstrating time filtering flowmap Screnshot demonstrating time filtering flowmap","code":"od_20210407_time <- od_20210407 |> mutate(time = as.POSIXct(paste0(date, \"T\", time_slot, \":00:00\"))) |> group_by(origin = id_origin, dest = id_destination, time) |> summarise(count = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() head(od_20210407_time) # A tibble: 6 × 4 origin dest time count 1 08054 0818401 2021-04-07 01:00:00 43.7 2 08054 0818401 2021-04-07 17:00:00 87.1 3 08054 0818402 2021-04-07 16:00:00 62.6 4 08054 0818403 2021-04-07 05:00:00 26.8 5 08054 0818403 2021-04-07 07:00:00 44.9 6 08054 0818403 2021-04-07 02:00:00 7.11 zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_transform(crs = 4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = zones_barcelona_fua$id) |> rename(lon = X, lat = Y) head(zones_barcelona_fua_coords) lon lat id 1 2.154317 41.49969 08180 2 1.968438 41.48274 08054 3 2.106401 41.41265 0801905 4 2.118221 41.38697 0801904 5 2.150536 41.42915 0801907 6 2.152419 41.41014 0801906 od_20210407_time_barcelona <- od_20210407_time |> filter(origin %in% zones_barcelona_fua$id & dest %in% zones_barcelona_fua$id) flowmap_time <- flowmapblue( locations = zones_barcelona_fua_coords, flows = od_20210407_time_barcelona, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap_time"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"filter-the-zones","dir":"Articles","previous_headings":"3 Advanced example - time filter","what":"Filter the zones","title":"Making interactive flow maps","text":"now using flows hour day, 24 times rows data, simple example. Therefore take longer generate plot resulting visualisation may work slower. create manageable example, let us filter data Madrid surrounding areas. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. District zone boundaries Barcelona nearby areas Now prepare table coordinates flowmap:","code":"zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_transform(crs = 4326) |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(id = zones_barcelona_fua$id) |> rename(lon = X, lat = Y) head(zones_barcelona_fua_coords) lon lat id 1 2.154317 41.49969 08180 2 1.968438 41.48274 08054 3 2.106401 41.41265 0801905 4 2.118221 41.38697 0801904 5 2.150536 41.42915 0801907 6 2.152419 41.41014 0801906"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"prepare-the-flows","dir":"Articles","previous_headings":"3 Advanced example - time filter","what":"Prepare the flows","title":"Making interactive flow maps","text":"Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around .","code":"od_20210407_time_barcelona <- od_20210407_time |> filter(origin %in% zones_barcelona_fua$id & dest %in% zones_barcelona_fua$id)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-interactive.html","id":"visualise-the-flows-for-barcelona-and-surrounding-areas","dir":"Articles","previous_headings":"3 Advanced example - time filter","what":"Visualise the flows for Barcelona and surrounding areas","title":"Making interactive flow maps","text":"Now, can create new plot data. Video Video demonstrating time filtering flowmap Screnshot demonstrating time filtering flowmap","code":"flowmap_time <- flowmapblue( locations = zones_barcelona_fua_coords, flows = od_20210407_time_barcelona, mapboxAccessToken = Sys.getenv(\"MAPBOX_TOKEN\"), darkMode = TRUE, animation = FALSE, clustering = TRUE ) flowmap_time"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"setup","dir":"Articles","previous_headings":"","what":"Setup","title":"Making static flow maps","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"library(spanishoddata) library(flowmapper) library(tidyverse) library(sf) Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Making static flow maps","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"simple-example","dir":"Articles","previous_headings":"","what":"Simple example - plot flows data as it is","title":"Making static flow maps","text":"Let us get flows districts typical working day 2021-04-07: also get district zones polygons match flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook). flowmapper package developed visualise origin-destination ‘flow’ data (Mast 2024). package expects data following format: data.frame origin-destination pairs flow counts following columns: o: unique id origin node d: unique id destination node value: intensity flow origin destination Another data.frame node ids names coorindates. coordinate reference system match whichever data planning use plot. name: unique id name node, must match o d flows data.frame ; x: x coordinate node; y: y coordinate node; previous code chunk created od_20210407_total column names expected flowmapper. need coordinates origin destination. can use centroids districts_v1 polygons . Now data structure match flowmapper‘s expected data format can plot sample data (plot containing flows ’busy’ world resemble haystack!). k_node argument add_flowmap function can used reduce business. Let us filter flows zones data just specific functional urban area take closer look flows. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. also prepare nodes add_flowmap function: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. , need k_node argument tweak aggregation nodes flows. Feel free tweak see results change.","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_in…¹ residence_province_n…² time_slot distance n_trips trips_total_length_km year month 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 # ℹ abbreviated names: ¹​residence_province_ine_code, ²​residence_province_name # ℹ 1 more variable: day districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; … Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; … Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429… od_20210407_total <- od_20210407 |> group_by(o = id_origin, d = id_destination) |> summarise(value = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(o, d, value) head(od_20210407_total) # A tibble: 6 × 3 o d value 1 2408910 2408910 1889. 2 2408910 24154_AM 11.0 3 2408910 5029703 12.8 4 2408910 24181_AM 22.3 5 2408910 4802004 9.45 6 2408910 4718608 4.75 districts_v1_coords <- districts_v1 |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = districts_v1$id) |> rename(x = X, y = Y) head(districts_v1_coords) x y name 1 290380.7 4719394 2408910 2 774727.2 4674304 22117_AM 3 428315.4 4177662 2305009 4 1001283.0 4422732 07058_AM 5 427524.2 4180942 2305006 6 428302.1 4190937 2305005 # create base ggplot with boundaries removing various visual clutter base_plot_districts <- ggplot() + geom_sf(data = districts_v1, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_all_districts <- base_plot_districts |> add_flowmap( od = od_20210407_total, nodes = districts_v1_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 20 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_all_districts <- flows_plot_all_districts + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_all_districts zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = zones_barcelona_fua$id) |> rename(x = X, y = Y) head(zones_barcelona_fua_coords) x y name 1 930267.0 4607072 08180 2 914854.0 4604279 08054 3 926837.9 4597166 0801905 4 927995.1 4594372 0801904 5 930418.9 4599218 0801907 6 930702.3 4597116 0801906 od_20210407_total_barcelona <- od_20210407_total |> filter(o %in% zones_barcelona_fua$id & d %in% zones_barcelona_fua$id) # create base ggplot with boundaries removing various visual clutter base_plot_barcelona <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_barcelona <- base_plot_barcelona |> add_flowmap( od = od_20210407_total_barcelona, nodes = zones_barcelona_fua_coords, node_radius_factor = 1, edge_width_factor = 0.6, arrow_point_angle = 45, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 30 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_barcelona <- flows_plot_barcelona + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_barcelona"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"get-data","dir":"Articles","previous_headings":"","what":"Get data","title":"Making static flow maps","text":"Let us get flows districts typical working day 2021-04-07: also get district zones polygons match flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook).","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_in…¹ residence_province_n…² time_slot distance n_trips trips_total_length_km year month 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 # ℹ abbreviated names: ¹​residence_province_ine_code, ²​residence_province_name # ℹ 1 more variable: day districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; … Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; … Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429…"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"flows","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Flows","title":"Making static flow maps","text":"Let us get flows districts typical working day 2021-04-07:","code":"od_20210407 <- spod_get(\"od\", zones = \"distr\", dates = \"2021-04-07\") head(od_20210407) # Source: SQL [6 x 14] # Database: DuckDB v1.0.0 [root@Darwin 23.6.0:R 4.4.1/:memory:] date id_origin id_destination activity_origin activity_destination residence_province_in…¹ residence_province_n…² time_slot distance n_trips trips_total_length_km year month 1 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 005-010 10.5 68.9 2021 4 2 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 0 010-050 12.6 127. 2021 4 3 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 1 010-050 12.6 232. 2021 4 4 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 2 005-010 10.8 102. 2021 4 5 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 5 005-010 18.9 156. 2021 4 6 2021-04-07 01001_AM 01001_AM home other 01 Araba/Álava 6 010-050 10.8 119. 2021 4 # ℹ abbreviated names: ¹​residence_province_ine_code, ²​residence_province_name # ℹ 1 more variable: day "},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"zones","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Zones","title":"Making static flow maps","text":"also get district zones polygons match flows. use version 1 polygons, selected date 2021, corresponds v1 data (see relevant codebook).","code":"districts_v1 <- spod_get_zones(\"dist\", ver = 1) head(districts_v1) Simple feature collection with 6 features and 6 fields Geometry type: MULTIPOLYGON Dimension: XY Bounding box: xmin: 289502.8 ymin: 4173922 xmax: 1010926 ymax: 4720817 Projected CRS: ETRS89 / UTM zone 30N (N-E) # A tibble: 6 × 7 id census_districts municipalities_mitma municipalities district_names_in_v2 district_ids_in_v2 geom 1 2408910 2408910 24089 24089 León distrito 10 2408910 (((290940.1 4719080, 290… 2 22117_AM 2210201; 2210301; 2211501; 2211701; 2216401; 2218701; 2221401 22117_AM 22102; 22103; 22115; … Graus agregacion de… 22117_AM (((774184.4 4662153, 774… 3 2305009 2305009 23050 23050 Jaén distrito 09 2305009 (((429745 4179977, 42971… 4 07058_AM 0701901; 0702501; 0703401; 0705801; 0705802 07058_AM 07019; 07025; 07034; … Selva agregacion de… 07058_AM (((1000859 4415059, 1000… 5 2305006 2305006 23050 23050 Jaén distrito 06 2305006 (((429795.1 4180957, 429… 6 2305005 2305005 23050 23050 Jaén distrito 05 2305005 (((430022.7 4181101, 429…"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"aggregate-data---count-total-flows","dir":"Articles","previous_headings":"","what":"Aggregate data - count total flows","title":"Making static flow maps","text":"","code":"od_20210407_total <- od_20210407 |> group_by(o = id_origin, d = id_destination) |> summarise(value = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(o, d, value)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"reshape-flows-for-visualization","dir":"Articles","previous_headings":"","what":"Reshape flows for visualization","title":"Making static flow maps","text":"flowmapper package developed visualise origin-destination ‘flow’ data (Mast 2024). package expects data following format: data.frame origin-destination pairs flow counts following columns: o: unique id origin node d: unique id destination node value: intensity flow origin destination Another data.frame node ids names coorindates. coordinate reference system match whichever data planning use plot. name: unique id name node, must match o d flows data.frame ; x: x coordinate node; y: y coordinate node; previous code chunk created od_20210407_total column names expected flowmapper. need coordinates origin destination. can use centroids districts_v1 polygons .","code":"head(od_20210407_total) # A tibble: 6 × 3 o d value 1 2408910 2408910 1889. 2 2408910 24154_AM 11.0 3 2408910 5029703 12.8 4 2408910 24181_AM 22.3 5 2408910 4802004 9.45 6 2408910 4718608 4.75 districts_v1_coords <- districts_v1 |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = districts_v1$id) |> rename(x = X, y = Y) head(districts_v1_coords) x y name 1 290380.7 4719394 2408910 2 774727.2 4674304 22117_AM 3 428315.4 4177662 2305009 4 1001283.0 4422732 07058_AM 5 427524.2 4180942 2305006 6 428302.1 4190937 2305005"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-flows-table","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Prepare the flows table","title":"Making static flow maps","text":"previous code chunk created od_20210407_total column names expected flowmapper.","code":"head(od_20210407_total) # A tibble: 6 × 3 o d value 1 2408910 2408910 1889. 2 2408910 24154_AM 11.0 3 2408910 5029703 12.8 4 2408910 24181_AM 22.3 5 2408910 4802004 9.45 6 2408910 4718608 4.75"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-nodes-table-with-coordinates","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Prepare the nodes table with coordinates","title":"Making static flow maps","text":"need coordinates origin destination. can use centroids districts_v1 polygons .","code":"districts_v1_coords <- districts_v1 |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = districts_v1$id) |> rename(x = X, y = Y) head(districts_v1_coords) x y name 1 290380.7 4719394 2408910 2 774727.2 4674304 22117_AM 3 428315.4 4177662 2305009 4 1001283.0 4422732 07058_AM 5 427524.2 4180942 2305006 6 428302.1 4190937 2305005"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"plot-the-flows","dir":"Articles","previous_headings":"","what":"Plot the flows","title":"Making static flow maps","text":"Now data structure match flowmapper‘s expected data format can plot sample data (plot containing flows ’busy’ world resemble haystack!). k_node argument add_flowmap function can used reduce business. Let us filter flows zones data just specific functional urban area take closer look flows. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. also prepare nodes add_flowmap function: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. , need k_node argument tweak aggregation nodes flows. Feel free tweak see results change.","code":"# create base ggplot with boundaries removing various visual clutter base_plot_districts <- ggplot() + geom_sf(data = districts_v1, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_all_districts <- base_plot_districts |> add_flowmap( od = od_20210407_total, nodes = districts_v1_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 20 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_all_districts <- flows_plot_all_districts + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_all_districts zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = zones_barcelona_fua$id) |> rename(x = X, y = Y) head(zones_barcelona_fua_coords) x y name 1 930267.0 4607072 08180 2 914854.0 4604279 08054 3 926837.9 4597166 0801905 4 927995.1 4594372 0801904 5 930418.9 4599218 0801907 6 930702.3 4597116 0801906 od_20210407_total_barcelona <- od_20210407_total |> filter(o %in% zones_barcelona_fua$id & d %in% zones_barcelona_fua$id) # create base ggplot with boundaries removing various visual clutter base_plot_barcelona <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_barcelona <- base_plot_barcelona |> add_flowmap( od = od_20210407_total_barcelona, nodes = zones_barcelona_fua_coords, node_radius_factor = 1, edge_width_factor = 0.6, arrow_point_angle = 45, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 30 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_barcelona <- flows_plot_barcelona + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_barcelona"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"plot-the-entire-country","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Plot the entire country","title":"Making static flow maps","text":"Now data structure match flowmapper‘s expected data format can plot sample data (plot containing flows ’busy’ world resemble haystack!). k_node argument add_flowmap function can used reduce business.","code":"# create base ggplot with boundaries removing various visual clutter base_plot_districts <- ggplot() + geom_sf(data = districts_v1, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_all_districts <- base_plot_districts |> add_flowmap( od = od_20210407_total, nodes = districts_v1_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 20 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_all_districts <- flows_plot_all_districts + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_all_districts"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"zoom-in-to-the-city-level","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Zoom in to the city level","title":"Making static flow maps","text":"Let us filter flows zones data just specific functional urban area take closer look flows. Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. also prepare nodes add_flowmap function: Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around . Now, can create new plot data. , need k_node argument tweak aggregation nodes flows. Feel free tweak see results change.","code":"zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = zones_barcelona_fua$id) |> rename(x = X, y = Y) head(zones_barcelona_fua_coords) x y name 1 930267.0 4607072 08180 2 914854.0 4604279 08054 3 926837.9 4597166 0801905 4 927995.1 4594372 0801904 5 930418.9 4599218 0801907 6 930702.3 4597116 0801906 od_20210407_total_barcelona <- od_20210407_total |> filter(o %in% zones_barcelona_fua$id & d %in% zones_barcelona_fua$id) # create base ggplot with boundaries removing various visual clutter base_plot_barcelona <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_barcelona <- base_plot_barcelona |> add_flowmap( od = od_20210407_total_barcelona, nodes = zones_barcelona_fua_coords, node_radius_factor = 1, edge_width_factor = 0.6, arrow_point_angle = 45, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 30 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_barcelona <- flows_plot_barcelona + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_barcelona"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"filter-the-zones","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Filter the zones","title":"Making static flow maps","text":"Let us select districts correspond Barcelona 10 km radius around . Thanks district_names_in_v2 column zones data, can easily select districts correspond Barcelona apply spatial join select districts around polygons correspond Barcelona. also prepare nodes add_flowmap function:","code":"zones_barcelona <- districts_v1 |> filter(grepl(\"Barcelona\", district_names_in_v2, ignore.case = TRUE)) zones_barcelona_fua <- districts_v1[ st_buffer(zones_barcelona, dist = 10000) , ] zones_barcelona_fua_plot <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.3) + theme_minimal() zones_barcelona_fua_plot zones_barcelona_fua_coords <- zones_barcelona_fua |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = zones_barcelona_fua$id) |> rename(x = X, y = Y) head(zones_barcelona_fua_coords) x y name 1 930267.0 4607072 08180 2 914854.0 4604279 08054 3 926837.9 4597166 0801905 4 927995.1 4594372 0801904 5 930418.9 4599218 0801907 6 930702.3 4597116 0801906"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-flows","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Prepare the flows","title":"Making static flow maps","text":"Now can use zone ids zones_barcelona_fua data select flows correspond Barcelona 10 km radius around .","code":"od_20210407_total_barcelona <- od_20210407_total |> filter(o %in% zones_barcelona_fua$id & d %in% zones_barcelona_fua$id)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"visualise-the-flows-for-barcelona-and-surrounding-areas","dir":"Articles","previous_headings":"2 Simple example - plot flows data as it is","what":"Visualise the flows for Barcelona and surrounding areas","title":"Making static flow maps","text":"Now, can create new plot data. , need k_node argument tweak aggregation nodes flows. Feel free tweak see results change.","code":"# create base ggplot with boundaries removing various visual clutter base_plot_barcelona <- ggplot() + geom_sf(data = zones_barcelona_fua, fill=NA, col = \"grey60\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent', colour = NA), legend.box.background = element_rect(fill='transparent', colour = NA), legend.key = element_blank(), # Remove legend key border legend.title = element_text(size = 12), # Adjust title size legend.text = element_text(size = 10), # Adjust text size legend.key.height = unit(1, \"cm\"), # Increase the height of legend keys legend.margin = margin(t = 0, r = 0, b = 0, l = 0, unit = \"cm\") # Remove margin ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot_barcelona <- base_plot_barcelona |> add_flowmap( od = od_20210407_total_barcelona, nodes = zones_barcelona_fua_coords, node_radius_factor = 1, edge_width_factor = 0.6, arrow_point_angle = 45, node_buffer_factor = 1.5, outline_col = \"grey80\", add_legend = \"bottom\", legend_col = \"gray20\", k_node = 30 # play around with this parameter to aggregate nodes and flows ) # customise colours for the fill flows_plot_barcelona <- flows_plot_barcelona + scale_fill_gradient( low = \"#FABB29\", high = \"#AB061F\", labels = scales::comma_format() # Real value labels ) flows_plot_barcelona"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"advanced-example","dir":"Articles","previous_headings":"","what":"Advanced example - aggregate flows for {spanishoddata} logo","title":"Making static flow maps","text":"advanced example need two additional packages: mapSpain (Hernangómez 2024) hexSticker (R-hexSticker?). Just like simple example , need flows visualise. Let us get origin-destination flows districts typical working day 2022-04-06: Also get spatial data zones. using version 2 zones, data got 2022 onwards, corresponds v2 data (see relevant codebook). Ultimately, like plot flows map Spain, aggregate flows visualisation avoid visual clutter. therefore also need nice map Spain, get using mapSpain (Hernangómez 2024) package: getting two sets boundaries. First one Canary Islands moved closer mainland Spain, nicer visualisation. Second one original location islands, can spatially join zones districts data got spanishoddata. Let us count total number trips made locations selected day 2022-04-06: Now need spatial join districts spain_for_join find districts fall within autonomous community. use spain_for_join. used spain_for_vis, districts Canary Islands match boundaries islands. way get table districts ids corresponding autonomous community names. can now add ids total flows districts id pairs calculate total flows autonomous communities: going use flowmapper (Mast 2024) package plot flows. package expects data following format: data.frame origin-destination pairs flow counts following columns: o: unique id origin node d: unique id destination node value: intensity flow origin destination Another data.frame node ids names coorindates. coordinate reference system match whichever data planning use plot. name: unique id name node, must match o d flows data.frame ; x: x coordinate node; y: y coordinate node; data right now flows_by_ca already correct format expected flowmapper. need coordinates origin destination. can use centroids districts_v1 polygons . Now data structure match flowmapper’s expected data format: image may look bit bleak, put sticker, look great. make sticker using hexSticker (Yu 2020) package.","code":"# two new packages library(mapSpain) library(hexSticker) # load these too, if you have not already library(spanishoddata) library(flowmapper) library(tidyverse) library(sf) od <- spod_get(\"od\", zones = \"distr\", dates = \"2022-04-06\") districts <- spod_get_zones(\"distr\", ver = 2) spain_for_vis <- esp_get_ccaa() spain_for_join <- esp_get_ccaa(moveCAN = FALSE) flows_by_district <- od |> group_by(id_origin, id_destination) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(desc(id_origin), id_destination, n_trips) flows_by_district # A tibble: 402,711 × 3 id_origin id_destination n_trips 1 31260_AM 01017_AM 7.15 2 31260_AM 01043 13.7 3 31260_AM 0105902 16.1 4 31260_AM 2512005 12.2 5 31260_AM 26002_AM 8 6 31260_AM 26026_AM 4 7 31260_AM 26036 38.3 8 31260_AM 26061_AM 10.6 9 31260_AM 26084 5.5 10 31260_AM 2608902 109. # ℹ 402,701 more rows # ℹ Use `print(n = ...)` to see more rows district_centroids <- districts |> st_centroid() |> st_transform(crs = st_crs(spain_for_join)) ca_distr <- district_centroids |> st_join(spain_for_join) |> st_drop_geometry() |> filter(!is.na(ccaa.shortname.en)) |> select(id, ca_name = ccaa.shortname.en) ca_distr # A tibble: 3,784 × 2 id ca_name 1 01001 Basque Country 2 01002 Basque Country 3 01004_AM Basque Country 4 01009_AM Basque Country 5 01010 Basque Country 6 01017_AM Basque Country 7 01028_AM Basque Country 8 01036 Basque Country 9 01043 Basque Country 10 01047_AM Basque Country # ℹ 3,774 more rows # ℹ Use `print(n = ...)` to see more rows flows_by_ca <- flows_by_district |> left_join(ca_distr |> rename(id_orig = ca_name), by = c(\"id_origin\" = \"id\") ) |> left_join(ca_distr |> rename(id_dest = ca_name), by = c(\"id_destination\" = \"id\") ) |> group_by(id_orig, id_dest) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> rename(o = id_orig, d = id_dest, value = n_trips) flows_by_ca # A tibble: 358 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. 7 Andalusia Cantabria 153. 8 Andalusia Castile and León 3114. 9 Andalusia Castile-La Mancha 13655. 10 Andalusia Catalonia 5453. # ℹ 348 more rows # ℹ Use `print(n = ...)` to see more rows head(flows_by_ca) # A tibble: 6 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. spain_for_vis_coords <- spain_for_vis |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = spain_for_vis$ccaa.shortname.en) |> rename(x = X, y = Y) head(spain_for_vis_coords) x y name 1 -4.5777846 37.46782 Andalusia 2 -0.6648791 41.51335 Aragon 3 -5.9936312 43.29377 Asturias 4 2.9065933 39.57481 Balearic Islands 5 -10.7324736 35.36091 Canary Islands 6 -4.0300438 43.19772 Cantabria # create base ggplot with boundaries removing any extra elements base_plot <- ggplot() + geom_sf(data = spain_for_vis, fill=NA, col = \"grey30\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent'), legend.box.background = element_rect(fill='transparent') ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot <- base_plot|> add_flowmap( od = flows_by_ca, nodes = spain_for_vis_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", k_node = 10 # play around with this parameter to aggregate nodes and flows ) # customise colours and remove legend, as we need a clean image for the logo flows_plot <- flows_plot + guides(fill=\"none\") + scale_fill_gradient(low=\"#FABB29\", high = \"#AB061F\") flows_plot sticker(flows_plot, # package name package= \"spanishoddata\", p_size=4, p_y = 1.6, p_color = \"gray25\", p_family=\"Roboto\", # ggplot image size and position s_x=1.02, s_y=1.19, s_width=2.6, s_height=2.72, # white hex h_fill=\"#ffffff\", h_color=\"grey\", h_size=1.3, # url url = \"github.com/rOpenSpain/spanishoddata\", u_color= \"gray25\", u_family = \"Roboto\", u_size = 1.2, # save output name and resolution filename=\"./man/figures/logo.png\", dpi=300 # )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"get-data-1","dir":"Articles","previous_headings":"","what":"Get data","title":"Making static flow maps","text":"Just like simple example , need flows visualise. Let us get origin-destination flows districts typical working day 2022-04-06: Also get spatial data zones. using version 2 zones, data got 2022 onwards, corresponds v2 data (see relevant codebook). Ultimately, like plot flows map Spain, aggregate flows visualisation avoid visual clutter. therefore also need nice map Spain, get using mapSpain (Hernangómez 2024) package: getting two sets boundaries. First one Canary Islands moved closer mainland Spain, nicer visualisation. Second one original location islands, can spatially join zones districts data got spanishoddata.","code":"od <- spod_get(\"od\", zones = \"distr\", dates = \"2022-04-06\") districts <- spod_get_zones(\"distr\", ver = 2) spain_for_vis <- esp_get_ccaa() spain_for_join <- esp_get_ccaa(moveCAN = FALSE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"flows-1","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Flows","title":"Making static flow maps","text":"Just like simple example , need flows visualise. Let us get origin-destination flows districts typical working day 2022-04-06: Also get spatial data zones. using version 2 zones, data got 2022 onwards, corresponds v2 data (see relevant codebook).","code":"od <- spod_get(\"od\", zones = \"distr\", dates = \"2022-04-06\") districts <- spod_get_zones(\"distr\", ver = 2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"map-of-spain","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Map of Spain","title":"Making static flow maps","text":"Ultimately, like plot flows map Spain, aggregate flows visualisation avoid visual clutter. therefore also need nice map Spain, get using mapSpain (Hernangómez 2024) package: getting two sets boundaries. First one Canary Islands moved closer mainland Spain, nicer visualisation. Second one original location islands, can spatially join zones districts data got spanishoddata.","code":"spain_for_vis <- esp_get_ccaa() spain_for_join <- esp_get_ccaa(moveCAN = FALSE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"flows-aggregation","dir":"Articles","previous_headings":"","what":"Flows aggregation","title":"Making static flow maps","text":"Let us count total number trips made locations selected day 2022-04-06: Now need spatial join districts spain_for_join find districts fall within autonomous community. use spain_for_join. used spain_for_vis, districts Canary Islands match boundaries islands. way get table districts ids corresponding autonomous community names. can now add ids total flows districts id pairs calculate total flows autonomous communities:","code":"flows_by_district <- od |> group_by(id_origin, id_destination) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(desc(id_origin), id_destination, n_trips) flows_by_district # A tibble: 402,711 × 3 id_origin id_destination n_trips 1 31260_AM 01017_AM 7.15 2 31260_AM 01043 13.7 3 31260_AM 0105902 16.1 4 31260_AM 2512005 12.2 5 31260_AM 26002_AM 8 6 31260_AM 26026_AM 4 7 31260_AM 26036 38.3 8 31260_AM 26061_AM 10.6 9 31260_AM 26084 5.5 10 31260_AM 2608902 109. # ℹ 402,701 more rows # ℹ Use `print(n = ...)` to see more rows district_centroids <- districts |> st_centroid() |> st_transform(crs = st_crs(spain_for_join)) ca_distr <- district_centroids |> st_join(spain_for_join) |> st_drop_geometry() |> filter(!is.na(ccaa.shortname.en)) |> select(id, ca_name = ccaa.shortname.en) ca_distr # A tibble: 3,784 × 2 id ca_name 1 01001 Basque Country 2 01002 Basque Country 3 01004_AM Basque Country 4 01009_AM Basque Country 5 01010 Basque Country 6 01017_AM Basque Country 7 01028_AM Basque Country 8 01036 Basque Country 9 01043 Basque Country 10 01047_AM Basque Country # ℹ 3,774 more rows # ℹ Use `print(n = ...)` to see more rows flows_by_ca <- flows_by_district |> left_join(ca_distr |> rename(id_orig = ca_name), by = c(\"id_origin\" = \"id\") ) |> left_join(ca_distr |> rename(id_dest = ca_name), by = c(\"id_destination\" = \"id\") ) |> group_by(id_orig, id_dest) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> rename(o = id_orig, d = id_dest, value = n_trips) flows_by_ca # A tibble: 358 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. 7 Andalusia Cantabria 153. 8 Andalusia Castile and León 3114. 9 Andalusia Castile-La Mancha 13655. 10 Andalusia Catalonia 5453. # ℹ 348 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"aggregate-raw-origin-destination-data-by-original-ids","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Aggregate raw origin destination data by original ids","title":"Making static flow maps","text":"Let us count total number trips made locations selected day 2022-04-06:","code":"flows_by_district <- od |> group_by(id_origin, id_destination) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() |> arrange(desc(id_origin), id_destination, n_trips) flows_by_district # A tibble: 402,711 × 3 id_origin id_destination n_trips 1 31260_AM 01017_AM 7.15 2 31260_AM 01043 13.7 3 31260_AM 0105902 16.1 4 31260_AM 2512005 12.2 5 31260_AM 26002_AM 8 6 31260_AM 26026_AM 4 7 31260_AM 26036 38.3 8 31260_AM 26061_AM 10.6 9 31260_AM 26084 5.5 10 31260_AM 2608902 109. # ℹ 402,701 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"match-ids-of-districts-with-autonomous-communities","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Match ids of districts with autonomous communities","title":"Making static flow maps","text":"Now need spatial join districts spain_for_join find districts fall within autonomous community. use spain_for_join. used spain_for_vis, districts Canary Islands match boundaries islands. way get table districts ids corresponding autonomous community names.","code":"district_centroids <- districts |> st_centroid() |> st_transform(crs = st_crs(spain_for_join)) ca_distr <- district_centroids |> st_join(spain_for_join) |> st_drop_geometry() |> filter(!is.na(ccaa.shortname.en)) |> select(id, ca_name = ccaa.shortname.en) ca_distr # A tibble: 3,784 × 2 id ca_name 1 01001 Basque Country 2 01002 Basque Country 3 01004_AM Basque Country 4 01009_AM Basque Country 5 01010 Basque Country 6 01017_AM Basque Country 7 01028_AM Basque Country 8 01036 Basque Country 9 01043 Basque Country 10 01047_AM Basque Country # ℹ 3,774 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"count-flows-between-pairs-of-autonomous-communities","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Count flows between pairs of autonomous communities","title":"Making static flow maps","text":"can now add ids total flows districts id pairs calculate total flows autonomous communities:","code":"flows_by_ca <- flows_by_district |> left_join(ca_distr |> rename(id_orig = ca_name), by = c(\"id_origin\" = \"id\") ) |> left_join(ca_distr |> rename(id_dest = ca_name), by = c(\"id_destination\" = \"id\") ) |> group_by(id_orig, id_dest) |> summarise(n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> rename(o = id_orig, d = id_dest, value = n_trips) flows_by_ca # A tibble: 358 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. 7 Andalusia Cantabria 153. 8 Andalusia Castile and León 3114. 9 Andalusia Castile-La Mancha 13655. 10 Andalusia Catalonia 5453. # ℹ 348 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"reshape-flows-for-visualization-1","dir":"Articles","previous_headings":"","what":"Reshape flows for visualization","title":"Making static flow maps","text":"going use flowmapper (Mast 2024) package plot flows. package expects data following format: data.frame origin-destination pairs flow counts following columns: o: unique id origin node d: unique id destination node value: intensity flow origin destination Another data.frame node ids names coorindates. coordinate reference system match whichever data planning use plot. name: unique id name node, must match o d flows data.frame ; x: x coordinate node; y: y coordinate node; data right now flows_by_ca already correct format expected flowmapper. need coordinates origin destination. can use centroids districts_v1 polygons .","code":"head(flows_by_ca) # A tibble: 6 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899. spain_for_vis_coords <- spain_for_vis |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = spain_for_vis$ccaa.shortname.en) |> rename(x = X, y = Y) head(spain_for_vis_coords) x y name 1 -4.5777846 37.46782 Andalusia 2 -0.6648791 41.51335 Aragon 3 -5.9936312 43.29377 Asturias 4 2.9065933 39.57481 Balearic Islands 5 -10.7324736 35.36091 Canary Islands 6 -4.0300438 43.19772 Cantabria"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-flows-table-1","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Prepare the flows table","title":"Making static flow maps","text":"data right now flows_by_ca already correct format expected flowmapper.","code":"head(flows_by_ca) # A tibble: 6 × 3 o d value 1 Andalusia Andalusia 23681858. 2 Andalusia Aragon 643. 3 Andalusia Asturias 373. 4 Andalusia Balearic Islands 931. 5 Andalusia Basque Country 769. 6 Andalusia Canary Islands 1899."},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"prepare-the-nodes-table-with-coordinates-1","dir":"Articles","previous_headings":"3 Advanced example - aggregate flows for {spanishoddata} logo","what":"Prepare the nodes table with coordinates","title":"Making static flow maps","text":"need coordinates origin destination. can use centroids districts_v1 polygons .","code":"spain_for_vis_coords <- spain_for_vis |> st_centroid() |> st_coordinates() |> as.data.frame() |> mutate(name = spain_for_vis$ccaa.shortname.en) |> rename(x = X, y = Y) head(spain_for_vis_coords) x y name 1 -4.5777846 37.46782 Andalusia 2 -0.6648791 41.51335 Aragon 3 -5.9936312 43.29377 Asturias 4 2.9065933 39.57481 Balearic Islands 5 -10.7324736 35.36091 Canary Islands 6 -4.0300438 43.19772 Cantabria"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"plot-the-flows-1","dir":"Articles","previous_headings":"","what":"Plot the flows","title":"Making static flow maps","text":"Now data structure match flowmapper’s expected data format: image may look bit bleak, put sticker, look great.","code":"# create base ggplot with boundaries removing any extra elements base_plot <- ggplot() + geom_sf(data = spain_for_vis, fill=NA, col = \"grey30\", linewidth = 0.05)+ theme_classic(base_size = 20) + labs(title = \"\", subtitle = \"\", fill = \"\", caption = \"\") + theme( axis.line = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank(), panel.background = element_rect(fill='transparent'), plot.background = element_rect(fill='transparent', color=NA), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill='transparent'), legend.box.background = element_rect(fill='transparent') ) # flows_by_ca_twoway_coords |> arrange(desc(flow_ab)) # add the flows flows_plot <- base_plot|> add_flowmap( od = flows_by_ca, nodes = spain_for_vis_coords, node_radius_factor = 1, edge_width_factor = 1, arrow_point_angle = 35, node_buffer_factor = 1.5, outline_col = \"grey80\", k_node = 10 # play around with this parameter to aggregate nodes and flows ) # customise colours and remove legend, as we need a clean image for the logo flows_plot <- flows_plot + guides(fill=\"none\") + scale_fill_gradient(low=\"#FABB29\", high = \"#AB061F\") flows_plot"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/flowmaps-static.html","id":"make-the-sticker","dir":"Articles","previous_headings":"","what":"Make the sticker","title":"Making static flow maps","text":"make sticker using hexSticker (Yu 2020) package.","code":"sticker(flows_plot, # package name package= \"spanishoddata\", p_size=4, p_y = 1.6, p_color = \"gray25\", p_family=\"Roboto\", # ggplot image size and position s_x=1.02, s_y=1.19, s_width=2.6, s_height=2.72, # white hex h_fill=\"#ffffff\", h_color=\"grey\", h_size=1.3, # url url = \"github.com/rOpenSpain/spanishoddata\", u_color= \"gray25\", u_family = \"Roboto\", u_size = 1.2, # save output name and resolution filename=\"./man/figures/logo.png\", dpi=300 # )"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"install-package","dir":"Articles","previous_headings":"","what":"Install the package","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"package yet available CRAN. can install latest version package rOpenSpain R universe: Alternative way install package GitHub: Developers load package locally, clone navigate root package terminal, e.g. following: run following command R console: Load follows: Using instructions , set data folder package download files . may need 30 GB download data another 30 GB like convert downloaded data analysis ready format (DuckDB database file, folder parquet files). can find info conversion Download convert OD datasets vignette.","code":"install.packages(\"spanishoddata\", repos = c(\"https://ropenspain.r-universe.dev\", \"https://cloud.r-project.org\")) if (!require(\"remotes\")) install.packages(\"remotes\") remotes::install_github(\"rOpenSpain/spanishoddata\", force = TRUE, dependencies = TRUE) gh repo clone rOpenSpain/spanishoddata code spanishoddata # with rstudio: rstudio spanishoddata/spanishoddata.Rproj devtools::load_all() library(spanishoddata)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"overall-approach-to-accessing-the-data","dir":"Articles","previous_headings":"","what":"Overall approach to accessing the data","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"want analyse data days, can use spod_get() function. download raw data CSV format let analyse -memory. cover steps page. need longer periods (several months years), use spod_convert() spod_connect() functions, convert data special format much faster analysis, see Download convert OD datasets vignette. spod_get_zones() give spatial data zones can matched origin-destination flows functions using zones ’id’s. Please see simple example , also consult vignettes detailed data description instructions package vignettes spod_codebook(ver = 1) spod_codebook(ver = 2), simply visit package website https://ropenspain.github.io/spanishoddata/. Figure 1 presents overall approach accessing data spanishoddata package. Figure 1: overview use pacakge functions get data","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"spatial-data-with-zoning-boundaries","dir":"Articles","previous_headings":"","what":"1. Spatial data with zoning boundaries","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"boundary data provided two geographic levels: Distrtics Municipalities. ’s important note always align official Spanish census districts municipalities. comply data protection regulations, certain aggregations made districts municipalities”. Districts correspond official census districts cities; however, lower population density, grouped together. rural areas, one district often equal municipality, municipalities low population combined larger units preserve privacy individuals dataset. Therefore, 2850 ‘districts’ compared 10494 official census districts based. access : districts_v1 object class sf consisting polygons. Data structure: Municipalities made official municipalities certain size; however, also aggregated cases lower population density. result, 2,205 municipalities compared 8,125 official municipalities based. access : resulting municipalities_v1 object type sf consisting polygons. Data structure: spatial data get via spanishoddata package downloaded directly source, geometries polygons automatically fixed invalid geometries. zone identifiers stored id column. Apart id column, original zones files metadata. However, seen , using spanishoddata package get many additional columns provide semantic connection official statistical zones used Spanish government zones can get v2 data (2022 onward).","code":"districts_v1 <- spod_get_zones(\"dist\", ver = 1) municipalities_v1 <- spod_get_zones(\"muni\", ver = 1)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"districts","dir":"Articles","previous_headings":"","what":"1.1 Districts","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"Districts correspond official census districts cities; however, lower population density, grouped together. rural areas, one district often equal municipality, municipalities low population combined larger units preserve privacy individuals dataset. Therefore, 2850 ‘districts’ compared 10494 official census districts based. access : districts_v1 object class sf consisting polygons. Data structure:","code":"districts_v1 <- spod_get_zones(\"dist\", ver = 1)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"municipalities","dir":"Articles","previous_headings":"","what":"1.2 Municipalities","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"Municipalities made official municipalities certain size; however, also aggregated cases lower population density. result, 2,205 municipalities compared 8,125 official municipalities based. access : resulting municipalities_v1 object type sf consisting polygons. Data structure: spatial data get via spanishoddata package downloaded directly source, geometries polygons automatically fixed invalid geometries. zone identifiers stored id column. Apart id column, original zones files metadata. However, seen , using spanishoddata package get many additional columns provide semantic connection official statistical zones used Spanish government zones can get v2 data (2022 onward).","code":"municipalities_v1 <- spod_get_zones(\"muni\", ver = 1)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"mobility-data","dir":"Articles","previous_headings":"","what":"2. Mobility data","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"mobility data referenced via id_origin, id_destination, location identifiers (mostly labelled id) two sets zones described . origin-destination data contain number trips districts municipalities Spain every hour every day 2020-02-14 2021-05-09. flow also attributes trip purpose (composed type activity (home/work_or_study/) origin destination), province residence individuals making trip, distance covered making trip. See detailed attributes table. Figure 2 shows example total flows province Barcelona Feb 14th, 2020. Figure 2: Origin destination flows Barcelona 2020-02-14 variables can find district municipality level origin-destination data: original data stored maestra-2 folder suffixes distritos (district zoning) municipios (municipality zoning). use district level data several data issues municipality data documented , also distric level data contains columns useful origin-destination flow characteristics. result, get district level data municipality level data columns. Municipality level data simply re-aggregation district level data using official relations file district identifiers mapped municipality identifiers (orginal file relaciones_distrito_mitma.csv). Getting data access data, use spod_get() function. example use short interval dates: data specified dates automatically downloaded cached SPANISH_OD_DATA_DIR directory. Existing files re-downloaded. Working data resulting objects od_dist od_muni class tbl_duckdb_connection1. Basically, can treat regular data.frames tibbles. One important difference data actually loaded memory, requested dates, e.g. whole month year, data likely fit computer’s memory. tbl_duckdb_connection mapped downloaded CSV files cached disk data loaded small chunks needed time computation. can manipulate od_dist od_muni using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble like : example , calculated mean hourly flows 4 days requested period. full data 4 days probably never loaded memory . Rather available memory computer used maximum limit make calculation happen, without ever exceeding available memory limit. opearation 100 even days, work way possible even limited memory. done transparantly user help DuckDB (specifically, {duckdb} R package Mühleisen Raasveldt (2024)). summary operation provided example can done entire dataset full 18 month regular laptop 8-16 GB memory. take bit time complete, done. speed things , please also see vignette converting data formats increase analsysis performance. Note long use table connection object created spod_get() function, much quicker filter dates year, month day variables, rather date variable. data day separate CSV file located folders look like year=2020/month=2/day=14. filtering date field, R scan CSV files comparing specified date stored inside CSV file. However, query year, month day variables, R needs check path CSV file, much quicker. caveat relevant long use spod_get() . convert (see relevant vignette) downloaded data format optimized quick analysis, can use whichever field want, affect performance. “number trips” data shows number individuals district municipality made trips categorised number trips. original data stored maestra-2 folder suffixes distritos (district zoning) municipios (municipality zoning). use district level data several data issues municipality data documented , also distric level data contains columns useful origin-destination flow characteristics. result, get district level data municipality level data columns. Municipality level data simply re-aggregation district level data using official relations file district identifiers mapped municipality identifiers (orginal file relaciones_distrito_mitma.csv). Getting data access use spod_get() type set “number_of_trips”, just “nt”. can also set dates maximum possible date range 2020-02-14 2021-05-09 get data, data relatively small (200 Mb). data small, can actually load completely memory:","code":"dates <- c(start = \"2020-02-14\", end = \"2020-02-17\") od_dist <- spod_get(type = \"od\", zones = \"dist\", dates = dates) od_muni <- spod_get(type = \"od\", zones = \"muni\", dates = dates) library(dplyr) od_mean_hourly_trips_over_the_4_days <- od_dist |> group_by(time_slot) |> summarise( mean_hourly_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() od_mean_hourly_trips_over_the_4_days # A tibble: 24 × 2 time_slot mean_hourly_trips 1 18 21.4 2 10 19.3 3 2 14.8 4 15 19.8 5 11 19.9 6 16 19.6 7 22 20.9 8 0 18.6 9 13 21.1 10 19 22.5 # ℹ 14 more rows # ℹ Use `print(n = ...)` to see more rows dates <- c(start = \"2020-02-14\", end = \"2021-05-09\") nt_dist <- spod_get(type = \"number_of_trips\", zones = \"dist\", dates = dates) nt_dist_tbl <- nt_dist |> dplyr::collect()"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"od-data","dir":"Articles","previous_headings":"","what":"2.1. Origin-destination data","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"origin-destination data contain number trips districts municipalities Spain every hour every day 2020-02-14 2021-05-09. flow also attributes trip purpose (composed type activity (home/work_or_study/) origin destination), province residence individuals making trip, distance covered making trip. See detailed attributes table. Figure 2 shows example total flows province Barcelona Feb 14th, 2020. Figure 2: Origin destination flows Barcelona 2020-02-14 variables can find district municipality level origin-destination data: original data stored maestra-2 folder suffixes distritos (district zoning) municipios (municipality zoning). use district level data several data issues municipality data documented , also distric level data contains columns useful origin-destination flow characteristics. result, get district level data municipality level data columns. Municipality level data simply re-aggregation district level data using official relations file district identifiers mapped municipality identifiers (orginal file relaciones_distrito_mitma.csv). Getting data access data, use spod_get() function. example use short interval dates: data specified dates automatically downloaded cached SPANISH_OD_DATA_DIR directory. Existing files re-downloaded. Working data resulting objects od_dist od_muni class tbl_duckdb_connection1. Basically, can treat regular data.frames tibbles. One important difference data actually loaded memory, requested dates, e.g. whole month year, data likely fit computer’s memory. tbl_duckdb_connection mapped downloaded CSV files cached disk data loaded small chunks needed time computation. can manipulate od_dist od_muni using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble like : example , calculated mean hourly flows 4 days requested period. full data 4 days probably never loaded memory . Rather available memory computer used maximum limit make calculation happen, without ever exceeding available memory limit. opearation 100 even days, work way possible even limited memory. done transparantly user help DuckDB (specifically, {duckdb} R package Mühleisen Raasveldt (2024)). summary operation provided example can done entire dataset full 18 month regular laptop 8-16 GB memory. take bit time complete, done. speed things , please also see vignette converting data formats increase analsysis performance. Note long use table connection object created spod_get() function, much quicker filter dates year, month day variables, rather date variable. data day separate CSV file located folders look like year=2020/month=2/day=14. filtering date field, R scan CSV files comparing specified date stored inside CSV file. However, query year, month day variables, R needs check path CSV file, much quicker. caveat relevant long use spod_get() . convert (see relevant vignette) downloaded data format optimized quick analysis, can use whichever field want, affect performance.","code":"dates <- c(start = \"2020-02-14\", end = \"2020-02-17\") od_dist <- spod_get(type = \"od\", zones = \"dist\", dates = dates) od_muni <- spod_get(type = \"od\", zones = \"muni\", dates = dates) library(dplyr) od_mean_hourly_trips_over_the_4_days <- od_dist |> group_by(time_slot) |> summarise( mean_hourly_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() od_mean_hourly_trips_over_the_4_days # A tibble: 24 × 2 time_slot mean_hourly_trips 1 18 21.4 2 10 19.3 3 2 14.8 4 15 19.8 5 11 19.9 6 16 19.6 7 22 20.9 8 0 18.6 9 13 21.1 10 19 22.5 # ℹ 14 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"nt-data","dir":"Articles","previous_headings":"","what":"2.2. Number of trips data","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"“number trips” data shows number individuals district municipality made trips categorised number trips. original data stored maestra-2 folder suffixes distritos (district zoning) municipios (municipality zoning). use district level data several data issues municipality data documented , also distric level data contains columns useful origin-destination flow characteristics. result, get district level data municipality level data columns. Municipality level data simply re-aggregation district level data using official relations file district identifiers mapped municipality identifiers (orginal file relaciones_distrito_mitma.csv). Getting data access use spod_get() type set “number_of_trips”, just “nt”. can also set dates maximum possible date range 2020-02-14 2021-05-09 get data, data relatively small (200 Mb). data small, can actually load completely memory:","code":"dates <- c(start = \"2020-02-14\", end = \"2021-05-09\") nt_dist <- spod_get(type = \"number_of_trips\", zones = \"dist\", dates = dates) nt_dist_tbl <- nt_dist |> dplyr::collect()"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v1-2020-2021-mitma-data-codebook.html","id":"advanced-use","dir":"Articles","previous_headings":"","what":"Advanced use","title":"Codebook and cookbook for v1 (2020-2021) Spanish mobility data","text":"advanced use, especially analysing longer periods (months even years), please see Download convert mobility datasets.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"install-package","dir":"Articles","previous_headings":"","what":"Install the package","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"package yet available CRAN. can install latest version package rOpenSpain R universe: Alternative way install package GitHub: Developers load package locally, clone navigate root package terminal, e.g. following: run following command R console: Load follows: Using instructions , set data folder package download files . may need 400 GB download data another 400 GB like convert downloaded data analysis ready format (DuckDB database file, folder parquet files). can find info conversion Download convert OD datasets vignette.","code":"install.packages(\"spanishoddata\", repos = c(\"https://ropenspain.r-universe.dev\", \"https://cloud.r-project.org\")) if (!require(\"remotes\")) install.packages(\"remotes\") remotes::install_github(\"rOpenSpain/spanishoddata\", force = TRUE, dependencies = TRUE) gh repo clone rOpenSpain/spanishoddata code spanishoddata # with rstudio: rstudio spanishoddata/spanishoddata.Rproj devtools::load_all() library(spanishoddata)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"set-data-folder","dir":"Articles","previous_headings":"","what":"Set the data directory","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"overall-approach-to-accessing-the-data","dir":"Articles","previous_headings":"","what":"Overall approach to accessing the data","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"want analyse data days, can use spod_get() function. download raw data CSV format let analyse -memory. cover steps page. need longer periods (several months years), use spod_convert() spod_connect() functions, convert data special format much faster analysis, see Download convert OD datasets vignette. spod_get_zones() give spatial data zones can matched origin-destination flows functions using zones ’id’s. Please see simple example , also consult vignettes detailed data description instructions package vignettes spod_codebook(ver = 1) spod_codebook(ver = 2), simply visit package website https://ropenspain.github.io/spanishoddata/. Figure 1 presents overall approach accessing data spanishoddata package. Figure 1: overview use pacakge functions get data","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"spatial-data-with-zoning-boundaries","dir":"Articles","previous_headings":"","what":"1. Spatial data with zoning boundaries","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"boundary data provided three geographic levels: Distrtics, Municipalities, Large Urban Areas. ’s important note always align official Spanish census districts municipalities. comply data protection regulations, certain aggregations made districts municipalities”. Districts correspond official census districts cities; however, lower population density, grouped together. rural areas, one district often equal municipality, municipalities low population combined larger units preserve privacy individuals dataset. Therefore, 3792 ‘districts’ compared 10494 official census districts based. also NUTS3 statistical regions covering France (94 units) Portugal (23 units). Therefore total 3909 zones Districts dataset. districts_v2 object class sf consisting polygons. Data structure: Municipalities made official municipalities certain size; however, also aggregated cases lower population density. result, 2618 municipalities compared 8,125 official municipalities based. also NUTS3 statistical regions covering France (94 units) Portugal (23 units). Therefore total 2735 zones Districts dataset. resulting municipalities_v2 object type sf consisting polygons. Data structure: Large Urban Areas (LUAs) essentially spatial units Municipalities, aggregated. Therefore, 2086 locations LUAs dataset. also NUTS3 statistical regions covering France (94 units) Portugal (23 units). Therefore total 2203 zones LUAs dataset. resulting luas_v2 object type sf consisting polygons. Data structure:","code":"districts_v2 <- spod_get_zones(\"dist\", ver = 2) municipalities_v2 <- spod_get_zones(\"muni\", ver = 2) luas_v2 <- spod_get_zones(\"lua\", ver = 2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"districts","dir":"Articles","previous_headings":"","what":"1.1 Districts","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"Districts correspond official census districts cities; however, lower population density, grouped together. rural areas, one district often equal municipality, municipalities low population combined larger units preserve privacy individuals dataset. Therefore, 3792 ‘districts’ compared 10494 official census districts based. also NUTS3 statistical regions covering France (94 units) Portugal (23 units). Therefore total 3909 zones Districts dataset. districts_v2 object class sf consisting polygons. Data structure:","code":"districts_v2 <- spod_get_zones(\"dist\", ver = 2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"municipalities","dir":"Articles","previous_headings":"","what":"1.2 Municipalities","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"Municipalities made official municipalities certain size; however, also aggregated cases lower population density. result, 2618 municipalities compared 8,125 official municipalities based. also NUTS3 statistical regions covering France (94 units) Portugal (23 units). Therefore total 2735 zones Districts dataset. resulting municipalities_v2 object type sf consisting polygons. Data structure:","code":"municipalities_v2 <- spod_get_zones(\"muni\", ver = 2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"luas","dir":"Articles","previous_headings":"","what":"1.3 LUAs (Large Urban Areas)","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"Large Urban Areas (LUAs) essentially spatial units Municipalities, aggregated. Therefore, 2086 locations LUAs dataset. also NUTS3 statistical regions covering France (94 units) Portugal (23 units). Therefore total 2203 zones LUAs dataset. resulting luas_v2 object type sf consisting polygons. Data structure:","code":"luas_v2 <- spod_get_zones(\"lua\", ver = 2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"mobility-data","dir":"Articles","previous_headings":"","what":"2. Mobility data","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"mobility data referenced via id_origin, id_destination, location identifiers (mostly labelled id) two sets zones described . origin-destination data contain number trips districts, municipalities, large urban areas (LUAs) Spain every hour every day 2022-02-01 whichever currently available latest data (2024-06-30 time writing). flow also attributes trip purpose (composed type activity (home/work_or_study/frequent_activity/infrequent_activity) origin destination, also age, sex, income group individuals traveling origin destination), province residence individuals making trip, distance covered making trip. See detailed attributes table. variables can find district, municipality large urban area level data: Getting data access data, use spod_get() function. example use short interval dates: data specified dates automatically downloaded cached SPANISH_OD_DATA_DIR directory. Existing files re-downloaded. Working data resulting objects od_dist od_muni class tbl_duckdb_connection5. Basically, can treat regular data.frames tibbles. One important difference data actually loaded memory, requested dates, e.g. whole month year, data likely fit computer’s memory. tbl_duckdb_connection mapped downloaded CSV files cached disk data loaded small chunks needed time computation. can manipulate od_dist od_muni using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble like : example , becaus data hourly intervals within day, first summed number trips day age, sex, income groups. grouped data dropping day variable calculated mean number trips per day age, sex, income groups. full data 4 days probably never loaded memory . Rather available memory computer used maximum limit make calculation happen, without ever exceeding available memory limit. opearation 100 even days, work way possible even limited memory. done transparantly user help DuckDB (specifically, {duckdb} R package Mühleisen Raasveldt (2024)). summary operation provided example can done entire dataset multiple years worth data regular laptop 8-16 GB memory. take bit time complete, done. speed things , please also see vignette converting data formats increase analsysis performance. Note long use table connection object created spod_get() function, much quicker filter dates year, month day variables, rather date variable. data day separate CSV file located folders look like year=2020/month=2/day=14. filtering date field, R scan CSV files comparing specified date stored inside CSV file. However, query year, month day variables, R needs check path CSV file, much quicker. caveat relevant long use spod_get() . convert (see relevant vignette) downloaded data format optimized quick analysis, can use whichever field want, affect performance. location, “number trips” data provides number individuals spent night , breakdown number trips made, age, sex. Getting data access use spod_get() type set “number_of_trips”, just “nt”. data small, can actually load completely memory: dataset provides number people spend night location, also identifying place residence census district level according INE encoding. variables can find district, municipality large urban area level data: Getting data access use spod_get() type set “number_of_trips”, just “nt”. data small, can actually load completely memory:","code":"dates <- c(start = \"2022-01-01\", end = \"2022-01-04\") od_dist <- spod_get(type = \"od\", zones = \"dist\", dates = dates) od_muni <- spod_get(type = \"od\", zones = \"muni\", dates = dates) library(dplyr) od_mean_trips_by_ses_over_the_4_days <- od_dist |> group_by(date, age, sex, income) |> summarise( n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> group_by(age, sex, income) |> summarise( daily_mean_n_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() od_mean_trips_by_ses_over_the_4_days # A tibble: 39 × 4 age sex income daily_mean_n_trips 1 NA NA <10 7002485. 2 NA NA 10-15 16551405. 3 NA NA >15 2651481. 4 0-25 NA <10 539060. 5 0-25 NA 10-15 1950892. 6 0-25 NA >15 401557. 7 0-25 female <10 1484989. 8 0-25 female 10-15 5357785. 9 0-25 female >15 1764454. 10 0-25 male <10 1558461. # ℹ 29 more rows # ℹ Use `print(n = ...)` to see more rows dates <- c(start = \"2022-01-01\", end = \"2022-01-04\") nt_dist <- spod_get(type = \"number_of_trips\", zones = \"dist\", dates = dates) nt_dist_tbl <- nt_dist |> dplyr::collect() dates <- c(start = \"2022-01-01\", end = \"2022-01-04\") os_dist <- spod_get(type = \"overnight_stays\", zones = \"dist\", dates = dates) os_dist_tbl <- os_dist |> dplyr::collect()"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"od-data","dir":"Articles","previous_headings":"","what":"2.1. Origin-destination data","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"origin-destination data contain number trips districts, municipalities, large urban areas (LUAs) Spain every hour every day 2022-02-01 whichever currently available latest data (2024-06-30 time writing). flow also attributes trip purpose (composed type activity (home/work_or_study/frequent_activity/infrequent_activity) origin destination, also age, sex, income group individuals traveling origin destination), province residence individuals making trip, distance covered making trip. See detailed attributes table. variables can find district, municipality large urban area level data: Getting data access data, use spod_get() function. example use short interval dates: data specified dates automatically downloaded cached SPANISH_OD_DATA_DIR directory. Existing files re-downloaded. Working data resulting objects od_dist od_muni class tbl_duckdb_connection5. Basically, can treat regular data.frames tibbles. One important difference data actually loaded memory, requested dates, e.g. whole month year, data likely fit computer’s memory. tbl_duckdb_connection mapped downloaded CSV files cached disk data loaded small chunks needed time computation. can manipulate od_dist od_muni using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble like : example , becaus data hourly intervals within day, first summed number trips day age, sex, income groups. grouped data dropping day variable calculated mean number trips per day age, sex, income groups. full data 4 days probably never loaded memory . Rather available memory computer used maximum limit make calculation happen, without ever exceeding available memory limit. opearation 100 even days, work way possible even limited memory. done transparantly user help DuckDB (specifically, {duckdb} R package Mühleisen Raasveldt (2024)). summary operation provided example can done entire dataset multiple years worth data regular laptop 8-16 GB memory. take bit time complete, done. speed things , please also see vignette converting data formats increase analsysis performance. Note long use table connection object created spod_get() function, much quicker filter dates year, month day variables, rather date variable. data day separate CSV file located folders look like year=2020/month=2/day=14. filtering date field, R scan CSV files comparing specified date stored inside CSV file. However, query year, month day variables, R needs check path CSV file, much quicker. caveat relevant long use spod_get() . convert (see relevant vignette) downloaded data format optimized quick analysis, can use whichever field want, affect performance.","code":"dates <- c(start = \"2022-01-01\", end = \"2022-01-04\") od_dist <- spod_get(type = \"od\", zones = \"dist\", dates = dates) od_muni <- spod_get(type = \"od\", zones = \"muni\", dates = dates) library(dplyr) od_mean_trips_by_ses_over_the_4_days <- od_dist |> group_by(date, age, sex, income) |> summarise( n_trips = sum(n_trips, na.rm = TRUE), .groups = \"drop\") |> group_by(age, sex, income) |> summarise( daily_mean_n_trips = mean(n_trips, na.rm = TRUE), .groups = \"drop\") |> collect() od_mean_trips_by_ses_over_the_4_days # A tibble: 39 × 4 age sex income daily_mean_n_trips 1 NA NA <10 7002485. 2 NA NA 10-15 16551405. 3 NA NA >15 2651481. 4 0-25 NA <10 539060. 5 0-25 NA 10-15 1950892. 6 0-25 NA >15 401557. 7 0-25 female <10 1484989. 8 0-25 female 10-15 5357785. 9 0-25 female >15 1764454. 10 0-25 male <10 1558461. # ℹ 29 more rows # ℹ Use `print(n = ...)` to see more rows"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"nt-data","dir":"Articles","previous_headings":"","what":"2.2. Number of trips data","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"location, “number trips” data provides number individuals spent night , breakdown number trips made, age, sex. Getting data access use spod_get() type set “number_of_trips”, just “nt”. data small, can actually load completely memory:","code":"dates <- c(start = \"2022-01-01\", end = \"2022-01-04\") nt_dist <- spod_get(type = \"number_of_trips\", zones = \"dist\", dates = dates) nt_dist_tbl <- nt_dist |> dplyr::collect()"},{"path":"https://rOpenSpain.github.io/spanishoddata/articles/v2-2022-onwards-mitma-data-codebook.html","id":"os-data","dir":"Articles","previous_headings":"","what":"2.3. Overnight stays","title":"Codebook and cookbook for v2 (2022 onwards) Spanish mobility data","text":"dataset provides number people spend night location, also identifying place residence census district level according INE encoding. variables can find district, municipality large urban area level data: Getting data access use spod_get() type set “number_of_trips”, just “nt”. data small, can actually load completely memory:","code":"dates <- c(start = \"2022-01-01\", end = \"2022-01-04\") os_dist <- spod_get(type = \"overnight_stays\", zones = \"dist\", dates = dates) os_dist_tbl <- os_dist |> dplyr::collect()"},{"path":"https://rOpenSpain.github.io/spanishoddata/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Egor Kotov. Author, maintainer. Robin Lovelace. Author. Eugeni Vidal-Tortosa. Contributor.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Kotov E, Lovelace R, Vidal-Tortosa E (2024). spanishoddata. doi:10.32614/CRAN.package.spanishoddata, https://github.com/rOpenSpain/spanishoddata.","code":"@Manual{spanishoddata, title = {spanishoddata}, author = {Egor Kotov and Robin Lovelace and Eugeni Vidal-Tortosa}, year = {2024}, url = {https://github.com/rOpenSpain/spanishoddata}, doi = {10.32614/CRAN.package.spanishoddata}, }"},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"spanishoddata-get-spanish-origin-destination-data-","dir":"","previous_headings":"","what":"Get Spanish Origin-Destination Data","title":"Get Spanish Origin-Destination Data","text":"spanishoddata R package provides functions downloading formatting Spanish open mobility data released Ministry Transport Sustainable mobility Spain (Secretaría de Estado de Transportes y Movilidad Sostenible 2024). supports two versions Spanish mobility data consists origin-destination matrices additional data sets. first version covers data 2020 2021, including period COVID-19 pandemic. second version contains data January 2022 onwards updated monthly fifteenth month. versions data primarily consist mobile phone positioning data, include matrices overnight stays, individual movements, trips Spanish residents different geographical levels. See package website vignettes v1 v2 data details. spanishoddata designed save people time providing data analysis-ready formats. Automating process downloading, cleaning, importing data can also reduce risk errors laborious process data preparation. also reduces computational resources using computationally efficient packages behind scenes. effectively work multiple data files, ’s recommended set data directory package can search data download files already present.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"examples-of-available-data","dir":"","previous_headings":"","what":"Examples of available data","title":"Get Spanish Origin-Destination Data","text":"Figure 1: Example data available package: daily flows Barcelona create static maps like see vignette . Figure 2: Example data available package: interactive daily flows Spain Figure 3: Example data available package: interactive daily flows Barcelona time filter create interactive maps see vignette .","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"install-the-package","dir":"","previous_headings":"","what":"Install the package","title":"Get Spanish Origin-Destination Data","text":"package yet available CRAN. can install latest version package rOpenSpain R universe: Alternative way install package GitHub: Developers load package locally, clone navigate root package terminal, e.g. following: run following command R console: Load follows:","code":"install.packages(\"spanishoddata\", repos = c(\"https://ropenspain.r-universe.dev\", \"https://cloud.r-project.org\")) if (!require(\"remotes\")) install.packages(\"remotes\") remotes::install_github(\"rOpenSpain/spanishoddata\", force = TRUE, dependencies = TRUE) gh repo clone rOpenSpain/spanishoddata code spanishoddata # with rstudio: rstudio spanishoddata/spanishoddata.Rproj devtools::load_all() library(spanishoddata)"},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"set-the-data-directory","dir":"","previous_headings":"","what":"Set the data directory","title":"Get Spanish Origin-Destination Data","text":"Choose spanishoddata download (convert) data setting SPANISH_OD_DATA_DIR environment variable following command: package create directory exist first run function downloads data. permanently set directory projects, can specify data directory globally setting SPANISH_OD_DATA_DIR environment variable, e.g. following command: can also set data directory locally, just current project. Set ‘envar’ working directory editing .Renviron file root project:","code":"Sys.setenv(SPANISH_OD_DATA_DIR = \"~/spanish_od_data\") usethis::edit_r_environ() # Then set the data directory globally, by typing this line in the file: SPANISH_OD_DATA_DIR = \"~/spanish_od_data\" file.edit(\".Renviron\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"overall-approach-to-accessing-the-data","dir":"","previous_headings":"","what":"Overall approach to accessing the data","title":"Get Spanish Origin-Destination Data","text":"want analyse data days, can use spod_get() function. download raw data CSV format let analyse -memory. cover steps page. need longer periods (several months years), use spod_convert() spod_connect() functions, convert data special format much faster analysis, see Download convert OD datasets vignette. spod_get_zones() give spatial data zones can matched origin-destination flows functions using zones ’id’s. Please see simple example , also consult vignettes detailed data description instructions package vignettes spod_codebook(ver = 1) spod_codebook(ver = 2), simply visit package website https://ropenspain.github.io/spanishoddata/. Figure 4 presents overall approach accessing data spanishoddata package. Figure 4: overview use pacakge functions get data","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"showcase","dir":"","previous_headings":"","what":"Showcase","title":"Get Spanish Origin-Destination Data","text":"run code README use following setup: Get metadata datasets follows (using version 2 data covering years 2022 onwards):","code":"library(tidyverse) theme_set(theme_minimal()) sf::sf_use_s2(FALSE) metadata <- spod_available_data(ver = 2) # for version 2 of the data metadata # A tibble: 9,442 × 6 target_url pub_ts file_extension data_ym data_ymd 1 https://movilidad-o… 2024-07-30 10:54:08 gz NA 2022-10-23 2 https://movilidad-o… 2024-07-30 10:51:07 gz NA 2022-10-22 3 https://movilidad-o… 2024-07-30 10:47:52 gz NA 2022-10-20 4 https://movilidad-o… 2024-07-30 10:14:55 gz NA 2022-10-18 5 https://movilidad-o… 2024-07-30 10:11:58 gz NA 2022-10-17 6 https://movilidad-o… 2024-07-30 10:09:03 gz NA 2022-10-12 7 https://movilidad-o… 2024-07-30 10:05:57 gz NA 2022-10-07 8 https://movilidad-o… 2024-07-30 10:02:12 gz NA 2022-08-07 9 https://movilidad-o… 2024-07-30 09:58:34 gz NA 2022-08-06 10 https://movilidad-o… 2024-07-30 09:54:30 gz NA 2022-08-05 # ℹ 9,432 more rows # ℹ 1 more variable: local_path "},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"zones","dir":"","previous_headings":"","what":"Zones","title":"Get Spanish Origin-Destination Data","text":"Zones can downloaded follows:","code":"distritos <- spod_get_zones(\"distritos\", ver = 2) distritos_wgs84 <- distritos |> sf::st_simplify(dTolerance = 200) |> sf::st_transform(4326) plot(sf::st_geometry(distritos_wgs84))"},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"od-data","dir":"","previous_headings":"","what":"OD data","title":"Get Spanish Origin-Destination Data","text":"result R database interface object (tbl_dbi) can used dplyr functions SQL queries ‘lazily’, meaning data loaded memory needed. Let’s aggregation find total number trips per hour 7 days: figure summarises 925,874,012 trips 7 days associated 135,866,524 records.","code":"od_db <- spod_get( type = \"origin-destination\", zones = \"districts\", dates = c(start = \"2024-03-01\", end = \"2024-03-07\") ) class(od_db) [1] \"tbl_duckdb_connection\" \"tbl_dbi\" \"tbl_sql\" [4] \"tbl_lazy\" \"tbl\" colnames(od_db) [1] \"full_date\" \"time_slot\" [3] \"id_origin\" \"id_destination\" [5] \"distance\" \"activity_origin\" [7] \"activity_destination\" \"study_possible_origin\" [9] \"study_possible_destination\" \"residence_province_ine_code\" [11] \"residence_province\" \"income\" [13] \"age\" \"sex\" [15] \"n_trips\" \"trips_total_length_km\" [17] \"year\" \"month\" [19] \"day\" n_per_hour <- od_db |> group_by(date, time_slot) |> summarise(n = n(), Trips = sum(n_trips)) |> collect() |> mutate(Time = lubridate::ymd_h(paste0(date, time_slot, sep = \" \"))) |> mutate(Day = lubridate::wday(Time, label = TRUE)) n_per_hour |> ggplot(aes(x = Time, y = Trips)) + geom_line(aes(colour = Day)) + labs(title = \"Number of trips per hour over 7 days\")"},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"spanishoddata-advantage-over-accessing-the-data-yourself","dir":"","previous_headings":"","what":"spanishoddata advantage over accessing the data yourself","title":"Get Spanish Origin-Destination Data","text":"demonstrated , can perform quick analysis using just lines code. highlight benefits package, manually: download xml file download links parse xml extract download links write script download files locate disk logical manner figure data structure downloaded files, read codebook translate data (columns values) English, familiar Spanish write script load data database figure way claculate summaries multiple files much … present simple functions get straight data one line code, ready run analysis .","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"desire-lines","dir":"","previous_headings":"","what":"Desire lines","title":"Get Spanish Origin-Destination Data","text":"’ll use input data pick-important flows Spain, focus longer trips visualisation: results show largest flows intra-zonal. Let’s keep inter-zonal flows: can convert geographic data {od} package (Lovelace Morgan 2024): Let’s focus trips around particular area (Salamanca): use information subset rows, capture movement within study area: Let’s plot results:","code":"od_national_aggregated <- od_db |> group_by(id_origin, id_destination) |> summarise(Trips = sum(n_trips), .groups = \"drop\") |> filter(Trips > 500) |> collect() |> arrange(desc(Trips)) od_national_aggregated # A tibble: 96,404 × 3 id_origin id_destination Trips 1 2807908 2807908 2441404. 2 0801910 0801910 2112188. 3 0801902 0801902 2013618. 4 2807916 2807916 1821504. 5 2807911 2807911 1785981. 6 04902 04902 1690606. 7 2807913 2807913 1504484. 8 2807910 2807910 1299586. 9 0704004 0704004 1287122. 10 28106 28106 1286058. # ℹ 96,394 more rows od_national_interzonal <- od_national_aggregated |> filter(id_origin != id_destination) od_national_sf <- od::od_to_sf( od_national_interzonal, z = distritos_wgs84 ) distritos_wgs84 |> ggplot() + geom_sf(aes(fill = population)) + geom_sf(data = spData::world, fill = NA, colour = \"black\") + geom_sf(aes(size = Trips), colour = \"blue\", data = od_national_sf) + coord_sf(xlim = c(-10, 5), ylim = c(35, 45)) + theme_void() salamanca_zones <- zonebuilder::zb_zone(\"Salamanca\") distritos_salamanca <- distritos_wgs84[salamanca_zones, ] plot(distritos_salamanca) ids_salamanca <- distritos_salamanca$id od_salamanca <- od_national_sf |> filter(id_origin %in% ids_salamanca) |> filter(id_destination %in% ids_salamanca) |> arrange(Trips) od_salamanca_sf <- od::od_to_sf( od_salamanca, z = distritos_salamanca ) ggplot() + geom_sf(fill = \"grey\", data = distritos_salamanca) + geom_sf(aes(colour = Trips), size = 1, data = od_salamanca_sf) + scale_colour_viridis_c() + theme_void()"},{"path":"https://rOpenSpain.github.io/spanishoddata/index.html","id":"further-information","dir":"","previous_headings":"","what":"Further information","title":"Get Spanish Origin-Destination Data","text":"information package, see: Information functions v1 data (2020-2021) codebook v2 data (2022 onwards) codebook (work progress) Download convert data OD disaggregation vignette showcases flows disaggregation Making static flowmaps vignette shows create flowmaps using data acquired spanishoddata Making interactive flowmaps shows create interactive flowmap using data acquired spanishoddata","code":""},{"path":[]},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/global_quiet_param.html","id":null,"dir":"Reference","previous_headings":"","what":"Global Quiet Parameter — global_quiet_param","title":"Global Quiet Parameter — global_quiet_param","text":"Documentation quiet parameter, used globally.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/global_quiet_param.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Global Quiet Parameter — global_quiet_param","text":"","code":"global_quiet_param(quiet = FALSE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/global_quiet_param.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Global Quiet Parameter — global_quiet_param","text":"quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data.html","id":null,"dir":"Reference","previous_headings":"","what":"Get available data list — spod_available_data","title":"Get available data list — spod_available_data","text":"Get table links available data files specified data version. Optionally check (see arguments) certain files already downloaded cache directory specified SPANISH_OD_DATA_DIR environment variable custom path specified data_dir argument.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get available data list — spod_available_data","text":"","code":"spod_available_data( ver = 2, check_local_files = FALSE, quiet = FALSE, data_dir = spod_get_data_dir() )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get available data list — spod_available_data","text":"ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards. check_local_files Whether check local files exist. Defaults FALSE. quiet logical value indicating whether suppress messages. Default FALSE. data_dir directory data stored. Defaults value returned spod_get_data_dir().","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get available data list — spod_available_data","text":"tibble links, release dates files data, dates data coverage, local paths files, download status. target_url character. URL link data file. pub_ts POSIXct. timestamp file published. file_extension character. file extension data file (e.g., 'tar', 'gz'). data_ym Date. year month data coverage, available. data_ymd Date. specific date data coverage, available. local_path character. local file path data stored. downloaded logical. Indicator whether data file downloaded locally.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v1.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the available v1 data list — spod_available_data_v1","title":"Get the available v1 data list — spod_available_data_v1","text":"function provides table available data list MITMA v1 (2020-2021), remote local.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the available v1 data list — spod_available_data_v1","text":"","code":"spod_available_data_v1( data_dir = spod_get_data_dir(), check_local_files = FALSE, quiet = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v1.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the available v1 data list — spod_available_data_v1","text":"data_dir directory data stored. Defaults value returned spod_get_data_dir(). check_local_files Whether check local files exist. Defaults FALSE. quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v1.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the available v1 data list — spod_available_data_v1","text":"tibble links, release dates files data, dates data coverage, local paths files, download status. target_url character. URL link data file. pub_ts POSIXct. timestamp file published. file_extension character. file extension data file (e.g., 'tar', 'gz'). data_ym Date. year month data coverage, available. data_ymd Date. specific date data coverage, available. local_path character. local file path data stored. downloaded logical. Indicator whether data file downloaded locally.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v1.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the available v1 data list — spod_available_data_v1","text":"","code":"# Get the available v1 data list for the default data directory if (FALSE) { metadata <- spod_available_data_v1() names(metadata) head(metadata) }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v2.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the data dictionary — spod_available_data_v2","title":"Get the data dictionary — spod_available_data_v2","text":"function retrieves data dictionary specified data directory.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the data dictionary — spod_available_data_v2","text":"","code":"spod_available_data_v2( data_dir = spod_get_data_dir(), check_local_files = FALSE, quiet = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the data dictionary — spod_available_data_v2","text":"data_dir directory data stored. Defaults value returned spod_get_data_dir(). check_local_files Whether check local files exist. Defaults FALSE. quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v2.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the data dictionary — spod_available_data_v2","text":"tibble links, release dates files data, dates data coverage, local paths files, download status. target_url character. URL link data file. pub_ts POSIXct. timestamp file published. file_extension character. file extension data file (e.g., 'tar', 'gz'). data_ym Date. year month data coverage, available. data_ymd Date. specific date data coverage, available. local_path character. local file path data stored. downloaded logical. Indicator whether data file downloaded locally.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_data_v2.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the data dictionary — spod_available_data_v2","text":"","code":"# Get the data dictionary for the default data directory if (FALSE) { metadata <- spod_available_data_v2() names(metadata) head(metadata) }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_ram.html","id":null,"dir":"Reference","previous_headings":"","what":"Get available RAM — spod_available_ram","title":"Get available RAM — spod_available_ram","text":"Get available RAM","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_ram.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get available RAM — spod_available_ram","text":"","code":"spod_available_ram()"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_available_ram.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get available RAM — spod_available_ram","text":"numeric amount available RAM GB.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v1.html","id":null,"dir":"Reference","previous_headings":"","what":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v1","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v1","text":"function fixes invalid geometries zones data renames \"ID\" column \"id\".","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v1","text":"","code":"spod_clean_zones_v1(zones_path, zones)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v1.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v1","text":"zones_path path zones spatial data file. zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards).","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v1.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v1","text":"spatial object containing cleaned zones data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v2.html","id":null,"dir":"Reference","previous_headings":"","what":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v2","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v2","text":"function fixes invalid geometries zones data renames \"ID\" column \"id\". also attacches population counts zone names provided csv files supplied original data provider.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v2","text":"","code":"spod_clean_zones_v2(zones_path)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v2","text":"zones_path path zones spatial data file.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_clean_zones_v2.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Fixes common issues in the zones data and cleans up variable names — spod_clean_zones_v2","text":"spatial object containing cleaned zones data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_codebook.html","id":null,"dir":"Reference","previous_headings":"","what":"View codebooks for v1 and v2 open mobility data — spod_codebook","title":"View codebooks for v1 and v2 open mobility data — spod_codebook","text":"Opens relevant vignette.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_codebook.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"View codebooks for v1 and v2 open mobility data — spod_codebook","text":"","code":"spod_codebook(ver = 1)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_codebook.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"View codebooks for v1 and v2 open mobility data — spod_codebook","text":"ver integer numeric value. version data. Defaults 1. Can 1 v1 (2020-2021) data 2 v2 (2022 onwards) data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_codebook.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"View codebooks for v1 and v2 open mobility data — spod_codebook","text":"Nothing, calls relevant vignette.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_connect.html","id":null,"dir":"Reference","previous_headings":"","what":"Connect to data converted to DuckDB — spod_connect","title":"Connect to data converted to DuckDB — spod_connect","text":"function allows user quickly connect data converted DuckDB spod_convert_to_duckdb() function. function simplificaiton connection process. uses","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_connect.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Connect to data converted to DuckDB — spod_connect","text":"","code":"spod_connect( data_path, target_table_name = NULL, quiet = FALSE, max_mem_gb = max(4, spod_available_ram() - 4), max_n_cpu = parallelly::availableCores() - 1, temp_path = spod_get_temp_dir() )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_connect.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Connect to data converted to DuckDB — spod_connect","text":"data_path path DuckDB database file '.duckdb' extension, path folder parquet files. Eigher one created spod_convert() function. target_table_name Default NULL. connecting folder parquet files, argument ignored. connecting DuckDB database, character vector length 1 table name open database file. specified, guessed data_path argument table names available database. manually interfered database, guessed automatically need specify . quiet logical value indicating whether suppress messages. Default FALSE. max_mem_gb maximum memory use GB. conservative default 3 GB, enough resaving data DuckDB form folder CSV.gz files small enough fit memory even old computers. data analysis using already converted data (DuckDB Parquet format) raw CSV.gz data, recommended increase according available resources. max_n_cpu maximum number threads use. Defaults number available cores minus 1. temp_path path temp folder DuckDB intermediate spilling case set memory limit /physical memory computer low perform query. default set temp directory data folder defined SPANISH_OD_DATA_DIR environment variable. Otherwise, queries folders CSV files parquet files, temporary path set current R working directory, probably undesirable, current working directory can slow storage, storage may limited space, compared data folder.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_connect.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Connect to data converted to DuckDB — spod_connect","text":"DuckDB table connection object.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_convert.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert data from plain text to duckdb or parquet format — spod_convert","title":"Convert data from plain text to duckdb or parquet format — spod_convert","text":"Converts data faster analysis either DuckDB file parquet files hive-style directory structure. Running analysis files sometimes 100x times faster working raw CSV files, espetially gzip archives. connect converted data, please use mydata <- spod_connect() passing path data saved. connected mydata can analysed using dplyr functions select(), filter(), mutate(), group_by(), summarise(), etc. end sequence commands need add collect() execute whole chain data manipulations load results memory R data.frame/tibble. -depth usage data, please refer DuckDB documentation examples https://duckdb.org/docs/api/r#dbplyr . useful examples can found https://arrow-user2022.netlify.app/data-wrangling#combining-arrow--duckdb . may also use arrow package work parquet files https://arrow.apache.org/docs/r/.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_convert.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert data from plain text to duckdb or parquet format — spod_convert","text":"","code":"spod_convert( type = c(\"od\", \"origin-destination\", \"os\", \"overnight_stays\", \"nt\", \"number_of_trips\"), zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\"), dates = NULL, save_format = \"duckdb\", save_path = NULL, overwrite = FALSE, data_dir = spod_get_data_dir(), quiet = FALSE, max_mem_gb = max(4, spod_available_ram() - 4), max_n_cpu = parallelly::availableCores() - 1, max_download_size_gb = 1 )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_convert.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert data from plain text to duckdb or parquet format — spod_convert","text":"type type data download. Can \"origin-destination\" (ust \"od\"), \"number_of_trips\" (just \"nt\") v1 data. v2 data \"overnight_stays\" (just \"os\") also available. data types supported future. See codebooks v1 v2 data vignettes spod_codebook(1) spod_codebook(2) (spod_codebook). zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). dates character Date vector dates process. Kindly keep mind v1 v2 data follow different data collection methodologies may directly comparable. Therefore, try request data versions date range. need compare data versions, please refer respective codebooks methodology documents. v1 data covers period 2020-02-14 2021-05-09, v2 data covers period 2022-01-01 present notice. true dates range checked available data version every function run. possible values can following: spod_get() spod_convert() functions, dates can set \"cached_v1\" \"cached_v2\" request data cached (already previously downloaded) v1 (2020-2021) v2 (2022 onwards) data. case, function identify use data files downloaded cached locally, (e.g. using explicit run spod_download(), data requests made using spod_get() spod_convert() functions). single date ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. vector dates ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. Can non-consecutive sequence dates. date range eigher character Date object length 2 clearly named elements start end ISO (YYYY-MM-DD) YYYYMMDD format. E.g. c(start = \"2020-02-15\", end = \"2020-02-17\"); character object form YYYY-MM-DD_YYYY-MM-DD YYYYMMDD_YYYYMMDD. example, 2020-02-15_2020-02-17 20200215_20200217. regular expression match dates format YYYYMMDD. character object. example, ^202002 match dates February 2020. save_format character vector length 1 values \"duckdb\" \"parquet\". Defaults \"duckdb\". NULL automatically inferred save_path argument. save_format provided, save_path set default location set SPANISH_OD_DATA_DIR environment variable using Sys.setenv(SPANISH_OD_DATA_DIR = 'path///cache/dir')). v1 data path /clean_data/v1/tabular/duckdb/ /clean_data/v1/tabular/parquet/. can also set save_path. ends \".duckdb\", save DuckDB database format, save_path end \".duckdb\", save parquet format treat save_path path folder, file, create necessary hive-style subdirectories folder. Hive style looks like year=2020/month=2/day=14 inside directory data_0.parquet file contains data day. save_path character vector length 1. full (relative) path DuckDB database file parquet folder. save_path ends .duckdb, saved DuckDB database file. format argument automatically set save_format='duckdb'. save_path ends folder name (e.g. /data_dir/clean_data/v1/tabular/parquet/od_distr origin-destination data district level), data saved collection parquet files hive-style directory structure. subfolders od_distr year=2020/month=2/day=14 inside folders single parquet file placed containing data day. NULL, uses default location data_dir (set SPANISH_OD_DATA_DIR environment variable using Sys.setenv(SPANISH_OD_DATA_DIR = 'path///cache/dir')). Therefore, default relative path DuckDB /clean_data/v1/tabular/duckdb/_.duckdb parquet files /clean_data/v1/tabular/parquet/_/, type type data (e.g. 'od', 'os', 'nt', correspoind 'origin-destination', 'overnight-stays', 'number--trips', etc.) zones name geographic zones (e.g. 'distr', 'muni', etc.). See details function arguments description. overwrite logical character vector length 1. TRUE, overwrites existing DuckDBorparquetfiles. Defaults toFALSE`. parquet files can also set 'update', parquet files created dates yet converted. data_dir directory data stored. Defaults value returned spod_get_data_dir() returns value environment variable SPANISH_OD_DATA_DIR temporary directory variable set. quiet logical value indicating whether suppress messages. Default FALSE. max_mem_gb maximum memory use GB. conservative default 3 GB, enough resaving data DuckDB form folder CSV.gz files small enough fit memory even old computers. data analysis using already converted data (DuckDB Parquet format) raw CSV.gz data, recommended increase according available resources. max_n_cpu maximum number threads use. Defaults number available cores minus 1. max_download_size_gb maximum download size gigabytes. Defaults 1.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_convert.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert data from plain text to duckdb or parquet format — spod_convert","text":"Path saved DuckDB file.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_dates_argument_to_dates_seq.html","id":null,"dir":"Reference","previous_headings":"","what":"Convert multiple formates of date arguments to a sequence of dates — spod_dates_argument_to_dates_seq","title":"Convert multiple formates of date arguments to a sequence of dates — spod_dates_argument_to_dates_seq","text":"function processes date arguments provided various functions package. can handle single dates arbitratry sequences (vectors) dates ISO (YYYY-MM-DD) YYYYMMDD format. can also handle date ranges format 'YYYY-MM-DD_YYYY-MM-DD' ('YYYYMMDD_YYYYMMDD'), date ranges named vec regular expressions match dates format YYYYMMDD.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_dates_argument_to_dates_seq.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Convert multiple formates of date arguments to a sequence of dates — spod_dates_argument_to_dates_seq","text":"","code":"spod_dates_argument_to_dates_seq(dates)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_dates_argument_to_dates_seq.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Convert multiple formates of date arguments to a sequence of dates — spod_dates_argument_to_dates_seq","text":"dates character Date vector dates process. Kindly keep mind v1 v2 data follow different data collection methodologies may directly comparable. Therefore, try request data versions date range. need compare data versions, please refer respective codebooks methodology documents. v1 data covers period 2020-02-14 2021-05-09, v2 data covers period 2022-01-01 present notice. true dates range checked available data version every function run. possible values can following: spod_get() spod_convert() functions, dates can set \"cached_v1\" \"cached_v2\" request data cached (already previously downloaded) v1 (2020-2021) v2 (2022 onwards) data. case, function identify use data files downloaded cached locally, (e.g. using explicit run spod_download(), data requests made using spod_get() spod_convert() functions). single date ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. vector dates ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. Can non-consecutive sequence dates. date range eigher character Date object length 2 clearly named elements start end ISO (YYYY-MM-DD) YYYYMMDD format. E.g. c(start = \"2020-02-15\", end = \"2020-02-17\"); character object form YYYY-MM-DD_YYYY-MM-DD YYYYMMDD_YYYYMMDD. example, 2020-02-15_2020-02-17 20200215_20200217. regular expression match dates format YYYYMMDD. character object. example, ^202002 match dates February 2020.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_dates_argument_to_dates_seq.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Convert multiple formates of date arguments to a sequence of dates — spod_dates_argument_to_dates_seq","text":"character vector dates ISO format (YYYY-MM-DD).","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_disconnect.html","id":null,"dir":"Reference","previous_headings":"","what":"Safely disconnect from data and free memory — spod_disconnect","title":"Safely disconnect from data and free memory — spod_disconnect","text":"function ensure DuckDB connections CSV.gz files (created via spod_get()), well DuckDB files folders parquet files (created via spod_convert()) closed properly prevent conflicting connections. Essentially just wrapper around DBI::dbDisconnect() reaches .$src$con object tbl_duckdb_connection connection object returned user via spod_get() spod_connect(). disonnecting database, also frees memory running gc().","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_disconnect.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Safely disconnect from data and free memory — spod_disconnect","text":"","code":"spod_disconnect(tbl_con, free_mem = TRUE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_disconnect.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Safely disconnect from data and free memory — spod_disconnect","text":"tbl_con tbl_duckdb_connection connection object get either spod_get() spod_connect(). free_mem logical. Whether free memory running gc(). Defaults TRUE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_disconnect.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Safely disconnect from data and free memory — spod_disconnect","text":"","code":"if (FALSE) { # \\dontrun{ od_distr <- spod_get(\"od\", zones = \"distr\", dates <- c(\"2020-01-01\", \"2020-01-02\")) spod_disconnect(od_distr) } # }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download.html","id":null,"dir":"Reference","previous_headings":"","what":"Download the data files of specified type, zones, and dates — spod_download","title":"Download the data files of specified type, zones, and dates — spod_download","text":"function downloads data files specified type, zones, dates data version.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Download the data files of specified type, zones, and dates — spod_download","text":"","code":"spod_download( type = c(\"od\", \"origin-destination\", \"os\", \"overnight_stays\", \"nt\", \"number_of_trips\"), zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), dates = NULL, max_download_size_gb = 1, data_dir = spod_get_data_dir(), quiet = FALSE, return_local_file_paths = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Download the data files of specified type, zones, and dates — spod_download","text":"type type data download. Can \"origin-destination\" (ust \"od\"), \"number_of_trips\" (just \"nt\") v1 data. v2 data \"overnight_stays\" (just \"os\") also available. data types supported future. See codebooks v1 v2 data vignettes spod_codebook(1) spod_codebook(2) (spod_codebook). zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). dates character Date vector dates process. Kindly keep mind v1 v2 data follow different data collection methodologies may directly comparable. Therefore, try request data versions date range. need compare data versions, please refer respective codebooks methodology documents. v1 data covers period 2020-02-14 2021-05-09, v2 data covers period 2022-01-01 present notice. true dates range checked available data version every function run. possible values can following: spod_get() spod_convert() functions, dates can set \"cached_v1\" \"cached_v2\" request data cached (already previously downloaded) v1 (2020-2021) v2 (2022 onwards) data. case, function identify use data files downloaded cached locally, (e.g. using explicit run spod_download(), data requests made using spod_get() spod_convert() functions). single date ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. vector dates ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. Can non-consecutive sequence dates. date range eigher character Date object length 2 clearly named elements start end ISO (YYYY-MM-DD) YYYYMMDD format. E.g. c(start = \"2020-02-15\", end = \"2020-02-17\"); character object form YYYY-MM-DD_YYYY-MM-DD YYYYMMDD_YYYYMMDD. example, 2020-02-15_2020-02-17 20200215_20200217. regular expression match dates format YYYYMMDD. character object. example, ^202002 match dates February 2020. max_download_size_gb maximum download size gigabytes. Defaults 1. data_dir directory data stored. Defaults value returned spod_get_data_dir() returns value environment variable SPANISH_OD_DATA_DIR temporary directory variable set. quiet logical value indicating whether suppress messages. Default FALSE. return_local_file_paths Logical. TRUE, function returns character vector paths downloaded files. FALSE, function returns NULL.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Download the data files of specified type, zones, and dates — spod_download","text":"Nothing. return_local_file_paths = TRUE, character vector paths downloaded files.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Download the data files of specified type, zones, and dates — spod_download","text":"","code":"if (FALSE) { # \\dontrun{ # Download the origin-destination on district level for the a date range in March 2020 spod_download( type = \"od\", zones = \"districts\", dates = c(start = \"2020-03-20\", end = \"2020-03-24\") ) # Download the origin-destination on district level for select dates in 2020 and 2021 spod_download( type = \"od\", zones = \"dist\", dates = c(\"2020-03-20\", \"2020-03-24\", \"2021-03-20\", \"2021-03-24\") ) # Download the origin-destination on municipality level using regex for a date range in March 2020 # (the regex will capture the dates 2020-03-20 to 2020-03-24) spod_download( type = \"od\", zones = \"municip\", dates = \"2020032[0-4]\" ) } # }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download_zones_v1.html","id":null,"dir":"Reference","previous_headings":"","what":"Downloads and extracts the raw v1 zones data — spod_download_zones_v1","title":"Downloads and extracts the raw v1 zones data — spod_download_zones_v1","text":"function ensures necessary v1 raw data zones files downloaded extracted specified data directory.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download_zones_v1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Downloads and extracts the raw v1 zones data — spod_download_zones_v1","text":"","code":"spod_download_zones_v1( zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\"), data_dir = spod_get_data_dir(), quiet = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download_zones_v1.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Downloads and extracts the raw v1 zones data — spod_download_zones_v1","text":"zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\"). data_dir directory data stored. quiet Boolean flag control display messages.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_download_zones_v1.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Downloads and extracts the raw v1 zones data — spod_download_zones_v1","text":"path downloaded extracted file.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_filter_by_dates.html","id":null,"dir":"Reference","previous_headings":"","what":"Filter a duckdb conenction by dates — spod_duckdb_filter_by_dates","title":"Filter a duckdb conenction by dates — spod_duckdb_filter_by_dates","text":"IMPORTANT: function assumes table view filtered separate year, month day columns integer values. done filtering faster CSV files stored folder structure hive-style /year=2020/month=2/day=14/.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_filter_by_dates.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Filter a duckdb conenction by dates — spod_duckdb_filter_by_dates","text":"","code":"spod_duckdb_filter_by_dates(con, source_view_name, new_view_name, dates)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_filter_by_dates.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Filter a duckdb conenction by dates — spod_duckdb_filter_by_dates","text":"con duckdb connection source_view_name name source duckdb \"view\" (virtual table, context current package likely connected folder CSV files) new_view_name name new duckdb \"view\" (virtual table, context current package likely connected folder CSV files). dates character Date vector dates process. Kindly keep mind v1 v2 data follow different data collection methodologies may directly comparable. Therefore, try request data versions date range. need compare data versions, please refer respective codebooks methodology documents. v1 data covers period 2020-02-14 2021-05-09, v2 data covers period 2022-01-01 present notice. true dates range checked available data version every function run. possible values can following: spod_get() spod_convert() functions, dates can set \"cached_v1\" \"cached_v2\" request data cached (already previously downloaded) v1 (2020-2021) v2 (2022 onwards) data. case, function identify use data files downloaded cached locally, (e.g. using explicit run spod_download(), data requests made using spod_get() spod_convert() functions). single date ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. vector dates ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. Can non-consecutive sequence dates. date range eigher character Date object length 2 clearly named elements start end ISO (YYYY-MM-DD) YYYYMMDD format. E.g. c(start = \"2020-02-15\", end = \"2020-02-17\"); character object form YYYY-MM-DD_YYYY-MM-DD YYYYMMDD_YYYYMMDD. example, 2020-02-15_2020-02-17 20200215_20200217. regular expression match dates format YYYYMMDD. character object. example, ^202002 match dates February 2020.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_limit_resources.html","id":null,"dir":"Reference","previous_headings":"","what":"Set maximum memory and number of threads for a DuckDB connection — spod_duckdb_limit_resources","title":"Set maximum memory and number of threads for a DuckDB connection — spod_duckdb_limit_resources","text":"Set maximum memory number threads DuckDB connection","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_limit_resources.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set maximum memory and number of threads for a DuckDB connection — spod_duckdb_limit_resources","text":"","code":"spod_duckdb_limit_resources( con, max_mem_gb = max(4, spod_available_ram() - 4), max_n_cpu = parallelly::availableCores() - 1 )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_limit_resources.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set maximum memory and number of threads for a DuckDB connection — spod_duckdb_limit_resources","text":"con duckdb connection max_mem_gb maximum memory use GB. conservative default 3 GB, enough resaving data DuckDB form folder CSV.gz files small enough fit memory even old computers. data analysis using already converted data (DuckDB Parquet format) raw CSV.gz data, recommended increase according available resources. max_n_cpu maximum number threads use. Defaults number available cores minus 1.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_limit_resources.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Set maximum memory and number of threads for a DuckDB connection — spod_duckdb_limit_resources","text":"duckdb connection.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_number_of_trips.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a duckdb number of trips table — spod_duckdb_number_of_trips","title":"Create a duckdb number of trips table — spod_duckdb_number_of_trips","text":"function creates duckdb connection number trips data stored folder CSV.gz files.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_number_of_trips.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a duckdb number of trips table — spod_duckdb_number_of_trips","text":"","code":"spod_duckdb_number_of_trips( con = DBI::dbConnect(duckdb::duckdb(), dbdir = \":memory:\", read_only = FALSE), zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), ver = NULL, data_dir = spod_get_data_dir() )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_number_of_trips.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a duckdb number of trips table — spod_duckdb_number_of_trips","text":"con duckdb connection object. specified, new -memory connection created. zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards. data_dir directory data stored. Defaults value returned spod_get_data_dir().","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_number_of_trips.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a duckdb number of trips table — spod_duckdb_number_of_trips","text":"duckdb connection 2 views.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_od.html","id":null,"dir":"Reference","previous_headings":"","what":"Creates a duckdb connection to origin-destination data — spod_duckdb_od","title":"Creates a duckdb connection to origin-destination data — spod_duckdb_od","text":"function creates duckdb connection origin-destination data stored CSV.gz files.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_od.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Creates a duckdb connection to origin-destination data — spod_duckdb_od","text":"","code":"spod_duckdb_od( con = DBI::dbConnect(duckdb::duckdb(), dbdir = \":memory:\", read_only = FALSE), zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), ver = NULL, data_dir = spod_get_data_dir() )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_od.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Creates a duckdb connection to origin-destination data — spod_duckdb_od","text":"con duckdb connection object. specified, new -memory connection created. zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards. data_dir directory data stored. Defaults value returned spod_get_data_dir().","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_od.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Creates a duckdb connection to origin-destination data — spod_duckdb_od","text":"duckdb connection object 2 views: od_csv_raw - raw table view cached CSV files origin-destination data previously cached $SPANISH_OD_DATA_DIR od_csv_clean - cleaned-table view od_csv_raw column names values translated mapped English. still includes cached data. structure cleaned-views od_csv_clean follows: date Date. full date trip, including year, month, day. id_origin factor. identifier origin location trip, formatted code (e.g., '01001_AM'). id_destination factor. identifier destination location trip, formatted code (e.g., '01001_AM'). activity_origin factor. type activity origin location (e.g., 'home', 'work'). Note: available district level data. activity_destination factor. type activity destination location (e.g., 'home', ''). Note: available district level data. residence_province_ine_code factor. province residence group individual making trip, encoded according INE classification. Note: available district level data. residence_province_name factor. province residence group individuals making trip (e.g., 'Cuenca', 'Girona'). Note: available district level data. time_slot integer. time slot (hour day) trip started, represented integer (e.g., 0, 1, 2). distance factor. distance category trip, represented code (e.g., '002-005' 2-5 km). n_trips double. number trips taken within specified time slot distance. trips_total_length_km double. total length trips kilometers specified time slot distance. year double. year trip. month double. month trip. day double. day trip. structure original data od_csv_raw follows: fecha Date. date trip, including year, month, day. origen character. identifier origin location trip, formatted character string (e.g., '01001_AM'). destino character. identifier destination location trip, formatted character string (e.g., '01001_AM'). actividad_origen character. type activity origin location (e.g., 'casa', 'trabajo'). actividad_destino character. type activity destination location (e.g., 'otros', 'trabajo'). residencia character. code representing residence individual making trip (e.g., '01') according official INE classification. edad character. age individual making trip. data actaully filled 'NA' values, column removed cleaned-translated view described . periodo integer. time period trip started, represented integer (e.g., 0, 1, 2). distancia character. distance category trip, represented character string (e.g., '002-005' 2-5 km). viajes double. number trips taken within specified time period distance. viajes_km double. total length trips kilometers specified time period distance. day double. day trip. month double. month trip. year double. year trip.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_overnight_stays.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a duckdb overnight stays table — spod_duckdb_overnight_stays","title":"Create a duckdb overnight stays table — spod_duckdb_overnight_stays","text":"function creates duckdb connection overnight stays data stored folder CSV.gz files.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_overnight_stays.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a duckdb overnight stays table — spod_duckdb_overnight_stays","text":"","code":"spod_duckdb_overnight_stays( con = DBI::dbConnect(duckdb::duckdb(), dbdir = \":memory:\", read_only = FALSE), zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), ver = NULL, data_dir = spod_get_data_dir() )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_overnight_stays.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a duckdb overnight stays table — spod_duckdb_overnight_stays","text":"con duckdb connection object. specified, new -memory connection created. zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards. data_dir directory data stored. Defaults value returned spod_get_data_dir().","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_overnight_stays.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a duckdb overnight stays table — spod_duckdb_overnight_stays","text":"duckdb connection 2 views.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_set_temp.html","id":null,"dir":"Reference","previous_headings":"","what":"Set temp file for DuckDB connection — spod_duckdb_set_temp","title":"Set temp file for DuckDB connection — spod_duckdb_set_temp","text":"Set temp file DuckDB connection","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_set_temp.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set temp file for DuckDB connection — spod_duckdb_set_temp","text":"","code":"spod_duckdb_set_temp(con, temp_path = spod_get_temp_dir())"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_set_temp.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set temp file for DuckDB connection — spod_duckdb_set_temp","text":"con duckdb connection temp_path path temp folder DuckDB intermediate spilling case set memory limit /physical memory computer low perform query. default set temp directory data folder defined SPANISH_OD_DATA_DIR environment variable. Otherwise, queries folders CSV files parquet files, temporary path set current R working directory, probably undesirable, current working directory can slow storage, storage may limited space, compared data folder.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_duckdb_set_temp.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Set temp file for DuckDB connection — spod_duckdb_set_temp","text":"duckdb connection.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_expand_dates_from_regex.html","id":null,"dir":"Reference","previous_headings":"","what":"Function to expand dates from a regex — spod_expand_dates_from_regex","title":"Function to expand dates from a regex — spod_expand_dates_from_regex","text":"function generates sequence dates regular expression pattern. based provided regular expression.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_expand_dates_from_regex.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Function to expand dates from a regex — spod_expand_dates_from_regex","text":"","code":"spod_expand_dates_from_regex(date_regex)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_expand_dates_from_regex.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Function to expand dates from a regex — spod_expand_dates_from_regex","text":"date_regex regular expression match dates format yyyymmdd.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_expand_dates_from_regex.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Function to expand dates from a regex — spod_expand_dates_from_regex","text":"character vector dates matching regex.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_files_sizes.html","id":null,"dir":"Reference","previous_headings":"","what":"Get files sizes for remote files of v1 and v2 data and save them into a csv.gz file in the inst/extdata folder. — spod_files_sizes","title":"Get files sizes for remote files of v1 and v2 data and save them into a csv.gz file in the inst/extdata folder. — spod_files_sizes","text":"Get files sizes remote files v1 v2 data save csv.gz file inst/extdata folder.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_files_sizes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get files sizes for remote files of v1 and v2 data and save them into a csv.gz file in the inst/extdata folder. — spod_files_sizes","text":"","code":"spod_files_sizes(ver = 2)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_files_sizes.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get files sizes for remote files of v1 and v2 data and save them into a csv.gz file in the inst/extdata folder. — spod_files_sizes","text":"ver version data (1 2). Can . Defaults 2, v1 data updated since 2021.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get.html","id":null,"dir":"Reference","previous_headings":"","what":"Get tabular data — spod_get","title":"Get tabular data — spod_get","text":"function creates DuckDB lazy table connection object specified type zones. checks missing data downloads necessary. connnection made raw CSV files gzip archives, analysing data connection may slow select days. can manipulate object using {dplyr} functions select, filter, mutate, group_by, summarise, etc. end sequence commands need add collect execute whole chain data manipulations load results memory R data.frame/tibble. See codebooks v1 v2 data vignettes spod_codebook(1) spod_codebook(2) (spod_codebook). want analyse longer periods time (especiially several months even whole data several years), consider using spod_convert spod_connect.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get tabular data — spod_get","text":"","code":"spod_get( type = c(\"od\", \"origin-destination\", \"os\", \"overnight_stays\", \"nt\", \"number_of_trips\"), zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), dates = NULL, data_dir = spod_get_data_dir(), quiet = FALSE, max_mem_gb = max(4, spod_available_ram() - 4), max_n_cpu = parallelly::availableCores() - 1, max_download_size_gb = 1, duckdb_target = \":memory:\", temp_path = spod_get_temp_dir() )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get tabular data — spod_get","text":"type type data download. Can \"origin-destination\" (ust \"od\"), \"number_of_trips\" (just \"nt\") v1 data. v2 data \"overnight_stays\" (just \"os\") also available. data types supported future. See codebooks v1 v2 data vignettes spod_codebook(1) spod_codebook(2) (spod_codebook). zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). dates character Date vector dates process. Kindly keep mind v1 v2 data follow different data collection methodologies may directly comparable. Therefore, try request data versions date range. need compare data versions, please refer respective codebooks methodology documents. v1 data covers period 2020-02-14 2021-05-09, v2 data covers period 2022-01-01 present notice. true dates range checked available data version every function run. possible values can following: spod_get() spod_convert() functions, dates can set \"cached_v1\" \"cached_v2\" request data cached (already previously downloaded) v1 (2020-2021) v2 (2022 onwards) data. case, function identify use data files downloaded cached locally, (e.g. using explicit run spod_download(), data requests made using spod_get() spod_convert() functions). single date ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. vector dates ISO (YYYY-MM-DD) YYYYMMDD format. character Date object. Can non-consecutive sequence dates. date range eigher character Date object length 2 clearly named elements start end ISO (YYYY-MM-DD) YYYYMMDD format. E.g. c(start = \"2020-02-15\", end = \"2020-02-17\"); character object form YYYY-MM-DD_YYYY-MM-DD YYYYMMDD_YYYYMMDD. example, 2020-02-15_2020-02-17 20200215_20200217. regular expression match dates format YYYYMMDD. character object. example, ^202002 match dates February 2020. data_dir directory data stored. Defaults value returned spod_get_data_dir() returns value environment variable SPANISH_OD_DATA_DIR temporary directory variable set. quiet logical value indicating whether suppress messages. Default FALSE. max_mem_gb maximum memory use GB. conservative default 3 GB, enough resaving data DuckDB form folder CSV.gz files small enough fit memory even old computers. data analysis using already converted data (DuckDB Parquet format) raw CSV.gz data, recommended increase according available resources. max_n_cpu maximum number threads use. Defaults number available cores minus 1. max_download_size_gb maximum download size gigabytes. Defaults 1. duckdb_target (Optional) path duckdb file save data , convertation CSV reuqested spod_convert function. specified, set \":memory:\" data stored memory. temp_path path temp folder DuckDB intermediate spilling case set memory limit /physical memory computer low perform query. default set temp directory data folder defined SPANISH_OD_DATA_DIR environment variable. Otherwise, queries folders CSV files parquet files, temporary path set current R working directory, probably undesirable, current working directory can slow storage, storage may limited space, compared data folder.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get tabular data — spod_get","text":"DuckDB lazy table connection object class tbl_duckdb_connection.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get tabular data — spod_get","text":"","code":"if (FALSE) { # \\dontrun{ # create a connection to the v1 data Sys.setenv(SPANISH_OD_DATA_DIR = \"~/path/to/your/cache/dir\") dates <- c(\"2020-02-14\", \"2020-03-14\", \"2021-02-14\", \"2021-02-14\", \"2021-02-15\") od_dist <- spod_get(type = \"od\", zones = \"distr\", dates = dates) # od dist is a table view filtered to the specified dates # access the source connection with all dates # list tables DBI::dbListTables(od_dist$src$con) } # }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_data_dir.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the data directory — spod_get_data_dir","title":"Get the data directory — spod_get_data_dir","text":"function retrieves data directory environment variable SPANISH_OD_DATA_DIR. environment variable set, returns temporary directory.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_data_dir.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the data directory — spod_get_data_dir","text":"","code":"spod_get_data_dir(quiet = FALSE)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_data_dir.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the data directory — spod_get_data_dir","text":"quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_data_dir.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the data directory — spod_get_data_dir","text":"data directory.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_file_size_from_url.html","id":null,"dir":"Reference","previous_headings":"","what":"Get file size from URL — spod_get_file_size_from_url","title":"Get file size from URL — spod_get_file_size_from_url","text":"Get file size URL","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_file_size_from_url.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get file size from URL — spod_get_file_size_from_url","text":"","code":"spod_get_file_size_from_url(x_url)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_file_size_from_url.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get file size from URL — spod_get_file_size_from_url","text":"x_url URL","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_file_size_from_url.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get file size from URL — spod_get_file_size_from_url","text":"File size MB","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v1_file_list.html","id":null,"dir":"Reference","previous_headings":"","what":"Get latest file list from the XML for MITMA open mobility data v1 (2020-2021) — spod_get_latest_v1_file_list","title":"Get latest file list from the XML for MITMA open mobility data v1 (2020-2021) — spod_get_latest_v1_file_list","text":"Get latest file list XML MITMA open mobility data v1 (2020-2021)","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v1_file_list.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get latest file list from the XML for MITMA open mobility data v1 (2020-2021) — spod_get_latest_v1_file_list","text":"","code":"spod_get_latest_v1_file_list( data_dir = spod_get_data_dir(), xml_url = \"https://opendata-movilidad.mitma.es/RSS.xml\" )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v1_file_list.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get latest file list from the XML for MITMA open mobility data v1 (2020-2021) — spod_get_latest_v1_file_list","text":"data_dir directory data stored. Defaults value returned spod_get_data_dir(). xml_url URL XML file download. Defaults \"https://opendata-movilidad.mitma.es/RSS.xml\".","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v1_file_list.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get latest file list from the XML for MITMA open mobility data v1 (2020-2021) — spod_get_latest_v1_file_list","text":"path downloaded XML file.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v1_file_list.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get latest file list from the XML for MITMA open mobility data v1 (2020-2021) — spod_get_latest_v1_file_list","text":"","code":"if (FALSE) { spod_get_latest_v1_file_list() }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v2_file_list.html","id":null,"dir":"Reference","previous_headings":"","what":"Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards) — spod_get_latest_v2_file_list","title":"Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards) — spod_get_latest_v2_file_list","text":"Get latest file list XML MITMA open mobility data v2 (2022 onwards)","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v2_file_list.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards) — spod_get_latest_v2_file_list","text":"","code":"spod_get_latest_v2_file_list( data_dir = spod_get_data_dir(), xml_url = \"https://movilidad-opendata.mitma.es/RSS.xml\" )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v2_file_list.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards) — spod_get_latest_v2_file_list","text":"data_dir directory data stored. Defaults value returned spod_get_data_dir(). xml_url URL XML file download. Defaults \"https://movilidad-opendata.mitma.es/RSS.xml\".","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v2_file_list.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards) — spod_get_latest_v2_file_list","text":"path downloaded XML file.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_latest_v2_file_list.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards) — spod_get_latest_v2_file_list","text":"","code":"if (FALSE) { spod_get_latest_v2_file_list() }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_temp_dir.html","id":null,"dir":"Reference","previous_headings":"","what":"Get temporary directory for DuckDB intermediate spilling — spod_get_temp_dir","title":"Get temporary directory for DuckDB intermediate spilling — spod_get_temp_dir","text":"Get path temp folder DuckDB intermediate spilling case set memory limit /physical memory computer low perform query.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_temp_dir.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get temporary directory for DuckDB intermediate spilling — spod_get_temp_dir","text":"","code":"spod_get_temp_dir(data_dir = spod_get_data_dir())"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_temp_dir.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get temporary directory for DuckDB intermediate spilling — spod_get_temp_dir","text":"data_dir directory data stored. Defaults value returned spod_get_data_dir().","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_temp_dir.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get temporary directory for DuckDB intermediate spilling — spod_get_temp_dir","text":"path temp folder DuckDB intermediate spilling.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_valid_dates.html","id":null,"dir":"Reference","previous_headings":"","what":"Get valid dates for the specified data version — spod_get_valid_dates","title":"Get valid dates for the specified data version — spod_get_valid_dates","text":"Get valid dates specified data version","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_valid_dates.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get valid dates for the specified data version — spod_get_valid_dates","text":"","code":"spod_get_valid_dates(ver = NULL)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_valid_dates.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get valid dates for the specified data version — spod_get_valid_dates","text":"ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_valid_dates.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get valid dates for the specified data version — spod_get_valid_dates","text":"vector type Date possible valid dates specified data version (v1 2020-2021 v2 2020 onwards).","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones.html","id":null,"dir":"Reference","previous_headings":"","what":"Get zones — spod_get_zones","title":"Get zones — spod_get_zones","text":"Get spatial zones specified data version. Supports v1 (2020-2021) v2 (2022 onwards) data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get zones — spod_get_zones","text":"","code":"spod_get_zones( zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), ver = NULL, data_dir = spod_get_data_dir(), quiet = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get zones — spod_get_zones","text":"zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\") data versions. Additionaly, can \"large_urban_areas\" (\"lua\", original Spanish \"grandes_areas_urbanas\", \"gau\") v2 data (2022 onwards). ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards. data_dir directory data stored. Defaults value returned spod_get_data_dir() returns value environment variable SPANISH_OD_DATA_DIR temporary directory variable set. quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get zones — spod_get_zones","text":"sf object (Simple Feature collection). columns v1 (2020-2021) data include: id character vector containing unique identifier district, assigned data provider. id matches id_origin, id_destination, id district-level origin-destination number trips data. census_districts string semicolon-separated identifiers census districts classified Spanish Statistical Office (INE) spatially bound within polygons id. municipalities_mitma string semicolon-separated municipality identifiers (assigned data provider) corresponding district id. municipalities string semicolon-separated municipality identifiers classified Spanish Statistical Office (INE) corresponding id. district_names_in_v2/municipality_names_in_v2 string semicolon-separated district names (v2 version data) corresponding district id v1. district_ids_in_v2/municipality_ids_in_v2 string semicolon-separated district identifiers (v2 version data) corresponding district id v1. geometry MULTIPOLYGON column containing spatial geometry district, stored sf object. geometry projected ETRS89 / UTM zone 30N coordinate reference system (CRS), XY dimensions. columns v2 (2022 onwards) data include: id character vector containing unique identifier zone, assigned data provider. name character vector name district. population numeric vector representing population district (2022). census_sections string semicolon-separated identifiers census sections corresponding district. census_districts string semicolon-separated identifiers census districts classified Spanish Statistical Office (INE) corresponding district. municipalities string semicolon-separated identifiers municipalities classified Spanish Statistical Office (INE) corresponding district. municipalities_mitma string semicolon-separated identifiers municipalities, assigned data provider, correspond district. luas_mitma string semicolon-separated identifiers LUAs (Local Urban Areas) provider, associated district. district_ids_in_v1/municipality_ids_in_v1 string semicolon-separated district identifiers v1 data corresponding district v2. match exists, marked NA. geometry MULTIPOLYGON column containing spatial geometry district, stored sf object. geometry projected ETRS89 / UTM zone 30N coordinate reference system (CRS), XY dimensions.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v1.html","id":null,"dir":"Reference","previous_headings":"","what":"Retrieves the zones for v1 data — spod_get_zones_v1","title":"Retrieves the zones for v1 data — spod_get_zones_v1","text":"function retrieves zones data specified data directory. can retrieve either \"distritos\" \"municipios\" zones data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v1.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Retrieves the zones for v1 data — spod_get_zones_v1","text":"","code":"spod_get_zones_v1( zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\"), data_dir = spod_get_data_dir(), quiet = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v1.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Retrieves the zones for v1 data — spod_get_zones_v1","text":"zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\"). data_dir directory data stored. quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v1.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Retrieves the zones for v1 data — spod_get_zones_v1","text":"sf object (Simple Feature collection) 2 fields: id character vector containing unique identifier zone, matched identifiers tabular data. geometry MULTIPOLYGON column containing spatial geometry zone, stored sf object. geometry projected ETRS89 / UTM zone 30N coordinate reference system (CRS), XY dimensions.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v1.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Retrieves the zones for v1 data — spod_get_zones_v1","text":"","code":"if (FALSE) { zones <- spod_get_zones_v1() }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v2.html","id":null,"dir":"Reference","previous_headings":"","what":"retrieves the zones data — spod_get_zones_v2","title":"retrieves the zones data — spod_get_zones_v2","text":"function retrieves zones data specified data directory. can retrieve either \"distritos\" \"municipios\" zones data.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v2.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"retrieves the zones data — spod_get_zones_v2","text":"","code":"spod_get_zones_v2( zones = c(\"districts\", \"dist\", \"distr\", \"distritos\", \"municipalities\", \"muni\", \"municip\", \"municipios\", \"lua\", \"large_urban_areas\", \"gau\", \"grandes_areas_urbanas\"), data_dir = spod_get_data_dir(), quiet = FALSE )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v2.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"retrieves the zones data — spod_get_zones_v2","text":"zones zones download data. Can \"districts\" (\"dist\", \"distr\", original Spanish \"distritos\") \"municipalities\" (\"muni\", \"municip\", original Spanish \"municipios\"). data_dir directory data stored. quiet logical value indicating whether suppress messages. Default FALSE.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v2.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"retrieves the zones data — spod_get_zones_v2","text":"sf object (Simple Feature collection) 4 fields: id character vector containing unique identifier zone, matched identifiers tabular data. name character vector name zone. population numeric vector representing population zone (2022). geometry MULTIPOLYGON column containing spatial geometry zone, stored sf object. geometry projected ETRS89 / UTM zone 30N coordinate reference system (CRS), XY dimensions.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_get_zones_v2.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"retrieves the zones data — spod_get_zones_v2","text":"","code":"if (FALSE) { zones <- spod_get_zones_v2() }"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_is_data_version_overlaps.html","id":null,"dir":"Reference","previous_headings":"","what":"Check if specified dates span both data versions — spod_is_data_version_overlaps","title":"Check if specified dates span both data versions — spod_is_data_version_overlaps","text":"function checks specified dates date ranges span v1 v2 data versions.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_is_data_version_overlaps.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Check if specified dates span both data versions — spod_is_data_version_overlaps","text":"","code":"spod_is_data_version_overlaps(dates)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_is_data_version_overlaps.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Check if specified dates span both data versions — spod_is_data_version_overlaps","text":"dates Dates vector dates check.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_is_data_version_overlaps.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Check if specified dates span both data versions — spod_is_data_version_overlaps","text":"TRUE dates span data versions, FALSE otherwise.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_match_data_type.html","id":null,"dir":"Reference","previous_headings":"","what":"Match data types for normalisation — spod_match_data_type","title":"Match data types for normalisation — spod_match_data_type","text":"Match data types normalisation","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_match_data_type.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Match data types for normalisation — spod_match_data_type","text":"","code":"spod_match_data_type( type = c(\"od\", \"origin-destination\", \"viajes\", \"os\", \"overnight_stays\", \"pernoctaciones\", \"nt\", \"number_of_trips\", \"personas\") )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_match_data_type.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Match data types for normalisation — spod_match_data_type","text":"type type data match. Can \"od\", \"origin-destination\", \"os\", \"overnight_stays\", \"nt\", \"number_of_trips\".","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_match_data_type_for_local_folders.html","id":null,"dir":"Reference","previous_headings":"","what":"Match data types to folders — spod_match_data_type_for_local_folders","title":"Match data types to folders — spod_match_data_type_for_local_folders","text":"Match data types folders","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_match_data_type_for_local_folders.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Match data types to folders — spod_match_data_type_for_local_folders","text":"","code":"spod_match_data_type_for_local_folders( type = c(\"od\", \"origin-destination\", \"os\", \"overnight_stays\", \"nt\", \"number_of_trips\"), ver = c(1, 2) )"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_match_data_type_for_local_folders.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Match data types to folders — spod_match_data_type_for_local_folders","text":"ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_read_sql.html","id":null,"dir":"Reference","previous_headings":"","what":"Load an SQL query, glue it, dplyr::sql it — spod_read_sql","title":"Load an SQL query, glue it, dplyr::sql it — spod_read_sql","text":"Load SQL query specified file package installation directory, glue::collapse , glue::glue case variables need replaced, dplyr::sql additional safety.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_read_sql.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Load an SQL query, glue it, dplyr::sql it — spod_read_sql","text":"","code":"spod_read_sql(sql_file_name)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_read_sql.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Load an SQL query, glue it, dplyr::sql it — spod_read_sql","text":"sql_file_name name SQL file load package installation directory.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_read_sql.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Load an SQL query, glue it, dplyr::sql it — spod_read_sql","text":"Text SQL query class sql/character.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_sql_where_dates.html","id":null,"dir":"Reference","previous_headings":"","what":"Generate a WHERE part of an SQL query from a sequence of dates — spod_sql_where_dates","title":"Generate a WHERE part of an SQL query from a sequence of dates — spod_sql_where_dates","text":"Generate part SQL query sequence dates","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_sql_where_dates.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generate a WHERE part of an SQL query from a sequence of dates — spod_sql_where_dates","text":"","code":"spod_sql_where_dates(dates)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_sql_where_dates.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generate a WHERE part of an SQL query from a sequence of dates — spod_sql_where_dates","text":"dates Dates vector dates process.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_sql_where_dates.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generate a WHERE part of an SQL query from a sequence of dates — spod_sql_where_dates","text":"character vector SQL query.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_clean_data_cache.html","id":null,"dir":"Reference","previous_headings":"","what":"Get clean data subfolder name — spod_subfolder_clean_data_cache","title":"Get clean data subfolder name — spod_subfolder_clean_data_cache","text":"Change subfolder name code function clean data cache apply globally, functions package use function get clean data cache path.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_clean_data_cache.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get clean data subfolder name — spod_subfolder_clean_data_cache","text":"","code":"spod_subfolder_clean_data_cache(ver = 1)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_clean_data_cache.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get clean data subfolder name — spod_subfolder_clean_data_cache","text":"ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_clean_data_cache.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get clean data subfolder name — spod_subfolder_clean_data_cache","text":"Character string subfolder name clean data cache.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_metadata_cache.html","id":null,"dir":"Reference","previous_headings":"","what":"Get metadata cache subfolder name — spod_subfolder_metadata_cache","title":"Get metadata cache subfolder name — spod_subfolder_metadata_cache","text":"Change subfolder name code function metadata cache apply globally, functions package use function get metadata cache path.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_metadata_cache.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get metadata cache subfolder name — spod_subfolder_metadata_cache","text":"","code":"spod_subfolder_metadata_cache()"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_metadata_cache.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get metadata cache subfolder name — spod_subfolder_metadata_cache","text":"Character string subfolder name raw data cache.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_raw_data_cache.html","id":null,"dir":"Reference","previous_headings":"","what":"Get raw data cache subfolder name — spod_subfolder_raw_data_cache","title":"Get raw data cache subfolder name — spod_subfolder_raw_data_cache","text":"Change subfolder name code function raw data cache apply globally, functions package use function get raw data cache path.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_raw_data_cache.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get raw data cache subfolder name — spod_subfolder_raw_data_cache","text":"","code":"spod_subfolder_raw_data_cache(ver = 1)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_raw_data_cache.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get raw data cache subfolder name — spod_subfolder_raw_data_cache","text":"ver Integer. Can 1 2. version data use. v1 spans 2020-2021, v2 covers 2022 onwards.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_subfolder_raw_data_cache.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get raw data cache subfolder name — spod_subfolder_raw_data_cache","text":"Character string subfolder name raw data cache.","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_unique_separated_ids.html","id":null,"dir":"Reference","previous_headings":"","what":"Remove duplicate values in a semicolon-separated string — spod_unique_separated_ids","title":"Remove duplicate values in a semicolon-separated string — spod_unique_separated_ids","text":"Remove duplicate IDs semicolon-separated string selected column data frame","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_unique_separated_ids.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Remove duplicate values in a semicolon-separated string — spod_unique_separated_ids","text":"","code":"spod_unique_separated_ids(column)"},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_unique_separated_ids.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Remove duplicate values in a semicolon-separated string — spod_unique_separated_ids","text":"column character vector column data frame remove duplicates .","code":""},{"path":"https://rOpenSpain.github.io/spanishoddata/reference/spod_unique_separated_ids.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Remove duplicate values in a semicolon-separated string — spod_unique_separated_ids","text":"character vector semicolon-separated unique IDs.","code":""}]