-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
105 lines (76 loc) · 4.29 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# phsopendata
<!-- badges: start -->
[![GitHub release (latest by date)](https://img.shields.io/github/v/release/Public-Health-Scotland/phsopendata)](https://github.com/Public-Health-Scotland/phsopendata/releases/latest)
[![R-CMD-check](https://github.com/Public-Health-Scotland/phsopendata/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/Public-Health-Scotland/phsopendata/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/Public-Health-Scotland/phsopendata/branch/master/graph/badge.svg)](https://app.codecov.io/gh/Public-Health-Scotland/phsopendata?branch=master)
<!-- badges: end -->
`phsopendata` contains functions to interact with open data from the [Scottish Health and Social Care Open Data platform](https://www.opendata.nhs.scot/) via the CKAN API.
- `get_resource()` extracts a single resource from an open dataset by resource id
- `get_latest_resource()` extracts the most recent resource from applicable datasets, by dataset name
- `get_dataset()` extracts multiple resources from an open dataset by dataset name
- `list_datasets()` returns the names of all available datasets
- `list_resources()` returns information on all resources within an open dataset by dataset name
`phsopendata` can be used on both Posit Workbench and desktop versions of RStudio.
## Installation
You need to install `phsopendata` from GitHub, which requires a package like `remotes` or `devtools`.
Using `remotes` you run this to install the package:
```{r gh-installation, eval = FALSE}
remotes::install_github("Public-Health-Scotland/phsopendata",
upgrade = "never"
)
```
## Examples
### Downloading a data table with `get_resource()`
To extract a specific resource, you will need its unique identifier - resource id. This can be found in the dataset metadata, the URL of a resource's page on https://www.opendata.nhs.scot/, or extracted using `list_resources()`.
```{r example resource, eval = FALSE}
library(phsopendata)
# define a resource ID
res_id <- "a794d603-95ab-4309-8c92-b48970478c14"
# download the data from the CKAN database
data <- get_resource(res_id = "a794d603-95ab-4309-8c92-b48970478c14")
```
### Querying/filtering data with `get_resource()`
You can define a row limit with the `rows` argument to get the first *N* rows of a table.
```{r example row, eval = FALSE}
# get first 100 rows
get_resource(
res_id = "a794d603-95ab-4309-8c92-b48970478c14",
rows = 100
)
```
You can use `col_select` and `row_filters` to query the data server-side (i.e., the data is filtered before it is downloaded to your machine).
```{r example query, eval = FALSE}
# get first 100 rows
get_resource(
res_id = "a794d603-95ab-4309-8c92-b48970478c14",
col_select = c("GPPracticeName", "TelephoneNumber"),
row_filters = list(
HB = "S08000017",
Dispensing = "Y"
)
)
```
### Downloading multiple tables with `get_dataset()`
To extract all resources from a dataset, you will need to use the *dataset name*. Note that this will differ from the *dataset title* that displays on the website. This can be found using `list_datasets()`, or taken from the dataset URL.
In this example, we are downloading GP Practice Population Demographics from: [opendata.nhs.scot/dataset/*gp-practice-populations*](https://www.opendata.nhs.scot/dataset/gp-practice-populations), so the dataset name will be gp-practice-populations.
```{r example dataset, eval = FALSE}
# if max_resources is not set, all resources will be returned by default.
# Here we pull 10 rows from the first 2 resources only
get_dataset("gp-practice-populations", max_resources = 2, rows = 10)
```
## Contributing to phsopendata
At present, this package is maintained by [Csilla Scharle](https://github.com/csillasch).
If you have requests or suggestions for additional functionality, please contact the package maintainer and/or the [PHS Open Data team]([email protected]).
If you would like to share examples of how you work with open data, you can also do so in the [Open Data repository](https://github.com/Public-Health-Scotland/Open-Data), where example scripts and resources are collated.