-
Notifications
You must be signed in to change notification settings - Fork 0
/
IncomingStudents.Rmd
146 lines (111 loc) · 4.09 KB
/
IncomingStudents.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
title: "Incoming Degree Students"
author: "Martine Jansen"
date: "30 januari 2019"
output:
word_document:
reference_docx: template.docx
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE)
library(tidyverse)
library(knitr)
```
```{r readData}
# read file IncomingInternationalDegreeStudents.csv
location_of_data <- file.path("data",
"IncomingInternationalDegreeStudents.csv")
dStudents <- read_csv2(location_of_data) %>%
# the data is not in tidy format,
# reshape wide to long using tidyr::gather
# the first column (Program) is ok, no gathering necessary
gather(key = "Year", value = "n", - Program, convert = TRUE)
```
```{r calculations}
# The first year in the data
min_year <- min(dStudents$Year)
# the most recent year in the data
max_year <- max(dStudents$Year)
diff_years <- max_year - min_year
# totals per year, only for the min and the max year
dTotalPerYear <- dStudents %>%
group_by(Year) %>%
summarise(Total = sum(n)) %>%
filter(Year %in% c(min_year, max_year))
# base R for calculating the total in max_year
total_max_year <- dTotalPerYear$Total[dTotalPerYear$Year == max_year] %>%
# the counts were numeric, we want an integer
as.integer()
# another way for the min_year
total_min_year <- dTotalPerYear %>%
filter(Year == min_year) %>%
select(Total) %>%
# now still a dataset, we want only the value
as.integer()
# the growth
growth_between <- (total_max_year - total_min_year) / total_min_year
# and now formatted as percentage
growth_between <- sprintf("%.0f%%", 100* growth_between)
```
# Introduction
In `r min_year`, `r total_min_year` international students were enrolled in a higher education institution in the Netherlands. In `r max_year` there were `r total_max_year`, a growth of `r growth_between` over a period of `r diff_years` years.
## A figure
```{r plotperprogram, fig.height = 2, fig.width = 6}
dStudents %>%
ggplot(aes(x = Year, y = n, color = Program)) +
geom_line(size = 1) +
# better values on the x axis:
scale_x_continuous(breaks = seq(from = min_year, to = max_year, by = 2)) +
# add titles and better labels
labs(y = "students",
title = "Incoming international students",
# caption in ggplot is used for source & credits information
caption = "(Data: Nuffic | Adaptation; Author)") +
# another theme
theme_minimal() +
# caption on the left, label x axis on the right
# and set some font sizes
theme(plot.caption = element_text(hjust = 0, size = 8),
plot.title = element_text(size = 10),
axis.title = element_text(size = 9),
axis.title.x = element_text(hjust = 1),
legend.title = element_text(size = 9),
legend.text = element_text(size = 8))
```
## A table
```{r}
dStudents %>%
filter(Year %in% c(min_year, max_year)) %>%
group_by(Year) %>%
mutate(Total_per_year = sum(n),
part = n / Total_per_year,
part = paste0(round(100*part,0), "%")) %>%
select(Program, Year, part) %>%
spread(Year, part) %>%
# caption in kable means title
kable(caption = "Choice of Program, per year",
align = "lrr")
```
## Extra
```{r, fig.width= 8}
dStudents %>%
group_by(Year) %>%
mutate(Total_per_year = sum(n),
part = n / Total_per_year,
part_label = paste0(round(100*part,0), "%")) %>%
ggplot(aes(x = as.factor(Year), y = part, fill = Program, label = part_label)) +
geom_bar(stat = "identity") +
geom_text(position = position_stack(vjust = 0.5),
size = 4,
color = "white",
fontface = "bold") +
theme_minimal() +
theme(plot.caption = element_text(hjust = 0, size = 8),
axis.title.y=element_blank(),
axis.text.y=element_blank(),
axis.ticks.y=element_blank()) +
labs(title = "Relative more students in NL for Research University",
subtitle = "(2006 - 2017)",
x = "Year",
caption = "(Data: Nuffic | Adaptation; Author)")
```