generated from jtr13/EDAVtemplate
-
Notifications
You must be signed in to change notification settings - Fork 0
/
results.Rmd
563 lines (434 loc) · 29.1 KB
/
results.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
# Results
## Monthly Unemployment Rate
In this section, we will visualize how U.S. unemployment rates changed each month from January 2012 to November 2022. We will visualize this time-series data using a line chart. Moreover, we will explore whether unemployment rates have a cyclical pattern for every month during the past eleven years.
### Visualizing Time-series Data: Unemployment Rate
```{r}
# environment set up
library(tidyverse)
library(readxl)
library(tidyr)
library(dplyr)
library(ggplot2)
library(lubridate)
library(forcats)
library("viridis")
```
```{r}
# read in data
unemployed <- read_excel("data/unemployed.xlsx")
unemployrate <- read_excel("data/unemployedrate.xlsx")
```
```{r}
# data preprocess
unemployed_pop <-
unemployed %>%
pivot_longer(cols = !Year, names_to = "Month",values_to = "Unemployed")
unemployed_pop$month <- as.integer(factor(unemployed_pop$Month, levels = month.abb))
unemployed_pop <-
unemployed_pop %>%
mutate(date = make_date(Year,month))
```
```{r}
unemployed_rate <-
unemployrate %>%
pivot_longer(cols = !Year, names_to = "Month",values_to = "UnemploymentRate")
unemployed_rate$month <- as.integer(factor(unemployed_rate$Month, levels = month.abb))
unemployed_rate <-
unemployed_rate %>%
mutate(date = make_date(Year,month))
```
```{r}
unemployment <-
unemployed_pop %>%
inner_join(unemployed_rate, by = c("Year","Month","date","month"))
```
```{r}
unemployment_p <-
unemployment %>%
pivot_longer(cols = c(Unemployed, UnemploymentRate), names_to = "Type", values_to = "value")
```
```{r}
# plot: line
unemployment <- na.omit(unemployment)
ggplot(unemployment, aes(date, UnemploymentRate)) +
geom_line() +
ggtitle("Unemployment Rates From Jan 2012 to Nov 2022")+
geom_point(data = unemployment, aes(date, UnemploymentRate), color = "deeppink") +
scale_color_viridis_c() +
xlab("Date") +
ylab("Unemployment Rate") +
theme(plot.title = element_text(size=14, vjust=TRUE, hjust=0.5))
```
From the plot above, we can see that the unemployment rate has been decreasing since 2012, which is very impressive. However, as you can see from the plot, the unemployment rate suddenly soared at the beginning of 2020, and it reached its highest point in April 2020 (14.7). After April 2020, it has been decreasing gradually. Till 2022, it almost went back to the unemployment level before Covid-19.
### Exploring Cyclic Pattern of Unemployment Rate
```{r}
unemployment$Month_f = factor(unemployment$Month,
levels=c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'))
ggplot(unemployment, aes(x = date, y = UnemploymentRate)) +
geom_line() +
facet_wrap(~Month_f) +
ggtitle("Unemployment Rate in Different Months") +
labs(x="Date", y = "Unemployment Rate")+
theme(plot.title = element_text(size=14, vjust=TRUE, hjust=0.5)) +
scale_color_viridis_c()
```
As you can see from the plots, each month's unemployment rate trend is very similar. The unemployment rate has been decreasing since 2012 until the beginning of 2020. The unemployment rate suddenly soared, and it reached its highest point in February 2020. After February 2020, it has been decreasing gradually. And we can notice that the unemployment rate in April and May has changed the most dramatically.
## Unemployment Rate by State
In this section, we will visualize the unemployment rates of each state in the U.S. for the last 36 months. We have divided the past 36 months into three periods, listed as Oct 2019 - Sep 2020, Oct 2020 - Sep 2021, and Oct 2021 - Sep 2022. We will explore the distribution of each state’s unemployment rate using a histogram. Also, the average unemployment rate is calculated for each of these three periods for each state, so as to compare each state’s unemployment rate using a Cleveland Dot plot. Moreover, we explore the geographical distribution of unemployment rates using a map.
```{r}
library(rvest)
library(dplyr)
library(tidyverse)
library(ggplot2)
library(ggmap)
library(gridExtra)
library(cdparcoord)
library("readxl")
# get tables from html and merge 12 tables
generate_table <- function(year_1, year_2){
filename = ""
month <- c("10", "11", "12", "01", "02", "03", "04", "05", "06", "07", "08", "09")
cnt <- 2
df <- html_table(read_html(paste("data/", year_1, "10.html", sep="")))[3]
repeat {
if (cnt <= 3) {
filename <- paste("data/", year_1, month[cnt], ".html", sep="")
}
if (cnt >3 && cnt <= 12) {
filename <- paste("data/", year_2, month[cnt], ".html", sep="")
}
if (cnt > 12) {
break
}
df1 <- html_table(read_html(filename))[3]
df <- merge(df, df1)
cnt <- cnt + 1
}
# replace "-" with NA and transform char columns into double columns
df <- df %>% mutate(across(where(is.character) & !c(State), ~as.numeric(na_if(., "-"))))
df <- data.frame(df)
return(df)
}
```
```{r}
# get unemployment rates from October 2019 to September 2020
df_2020 <- generate_table("2019", "2020")
# calculate the mean by state of this year
df_2020$Mean <- rowMeans(df_2020[,2:13])
# get unemployment rates from October 2020 to September 2021
df_2021 <- generate_table("2020", "2021")
# calculate the mean by state of this year
df_2021$Mean <- rowMeans(df_2021[,2:13])
# get unemployment rates from October 2021 to September 2022
df_2022 <- generate_table("2021", "2022")
# calculate the mean by state of this year
df_2022$Mean <- rowMeans(df_2022[,2:13])
# merge the mean over these three years
df <- df_2020["State"]
df["mean_2020"] <- df_2020["Mean"]
df["mean_2021"] <- df_2021["Mean"]
df["mean_2022"] <- df_2022["Mean"]
# considering Puerto Rico contains NA values, remove it
df <- df %>% filter(State != "Puerto Rico")
# for convenience, rename the columns of the dataframe
names(df)[names(df) == "mean_2020"] <- "Oct 2019 ~ Sep 2020"
names(df)[names(df) == "mean_2021"] <- "Oct 2020 ~ Sep 2021"
names(df)[names(df) == "mean_2022"] <- "Oct 2021 ~ Sep 2022"
# transform to tidy data
df_tidy <- df %>% pivot_longer(cols = !State, names_to = "Year",values_to = "mean")
```
### Unemployment Rate Distribution among Each State
```{r}
ggplot(df_tidy, aes(x = `mean`)) +
geom_histogram(alpha=0.9, binwidth = 1, fill="lightblue") +
facet_wrap(~ Year, ncol=1) +
xlab("Average Unemployment Rate (%)") +
ylab("Frequency") +
ggtitle("Unemployment Rate Distribution of Each State in Recent Three Years") +
theme(axis.text.x = element_text(size = 8, vjust=0.5)) +
theme(axis.text.y = element_text(size = 6)) +
theme(plot.title = element_text(size=12, vjust=TRUE, hjust=0.5)) +
scale_color_viridis_c()
```
These three histogram graphs present the distribution of average unemployment rates of each state in the U.S. from October 2019 to September 2020, from October 2020 to September 2021, and from October 2021 to September 2022. The x-axis represents the average unemployment rate (%), and the y-axis represents how many states have such an unemployment rate. From October 2019 to September 2020, each state's unemployment rate ranged from 3% to 12.5%, and most states had more than 5% unemployment rates. There was also a state with an average unemployment rate of over 11%. From October 2020 to September 2021, the average unemployment rate of most states ranged from 3.5% to 7.5%. It is obvious that the median of each state's unemployment rate shifted left, compared to the previous 12 months, which indicates that the unemployed situation was alleviated to some extent. As for the period of October 2021 to September 2022, the median of each state's unemployment rate shifted left further. Most states had decreased their unemployment rate below 5%, verifying that job markets recovered after the Covid-19 pandemic.
### Visualizing Unemployment Rates of Each State by Cleveland Dot Plot
```{r}
# plot the Cleveland plot
ggplot(df, aes(y=reorder(State, `Oct 2019 ~ Sep 2020`))) +
geom_point(aes(x=`Oct 2019 ~ Sep 2020`,color="Oct 2019 - Sep 2020")) +
geom_point(aes(x=`Oct 2020 ~ Sep 2021`,color="Oct 2020 - Sep 2021")) +
geom_point(aes(x=`Oct 2021 ~ Sep 2022`,color="Oct 2021 - Sep 2022")) +
xlab("Unemployment Rate(%)") +
ylab("State")+
ggtitle("Unemploymemt Rates for Recent Three Years of Each State in the U.S.") +
scale_color_viridis_d() +
theme(axis.text.x = element_text(size = 8, vjust=0.5)) +
theme(axis.text.y = element_text(size = 6)) +
theme(plot.title = element_text(size=12, vjust=TRUE, hjust=0.5)) +
scale_color_viridis_d()
```
This Cleveland dot plot exhibits the unemployment rates of each state in the United States during three different periods of recent 36 months. The purple, green, and yellow dots represent the average monthly unemployment rates from October 2019 to September 2020, from October 2020 to September 2021, and from October 2021 to September 2022. The states on the y-axis are ordered by the average monthly unemployment rates between October 2019 and September 2020. Due to the missing data on Puerto Rico, this graph does not include the unemployment rate data of this state. As this graph shows, in the last 36 months, all the states had the highest unemployment rates between October 2019 and September 2020, and approximately ten states reached more than a 7.5% unemployment rate during this period. These extremely high unemployment rates resulted from the burst of the Covid-19 pandemic, which hugely impacted various industries and caused many people to lose their jobs. During this period, Nevada had the highest unemployment rate of over 12.5% among all the states, while South Dakota had the lowest, which is around 4%. However, this difficult unemployed situation was gradually alleviated in the following 24 months. For most U.S. states, the average unemployment rates for the recent 24-12 months and the recent 12 months gradually decreased from October 2020 to September 2022, except for Connecticut. The average unemployment rate in Connecticut between October 2020 and September 2021 was even higher than that between October 2019 and September 2020. Until September 2022, most states had lowered their average unemployment rate for the recent 12 months below 5%.
### Geographical Distribution of Unemployment Rates
```{r}
# merge our dataframe and geographical data
us_states <- map_data("state")
df_tidy$region <- tolower(df_tidy$State)
df_us <- left_join(us_states, df_tidy, by="region")
```
```{r}
# plot the map
ggplot(data = df_us, mapping = aes(x = long, y = lat,
group = group, fill = mean)) +
facet_wrap(~ Year, ncol=1) +
geom_polygon(color = "gray90", size = 0.1) +
coord_map(projection = "albers", lat0 = 39, lat1 = 45) +
scale_fill_gradient2(low = "white", high = "#00A9FF") +
labs(fill = "Unemployment Rate (%)") +
xlab("Longitude") +
ylab("Latitude")+
ggtitle("Unemployment Rates by Each State in Recent Three Years") +
theme(axis.text.x = element_text(size = 8, vjust=0.5)) +
theme(axis.text.y = element_text(size = 6)) +
theme(plot.title = element_text(size=12, vjust=TRUE, hjust=0.5))
```
The above maps show the geographical distribution of average monthly unemployment rates in the U.S. from October 2019 to September 2020, from October 2020 to September 2021, and from October 2021 to September 2022. The shade of color reflects the average unemployment rates during different periods. Deeper blue represents a higher unemployment rate, while lighter blue represents a lower one. We can see the most serious unemployed situations happened from October 2019 to September 2020 from the deeper blue across the U.S. because of the burst of Covid-19. During this period, Nevada had the highest average unemployment rate of over 10% since this state is colored with the deepest blue. In the following 24 months, the unemployment rates in most states gradually dropped as the color on the map gradually becomes lighter. The average unemployment rate from October 2020 to September 2021 in each state was lower than that from October 2019 to October 2020. Also, the average unemployment rate from October 2021 to September 2022 in each state was lower than that from October 2020 to October 2021. These three maps all reveal a pattern that in a specific period, western, southern, and northeastern states suffered more severe unemployed than states in the north and middle, as western, southern, and northeastern states are colored with deeper blue on the map than other states.
## Unemployment Rate by Race
In this section, we will explore how unemployment rates differ among White, Asian, and Black Americans from September 2021 to September 2022. We will visualize these data using a grouped bar chart.
```{r}
# read in data
white_uemp <- read_excel("data/white_unemployed.xlsx")
white_uemr <- read_excel("data/white_unemployment_rate.xlsx")
black_uemp <- read_excel("data/black_unemployed.xlsx")
black_uemr <- read_excel("data/black_unemployment_rate.xlsx")
asian_uemp <- read_excel("data/asian_unemployed.xlsx")
asian_uemr <- read_excel("data/asian_unemployment_rate.xlsx")
```
```{r}
white_uemp <-
white_uemp %>%
pivot_longer(cols = !Year, names_to = "Month",values_to = "Unemployed")
white_uemp$month <- as.integer(factor(white_uemp$Month, levels = month.abb))
white_uemp <-
white_uemp %>%
mutate(date = make_date(Year,month))
```
```{r}
white_uemr <-
white_uemr %>%
pivot_longer(cols = !Year, names_to = "Month",values_to = "UnemploymentRate")
white_uemr$month <- as.integer(factor(white_uemr$Month, levels = month.abb))
white_uemr <-
white_uemr %>%
mutate(date = make_date(Year,month))
white_uemr$race = "White"
```
```{r}
black_uemp <-
black_uemp %>%
pivot_longer(cols = !Year, names_to = "Month",values_to = "Unemployed")
black_uemp$month <- as.integer(factor(black_uemp$Month, levels = month.abb))
black_uemp <-
black_uemp %>%
mutate(date = make_date(Year,month))
```
```{r}
black_uemr <-
black_uemr %>%
pivot_longer(cols = !Year, names_to = "Month",values_to = "UnemploymentRate")
black_uemr$month <- as.integer(factor(black_uemr$Month, levels = month.abb))
black_uemr <-
black_uemr %>%
mutate(date = make_date(Year,month))
black_uemr$race = "Black"
```
```{r}
asian_uemp <-
asian_uemp %>%
pivot_longer(cols = !Year, names_to = "Month",values_to = "Unemployed")
asian_uemp$month <- as.integer(factor(asian_uemp$Month, levels = month.abb))
asian_uemp <-
asian_uemp %>%
mutate(date = make_date(Year,month))
```
```{r}
asian_uemr <-
asian_uemr %>%
pivot_longer(cols = !Year, names_to = "Month",values_to = "UnemploymentRate")
asian_uemr$month <- as.integer(factor(asian_uemr$Month, levels = month.abb))
asian_uemr <-
asian_uemr %>%
mutate(date = make_date(Year,month))
asian_uemr$race = "Asian"
```
```{r}
race_uemr1 <- rbind(white_uemr,black_uemr)
race_uemr <- rbind(race_uemr1, asian_uemr)
```
```{r}
# plot: grouped bar
race_plot <-
race_uemr %>%
filter(date>= '2021-09-01' & date <= '2022-09-01')
```
```{r}
race_plot$Month_f = factor(race_plot$Month,
levels=c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'))
```
```{r}
ggplot(race_plot, aes(x = date, y = UnemploymentRate, fill = race)) +
geom_col(position = "dodge")+
labs(x="Date", y="Unemployment Rate") +
scale_fill_viridis_d() +
ggtitle("Unemployment Rates for Different Races from September 2021 to September 2022") +
theme(plot.title = element_text(size=12, vjust=TRUE, hjust=0.5))
#scale_x_date(limits = c(as.Date("2021-09-01"), as.Date("2022-09-01")), date_labels = "%b %Y")
```
From the grouped bar chart above, we first notice that the general unemployment rate trend from Sept 2021 to Sept 2022 was slightly decreasing. Among the three races we investigated, Black Americans showed the highest unemployment rate during this period, which was twice as much as the unemployment rate of either White Americans or Asian Americans. In addition, we can notice that before January 2022. the unemployment rate of Asian Americans was higher than White Americans, while White Americans displayed a higher unemployment rate than Asian Americans after January 2022.
## Unemployment Situation by Age and Sex
In this section, we will explore how unemployment situations are affected by different ages and sex over various periods. First, we will investigate the unemployed population and unemployment rates in September 2019, 2020, 2021, and 2022 with stacked bar charts, grouped by sex and age. Second, we will visualize the unemployment rates from September 2021 to September 2022 by various ages using Heatmaps. Then, we categorize the unemployed population into three classes, and we will explore the relationships among sex, age, and unemployed population from September 2021 to September 2022 using mosaic plots.
```{r}
# library(tidyverse)
library(ggplot2)
library(magrittr) # needs to be run every time you start R and want to use %>%
library(ggplot2)
library(dplyr)
library(tidyr)
library(grid)
library(vcd)
library(vcdExtra)
library(dplyr)
library(redav)
```
```{r}
population = read.csv("data/upop.csv")
rate = read.csv("data/urate.csv")
rate_monthly_raw = read.csv("data/urate_monthly.csv")
population_monthly_raw = read.csv("data/upop_monthly.csv")
rate_monthly = read.csv("data/urate_monthly.csv")
population_monthly = read.csv("data/upop_monthly.csv")
colnames(population) <- c('Age','Sex','2019Q3','2020Q3','2021Q3','2022Q3','2019Sep','2020Sep','2021Sep','2022Sep')
colnames(rate) <- c('Age','Sex','2019Q3','2020Q3','2021Q3','2022Q3','2019Sep','2020Sep','2021Sep','2022Sep')
quarter_ppl <- population[,c(1, 2, 3, 4, 5, 6)]
sep_ppl <- population[,c(1, 2, 7, 8, 9, 10)]
quarter_rate <- rate[,c(1, 2, 3, 4, 5, 6)]
sep_rate <- rate[,c(1, 2, 7, 8, 9, 10)]
colnames(rate_monthly) <- c('Age','Sex','Sep21','Oct21','Nov21','Dec21',
'Jan22','Feb22','Mar22','Apr22','May22','Jun22',
'Jul22','Aug22','Sep22')
colnames(population_monthly) <- c('Age','Sex','Sep21','Oct21','Nov21','Dec21',
'Jan22','Feb22','Mar22','Apr22','May22','Jun22',
'Jul22','Aug22','Sep22',"mean_ppl")
# The following data is for Stacked Bar Charts.
quarter_ppl <- quarter_ppl %>%
pivot_longer(!Sex & !Age, names_to = "Year", values_to = "Unemployment_ppl")
quarter_ppl <- quarter_ppl %>% filter(quarter_ppl$Sex != "Total")
sep_ppl <- sep_ppl %>%
pivot_longer(!Sex & !Age, names_to = "Year", values_to = "Unemployment_population")
sep_ppl <- sep_ppl %>% filter(sep_ppl$Sex != "Total")
sep_ppl$Sex <- factor(sep_ppl$Sex, levels = c('Women','Men'))
quarter_rate <- quarter_rate %>%
pivot_longer(!Sex & !Age, names_to = "Year", values_to = "Unemployment_rate")
quarter_rate <- quarter_rate %>% filter(quarter_rate$Sex != "Total")
sep_rate <- sep_rate %>%
pivot_longer(!Sex & !Age, names_to = "Year", values_to = "Unemployment_rate")
sep_rate <- sep_rate %>% filter(sep_rate$Sex != "Total")
sep_rate$Sex <- factor(sep_rate$Sex, levels = c('Women','Men'))
# The following data is for Heatmap.
rate_monthly <- rate_monthly %>%
pivot_longer(!Sex & !Age, names_to = "Month", values_to = "Unemployment_rate")
rate_monthly <- rate_monthly %>% filter(rate_monthly$Sex == "Total")
population_monthly <- population_monthly[,c(1, 2,3,4,5,6, 7, 8, 9, 10,11,12,13,14,15)]
population_monthly <- population_monthly %>%
pivot_longer(!Sex & !Age, names_to = "Month", values_to = "Unemployment_ppl")
population_monthly <- population_monthly %>% filter(population_monthly$Sex != "Total")
a <- 7000
b <- 15000
population_monthly$age_category[population_monthly$Age == "35 to 44 years"]<- '>= 35 years old'
population_monthly$age_category[population_monthly$Age == "45 to 54 years"]<- '>= 35 years old'
population_monthly$age_category[population_monthly$Age == "55 years and over"]<- '>= 35 years old'
population_monthly$age_category[is.na(population_monthly$age_category)]<- '< 35 years old'
population_monthly$category[population_monthly$Unemployment_ppl <= a] <- '<= 7000'
population_monthly$category[population_monthly$Unemployment_ppl <= b & population_monthly$Unemployment_ppl> a] <- '> 7000 and <= 15000'
population_monthly$category[population_monthly$Unemployment_ppl>= b] <- '> 15000'
rate_monthly$Month <- factor(rate_monthly$Month, levels = c('Sep21','Oct21','Nov21','Dec21',
'Jan22','Feb22','Mar22','Apr22','May22','Jun22',
'Jul22','Aug22','Sep22'))
counts3 <- population_monthly %>%
group_by(Sex, age_category, category) %>%
summarize(Freq = n()) %>%
ungroup() %>%
complete(Sex, age_category, category, fill = list(Freq = 0))
counts3$category[counts3$category == '<= 7000' ] <- 'low'
counts3$category[counts3$category == '> 7000 and <= 15000' ] <- 'medium'
counts3$category[counts3$category == '> 15000' ] <- 'high'
counts3$category <- factor(counts3$category, levels = c('low','medium','high'))
colnames(counts3) <- c('Sex','Age','Unemployment_Population','Freq')
```
### Visualizing Unemployment by Stacked Bar Charts
```{r}
ggplot(sep_ppl, aes(fill=Sex, y=Unemployment_population, x=Age)) +
geom_bar(position="stack", stat="identity")+
facet_wrap(~Year) +
coord_flip()+ggtitle("Unemployment Population Since Covid-19 ") + theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_viridis_d()
```
```{r}
ggplot(sep_rate, aes(fill=Sex, y=Unemployment_rate, x=Age)) +
geom_bar(position="stack", stat="identity")+
facet_wrap(~Year) +
coord_flip()+ggtitle("Unemployment Rate Since Covid-19 ") + theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_viridis_d()
```
The above two groups of stacked bar charts show the unemployment population and unemployment rates during the September of each year since 2019. These data are stacked by gender and grouped by age intervals. We can see several trends from these stacked bar charts.
First, as we can see from the stacked bar charts of the unemployment population, the number of unemployed people is small between the age of 16 and 24, compared to that of people over 24. But the unemployment rate is much higher among people between 16 and 24 than people over 24. It is reasonable that the labor force of young people (16 to 24) is relatively smaller because many people at this age are still in school.
Second, we can see from the unemployment rate charts that the ratio between male unemployment and female unemployment has been close to 1:1 since 2019. It implies that unemployment after Covid-19 does not affect each gender more or less. This phenomenon could be explained by the fact that sex discrimination in the labor market has been paid more attention to by more companies, and employment does not give preference to each gender.
Third, we observe that during 2020, the unemployment rate increased dramatically compared to previous years due to Covid-19. Such a situation improved in 2021 when the unemployment rate dropped to a lower level. With the economic recovery, the unemployment rate for employers in most age intervals reached a very low level. However, the unemployment rate for young people aged from 16 to 24 is still high. It is because the economy has just recovered, and many companies need some time to return to the economic level before Covid. Thus, they may not have enough money and efforts to hire and train young employees.
### Visualizing Unemployment Rates among Different Ages by Heatmap
```{r}
ggplot(rate_monthly, aes(x = Month, y = Age)) +
geom_tile(aes(fill = Unemployment_rate)) +
scale_fill_gradient2(low = "black", mid = "white",
high = "red", midpoint = 5) +
coord_fixed() + xlab("Month") + ylab("Age") +ggtitle("Unemployment Rate From Sep 2021 to Sep 2022") + theme(plot.title = element_text(hjust = 0.5)) +
scale_fill_viridis_c() +
theme(axis.text.x = element_text(angle = 60, size = 8, vjust=0.5))
```
The heatmap plots the monthly unemployment rate separated by age. Each box within the heatmap corresponds to an age interval and a month between September 2021 and September 2022, which gives a value for the unemployment rate. We can get two primary observations from this heatmap.
First, it can be seen vertically that the lower part is more yellowish than the higher part. The more yellowish the color is in the heatmap, the higher the corresponding unemployment rate is. Also, the lower part corresponds to a younger age interval. Based on this information, we can conclude that within each month from September 2021 to September 2022, there exists a higher unemployment rate among younger workers than among more experienced workers.
Second, it can be seen horizontally that the colors on the left-hand side are more yellowish on the bottom and more whitish on the top, while the colors on the right-hand side are less yellowish on the bottom and darker on the top. It shows a trend that the unemployment rates among all age intervals have decreased constantly since September 2021. It implies the recovery of the overall labor market after Covid-19.
### Visualizing Unemployed Population, Age, and Sex by Mosaic Plot
```{r}
vcd::mosaic(Unemployment_Population ~ Age + Sex, counts3,
direction = c("v", "v", "h"))
```
The mosaic plot shows the unemployment condition by depicting the proportion of low, medium, and high unemployment population groups based on both genders and two age intervals (less than 35 years old / more than 35 years old). The data is a monthly unemployment population data from September 2021 to September 2022. The plot shows several trends.
First, as the proportion of high unemployment population groups is higher among people over 35 years old, workers less than 35 years old might face a less difficult unemployment situation than those over 35 years old. There can be multiple explanations for this phenomenon. For example, the cardinality of working people over 35 years old could be larger than those under 35 years old.
Second, for both age intervals, the proportion of highly unemployed groups is more significant in males than females. It might be explained by various reasons, such as the higher attention paid to gender equality in the workplace.
## Relationships among Unemployed Population, GDP, and CPI
As we all know, unemployment has a strong connection with the development of an economy. GDP and CPI are two essential indicators in economics. GDP serves as a vital indicator to capture a country's economic output, while CPI reflects price changes and the customers' purchasing ability. In this section, we will attempt to find what factors will influence the unemployed population. We will explore the relationships between the unemployed population, GDP, and CPI.
```{r}
# read the CPI data and process
cpi_data <- read_excel("data/CPI.xlsx")
cpi_data <- cpi_data %>% pivot_longer(cols = !Year, names_to = "Month",values_to = "CPI_Value")
cpi_data <- cpi_data %>% filter(!(`Year` == 2022 & `Month` == "Nov")) %>% filter(!(Year == 2022 & Month == "Dec"))
# read the unemployed data and process
unemploy_data <- read_excel("data/unemploy.xlsx")
unemploy_data <- unemploy_data %>% pivot_longer(cols = !Year, names_to = "Month",values_to = "Unemployment Population")
unemploy_data <- unemploy_data %>% filter(!(`Year` == 2022 & `Month` == "Nov")) %>% filter(!(Year == 2022 & Month == "Dec"))
# read the GDP data
gdp_data <- read.csv(file = 'data/GDP.csv')
df_corr <- data.frame(Year = as.vector(cpi_data $Year),
CPI = as.vector(cpi_data $CPI_Value),
`Unemployment_Population` = as.vector(unemploy_data$`Unemployment Population`),
GDP = as.vector(gdp_data$GDP))
# mark different periods
df_corr <- df_corr %>% mutate(Period = case_when(Year <= 2019 ~ 'Pre-Covid',
TRUE ~ 'After-Covid'))
```
```{r}
# plot the parallel coordinate plot
ggparcoord(df_corr, columns=2:4, groupColumn = 5, alpha=0.5, scale='uniminmax',
title = "Parallel Coordinate Plot for Unemployed Population, CPI & GDP from 2012 to 2022") +theme(plot.title = element_text(size=12, vjust=TRUE, hjust=0.5)) +
scale_color_viridis_d()
```
This parallel coordinate plot exhibits the relationship between the unemployed population, GDP, and CPI from January 2012 to October 2022. Each line represents a piece of monthly data on the unemployed population, GDP, and CPI. The data before 2020 is considered as the Pre-Covid period and labeled with yellow, and the data after 2020 (inclusive) is considered as the After-Covid period and labeled with purple. All the unemployed population, GDP, and CPI data are standardized. Both before and after Covid-19, CPI negatively corresponded to the unemployed population. In other words, a higher CPI corresponds to a lower unemployed population and vice versa. However, during the past ten years, the several high CPI and unemployed population data were all from the After-Covid period. We can infer that the Covid-19 pandemic resulted in severe inflation and job loss, significantly impacting people's lives. Moreover, the unemployed population also negatively corresponded to GDP in both Pre-Covid and After-Covid periods. That is, a higher unemployed population corresponds to a lower GDP. In the past ten years, most GDP values after Covid-19 were still higher than before Covid-19, indicating that generally speaking, GDP in the U.S. had grown gradually in the past ten years. However, due to the enormous impact of the Covid-19 pandemic, several pieces of GDP data were lower than some Pre-Covid GDP values and connected with high unemployed population data.