-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
597 lines (498 loc) · 29 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
---
title: "Jersey City Demolitions 2018-2022: Part II"
output: github_document
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
```
Sarah Ligon ([\@ligonish](https://github.com/ligonish))\
November 6, 2022
In the middle of 2022 --- the same year Jersey City reported [demolishing more housing units than any municipality in New Jersey](https://www.nj.gov/dca/divisions/codes/reporter/2021yearly/DEMO_21.pdf) --- the *New York Times* declared it "the [most expensive city in which to rent a home in the United States](https://www.nytimes.com/2022/07/28/realestate/which-city-is-most-expensive-for-renters-you-might-be-surprised.html)".
This second installment in a three-part series, originally written for Lucy Block & Oksana Miranova's "Spatial Analysis & Data Visualization" grad course at NYU, investigates patterns in the housing demolitions Jersey City administrators have approved in the five years since New Jersey began digitizing municipal construction permit records.
Full write-up of this analysis appears in ["Whose House is Jersey City Tearing Down? Part 2 of 3".](https://medium.com/@srligon/whose-house-is-jersey-city-tearing-down-part-ii-3212b1514d6f)
See also [Part 1](https://medium.com/@srligon/whose-house-is-jersey-city-tearing-down-c1092cdbbc43) & [Part 3](https://medium.com/@srligon/whose-house-is-jersey-city-tearing-down-903cdeaa6c6a).
### Data Sources
- [NJOIT Open Data Center, NJ Construction Permit Data](https://data.nj.gov/Reference-Data/NJ-Construction-Permit-Data/w9se-dmra)
- [Rutgers University + NJ Dep't of Community Affairs, N.J. MOD-IV Historical Database](http://modiv.rutgers.edu) (See "documentation" folder for detailed variable descriptions)
- [2018 International Building Code, New Jersey Edition](https://codes.iccsafe.org/content/NJBC2018P2/chapter-3-occupancy-classification-and-use#NJBC2018P2_Ch03_Sec310)
- [Zillow Home Value Index (ZHVI) Condo/Co-op Timeseries](https://www.zillow.com/research/data/), by ZIP
- [OpenStreetMap](https://www.openstreetmap.org) geolocation via [tidygeocoder](https://cran.r-project.org/web/packages/tidygeocoder/index.html)
### Dependent Libraries
```{r, message = FALSE}
library(tidyverse) # data cleaning & manipulation
library(janitor) # data cleaning & easy table percentages
library(lubridate) # date variable normalization
library(viridis) # plot color schemes
library(hrbrthemes) # more plot aesthetics
library(gghighlight) # coloring specific features of line graphs
library(gganimate) # Fun With Headaches
```
### Loading Jersey City Demolitions Data
NJ.gov's Open Data portal updates its construction dataset biweekly. The latest numbers are accessible via API, but require a unique user token to access. Here's the code I'm using to grab updates directly ---
```{r, eval=FALSE, output=FALSE}
library(RSocrata)
rm(url <- "https://data.nj.gov/resource/w9se-dmra.json?TreasuryCode=0906&Permit Type=13"
raw_jc_demos <- read.socrata(url = url, app_token = token)
```
--- but for replicability, I've included the same data (current as of November 06, 2022) as a more directly accessible .csv:
```{r, message = FALSE}
jc_raw <- read_csv("data/jc_raw_2022_11_06.csv") %>%
clean_names
```
I cleaned this up by selecting the variables we'll use in our analysis and recoding the residential use descriptors to align with NJ's 2018 International Building Code (rather than leaving them labelled merely "International residential use code"). See this useful breakdown of finer technical distinctions between R-3 and R-5 in NJ; it's useful to note that buildings with 1-2 residential units that are more than 3 stories typically aren't enormous brownstones, but rather smaller apartments located over a street-level shop.
I also cleaned up the block_lot record identifier to allow easier geocoding down the line; in the raw data, lots were often stored with decimal-value suffixes, which we separate out here to more closely pinpoint merges with other parcel-based datasets.
To enable demolition record counts by year, I added a variable whose value is the most recent year each record received its latest green-light from the city (whether that came in the form of a permit or a certificate of approval).
```{r}
jc_demos <- jc_raw %>%
select(recordid,
block,
lot,
permitstatusdesc,
permitdate,
certdate,
permittypedesc,
certtypedesc,
certcount,
buildfee,
plumbfee,
electfee,
firefee,
dcafee,
certfee,
elevfee,
otherfee,
totalfee,
constcost,
salegained,
rentgained,
usegroup,
usegroupdesc,
processdate) %>%
mutate(
lot_ext = str_extract(lot, "(?<=\\.)[:digit:]+"),
lot = case_when(
is.na(lot_ext) ~ lot,
lot_ext > 0 ~ str_extract(lot, "[:digit:]+(?=\\.?)")),
block_lot = paste0(block, "_", lot),
ibcnj_use_desc = case_when(
usegroup == "R-1" ~ "Hotels, motels, boarding houses etc",
usegroup == "R-2" ~ "3 or more units",
usegroup == "R-3" ~ "1-2 units, >3 stories",
usegroup == "R-5" ~ "1-2 units, <= 3 stories"
),
permitdate = date(permitdate),
certdate = date(certdate),
last_pc_year = case_when(
certdate > permitdate ~ year(certdate), # signoff yr
permitdate >= certdate | is.na(certdate) ~ year(permitdate)
)) %>%
relocate(lot_ext, block_lot, .after = lot)
```
That's about as far as we got in Part I -- we're just seeing it above in R format rather than relying on Google Sheets.
### Bring In Tax Data
Now that we're using R, however, we can add a crucial next step: the street address of each demolished parcel, and the actual number of dwelling units it was assessed as containing in tax year just before our 2017 - present set begins. I chose 2016 as a close-but-not-overlapping index property tax year. These records are not available through the Hudson County Tax Assessor website, but can be downloaded with a free login from the State of New Jersey and Rutgers University's property tax record archive. I'll import a pre-downloaded .csv of all Jersey City parcel records from the 2016 tax year.
```{r, message = FALSE}
mod_iv_2016 <- read_csv("data/mod_iv_data_2016_jc.csv") %>% #61,595 obs
clean_names()
```
Cleaning the tax data similarly to the process followed for demolition data, since we'll be cross-referencing the two sets:
```{r}
mod_iv_2016 <- mod_iv_2016 %>%
select(
property_id_blk,
property_id_lot,
property_id_qualifier,
qualification_code_name,
property_class,
property_class_code_name,
property_location,
building_description,
land_description,
zoning,
owner_name,
number_of_owners,
deed_date_mmddyy,
sale_price,
number_of_dwellings
) %>%
mutate(property_id_lot = as.character(property_id_lot),
lot_ext = str_extract(property_id_lot, "(?<=\\.)[:digit:]+"),
property_id_lot = case_when(
is.na(lot_ext) ~ property_id_lot,
lot_ext > 0 ~ str_extract(property_id_lot, "[:digit:]+(?=\\.?)")),
block_lot = paste0(property_id_blk, "_", property_id_lot),
number_of_dwellings = as.integer(number_of_dwellings)
) %>%
relocate(lot_ext, block_lot, .after = property_id_lot)
```
I ran a quick check for duplicates or odd records before merging address, building description, and dwelling unit counts into each demolition record:
```{r}
merge_test <- anti_join(jc_demos, mod_iv_2016, by="block_lot")
# none of these are residential properties
# 13002_4 was entered incorrectly; tax data shows it was a vacant lot since at least the 90s.
```
Merge tax data into demo data to capture street address, building descriptions, basic owner entity information, and residential unit counts for each parcel:
```{r}
tax_demo_merge <- jc_demos %>%
left_join(mod_iv_2016, by = "block_lot") %>%
arrange(recordid) %>%
filter(recordid!=lag(recordid) | is.na(lag(recordid)))
```
Check for duplicates
```{r}
merge_dupes <- get_dupes(tax_demo_merge, block_lot) # all good!
```
Isolate residential demos, first by pulling all records that Jersey City demolition reports labelled as residential usegroups:
```{r}
jc_res_demos <- tax_demo_merge %>%
filter(str_starts(usegroup, 'R'))
```
But since these are messy sets, I went a step farther and checked for possibly mis-recorded residential demos (where a given parcel's use is labelled non-residential on demolition reports, but tax records for the same parcel label it as containing some non-zero count of dwelling units.)
```{r}
poss_res_demos <- tax_demo_merge %>%
filter(! str_starts(usegroup, 'R')) %>%
filter(! is.na(number_of_dwellings)) %>%
filter(property_class_code_name == "Residential")
# 19 aren't listed as res demo, but tax set says they contain housing units
```
I added these into our main working residential demolition set.
Time to make some unit estimates. Jersey City demolition applications only record the number of for-sale units projected lost and the number of rental units projected lost; in most cases, the same value is also double-entered in both columns, making it impossible to sum either variable (in other words, a two-bedroom house whose tax records list it as a two-bedroom house is usually entered by Jersey City as a demolition resulting in -2 sales units *and* -2 rental units, which would wrongly give a total of 4 units lost, even though only 2 units total were demolished.)
To counter this, I wrote a series of if/else statements telling R to make two demolished-unit estimates: the first, a conservative count of demo-reported units (using the demo records' reported count if sales and rental units report different values; if values were identical, I used only one of them. Buildings reporting zeroes for all residential units were left with zeroes.)
For a separate, slightly more accurate unit estimate, in cases where demolition records claimed a parcel contained zero dwelling units of any kind, but tax records listed the parcel with a non-zero number of dwelling units, I imputed the number of tax-assessed dwelling units in place of the zero value.
```{r}
jc_res_demos <- jc_res_demos %>%
rows_append(poss_res_demos) %>%
mutate(described_units = as.integer(str_extract(building_description, pattern = "[:digit:]+(?=U)"))) %>%
mutate(salegained = abs(salegained),
rentgained = abs(rentgained)) %>%
mutate(declared_loss= case_when(
salegained != rentgained ~ salegained + rentgained,
salegained == rentgained & salegained != 0 ~ salegained,
salegained + rentgained == 0 ~ 0)) %>%
mutate(unit_estimate = ifelse(declared_loss == 0, described_units, declared_loss))
```
Here's a summary of unit count estimates based on from demo records, tax assessor building descriptions, and the conservative estimate using the former but imputing the latter where zero units were claimed demolished:
```{r}
unit_counts <- jc_res_demos %>%
group_by(last_pc_year) %>%
filter(last_pc_year > 2017) %>%
summarise(sale = sum(salegained, na.rm = T),
rent = sum(rentgained, na.rm = T),
declared = sum(declared_loss, na.rm = T),
described = sum(described_units, na.rm = T),
estimate = sum(unit_estimate, na.rm = T)) %>%
adorn_totals()
```
### Where Are All These Parcels?
Now we have street addresses, but no information that would let us make some broad-strokes initial forays into how residential demolitions were clustered by neighborhood. The NJ tax dataset we merged provides parcels with a street and building number, but does not include ZIP codes. To get those, along with latitude and longitude pinpoints for mapping in Part 3, I reverse-geolocated each parcel's street address to test out the relatively new tidygeocoder() package. This was a s l o o o w process, so I'll upload the location data as a .csv below; code included here for replication purposes.
```{r, eval=FALSE, output=FALSE}
library(tidygeocoder)
city <- "Jersey City"
state <- "NJ"
testset <- jc_res_demos %>%
select(block_lot, property_location) %>%
mutate(city = city,
state = state)
clean_test <- testset %>%
mutate(norm_address = normal_address(property_location),
norm_city = normal_city(city),
norm_state = normal_state(state),
.keep = "unused")%>%
geocode(street = norm_address,
city = norm_city,
state = norm_state,
method = "osm")
rev_test <- clean_test %>%
reverse_geocode(lat = lat,
long = long,
address = norm_address,
method = "osm",
full_results = TRUE)
location_data <- rev_test %>% select(block_lot, postcode, lat, long)
write_csv(location_data, "jc_res_demos_coords_zips_11_07_2022.csv")
```
Instead, we'll just upload that .csv directly:
```{r, message = FALSE}
location_data <- read_csv("data/jc_res_demos_coords_zips_11_07_2022.csv")
```
And stick the location data onto our parcel demolition records, removing nine duplicates that resulted from the merge by telling R to keep only distinct record IDs:
```{r}
jc_res_demos <- jc_res_demos %>%
left_join(location_data, by = "block_lot")%>%
relocate(property_location, postcode, lat, long, .after = block_lot) %>%
arrange(recordid) %>%
filter(recordid!=lag(recordid) | is.na(lag(recordid))) %>%
mutate(postcode = replace(postcode, postcode == "08016", "07304"))
```
### Visualizations
Now that we have tax-assessed dwelling unit counts, we can highlight some significant possible discrepancies (possible, significant discrepancies?) between the housing-unit counts listed in each demolition record and those recorded on the same parcel's property tax records.
#### Plot 1
```{r, fig.width = 9}
unit_counts %>%
select(last_pc_year, declared, estimate) %>%
pivot_longer(2:3, names_to = "count_type", values_to = "units") %>%
mutate(count_type = factor(count_type),
last_pc_year = factor(last_pc_year)) %>%
ggplot(aes(fill = count_type,
y = units,
x = last_pc_year)) +
geom_bar(position = "dodge",
stat = "identity",
show.legend = T)+
labs (x = NULL, y = "City-Approved Residential Unit Demolitions",
title = "When Jersey City's 2018-2022 Demolition Records Say a Building \nHad No Living Units, Earlier Property Tax Records May Disagree.",
subtitle = "Jersey City requires demolition applicants to report how many dwelling units the building contains. \nHudson County Tax Assessor unit counts for those same properties are often significantly higher.",
caption = "Source: NJ Construction Permit Data retrieved 11/06/2022 via NJOIT Open Data Center; \n2016 Jersey City property tax records retrieved 11/06/2022 via N.J. MOD-IV Historical Database.",
fill = NULL)+
geom_text(
aes(label = units),
color = "white",
size = 4,
position = position_dodge(.9),
hjust = 1.25,
family = "Roboto Condensed"
)+
scale_fill_manual(values = c("azure4", "#21918c"),
labels=c('Housing Units Reported on Demolition Records', 'Housing Units Reported by Tax Assessor'))+
scale_x_discrete(limits = rev)+
coord_flip()+
theme_ipsum_rc(grid = FALSE)+
theme(axis.text.x=element_blank(),
legend.position = "top",
)
```
#### PLOT 1.b
The other thing we weren't able to do last time was check the proportion of all JC-approved demolitions that were residential -- since we were using Google Sheets as a primary tool, there was no way I could load a potentially huge set of observations into a single tab. Now that we're using R, we can look at *all* Jersey City demolitions and examine how people were using the buildings before they came down.
I started by cleaning up the initial demo set a bit to accommodate visualization (those are some very, very long labels)
```{r}
res_spread <- jc_demos %>%
mutate(usegroup = factor(usegroup)) %>%
mutate(ibcnj_use_desc = factor(case_when(
usegroup == "R-1" ~ "Hotels, motels, boarding houses etc",
usegroup == "R-2" ~ "Housing: 3 or more units",
usegroup == "R-3" ~ "Housing: 1-2 units (over 3 stories)",
usegroup == "R-5" ~ "Housing: 1-2 units (3 stories or less)",
usegroup == "U" ~ "Accessory bldgs & misc. structures",
usegroup == "A-3" ~ "Lecture halls, galleries, churches, etc.",
usegroup == "A-4" ~ "Indoor sporting venues/arenas/pools",
usegroup == "I-4" ~ "Inst. adult/child day care, 6+ occ.",
TRUE ~ usegroupdesc)
)) %>%
group_by(ibcnj_use_desc) %>%
summarise(demos = n_distinct(recordid))
```
and then put them into a lollipop chart that is very cool/horrifying at wide resolution, but needs retooling to be visible in smaller formats. Another day! Meanwhile, yes: not only does JC consistently report the most demolitions of any municipality in New Jersey, but it also appears to be tearing down residential-use parcels almost exclusively.
```{r, fig.width = 9}
res_spread %>%
mutate(ibcnj_use_desc = fct_reorder(ibcnj_use_desc, demos)) %>%
ggplot(aes(x = ibcnj_use_desc, y = demos)) +
geom_segment(aes(xend = ibcnj_use_desc,
yend = 0),
color = "dimgrey") +
geom_point(size = 4, color = c("dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "#21918c", "#21918c", "#21918c", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey")) +
labs (x = NULL,
y = "Total JC-approved parcel demolitions, 2018-2022",
title = "How Had People Been Using the Buildings Jersey City \nApproved for Demolition in 2018-2022?",
subtitle = "Jersey City has the highest building teardown rate in NJ. Most were housing.",
caption = "Source: NJ Construction Permit Data retrieved 11/06/2022 via NJOIT Open Data Center.")+
coord_flip() +
geom_text(aes(label = demos),
size = 4,
nudge_y = 10,
family = "Roboto Condensed",
fontface = c("plain", "plain", "plain", "plain", "plain", "plain", "bold", "bold", "bold", "plain", "plain", "plain", "plain", "plain", "plain"),
color = c("dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "#21918c", "#21918c", "#21918c", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey")
)+
theme_ipsum_rc(grid = F,
axis = "y",
axis_title_just = "l")+
theme(axis.text.x = element_blank(),
axis.text.y = element_text(
color = c("dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "dimgrey", "#21918c", "#21918c", "#21918c")
))+
expand_limits(x = 0, y = 400)
```
#### PLOT 2
Jersey City ZIP codes are often used as shorthand for neighborhoods and communities, in a way similar to NYC's borough or subway-line identities. Here I'm looking at total number of parcels in each ZIP district the City approved each year (since I'm guessing there will be some strong patterns in specific neighborhoods).
First I check how many records weren't properly reverse-geolocated: 35 of 664. That's small enough to add in by hand, or more likely, handle during the geolocation section of the semester.
```{r}
zip_missing <- jc_res_demos %>%
summarise(missing = sum(is.na(postcode))) # 35 of 664 missing
```
You could make this plot using our main dataframe, but I'm starting with a summary table of both parcel and unit demolitions, by ZIP and year. The table lists both annual and cumulative totals over the five-year period; that last variable is just for easier ggplot2 labelling.
```{r}
plot_2 <- jc_res_demos %>%
filter(last_pc_year > 2017) %>%
group_by(last_pc_year, postcode) %>%
summarise(demos = n_distinct(recordid),
units = sum(unit_estimate, na.rm = T)) %>%
ungroup() %>%
group_by(postcode) %>%
arrange(last_pc_year, postcode) %>%
mutate(cumulative_demos = cumsum(demos),
plot_2_labs = case_when(last_pc_year == 2022 ~ paste0(cumulative_demos, " in ", postcode, " since 2018.")))
```
Cumulative unit demolition totals, by ZIP, 2018-2022:
```{r, fig.width = 9}
plot_2 %>%
filter(postcode != "NA",
postcode != "07310",
postcode != "07087") %>%
ggplot(aes(x = last_pc_year,
y = cumulative_demos,
group = postcode,
color = postcode))+
geom_line(size = 1.5,
alpha = .6)+
labs (x = NULL, y = "Total City-Approved Residential Parcel Demolitions",
title = "How Many Housing Teardowns Has Jersey City Approved \nin Each ZIP Code Since 2018?",
subtitle = "Cumulative total residential buildings whose demolitions Jersey City accepted and \nreported to the NJ Dep't of Community Affairs, Jan. 2018 - Sept. 2022.",
caption = "Source: NJ Construction Permit Data retrieved 11/06/2022 via NJOIT Open Data Center; Jersey City \nproperty tax records retrieved 11/06/2022 via N.J. MOD-IV Historical Database; ZIP data via Open Street Map.")+
geom_text(aes(label = plot_2_labs,
y = cumulative_demos,
color = postcode),
hjust=-.07,
family = "Roboto Condensed",
fontface = "bold",
size = 4,
nudge_y = c(-7, 20, 0, 0, -5)) +
scale_color_viridis(discrete = TRUE, end = .8, direction = -1) +
expand_limits(x = c(2018, 2024))+
guides(color = "none")+
theme_ipsum_rc(grid = F, subtitle_size = 12)+
theme(plot.caption = element_text(hjust = 0),
axis.text.x = element_text(color = c("black", "black", "black", "white")))
```
I wanted to see what this looked like using discrete annual totals rather than a rolling cumulative sum, but since we already tried one version of this graph, I tired out animating the yearly-total version.
```{r}
plot_2 %>%
filter(postcode != "NA",
postcode != "07310",
postcode != "07087") %>%
ggplot(aes(x = last_pc_year,
y = demos,
group = postcode,
color = postcode))+
geom_line(size = 1.5,
alpha = .7)+
geom_point(aes(group = postcode)) +
labs (x = NULL, y = "City-Approved Residential Parcel Demolitions",
title = "How Many Yearly Housing Teardowns Has Jersey City \nApproved in Each ZIP Code Since 2018?",
subtitle = "Annual totals, by ZIP code, of residential buildings whose demolitions Jersey City \naccepted and reported to NJ Dep't of Community Affairs, 2018 - 2022.",
caption = "Source: NJ Construction Permit Data retrieved 11/06/2022 via NJOIT Open Data Center; Jersey City \nproperty tax records retrieved 11/06/2022 via N.J. MOD-IV Historical Database; ZIP data via Open Street Map.")+
geom_text(aes(label = postcode,
y = demos,
color = postcode),
hjust=-.1,
family = "Roboto Condensed",
fontface = "bold",
size = 6,
nudge_y = c(2, 2, 0, 0, -2)) +
theme(plot.caption = element_text(hjust = 0))+
expand_limits(x = c(2018, 2022.5))+
scale_color_viridis(discrete = TRUE, end = .85, direction = -1) +
guides(colour = "none") +
theme_ipsum_rc(grid = "XY")+
transition_reveal(last_pc_year) +
ease_aes('cubic-in-out')
anim_save("plots/plot_2_annual_animated.gif")
```
#### Plot 3
Here's a summary-table df setting up total parcels demolished in each ZIP code.
```{r}
plot_3 <- jc_res_demos %>%
filter(last_pc_year > 2017) %>%
mutate(postcode = factor(postcode)) %>%
group_by(postcode) %>%
summarise(demos = n_distinct(recordid),
units = sum(unit_estimate, na.rm = T))
```
Plot 3
```{r, fig.width = 9}
plot_3 %>%
filter(postcode != "NA",
postcode != "07087") %>%
mutate(postcode = fct_reorder(postcode, units)) %>%
ggplot(aes(x=postcode, y=units)) +
geom_segment(aes(xend=postcode, yend=0)) +
geom_point(size = 6, color = "#35b779") +
geom_text(
aes(label = units),
color = "black",
size = 4,
nudge_y = 15,
family = "Roboto Condensed",
fontface = "bold"
)+
labs (x = "ZIP Code",
y = "Estimated residential units approved for demolition by City of Jersey City, 2018-2022",
title = "The Heights Lost the Most Housing Units to City-Approved Demolitions.",
subtitle = "Journal Square and the West Side weren't far behind. Waterfront ZIPs lost the fewest units over the last five years.",
caption = "Source: NJ Construction Permit Data retrieved 11/06/2022 via NJOIT Open Data Center; Jersey City \nproperty tax records retrieved 11/06/2022 via N.J. MOD-IV Historical Database.")+
coord_flip() +
theme_ipsum_rc(axis = "x", grid = F)
```
So we're at 481 lines of code, and it's time to write this up rather than bringing in even more ... lines of code? -- but our work here and on Google Street View histories of these buildings, along with anecdotal trends in the greater NYC area, suggest a big proportion of condo conversions. I'm going to explore filling in stronger connections between demolitions, condo conversions or other lot flips (including City-sponsored 60s-esque "urban renewal", which tragically is still very much in full swing here) later in this project. (Though making that connection would really require some robust statistical modelling beyond our scope in this class.) Meanwhile, after seeing the breakdown of demolished units in specific ZIP codes (especially 07307) above, I'm curious about how the housing market might have reacted in those ZIPs during that same time period.
#### In Which We Don't Need Another Plot, But Do We Want One?
To get a sense of whether there's something there, I downloaded Zillow's "typical condo price" estimates by ZIP code and isolated them to Jersey City's. These are problematic data on many levels, but they're something to go on for now.
```{r}
# Isolate JC-only ZIP codes, including 07087, which overlaps w/ Union City & thus wasn't included under city name in Zillow set
zillow_jc <- read_csv("data/City_zhvi_uc_condo_tier_0.33_0.67_sm_sa_month.csv") %>%
clean_names() %>%
filter(region_name == "Jersey City") %>%
select(region_name, x2018_01_31:x2022_09_30)
zillow_zips <- read_csv("data/Zip_zhvi_uc_condo_tier_0.33_0.67_sm_sa_month.csv") %>%
clean_names() %>%
filter(city == "Jersey City" | region_name == "07087") %>%
select(region_name, x2018_01_31:x2022_09_30) %>%
rows_append(zillow_jc)
```
Reshaped and lubridated. I'm also adding a change-since-baseline delta variable showing the percentage increase or decrease in each ZIP code's typical condo prices over the last 5 years.
```{r}
zillow_zips <- zillow_zips %>%
mutate(jan_2018_baseline = x2018_01_31) %>%
pivot_longer(2:58,
names_to = "date",
values_to = "zillow_typical_condo_value") %>%
rename(zip = region_name) %>%
mutate(date = str_replace(date, "x", ""),
date = ymd(date),
zip = replace(zip, zip == "Jersey City", "Citywide"),
#pct_change_since_jan_2018 = round((((zillow_typical_condo_value - jan_2018_baseline)/jan_2018_baseline)*100),2),
pct_change_since_jan_2018 = (((zillow_typical_condo_value - jan_2018_baseline)/jan_2018_baseline)),
zip_labels = case_when(date == "2022-09-30" & zip != "Jersey City" ~ zip,
TRUE ~ NA_character_)
) %>%
relocate(jan_2018_baseline, .before = pct_change_since_jan_2018)
```
And now we can look at it! Since there are so many lines, I'm using the gghighlight package to keep this from looking like a colorful plate of spaghetti. Interestingly, the non-Waterfront ZIP codes' condo prices are the ones hurtling up the fastest since 2018 (likely because available Waterfront housing stock was bought up and flipped by Australian real estate/superfund megaforce Dixon Advisory c. 2014-15, before demolition records became available in NJ.) Causality could flow either way, here, but the demolition rate and condo price spikes do suggest some sort of correlation.
```{r, fig.width = 9}
zillow_zips %>%
filter(zip != "07087") %>%
ggplot(aes(x = date,
y = pct_change_since_jan_2018,
group = zip,
color = zip)
)+
geom_line(size = 1.5, alpha = .5)+
labs (x = NULL, y = "% change in price of typical condo since Jan. 2018",
title = "How Have Typical Jersey City Condo Prices Changed Since 2018?",
subtitle = "The Heights (07307) experienced a 23% increase in typical condo prices since 2018: more than twice the change in \ncondo prices citywide, and over 20 times that of Downtown.",
caption = "Source: Zillow Home Value Index (ZHVI) for condos and co-ops by ZIP code retrieved from Zillow; https://www.zillow.com/research/data/, November 10, 2022.")+
geom_text(aes(label = zip_labels,
x = date + 0.6,
y = pct_change_since_jan_2018,
color = zip),
hjust=-.1,
family = "Roboto Condensed",
fontface = "bold",
size = 4) +
gghighlight(zip == "07307" | zip == "07306" | zip == "Citywide",
use_direct_label = FALSE) +
scale_color_viridis_d()+
guides(color = "none")+
scale_y_continuous(labels = scales::percent)+
scale_x_date(date_labels = "%Y", limit=c(as.Date("2018-01-01"),as.Date("2022-12-31")))+
theme_ipsum_rc(grid = "XY")+
theme(plot.caption = element_text(hjust = 0))
```