Note: income data is inflation-adjusted
NOTE: Labels are OUTDATED, since they are not consistent across years of ACS data. Please see data.py to see which labels are actually being used.
- S0802 - transportation
- DP05 - demographic
- DP03 - economic
- DP04 - housing
- DP02 - social
Create a new column in ACS and PLUTO data called ITZ_GEOID, combining the borough and the tract number. (ex. "BX1" for Bronx census tract 1.)
If a PLUTO year has 2000 tracts instead of 2010, map to 2010 using
2000-to-2010-census-blocks-tracts.txt
Create a dictionary of census tracts to the lots they contain.
Filter by whether the lot is a residential lot using LandUse. Create lot_df which tells us whether it's been upzoned, and the number of years since.
- MaxAllwFAR/ResidFAR
- LotArea
- ResUnits
- LtdHeight
- LandUse
Create one dataframe for each of the start/end years.
Find tract area with GEOID10 in ny_2010_census_tracts.json. Divide these by tract area (ALAND10).
- pop_density: DP05_0001E
- resid_unit_density: DP04_0001E
- % multi_family_units: 100 - (DP04_0007PE+DP04_0008PE) (subtracting one unit attached/detached houses)
- percent_car_commuters: (S0802_C02_001E + S0802_C03_001E) / S0802_C01_001E
- percent_public_transport_commuters: S0802_C04_001E / S0802_C01_001E
- DP04_0088E: median_home_value
- Not sure exactly what this is measuring. Description: 'Estimate!!VALUE!!Median (dollars)' - from ACS housing data
- This actually measures the median home value in the census tract. - Jam
- S0802_C04_090E: percent_public_transport_trips_under_45_min
- S0802_C03_090E: percent_car_trips_under_45_min
- DP05_0032PE: percent_non_hispanic_or_latino_white_alone
- DP05_0033PE: percent_non_hispanic_black_alone
- DP05_0066PE: percent_hispanic_any_race
- DP05_0039PE: percent_non_hispanic_asian_alone
- DP03_0088E: per_capita_income
- DP05_0017E: median_age
- DP02_0013PE: percent_households_with_people_under_18
- DP04_0002PE: percent_occupied_housing_units
- DP04_0132E: median_gross_rent
- DP02_0080PE: percent_of_households_in_same_house_year_ago
- DP02_0067PE: percent_bachelor_degree_or_higher
- percent_resid: % of lots in tract for which ResUnits > 0
- percent_ltd_height: % of lots in tract for which LtdHeight is not just spaces
Create a new dataframe (assuming previous work was done with each year having a separate dataframe) Convert each of these data columns into "d_"-prefixed columns as change measures.
Add a column for the 2011 values for these (prefix with orig_).
per_capita_income, pop_density, percent_non_hispanic_or_latino_white_alone, percent_non_Hispanic_black_alone, percent_hispanic_any_race, percent_non_hispanic_asian_alone, percent_occupied_housing_units, median_age, percent_households_with_people_undr_18, percent_of_households_in_same_house_year_ago, percent_bachelor_degree_or_higher
For each tract: calculate the percentage of lots in the tract for which MaxAllwFAR/ResidFAR * LotArea has increased by more than 10% between the start and end years.
Add as a column called percent_upzoned
Captures the homogeneity/heterogeneity of the census tract - 0 represents a tract where every lot has the same land use, while 1 represents a tract where land uses are evenly split between lots.
Calculated as -1 * sum of p_j * ln(p_j) over all j, where j is the number of land uses and p_j is the proportion of lots with land use j.
For each lot in tract: Check if in subsidized housing records in subsidized_properties.csv using BBL number. (i.e. if lot is subsidized and before 3/1/2011) Create variable orig_percent_subsidized_properties.
Average for each of the lots in the tract, set to 0 if no upzoning occurred.