-
Notifications
You must be signed in to change notification settings - Fork 1
Home
OFM processing scripts require the following python libraries: pandas
, panda.api.types
, numpy
, urllib
, requests
, pyodbc
, sqlalchemy
, os
, functools
.
Run the following scripts in this order:
-
download_ofm_apr.py
: This script will download April 1 excel files for intercensal & postcensal population and postcensal housing units intohousing-metrics/data
. -
elmer_staging.py
: This script will executed respective tidying scripts and push to Elmer'sstg
schema. The tables will be named:ofm_apr_intercensal
,ofm_apr_postcensal
, andofm_apr_postcensal_housing
.
The python script is here for importing CHAS data into Elmer. The script reads CHAS data from the HUD website, downloads it, unzips it, and puts in the Elmer staging database. The data website is here: ttps://www.huduser.gov/portal/datasets/cp.html. The background info is here: http://aws-linux/mediawiki/index.php/Comprehensive_Housing_Affordability_Strategy_(CHAS)
The script requires numpy, pandas, urllib, pyodbc, pathlib, reqeusts, zipfile, and sqlalchemy.
The script is called by running CHAS_ETL.py. If you want to use a new dataset, you will need to change the data_file_name around line 61: data_file_name = '2012thru2016-140-csv.zip', and data dictionary name around line 73: data_dict_name = 'CHAS data dictionary 12-16.xlsx'
You may also want to specify a new name for the table in staging at line 129:df_to_staging(table_9_data_long, 'chas_tbl_9_2016')
After the data has been put in the staging database, code in this directory https://github.com/psrc/housing-metrics/tree/main/process/CHAS is used to pu the data into facts and dimensions and add geographic information.