Skip to content
Suzanne Childress edited this page Jul 23, 2021 · 13 revisions

OFM April 1 data

OFM processing scripts require the following python libraries: pandas, panda.api.types, numpy, urllib, requests, pyodbc, sqlalchemy, os, functools.

Run the following scripts in this order:

  1. download_ofm_apr.py: This script will download April 1 excel files for intercensal & postcensal population and postcensal housing units into housing-metrics/data.
  2. elmer_staging.py: This script will executed respective tidying scripts and push to Elmer's stg schema. The tables will be named: ofm_apr_intercensal, ofm_apr_postcensal, and ofm_apr_postcensal_housing.

CHAS data import

The python script is here for importing CHAS data into Elmer. The script reads CHAS data from the HUD website, downloads it, unzips it, and puts in the Elmer staging database. The data website is here: ttps://www.huduser.gov/portal/datasets/cp.html. The background info is here: http://aws-linux/mediawiki/index.php/Comprehensive_Housing_Affordability_Strategy_(CHAS)

The script requires numpy, pandas, urllib, pyodbc, pathlib, reqeusts, zipfile, and sqlalchemy.

The script is called by running CHAS_ETL.py. If you want to use a new dataset, you will need to change the data_file_name around line 61: data_file_name = '2012thru2016-140-csv.zip', and data dictionary name around line 73: data_dict_name = 'CHAS data dictionary 12-16.xlsx'

You may also want to specify a new name for the table in staging at line 129:df_to_staging(table_9_data_long, 'chas_tbl_9_2016')

After the data has been put in the staging database, code in this directory https://github.com/psrc/housing-metrics/tree/main/process/CHAS is used to pu the data into facts and dimensions and add geographic information.

Clone this wiki locally