Skip to content

enigma-io/smoke-signals-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

smoke-signals-model

This repository contains code and documentation for generating scores that help indicate whether or not the residents of a census block group have a high risk for its residents not having smoke alarms. You can read an overview of the analysis here. This analysis is made possible by mapping common variables in the American Housing Survery and the American Community Survey. You can see details on how these mappings are done in this repository.

Getting Started.

Installation

First clone the repository and navigate to the project's root directory:

git clone https://github.com/enigma-io/smoke-alarm-risk.git
cd smoke-alarm-risk

This project is written in R and depends on the following packages:

  • bit64
  • plyr
  • ggplot2
  • data.table
  • knitr
  • reshape2
  • scales
  • bigrf
  • pROC

You can install these packages by running the following command in the project's root directory:

$ make init

Get the data

This project also requires six csv files (two of which - the ACS and the AHS, are generated by this project). You can grab these files from the web by running the following command:

$ make fetch_data

WARNING: This may take a while. The ACS file is ~ 2 GB.

Once this is finished, you should see five files in data/:

  • acs-bg-at-risk-population.csv - percent of population under the age of 5 and over the age of 65 per block group.
  • acs-bg-population.csv - total population per block group.
  • acs-bg-pop-density.csv - population density per block group.
  • msa80-bg.csv - A lookup of 1980 MSA IDs to 2010 Block Group IDs.
  • acs.csv - an export of the ACS with variables mapped to the AHS. (see this repo)
  • ahs.csv - an export of the AHS with variables mapped to the ACS. (see this repo)

Once you've run got these files, you should be all set to generate risk scores.

Generate the risk scores.

First, open up index.md and change this line to your working directory:

WD <- '/path/to/this/directory'

Execute the model using this command:

$ make model

Under the hood, this command executes index.Rmd, which is a RMarkdown file. It contains notes on each step of our process and generates plots which visualize our results. You can see the finalized output of the modeling process by typing this command:

$ make view

If you open a web browser and navigate to http://localhost:8000/ you should see the report on the modeling process.

Get the output.

When the modeling script has finished executing, the risk scores per block group will be output to data/smoke-alarm-risk-scores.csv. These also include total population and at-risk population (< 5 years old, > 65 years old) per block group.

Known Issues

bigrf seems to have a memory leak when executed within RStudio. This can be avoided by simply using the make model command. SEE: aloysius-lim/bigrf#16.

About

The Machine Learning Algorithms that power Smoke Signals.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •