Accident_Traffic_Duration

Overview

This repo's main feature is a model trained on the US Accidents Dataset that predicts the length of time an accident will impact traffic given features that can be known immediately (weather conditions, day/time, nearby road features). Individual predictions can be made and evaluated via the Flask app (app.py), and further discussion is included on the web app.

To read more about this project, check out my Medium article here!

Usage

Libraries

numpy
pandas
xgboost
joblib
json
plotly
flask
matplotlib
seaborn
sklearn

For any usage, it is recommended to install the required packages - requirements files are included for pip and conda package managers. If using pip, create a new environment through your environment manager if needed to avoid updating system package versions!

cd into top-level directory for this repo
pip install -r pipreqs.txt for pip, conda create --name <env> --file requirements.txt for conda
- Note: if on a non-windows system, comment out pywin32 & pywinty lines, otherwise this will error

For the web app:

Clone this repo
cd into the top-level directory
flask run
Visit the address shown in the CLI (127.0.0.1:5000)
Fill out the fields in the webpage to generate a prediction given different input features

For data_processing.ipynb:

Download the US Accidents Dataset and extract to the top-level directory of this repo
jupyter lab via CLI, or preferred interface for python notebooks
Work through the cells top-down

For model_exploration.ipynb:

Generate the cleaned_data.csv from data_processing.ipynb
Work through the cells! The GridSearchCV cell will take a long time, so I would recommend skipping unless modifying to develop a stronger model.

Files

📁templates
- index.html # main page of web app. Contains write-up about the data analysis & model.
📁static
- contains static images generated by the notebooks for use on the webpage & medium article.
app.py # Python script to run webpage using Flask
data_processing.ipynb # Jupyter notebook used to process initial csv into cleaned_data.csv, and generate plots found in static folder
model_exploration.ipynb # Jupyter notebook used to train models from cleaned_data.csv
grid_search_results.csv # CSV file that contains results of the grid_search training, with relevant parameters
states_ordered.csv # Contains U.S. states ordered by mean accident duration, calculated from the dataset. Used in web app for form submission.
classifier.pkl # XGBoost softprob classifier object generated by model_exploration.ipynb and used in run.py
requirements.txt # requirements file to use for conda environments
pipreqs.txt # requirements file to use for pip environments
README.md # This file!

Results

Using xgboost's softprobability classifier with gridsearchCV for hyperparameter tuning, we are able to achieve an accuracy score of 0.63 (0.64 on the test set.) The most performant parameters were learning_rate: 0.1, max_depth: 18, n_estimators: 200.

Acknowledgements

U.S. Accidents Dataset
- Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. “A Countrywide Traffic Accident Dataset.”, 2019.
- Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, Radu Teodorescu, and Rajiv Ramnath. "Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights." In proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM, 2019.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accident_Traffic_Duration

Overview

Usage

Libraries

Files

Results

Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
static		static
templates		templates
.gitattributes		.gitattributes
README.md		README.md
app.py		app.py
classifier.pkl		classifier.pkl
data_processing.ipynb		data_processing.ipynb
grid_search_results.csv		grid_search_results.csv
model_exploration.ipynb		model_exploration.ipynb
pipreqs.txt		pipreqs.txt
requirements.txt		requirements.txt
states_ordered.csv		states_ordered.csv

Tapeless/Accident_Traffic_Duration

Folders and files

Latest commit

History

Repository files navigation

Accident_Traffic_Duration

Overview

Usage

Libraries

Files

Results

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages