In this Capstone overview, we describe how traffic volume can be modeled utilizing Principal Component Analysis (PCA), additionally how easily and effectively it can be visualized to assist in the detection of traffic patterns and volume for further analysis. In addition, our work experiments with identifying different traffic behaviors exhibited throughout the week utilizing the KMeans++ clustering algorithm. A cursory examination into determining the factors that contribute to traffic volume is explored using Elastic Net Regression. The purpose of this work is to inform business decision makers and the general public of traffic patterns that exist in California, with the hope being that additional insight can guide solutions to address high traffic volume.
Group Members
- John Gill
- Kevin Dyer
- Conway Wong
Advisor:
- Yoav Freund
Location | Description |
---|---|
final/doc/FinalReport.pdf | Final Report |
final/doc/FinalPresentation.pdf | Final Presentation |
Location | Description |
---|---|
s3://dse-team2-2014/dse_traffic | Directory containing original downloaded traffic files |
s3://dse-team2-2014/pivot_output_#{year} | Directory containing Pivot Output Files from parsing downloaded traffic files |
s3://dse-team2-2014/regression | Directory containing files used for Elastic Net Regression |
Location | Description |
---|---|
ml/scala/traffic | Maven Project for Machine Learning, Pivoting Scala Code |
traffic/vis/stations | Traffic GIS Map Visualization - HTML/Javascript Source Code |
final/eigenvector_analysis.py | Eigenvector Analysis Visualiation - Bokeh Python Source Code |
traffic/src/scraper.py | Web Scraper Python Source Code |
traffic/jsg_test | Directory Containing CloverETL graphs, transformers for Postgres DB ETL |
traffic/traffic_etl_spark.py | PySpark Job to extract traffic data from S3 into Postrgres DB |