- Report Link: https://www.overleaf.com/read/mqkbfcvsznrm
-
Language: _i.e Python 3.8.3
-
Packages / Libraries:
- pandas
- sklearn
- seaborn
- folium,
- numpy,
- math,
- geopandas
- bokeh
- matplotlib
- NYC TLC: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
- External dataset 1: https://data.cityofnewyork.us/Public-Safety/Police-Precincts/78dh-3ptz
- External dataset 2: https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95
- External dataset 3:https://data.cityofnewyork.us/Public-Safety/NYPD-Shooting-Incident-Data-Historic-/833y-fsy8
- External dataset 4:https://data.cityofnewyork.us/Public-Safety/Citywide-Crime-Statistics/c5dk-m6ea
- External dataset 5:https://data.cccnewyork.org/data/map/66/median-incomes#66/39/6/107/62/a/a
- External dataset 6:https://www1.nyc.gov/site/planning/data-maps/open-data.page
- External dataset 7:https://www.census.gov/quickfacts/fact/table/newyorkcountymanhattanboroughnewyork,bronxcountybronxboroughnewyork,queenscountyqueensboroughnewyork,kingscountybrooklynboroughnewyork,richmondcountystatenislandboroughnewyork,newyorkcitynewyork/HSG010219
raw_data
: orginal datasets for yellow taxi trip data only.data
: Contain all preprocessed files and small external datasets supporting the analysis.plots
: All plots and Map.html are saved here, both for data exploration and reporting writting.code
:- Notebook 0 for "Download_data.ipynb".
- Notebook 1 for "1. Preprocessing_2020_whole_year.ipynb".
- Notebook 2 for "1. Preprocessing-2019.ipynb".
- Notebook 3 for "1. Preprocessing-2020.ipynb".
- Notebook 4 for "2. Visual and Exploratory analysis part1.ipynb".
- Notebook 5 for "2.Visual and Exploratory analysis part2.ipynb".
- Notebook 6 for "3. Statistical Modelling.ipynb".
- Run notebooks in the listed order above and ensure you have run the notebook 0 to download the raw data beforehand.
- Make sure to change the filepath to your local machine if you intend to run the codes above under a different environment.
- Changing filepath is just to chang first disk name,and the rest are the same.
- To sucessfully run the notebooks,you must change all filepaths in notebooks
- Some plots are not saved by notebook auto-generation and they are saved through manual screenshot, but they are all there, if you are in doubt, check them out!