nyc-taxi-trip-time-prediction

Build a model that predicts the total ride duration of taxi trips in New York City.

Exploring Solutions:

Having a deeper understanding of what problem we are trying to solve, what the users’ needs, and frustrations are, and what the goals are for achieving the best possible solution for both for the business as well as the user, I began by listing out the possible solutions that were arrived from the research.

Steps involved:

The full code for this article can be found here. It is implemented in Python and different machine learning algorithms are used. Below is a brief description of the general approach that I employed:

Data Loading and general checkups: We have loaded the data from the given csv files using a function from pandas library. Then we checked the general information about data

Exploratory Data Analysis:

We removed id variable as it doesn’t give much interpretation. We then calculated the distance based on haversine formula from pickup and drop-off latitude and longitude. Then we plotted the box plot for the variable and observed there are many outlier so we segregate this variable and see that most of the trip are within 10km, some trip are within 50km while a very few trip crosses 50km. so we eliminate trip with 0 and above 50km distance. We then checked for categorical variable store_and_fwd_flag and passenger_count. We observed the store and fwd. flag contain majority of one category. So we drop this feature. Passenger count variable has entries from 0 to 9. Since there is no trips with 0 passenger either this a miss entry or the driver forgot to enter passenger count of that trip. Also in a taxi maximum six person are allowed to sit including minor. So we eliminate 0 and 7-9 records from our dataset. Linear Regression: Linear Regression is a regression of dependent variable on independent variable. It is a linear model that assumes a linear relationship between dependent (y) and independent variables (x).

XGBoost: XGBoost comes under boosting and is known as extra gradient boosting. GBM first calculates the model using X and Y then after the prediction is obtain. It will again calculates the model based on residual of previous model, here loss function will give more weightage to error of previous model.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
2 Technical document for NYC Taxi Prediction.docx		2 Technical document for NYC Taxi Prediction.docx
README.md		README.md
Team_notebook_Capstone_Project_2.ipynb		Team_notebook_Capstone_Project_2.ipynb
individual_notebook_Capstone_Project.ipynb		individual_notebook_Capstone_Project.ipynb
presentation taxi trip prediction - 2.pdf		presentation taxi trip prediction - 2.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nyc-taxi-trip-time-prediction

Exploring Solutions:

Steps involved:

Exploratory Data Analysis:

About

Releases

Packages

Languages

rutujahingankar/nyc-taxi-trip-time-prediction

Folders and files

Latest commit

History

Repository files navigation

nyc-taxi-trip-time-prediction

Exploring Solutions:

Steps involved:

Exploratory Data Analysis:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages