We at GTRI would like to give you the opportunity to show your best potential by completing a data exploration and analysis exercise. This exercise is not meant as a brain teaser, there are no 'gotcha' instances within the data. We simply would like to get a glimpse of how you review, explore and summarize data. Please download the following datasets to complete the exercise questions below. You will need to have a Kaggle account created in order to access the datasets.
You will explore this airline-delay-and-cancellation-data-2009-2018 dataset and trends based on weather
-
https://www.kaggle.com/yuanyuwendymu/airline-delay-and-cancellation-data-2009-2018
-
https://www.kaggle.com/selfishgene/historical-hourly-weather-data
Please fork this repository and push your code to your own personal repository for review. Be sure to add comments within your code to explain your thought process and steps taken. Also, please be sure to not have the datasets in your repo as they will be too large.
Each exercise should have a function dedicated to it with the expected arguments as input:
-
Given a city and airport code provide the average delay time on days where there is any type of rain
-
Given a city and airport code what are that aiport's worst days to travel
-
Provide the top 3 airlines with an average difference in expected arrival and actual arrival and their standard deviation
-
Given a city and airport code create a function that provides a probability of a flight getting cancelled
-
Given an airport code, city and weather description create a function to predict the probability of departure flight delay from the airport