Multiclass classification problem: Build a classifier to predict the number of passengers in flights

Project Description

The aim of this project is to investigate various data preprocessing techniques and machine learning algorithms and evaluate their performance on a dataset related to flights. In particular, a set of data consisting of several thousand flights to America is provided, where each flight is described by a set of variables such as departure and arrival airports. Additionally, each flight is associated with a variable representing the number of passengers onboard, categorized into ranges. While this variable is known for some flights, it is missing for others. This project aims to predict this variable for flights where the information was unavailable.

The project was implemented in the context of the course "Data Science & Web Mining" taught by Prof. Vazirgiannis Michalis at the Department of Informatics, AUEB. Additionally, it was part of the Kaggle competition: INF131: The case of flight passengers prediction, where we secured the 3rd place in the private leaderboard.

Dataset Description

The files we examined are the following:

train.csv — 8,899 labeled flights, with each row representing one flight along with its corresponding label.
test.csv — 2,229 flights, with each flight represented in a separate row.

The file named "train.csv" contains the training data of the problem, while the file "test.csv" contains the test set of the problem. Each entry in these files represents a flight, featuring the following variables:

DateOfDeparture: Departure date of the flight (object)
Departure: Code of the departure airport (object)
CityDeparture: Name of the departure airport (object)
LongitudeDeparture: Longitude of the departure airport (float64)
LatitudeDeparture: Latitude of the departure airport (float64)
Arrival: Code of the arrival airport (object)
CityArrival: Name of the arrival airport (object)
LongitudeArrival: Longitude of the arrival airport (float64)
LatitudeArrival: Latitude of the arrival airport (float64)
WeeksToDeparture: What was the average number of weeks before departure that passengers booked their tickets (float64)
std_wtd: Standard deviation for the "WeeksToDeparture" variable (float64)

The training set contains an additional variable called PAX (int64), which represents the number of passengers on each flight. This variable has eight distinct values ranging from 0 to 7, resulting in a total of 8 categories. Each category signifies the range of passengers boarding the flight, with the actual numbers within the categories being randomly assigned. Notably, the variable PAX is not included in the test set, as the task revolves around predicting it.

Implementation Process

Data Preprocessing Techniques:

Feature engineering
Min-max normalization
One-hot-encoding for object variables
Principal components analysis for feature extraction

Machine Learning Algorithms:

Decision tree classifier
K-nearest neighbors algorithm (KNN)
Gradient Boosting classifier
Artificial Neural Networks (ANNs)

Hyperparameters Optimization:

Number of estimators, max_depth, and learning rate for Gradient Boosting classifier
Number of neighbors and the algorithm employed to compute the nearest neighbors for the KNN classifier.
Hidden layer size, activation functions, optimizers, weight initializers, number of epochs, dropout regularization, and learning rate for ANNs

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.idea		.idea
LICENSE		LICENSE
Passengers.ipynb		Passengers.ipynb
README.md		README.md
mlp.png		mlp.png
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multiclass classification problem: Build a classifier to predict the number of passengers in flights

Contents

Project Description

Dataset Description

Implementation Process

Data Preprocessing Techniques:

Machine Learning Algorithms:

Hyperparameters Optimization:

Proposed Model:

Team

About

Releases

Packages

Languages

License

ChryssaNab/Flight-Passengers-Prediction

Folders and files

Latest commit

History

Repository files navigation

Multiclass classification problem: Build a classifier to predict the number of passengers in flights

Contents

Project Description

Dataset Description

Implementation Process

Data Preprocessing Techniques:

Machine Learning Algorithms:

Hyperparameters Optimization:

Proposed Model:

Team

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages