Data-Science-Projects

Repo showing some of the data science projects I have done including preprocessing and supervised learning.

We have the preprocessing folder and supervised learning folder.

In the preprocessing folder we have:

Create_shuffled_train_val_split_for_image_datasets - shuffling and combining a number of different image datasets into training and validation splits
Data_preparation_feature_analysis_star_ratings_sklearn - Using big data methodologies (such as dask and c++) to process a large amount of data as efficiently as possible, clean it, combine it and then save it in a suitable format
Energy_price_data_preprocessing - gathering data from a number of different excel files and performing a number of preprocessing steps to get the data into a single cleaned dataframe.

In the supervised learning folder we have:

Fraud_classification_sklearn - Comparing different classification algorithms to see if an individual is a person of interest in the Enron company legal case for fraud.
Employee_leave_logistic_regression_sklearn - A logistic regression using sklearn to predict for scheduling reasons whether an employee would take a large amount of time away from the office for a certain absence reason.
Road_safety_regression_prediction_ensemble_sklearn - Road-safety score prediction using regression methods and ensemble learning. Comparison of different regression models and ensemble methods to predict the road safety score (1 star - 5 stars) of a road given a set of road features.
Energy_price_model_selection_optimisation_sklearn_optuna - Loads data, automatically compares a number of different regression models for predicting energy prices, chooses the best model and applies automated hyperparameter tuning , then *visualises the results of the tuned model on a test set.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Preprocessing		Preprocessing
Supervised_learning		Supervised_learning
README.md		README.md

Provide feedback