Repo showing some of the data science projects I have done including preprocessing and supervised learning.
We have the preprocessing folder and supervised learning folder.
In the preprocessing folder we have:
- Create_shuffled_train_val_split_for_image_datasets - shuffling and combining a number of different image datasets into training and validation splits
- Data_preparation_feature_analysis_star_ratings_sklearn - Using big data methodologies (such as dask and c++) to process a large amount of data as efficiently as possible, clean it, combine it and then save it in a suitable format
- Energy_price_data_preprocessing - gathering data from a number of different excel files and performing a number of preprocessing steps to get the data into a single cleaned dataframe.
In the supervised learning folder we have:
- Fraud_classification_sklearn - Comparing different classification algorithms to see if an individual is a person of interest in the Enron company legal case for fraud.
- Employee_leave_logistic_regression_sklearn - A logistic regression using sklearn to predict for scheduling reasons whether an employee would take a large amount of time away from the office for a certain absence reason.
- Road_safety_regression_prediction_ensemble_sklearn - Road-safety score prediction using regression methods and ensemble learning. Comparison of different regression models and ensemble methods to predict the road safety score (1 star - 5 stars) of a road given a set of road features.
- Energy_price_model_selection_optimisation_sklearn_optuna - Loads data, automatically compares a number of different regression models for predicting energy prices, chooses the best model and applies automated hyperparameter tuning , then *visualises the results of the tuned model on a test set.