Our study uses financial ratios data of Polish companies to predict their default using machine-learning techniques.
The dataset used in this study is obtained from UCI Machine Learning Repository. The data contains 64 financial ratios and corresponding class label that indicates bankruptcy status after 2 years. Of the 9792 companies analyzed in this study, 515 companies (5.26%) went into bankruptcy, whereas 9277 (94.74%) firms survived.
Given that the dataset only contains financial ratios, instead of raw financial figures, we attempted and successfully reverse engineered the raw financial figures from financial ratios.
We find that ensemble techniques such as random forest provide the best results. Furthermore, we applied SHAP (SHapley Additive exPlanations) technique to explain the output of the model.
The best way to step through our work is to view the notebooks.
The source data is obtained from UCI Machine Learning Repository
We created a data dictionary to map the given column names to financial ratios:
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
To set up the conda environment, run:
conda env create -f environment.yml
conda activate company_default
If there is any additonal packages required, add to the yml file and run:
conda env update -f environment.yml --prune
To create the kernel for jupyter notebooks, run:
conda activate company_default
python -m ipykernel install --user --name company_default --display-name "Python (company_default)`
To set up pre-commit (which is used to run black before committing to Git), run:
pre-commit install
Add the following line in the first line of your notebook to run black formatting on that notebook:
%load_ext nb_black
To set up the io, create a data folder in the root directory, which should have the following structure:
- data/
- input/
- train.csv
- test.csv
- output/
- input/
The input folder contains train.csv and test.csv, while the output folder will have the pipeline output.
Tables | Description |
---|---|
train.csv | Labelled dataset used to train the model |
test.csv | Unlablled dataset to make prediciton for |