Finding Hidden Features Responsible for Machine Learning Failures
This repository contains the scripts used for creating subsets of the CheXpert dataset, a large dataset of chest X-rays, used for training and testing the model in askovdal/fhfrmlf-model, which is forked from jfhealthcare/Chexpert, and refactored to train only on predicting pneumothorax cases.
The subsets
folder contains the different CSV-files created using the scripts below.
increment-notes.md
contains a summary of our progress throughout our process of our thesis.
Run a script using python 3 in the terminal, e.g. python3 model.py
.
Contains the logic for creating subsets of the main CSV-file of the whole dataset (train.csv), both as new CSV-files and as directories containing the actual image files.
Creates a CSV-file from a directory of files, optionally filtering the files using a suffix, and controlling the length of the outputted CSV-file.
Takes 2 CSV-files as input and concatenates them into a single CSV-file. Optionally, a fraction of each CSV-file can be used instead of the whole file.
Shuffles the rows of a CSV-file and overwrites the original file.
Takes a CSV-file and creates two new files containing a test and train subset.