Finding Hidden Features Responsible for Machine Learning Failures

This repository contains the scripts used for creating subsets of the CheXpert dataset, a large dataset of chest X-rays, used for training and testing the model in askovdal/fhfrmlf-model, which is forked from jfhealthcare/Chexpert, and refactored to train only on predicting pneumothorax cases.

The subsets folder contains the different CSV-files created using the scripts below.

increment-notes.md contains a summary of our progress throughout our process of our thesis.

Script overview

Run a script using python 3 in the terminal, e.g. python3 model.py.

model.py

Contains the logic for creating subsets of the main CSV-file of the whole dataset (train.csv), both as new CSV-files and as directories containing the actual image files.

create-csv-from-dir.py

Creates a CSV-file from a directory of files, optionally filtering the files using a suffix, and controlling the length of the outputted CSV-file.

concat-csvs.py

Takes 2 CSV-files as input and concatenates them into a single CSV-file. Optionally, a fraction of each CSV-file can be used instead of the whole file.

shuffle-csv.py

Shuffles the rows of a CSV-file and overwrites the original file.

create-train-test-split.py

Takes a CSV-file and creates two new files containing a test and train subset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Finding Hidden Features Responsible for Machine Learning Failures

Script overview

model.py

create-csv-from-dir.py

concat-csvs.py

shuffle-csv.py

create-train-test-split.py

Files

README.md

Latest commit

History

README.md

File metadata and controls

Finding Hidden Features Responsible for Machine Learning Failures

Script overview

model.py

create-csv-from-dir.py

concat-csvs.py

shuffle-csv.py

create-train-test-split.py