Finding Hidden Features Responsible for Machine Learning Failures

This repository contains the scripts used for creating subsets of the CheXpert dataset, a large dataset of chest X-rays, used for training and testing the model in askovdal/fhfrmlf-model, which is forked from jfhealthcare/Chexpert, and refactored to train only on predicting pneumothorax cases.

The subsets folder contains the different CSV-files created using the scripts below.

increment-notes.md contains a summary of our progress throughout our process of our thesis.

Script overview

Run a script using python 3 in the terminal, e.g. python3 model.py.

model.py

Contains the logic for creating subsets of the main CSV-file of the whole dataset (train.csv), both as new CSV-files and as directories containing the actual image files.

create-csv-from-dir.py

Creates a CSV-file from a directory of files, optionally filtering the files using a suffix, and controlling the length of the outputted CSV-file.

concat-csvs.py

Takes 2 CSV-files as input and concatenates them into a single CSV-file. Optionally, a fraction of each CSV-file can be used instead of the whole file.

shuffle-csv.py

Shuffles the rows of a CSV-file and overwrites the original file.

create-train-test-split.py

Takes a CSV-file and creates two new files containing a test and train subset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finding Hidden Features Responsible for Machine Learning Failures

Script overview

model.py

create-csv-from-dir.py

concat-csvs.py

shuffle-csv.py

create-train-test-split.py

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
subsets		subsets
.gitignore		.gitignore
README.md		README.md
concat-csvs.py		concat-csvs.py
correlation.py		correlation.py
create-csv-from-dir.py		create-csv-from-dir.py
create-train-test-split.py		create-train-test-split.py
ensemble-test.csv		ensemble-test.csv
increment-notes.md		increment-notes.md
model.py		model.py
shuffle-csv.py		shuffle-csv.py
train.csv		train.csv

purrlab/hiddenfeatures-chestxray

Folders and files

Latest commit

History

Repository files navigation

Finding Hidden Features Responsible for Machine Learning Failures

Script overview

model.py

create-csv-from-dir.py

concat-csvs.py

shuffle-csv.py

create-train-test-split.py

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages