FairMedFM

Fairness Benchmarking for Medical Imaging Foundation Models

Abstract

The advent of foundation models (FMs) in healthcare offers unprecedented opportunities to enhance medical diagnostics through automated classification and segmentation tasks. However, these models also raise significant concerns about their fairness, especially when applied to diverse and underrepresented populations in healthcare applications. Currently, there is a lack of comprehensive benchmarks, standardized pipelines, and easily adaptable libraries to evaluate and understand the fairness performance of FMs in medical imaging, leading to considerable challenges in formulating and implementing solutions that ensure equitable outcomes across diverse patient populations. To fill this gap, we introduce FairMedFM, a fairness benchmark for FM research in medical imaging. FairMedFM integrates with 17 popular medical imaging datasets, encompassing different modalities, dimensionalities, and sensitive attributes. It explores 20 widely used FMs, with various usages such as zero-shot learning, linear probing, parameter-efficient fine-tuning, and prompting in various downstream tasks -- classification and segmentation. Our exhaustive analysis evaluates the fairness performance over different evaluation metrics from multiple perspectives, revealing the existence of bias, varied utility-fairness trade-offs on different FMs, consistent disparities on the same datasets regardless FMs, and limited effectiveness of existing unfairness mitigation methods.

Structure

FairMedFM captures comprehensive modules for benchmarking the fairness of foundation models in medical image analysis.

Dataloader: provides a consistent interface for loading and processing imaging data across various modalities and dimensions, supporting both classification and segmentation tasks.
Model: a one-stop library that includes implementations of the most popular pre-trained foundation models for medical image analysis.
Usage Wrapper: encapsulates foundation models for various use cases and tasks, including linear probe, zero-shot inference, PEFT, promptable segmentation, etc.
Trainer: offers a unified workflow for fine-tuning and testing wrapped models, and includes state-of-the-art unfairness mitigation algorithms.
Evaluation includes a set of metrics and tools to visualize and analyze fairness across different tasks.

Tasks	Supported Usages	Supported Models	Supported Datasets
Image Classification	Linear probe, zero-shot, CLIP adaptaion, PEFT	CLIP, BLIP, BLIP2, MedCLIP, BiomedCLIP, PubMedCLIP, DINOv2, C2L, LVM-Med, MedMAE, MoCo-CXR	CheXpert, MIMIC-CXR, HAM10000, FairVLMed10k, GF3300, PAPILA, BRSET, COVID-CT-MD, ADNI-1.5T
Image Segmentation	Interactive segmentation prompted with boxes and points	SAM, MobileSAM, TinySAM, MedSAM, SAM-Med2D, FT-SAM, SAM-Med3D, FastSAM3D, SegVol	HAM10000, TUSC, FairSeg, Montgomery County X-ray, KiTS, CANDI, IRCADb, SPIDER

Schedule

Release the classification tasks.
Release the segmentation tasks.
- 2D dataset + 2D SAMs
- 3D dataset + 2D SAMs
- 3D dataset + 3D SAMs
Release more models
Release the preprocessed datasets.
Integration of the classic strategies.
Release examples and tutorials.

Installation

The installation requires three steps.

Download from github

git clone https://github.com/FairMedFM/FairMedFM.git
cd FairMedFM

Creating conda environment

conda env create -f environment.yaml
conda activate fairmedfm

Download Pretrained FMs

wget https://object-arbutus.cloud.computecanada.ca:443/rjin/pretrained.zip
unzip pretrained.zip
rm -f pretrained.zip

Our notebook tutorials also contains how to setup the environment in Colab.

Data

You can either download our pre-processed data directly (see next section) or pre-process customized data your self. However, not all dataset we used permit us to release the data on our end (e.g., dataset like MIMIC and ADNI requires the user go through their data usage application first). In such case, we cannot provide the download link of our preprocessed dataset for them, but we have the original dataset downloading link and our pre-process scripts released.

Preprocess data on your own

We provide data preprocessing scripts for each datasets here. The data preprocessing contains 3 steps:

(Optional) preprocess imaging data.
Preprocess metadata and sensitive attributes.
Split dataset into training set and test set with balanced subgroups (for classification only).

Our data is downloaded uisng the following links.

Classification Dataset

Dataset	Link
CheXpert	Original data Demographic data
MIMIC-CXR	MIMIC-CXR
PAPILA	PAPILA
HAM10000	HAM10000
OCT	OCT
OL3I	OL3I
COVID-CT-MD	COVID-CT-MD
ADNI	ADNI-1.5T

Segmentation Dataset

Dataset	Link
HAM10000	HAM10000
TUSC	TUSC
FairSeg	FairSeg
Montgomery County X-ray	Montgomery County X-ray
KiTS2023	KiTS2023
IRCADb	IRCADb
CANDI	CANDI
SPIDER	SPIDER

Use Our Pre-processed Data

We offer data downloading through the S3 link. We are working to build this feature now.

Classification Dataset

Dataset	Link
CheXpert	Requires application on original data provider.
MIMIC-CXR	Requires application on original data provider.
PAPILA	PAPILA
HAM10000	HAM10000
OCT	TODO
OL3I	TODO
COVID-CT-MD	TODO
ADNI	Requires application on original data provider.

Notebook Tutorial

We offer some examples of how to use our package through the notebook.

Feature	Notebook
Linear Probing
CLIP Zero-shot and Adaptor
Segmentation

Running Experiment

Classification

We provide an example of running a linear-probe (classification) experiment of the CLIP model on the MIMIC-CXR dataset to evaluate fairness on sex. Please refer to parse_args.py for more details.

python main.py --task cls --usage lp --dataset CXP --sensitive_name Sex --method erm --total_epochs 100 --warmup_epochs 5 --blr 2.5e-4 --batch_size 128 --optimizer adamw --min_lr 1e-5 --weight_decay 0.05

Segmentation (2D SAMs)

We also provide an example of using SAM with center point prompt on the TUSC dataset to evaluate fairness on sex. Please refer to parse_args.py for more details.

python main.py --task seg --usage seg2d --dataset TUSC --sensitive_name Sex --method erm --batch_size 1 --pos_class 255 --model SAM --sam_ckpt_path ./weights/SAM.pth --img_size 1024 --prompt center

Acknowledgement

We thank MEDFAIR for their pioneering works on benchmarking fairness for medical image analysis, and Slide-SAM for the SAM inference framework.

Guidelines for Responsible Use of the Benchmark

1. Understanding the Scope and Limitations

Comprehend the Benchmark's Design: Before utilizing the benchmark, ensure a thorough understanding of its design, including the datasets, models, tasks, and metrics involved. Be aware of its intended scope, as well as any limitations or biases inherent in the datasets and models.
Acknowledge: The benchmark uses publically available data, where some of the data needs users' license in order to download. We acknowledge that the these data belongs to the original owner, where FairMedFM uses them under the constraint set by these owners.

2. Ethical Considerations

Fairness and Bias Awareness: The benchmark includes fairness metrics to assess the performance of AI models. Users should carefully consider these metrics to avoid perpetuating or amplifying biases in AI systems, especially in sensitive domains like healthcare.

3. Reproducibility

Use the Provided Codebase: The benchmark comes with a codebase designed to facilitate reproducibility. Users are encouraged to utilize and contribute to this codebase to maintain a consistent standard of experimentation.
Document Any Changes: If modifications are made to the benchmark, such as adjusting datasets or metrics, these changes should be well-documented and justified. This ensures that results can be accurately interpreted and compared.

4. Collaborative and Open Science

Engage with the Community: Users are encouraged to engage with the research community by sharing findings, discussing potential improvements, and collaborating on extensions of the benchmark.
Attribution and Acknowledgment: When using the benchmark in research or applications, proper attribution should be given to the creators of the benchmark and the original datasets. Acknowledge the sources of any third-party data or models used.

5. Continuous Improvement

Feedback and Contributions: Provide feedback to the benchmark’s maintainers regarding any issues or potential improvements. Contributions in the form of new datasets, metrics, or models are encouraged to enhance the benchmark's utility and relevance.
Stay Updated: Keep abreast of updates to the benchmark or related research. This ensures that you are using the most current and validated version, which may include important improvements or corrections.

By following these guidelines, users can ensure that they are utilizing the benchmark responsibly, contributing to ethical AI development, and fostering a collaborative research environment.

License

This project is released under the CC BY 4.0 license. Please see the LICENSE file for more information.

Citation

@article{jin2024fairmedfm,
  title={FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models},
  author={Jin, Ruinan and Xu, Zikang and Zhong, Yuan and Yao, Qiongsong and Dou, Qi and Zhou, S Kevin and Li, Xiaoxiao},
  journal={arXiv preprint arXiv:2407.00983},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
configs		configs
datasets		datasets
figs		figs
models		models
notebooks		notebooks
pre-processing		pre-processing
trainers		trainers
utils		utils
wrappers		wrappers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.py		main.py
parse_args.py		parse_args.py
run_test_seg.sh		run_test_seg.sh
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FairMedFM

Fairness Benchmarking for Medical Imaging Foundation Models

Abstract

Structure

Schedule

Installation

Data

Preprocess data on your own

Classification Dataset

Segmentation Dataset

Use Our Pre-processed Data

Classification Dataset

Notebook Tutorial

Running Experiment

Classification

Segmentation (2D SAMs)

Acknowledgement

1. Understanding the Scope and Limitations

2. Ethical Considerations

3. Reproducibility

4. Collaborative and Open Science

5. Continuous Improvement

License

Citation

About

Releases

Packages

Contributors 4

Languages

License

FairMedFM/FairMedFM

Folders and files

Latest commit

History

Repository files navigation

FairMedFM

Fairness Benchmarking for Medical Imaging Foundation Models

Abstract

Structure

Schedule

Installation

Data

Preprocess data on your own

Classification Dataset

Segmentation Dataset

Use Our Pre-processed Data

Classification Dataset

Notebook Tutorial

Running Experiment

Classification

Segmentation (2D SAMs)

Acknowledgement

1. Understanding the Scope and Limitations

2. Ethical Considerations

3. Reproducibility

4. Collaborative and Open Science

5. Continuous Improvement

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages