Skip to content

junzhin/DGM-VLC

Repository files navigation

Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning

This is the implementation of Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning (accepted by AIM-FM Workshop of NeurIPS2024🔥🔥🔥), which explores how deep generative models can reveal and demonstrate patterns in medical images through vision-language conditioning.

🌟 Abstract

Deep generative models have significantly advanced medical imaging analysis by enhancing dataset size and quality. Beyond mere data augmentation, our research in this paper highlights an additional, significant capacity of deep generative models: their ability to reveal and demonstrate patterns in medical images. We employ a generative structure with hybrid conditions, combining clinical data and segmentation masks to guide the image synthesis process. Furthermore, we innovatively transformed the tabular clinical data into textual descriptions. This approach simplifies the handling of missing values and also enables us to leverage large pre-trained vision-language models that investigate the relations between independent clinical entries and comprehend general terms, such as gender and smoking status. Our approach differs from and presents a more challenging task than traditional medical report-guided synthesis due to the less visual correlation of our clinical information with the images. To overcome this, we introduce a text-visual embedding mechanism that strengthens the conditions, ensuring the network effectively utilizes the provided information. Our pipeline is generalizable to both GAN-based and diffusion models. Experiments on chest CT, particularly focusing on the smoking status, demonstrated a consistent intensity shift in the lungs which is in agreement with clinical observations, indicating the effectiveness of our method in capturing and visualizing the impact of specific attributes on medical image patterns. Our methods offer a new avenue for the early detection and precise visualization of complex clinical conditions with deep generative models.

🚀 Model Pipelines and Fusion Graphical Illustration:

Overview of Method Pipeline1 Overview of Method Pipeline2 Overview of Method Comparsion

🖼️ Illustrative Cases Demonstrating the Impact of Altered Prompt Content on Prediction Outcomes.

(Default) Age: 68 Smoker: No 😊
(Default) Age: 86 Smoker: Yes
Age: 24 Smoker: No 🧒
Input Mask
Difference in Voxel Values: Age 📈
Age Difference
Age: 68 Smoker: Yes 🚬
Smoker
Difference in Voxel Values: Smoker No vs Yes 🔄
Smoker Difference
(Default) Age: 86 Smoker: Yes 🧓🚬
(Default) Age: 86 Smoker: Yes
Age: 24 Smoker: Yes 🧒🚬
Input Mask
Difference in Voxel Values: Age 📊
Age Difference
Age: 86 Smoker: No 🚭
Non-Smoker
Difference in Voxel Values: Smoker No vs Yes 🔁
Smoker Difference

🎉 More Examples

Ex1: (Default) Patient Status: Alive 😊
(Default) Patient Status: Alive
Patient Status: Dead 💀
Input Mask
Difference in Voxel Values: Dead vs Alive 📉
Age Difference
Ex2: (Default) Patient Status: Dead 💀
Patient Status: Dead
Patient Status: Alive 😊
Patient Status: Alive
Difference in Voxel Values: Dead vs Alive 📊
Age Difference
Ex3: (Default) DIAGNOSIS CODE: CTD-ILD 🏥
DIAGNOSIS CODE: CTD-ILD
DIAGNOSIS CODE: IPF 🩺
DIAGNOSIS CODE: IPF
Difference in Voxel Values: CTD-ILD vs IPF 📈
Difference in Voxel Values: DIAGNOSIS CODE
DIAGNOSIS CODE: UILD 🏥
DIAGNOSIS CODE: UILD
Difference in Voxel Values: CTD-ILD vs UILD 📉
Difference in Voxel Values: DIAGNOSIS CODE
Difference in Voxel Values: IPF vs UILD 📊
Difference in Voxel Values: DIAGNOSIS CODE
Ex4: (Default) DIAGNOSIS CODE: CTD-ILD 🏥
DIAGNOSIS CODE: CTD-ILD
DIAGNOSIS CODE: IPF 🩺
DIAGNOSIS CODE: IPF
Difference in Voxel Values: CTD-ILD vs IPF 📈
Difference in Voxel Values: DIAGNOSIS CODE
DIAGNOSIS CODE: UILD 🏥
DIAGNOSIS CODE: UILD
Difference in Voxel Values: CTD-ILD vs UILD 📉
Difference in Voxel Values: DIAGNOSIS CODE
Difference in Voxel Values: IPF vs UILD 📊
Difference in Voxel Values: DIAGNOSIS CODE
Ex5: (Default) DIAGNOSIS CODE: CTD-ILD 🏥
DIAGNOSIS CODE: CTD-ILD
DIAGNOSIS CODE: IPF 🩺
DIAGNOSIS CODE: IPF
Difference in Voxel Values: CTD-ILD vs IPF 📈
Difference in Voxel Values: DIAGNOSIS CODE
DIAGNOSIS CODE: UILD 🏥
DIAGNOSIS CODE: UILD
Difference in Voxel Values: CTD-ILD vs UILD 📉
Difference in Voxel Values: DIAGNOSIS CODE
Difference in Voxel Values: IPF vs UILD 📊
Difference in Voxel Values: DIAGNOSIS CODE

💡 Highlights

  • Conversion of Tabular Data into Text 😊: This method efficiently addresses missing data issues and capitalizes on the capabilities of pre-trained vision-language models to decode clinical information.
  • Advanced Text Fusion Techniques 🧠: We introduced techniques, including a cross-attention module and an Affine transformation fusion unit, to refine the conditioning process in cases where clinical information does not directly correspond to visual cues in images.
  • General Implementation for GAN and Diffusion Models 🔄: Our pipeline is adaptable to both GAN-based and diffusion-based generative models.

🗂️ Code Structure

The code structure is organized as follows:

├── metrics_computation # Scripts for calculating evaluation metrics
├── models # Model definitions
├── options # Configuration and command line options
├── scr_README # README and documentation-related assets
├── utils # Utility functions and scripts 
├── generate_prompts.ipynb # Notebook for generating prompts
├── observe_difference.ipynb # Notebook for observing voxel differences 
├── inference_patch.py # Inference script (patch-level)
├── inference_whole.py # Inference script (whole-level)
└── train.py # Training script

📋 Requirements

Ensure you have the following dependencies installed:

apex==0.9.10dev
dominate==2.9.1
matplotlib==3.8.2
MedCLIP==0.0.3
monai==1.3.0
nibabel==5.2.1
numpy==1.26.4
pandas==2.2.1 
Pillow==10.2.0
pytorch_msssim==1.0.0
scikit_learn==1.4.0
scipy==1.12.0
SimpleITK==2.3.1
SimpleITK==2.3.1
tensorflow==2.15.0.post1
torch==2.1.2
torchmetrics==1.3.1
torchvision==0.16.2
tqdm==4.65.0

You can install all dependencies by running:

pip install -r requirements.txt

🔗 Citation

If you find our work interesting and useful, please consider citing:

@article{xing2024deep,
  title={Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning},
  author={Xing, Xiaodan and Ning, Junzhi and Nan, Yang and Yang, Guang},
  journal={arXiv preprint arXiv:2410.13823},
  year={2024}
}

📢 License

This project is licensed under the MIT License.

Credits

This repository is based on:

pix2pixHD: High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs (code and paper);

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published