Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning

This is the implementation of Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning (accepted by AIM-FM Workshop of NeurIPS2024🔥🔥🔥), which explores how deep generative models can reveal and demonstrate patterns in medical images through vision-language conditioning.

🌟 Abstract

Deep generative models have significantly advanced medical imaging analysis by enhancing dataset size and quality. Beyond mere data augmentation, our research in this paper highlights an additional, significant capacity of deep generative models: their ability to reveal and demonstrate patterns in medical images. We employ a generative structure with hybrid conditions, combining clinical data and segmentation masks to guide the image synthesis process. Furthermore, we innovatively transformed the tabular clinical data into textual descriptions. This approach simplifies the handling of missing values and also enables us to leverage large pre-trained vision-language models that investigate the relations between independent clinical entries and comprehend general terms, such as gender and smoking status. Our approach differs from and presents a more challenging task than traditional medical report-guided synthesis due to the less visual correlation of our clinical information with the images. To overcome this, we introduce a text-visual embedding mechanism that strengthens the conditions, ensuring the network effectively utilizes the provided information. Our pipeline is generalizable to both GAN-based and diffusion models. Experiments on chest CT, particularly focusing on the smoking status, demonstrated a consistent intensity shift in the lungs which is in agreement with clinical observations, indicating the effectiveness of our method in capturing and visualizing the impact of specific attributes on medical image patterns. Our methods offer a new avenue for the early detection and precise visualization of complex clinical conditions with deep generative models.

🚀 Model Pipelines and Fusion Graphical Illustration:

🖼️ Illustrative Cases Demonstrating the Impact of Altered Prompt Content on Prediction Outcomes.

(Default) Age: 68 Smoker: No 😊

Age: 24 Smoker: No 🧒

Difference in Voxel Values: Age 📈

Age: 68 Smoker: Yes 🚬

Difference in Voxel Values: Smoker No vs Yes 🔄

(Default) Age: 86 Smoker: Yes 🧓🚬

Age: 24 Smoker: Yes 🧒🚬

Difference in Voxel Values: Age 📊

Age: 86 Smoker: No 🚭

Difference in Voxel Values: Smoker No vs Yes 🔁

🎉 More Examples

Ex1: (Default) Patient Status: Alive 😊

Patient Status: Dead 💀

Difference in Voxel Values: Dead vs Alive 📉

Ex2: (Default) Patient Status: Dead 💀

Patient Status: Alive 😊

Difference in Voxel Values: Dead vs Alive 📊

Ex3: (Default) DIAGNOSIS CODE: CTD-ILD 🏥

DIAGNOSIS CODE: IPF 🩺

Difference in Voxel Values: CTD-ILD vs IPF 📈

Difference in Voxel Values: DIAGNOSIS CODE

DIAGNOSIS CODE: UILD 🏥

Difference in Voxel Values: CTD-ILD vs UILD 📉

Difference in Voxel Values: IPF vs UILD 📊

Ex4: (Default) DIAGNOSIS CODE: CTD-ILD 🏥

DIAGNOSIS CODE: IPF 🩺

Difference in Voxel Values: CTD-ILD vs IPF 📈

DIAGNOSIS CODE: UILD 🏥

Difference in Voxel Values: CTD-ILD vs UILD 📉

Difference in Voxel Values: IPF vs UILD 📊

Ex5: (Default) DIAGNOSIS CODE: CTD-ILD 🏥

DIAGNOSIS CODE: IPF 🩺

Difference in Voxel Values: CTD-ILD vs IPF 📈

DIAGNOSIS CODE: UILD 🏥

Difference in Voxel Values: CTD-ILD vs UILD 📉

Difference in Voxel Values: IPF vs UILD 📊

💡 Highlights

Conversion of Tabular Data into Text 😊: This method efficiently addresses missing data issues and capitalizes on the capabilities of pre-trained vision-language models to decode clinical information.
Advanced Text Fusion Techniques 🧠: We introduced techniques, including a cross-attention module and an Affine transformation fusion unit, to refine the conditioning process in cases where clinical information does not directly correspond to visual cues in images.
General Implementation for GAN and Diffusion Models 🔄: Our pipeline is adaptable to both GAN-based and diffusion-based generative models.

🗂️ Code Structure

The code structure is organized as follows:

├── metrics_computation # Scripts for calculating evaluation metrics
├── models # Model definitions
├── options # Configuration and command line options
├── scr_README # README and documentation-related assets
├── utils # Utility functions and scripts 
├── generate_prompts.ipynb # Notebook for generating prompts
├── observe_difference.ipynb # Notebook for observing voxel differences 
├── inference_patch.py # Inference script (patch-level)
├── inference_whole.py # Inference script (whole-level)
└── train.py # Training script

📋 Requirements

Ensure you have the following dependencies installed:

apex==0.9.10dev
dominate==2.9.1
matplotlib==3.8.2
MedCLIP==0.0.3
monai==1.3.0
nibabel==5.2.1
numpy==1.26.4
pandas==2.2.1 
Pillow==10.2.0
pytorch_msssim==1.0.0
scikit_learn==1.4.0
scipy==1.12.0
SimpleITK==2.3.1
SimpleITK==2.3.1
tensorflow==2.15.0.post1
torch==2.1.2
torchmetrics==1.3.1
torchvision==0.16.2
tqdm==4.65.0

You can install all dependencies by running:

pip install -r requirements.txt

🔗 Citation

If you find our work interesting and useful, please consider citing:

@article{xing2024deep,
  title={Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning},
  author={Xing, Xiaodan and Ning, Junzhi and Nan, Yang and Yang, Guang},
  journal={arXiv preprint arXiv:2410.13823},
  year={2024}
}

📢 License

This project is licensed under the MIT License.

Credits

This repository is based on:

pix2pixHD: High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs (code and paper);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning

🌟 Abstract

🚀 Model Pipelines and Fusion Graphical Illustration:

🖼️ Illustrative Cases Demonstrating the Impact of Altered Prompt Content on Prediction Outcomes.

🎉 More Examples

💡 Highlights

🗂️ Code Structure

📋 Requirements

🔗 Citation

📢 License

Credits

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.vscode		.vscode
metrics_computation		metrics_computation
models		models
options		options
scr_README		scr_README
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_prompts.ipynb		generate_prompts.ipynb
inference_patch.py		inference_patch.py
inference_whole.py		inference_whole.py
observe_difference.ipynb		observe_difference.ipynb
requirements.txt		requirements.txt
train.py		train.py

License

junzhin/DGM-VLC

Folders and files

Latest commit

History

Repository files navigation

Deep Generative Models Unveil Patterns in Medical Images Through Vision-Language Conditioning

🌟 Abstract

🚀 Model Pipelines and Fusion Graphical Illustration:

🖼️ Illustrative Cases Demonstrating the Impact of Altered Prompt Content on Prediction Outcomes.

🎉 More Examples

💡 Highlights

🗂️ Code Structure

📋 Requirements

🔗 Citation

📢 License

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages