Medical Report Generation And VQA (Adapting XrayGPT to Any Modality)

abdur75648 · Jun 13, 2024 · c01bdfa · c01bdfa
1 parent 14945a5
commit c01bdfa
Show file tree

Hide file tree

Showing 19 changed files with 387 additions and 514 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/.gitignore b/.gitignore
@@ -137,4 +137,8 @@ venv.bak/
 dmypy.json
 
 # Pyre type checker
-.pyre/
+.pyre/
+
+*.pth
+OpenI-5-Samples/
+train_logs.txt
diff --git a/README-DATASET.md b/README-DATASET.md
diff --git a/README.md b/README.md
@@ -1,95 +1,70 @@
-# XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.
-![](https://i.imgur.com/waxVImv.png)
-
-[Omkar Thawakar](https://omkarthawakar.github.io/)* , [Abdelrahman Shaker](https://amshaker.github.io/)* , [Sahal Shaji Mullappilly](https://scholar.google.com/citations?user=LJWxVpUAAAAJ&hl=en)* , [Hisham Cholakkal](https://scholar.google.com/citations?hl=en&user=bZ3YBRcAAAAJ), [Rao Muhammad Anwer](https://scholar.google.com/citations?hl=en&authuser=1&user=_KlvMVoAAAAJ), [Salman Khan](https://salman-h-khan.github.io/), [Jorma Laaksonen](https://scholar.google.com/citations?user=qQP6WXIAAAAJ&hl=en), and [Fahad Shahbaz Khan](https://scholar.google.es/citations?user=zvaeYnUAAAAJ&hl=en). 
-
-*Equal Contribution
-
-**Mohamed bin Zayed University of Artificial Intelligence, UAE**
-
-<a href='#'><img src='https://img.shields.io/badge/Project-Page-Green'></a> [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://youtu.be/-zzq7bzbUuY)
-
-
-## :rocket: News
-<hr>
-
-+ Jun-14 : Our technical report is released [here](https://arxiv.org/abs/2306.07971). :fire::fire:
-+ May-25 : Our technical report will be released very soon. stay tuned!.
-+ May-19 : Our code, models, and pre-processed report summaries are released.
-
-
-## Online Demo
-You can try our demo using the provided examples or by uploading your own X-ray here : [Link-1](https://e764abfa8fdc8ad0c8.gradio.live) | [Link-2](https://61adec76d380025b25.gradio.live) | [Link-3](https://c1a70c1631c7cc54cd.gradio.live) .
-
+# Medical Report Generation (& VQA) using a VLM (XrayGPT-Based).
 
 ## About XrayGPT
-<hr>
 
-+ XrayGPT aims to stimulate research around automated analysis of chest radiographs based on the given x-ray. 
-+ The LLM (Vicuna) is fine-tuned on medical data (100k real conversations between patients and doctors) and ~30k radiology conversations to acquire domain specific and relevant features. 
-+ We generate interactive and clean summaries (~217k) from free-text radiology reports of two datasets ([MIMIC-CXR](https://physionet.org/content/mimic-cxr-jpg/2.0.0/) and [OpenI](https://openi.nlm.nih.gov/faq#collection)). These summaries serve to enhance the performance of LLMs through fine-tuning the linear transformation layer on high-quality data. For more details regarding our high-quality summaries, please check [Dataset Creation](README-DATASET.md).
-+ We align frozen medical visual encoder (MedClip) with a fune-tuned LLM (Vicuna), using simple linear transformation.
+XrayGPT is a state-of-the-art model for chest radiology report generation using large medical vision-language models. Built on top of BLIP-2 and MedCLIP, XrayGPT aligns a frozen visual encoder with a frozen large language model (LLM), Vicuna, using a Q-former. This repository extends XrayGPT for general-purpose medical report generation and Visual Question Answering (VQA).
 
-![overview](images/Overall_architecture_V3.gif)
+- [XrayGPT Paper](https://arxiv.org/abs/2306.07971)
+- [XrayGPT Repository](https://github.com/mbzuai-oryx/XrayGPT)
 
+## Using This Repository
 
-## Getting Started
 ### Installation
 
-**1. Prepare the code and the environment**
+Due to inconsistencies and incompatibilities among various libraries in the original codebase, a new environment is created to run the code in a Runpod container. The environment is based on Python 3.10, PyTorch 2.0.0, and CUDA 11.8.
 
-Clone the repository and create a anaconda environment
+[Runpod Website](https://runpod.io/)
+
+Use the Runpod Template `pytorch:2.1.0-py3.10-cuda11.8.0` and run the following commands to install the required libraries:
 
 ```bash
-git clone https://github.com/mbzuai-oryx/XrayGPT.git
-cd XrayGPT
-conda env create -f env.yml
-conda activate xraygpt
-```
-OR 
-```bash
-git clone https://github.com/mbzuai-oryx/XrayGPT.git
-cd XrayGPT
-conda create -n xraygpt python=3.9
-conda activate xraygpt
-pip install -r xraygpt_requirements.txt
+apt-get update -y && apt-get install zip unzip vim -y
+python -m pip install --upgrade pip
+pip install gdown
+pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
+pip install -r hard_requirements.txt --no-deps
+pip install pydantic==1.10.7
+pip install hyperframe==5.2.0
+pip install gradio==3.23.0
+pip install safetensors==0.4.3
 ```
 
 ### Setup
+Below is a brief overview of the steps for fine-tuning the model. Detailed instructions for training from scratch are provided in the original repository.
 
-**1. Prepare the Datasets for training**
+#### 1. Prepare the Datasets for Training
+Publicly available datasets for medical report generation predominantly focus on chest X-ray reports, often derived from sources like MIMIC-CXR/OpenI. While these datasets are valuable, they lack diversity in terms of medical imaging modalities. To address this limitation and enhance the model's capabilities for multi-modality report generation and Visual Question Answering (VQA), we curated a unique dataset by integrating two distinct datasets: OpenI and ROCO.
 
-Refer the [dataset_creation](README-DATASET.md) for more details.
+**OpenI Dataset**: [OpenI](https://openi.nlm.nih.gov/faq) is a well-known resource provided by the Indiana University School of Medicine, comprising chest X-ray images paired with corresponding radiology reports.
 
+- **Kaggle Download:** [Link](https://www.kaggle.com/datasets/raddar/chest-xrays-indiana-university)
+- **Description:** Radiology reports and chest X-ray images
+- **Samples:** 4,000
+- **Usage:** Report generation (Chest X-ray)
 
-Download the preprocessed annoatations [mimic](https://mbzuaiac-my.sharepoint.com/:u:/g/personal/omkar_thawakar_mbzuai_ac_ae/EZ6500itBIVMnD7sUztdMQMBVWVe7fuF7ta4FV78hpGSwg?e=wyL7Z7) & [openi](https://mbzuaiac-my.sharepoint.com/:u:/g/personal/omkar_thawakar_mbzuai_ac_ae/EVYGprPyzdhOjFlQ2aNJbykBj49SwTGBYmC1uJ7TMswaVQ?e=qdqS8U).
-Respective image folders contains the images from the dataset.
+**ROCO Dataset**: [ROCO](https://github.com/razorx89/roco-dataset) (Radiology Objects in COntext) is a multimodal medical image dataset enriched with descriptive captions, offering a broader spectrum of medical imaging scenarios.
 
-Following will be the final dataset folder structure:
+- **Description:** Multimodal images with detailed descriptive captions
+- **Dataset Size:** 8,000 samples (validation split used)
+- **Usage:** Enables the model to generalize across various medical imaging modalities beyond chest X-rays.
+
+By combining the OpenI and ROCO datasets and processing them using the OpenAI API, we created a comprehensive dataset suitable for training our model. The data integration resulted in a structured dataset stored in the `dataset` folder, facilitating efficient training and evaluation processes.
+
+The final structure of the dataset folder is as follows:
 
 ```
 dataset
-├── mimic
-|    ├── image
-|    |   ├──abea5eb9-b7c32823-3a14c5ca-77868030-69c83139.jpg
-|    |   ├──427446c1-881f5cce-85191ce1-91a58ba9-0a57d3f5.jpg
-|    |   .....
-|    ├──filter_cap.json
-├── openi
-|    ├── image
-|    |   ├──1.jpg
-|    |   ├──2.jpg
-|    |   .....
-|    ├──filter_cap.json
-...   
+├── image
+|   ├──1.jpg
+|   ├──2.jpg
+|   ├──3.jpg
+|    .....
+├── filter_cap.json
 ```
 
-**3. Prepare the pretrained Vicuna weights**
+#### 2. Prepare the Pretrained Vicuna Weights
 
-We built XrayGPT on the v1 versoin of Vicuna-7B.
-We finetuned Vicuna using curated radiology report samples. 
-Download the Vicuna weights from [vicuna_weights](https://mbzuaiac-my.sharepoint.com/:u:/g/personal/omkar_thawakar_mbzuai_ac_ae/EWoMYn3x7sdEnM2CdJRwWZgBCkMpLM03bk4GR5W0b3KIQQ?e=q6hEBz)
-The final weights would be in a single folder in a structure similar to the following:
+Download the finetuned version of `Vicuna-7B` from the [original XrayGPT link](https://mbzuaiac-my.sharepoint.com/:u:/g/personal/omkar_thawakar_mbzuai_ac_ae/EWoMYn3x7sdEnM2CdJRwWZgBCkMpLM03bk4GR5W0b3KIQQ?e=q6hEBz). The final weights should be in a single folder with a structure similar to the following:
 
 ```
 vicuna_weights
@@ -100,76 +75,42 @@ vicuna_weights
 ...   
 ```
 
-Then, set the path to the vicuna weight in the model config file "xraygpt/configs/models/xraygpt.yaml" at Line 16.
-
-To finetune Vicuna on radiology samples please download our curated [radiology](https://mbzuaiac-my.sharepoint.com/:u:/g/personal/omkar_thawakar_mbzuai_ac_ae/EXsChX3eN_lJgcrV2fLUU0QBQalFkDtp-mlHNixta_hc4w) and [medical_healthcare](https://mbzuaiac-my.sharepoint.com/:u:/g/personal/omkar_thawakar_mbzuai_ac_ae/Ecm7-uxj045DhHqZTSBsZi4B2Ld77tE-uB7SvvmLNmCW1Q?e=t5YLgi) conversational samples and refer the original Vicuna repo for finetune.[Vicuna_Finetune](https://github.com/lm-sys/FastChat#fine-tuning)
+#### 3. Download the Minigpt-4 Checkpoint
 
-**4. Download the pretrained Minigpt-4 checkpoint**
+Download the Minigpt-4 checkpoint from the [trained XrayGPT model](https://mbzuaiac-my.sharepoint.com/:u:/g/personal/omkar_thawakar_mbzuai_ac_ae/EbGJZmueJkFAstU965buWs8B7T8tLcks7N-P79gsExRH0Q?e=mVASdV).
 
-Download the pretrained minigpt-4 checkpoints. [ckpt](https://drive.google.com/file/d/1RY9jV0dyqLX-o38LrumkKRh6Jtaop58R/view?pli=1)
+### Model Training
+Here we fine-tuned a pretrained XrayGPT model on the dataset created above. The model was initially trained on the MIMIC and OpenI datasets in a two-stage training process.
 
-
-## 5. Training of XrayGPT
-
-**A. First mimic pretraining stage**
-
-In the first pretrained stage, the model is trained using image-text pairs from preprocessed mimic dataset.
-
-To launch the first stage training, run the following command. In our experiments, we use 4 AMD MI250X GPUs. 
+Run the following command:
 
 ```bash
-torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/xraygpt_mimic_pretrain.yaml
+python3 train.py --cfg-path train_configs/xraygpt_openi_finetune.yaml
 ```
 
-**2. Second openi finetuning stage**
+### Launching the Demo
 
-In the second stage, we use a small high quality image-text pair openi dataset preprocessed by us.
+Download the pretrained XrayGPT checkpoints from the [link](https://mbzuaiac-my.sharepoint.com/:u:/g/personal/omkar_thawakar_mbzuai_ac_ae/EbGJZmueJkFAstU965buWs8B7T8tLcks7N-P79gsExRH0Q?e=mVASdV) and add this checkpoint in `eval_configs/xraygpt_eval.yaml`.
 
-Run the following command. In our experiments, we use AMD MI250X GPU.
+Run the following command to launch the demo:
 
 ```bash
-torchrun --nproc-per-node NUM_GPU train.py --cfg-path train_configs/xraygpt_openi_finetune.yaml
-```
-
-### Launching Demo on local machine
-
-
-Download the pretrained xraygpt checkpoints. [link](https://mbzuaiac-my.sharepoint.com/:u:/g/personal/omkar_thawakar_mbzuai_ac_ae/EbGJZmueJkFAstU965buWs8B7T8tLcks7N-P79gsExRH0Q?e=mVASdV)
-
-Add this ckpt in "eval_configs/xraygpt_eval.yaml".
-
-Try gradio [demo.py](demo.py) on your local machine with following
-
-```
-python demo.py --cfg-path eval_configs/xraygpt_eval.yaml  --gpu-id 0
+python demo.py --cfg-path eval_configs/xraygpt_eval.yaml --gpu-id 0
 ```
 
-## Examples
-  |   |   |
-:-------------------------:|:-------------------------:
-![example 1](images/image1.jpg) |  ![example 2](images/image2.jpg)
-![example 3](images/image3.jpg)  |  ![example 4](images/image4.jpg)
+### Important Note
 
+Due to computational constraints, the model was fine-tuned on a very small subset of the newly curated dataset (only 100 samples total). The model is not fully trained, and the results are not accurate. The purpose was to set up the environment and run the code. For better results, please train the model on the complete datasets.
 
-## Acknowledgement
-<hr>
+## Citation
 
-+ [MiniGPT-4](https://minigpt-4.github.io) Enhancing Vision-language Understanding with Advanced Large Language Models. We built our model on top of MiniGPT-4. 
-+ [MedCLIP](https://github.com/RyanWangZf/MedCLIP) Contrastive Learning from Unpaired Medical Images and Texts. We used medical aware image encoder from MedCLIP.
-+ [BLIP2](https://huggingface.co/docs/transformers/main/model_doc/blip-2) The model architecture of XrayGPT follows BLIP-2. 
-+ [Lavis](https://github.com/salesforce/LAVIS) This repository is built upon Lavis!
-+ [Vicuna](https://github.com/lm-sys/FastChat) The fantastic language ability of Vicuna is just amazing. And it is open-source!
+If you use this work, please cite the following paper:
 
-## Citation
-If you're using XrayGPT in your research or applications, please cite using this BibTeX:
 ```bibtex
-    @article{Omkar2023XrayGPT,
-        title={XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models},
-        author={Omkar Thawkar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen and Fahad Shahbaz Khan},
-        journal={arXiv: 2306.07971},
-        year={2023}
-    }
-```
-
-## License
-This repository is licensed under CC BY-NC-SA. Please refer to the license terms [here](https://creativecommons.org/licenses/by-nc-sa/4.0/).
+@article{Omkar2023XrayGPT,
+    title={XrayGPT: Chest Radiographs Summarization using Large Medical Vision-Language Models},
+    author={Omkar Thawkar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen and Fahad Shahbaz Khan},
+    journal={arXiv: 2306.07971},
+    year={2023}
+}
+```
diff --git a/Screenshot WebApp.png b/Screenshot WebApp.png
diff --git a/Thumbs.db b/Thumbs.db