Skip to content

Latest commit

 

History

History
163 lines (156 loc) · 24.8 KB

File metadata and controls

163 lines (156 loc) · 24.8 KB

Maintenance PR's Welcome Awesome

Awesome-Multimodal-Applications-In-Medical-Imaging

This repository includes resources on several applications of multi-modal learning in medical imaging.

Overview

Survey

  • [arXiv 2022] Visual Attention Methods in Deep Learning: An In-Depth Survey [pdf]
  • [arXiv 2022] Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study [pdf]
  • [arXiv 2022] Vision+X: A Survey on Multimodal Learning in the Light of Data [pdf]

Medical Report Generation

  • [NeurIPS 2021 Datasets and Benchmarks Track (Round 2)] FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark [pdf] [code]
  • [arXiv 2020] Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation [pdf]
  • [CVPR 2022] Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [pdf]
  • [EMNLP 2018] Automated Generation of Accurate & Fluent Medical X-ray Reports [pdf] [code]
  • [ACL 2018] On the Automatic Generation of Medical Imaging Reports [pdf] [code]
  • [ACL 2021] Competence-based Multimodal Curriculum Learning for Medical Report Generation [pdf]
  • [NeurIPS 2018] Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation [pdf]
  • [CVPR 2021] Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation [pdf]
  • [MICCAI 2021] AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation [pdf]
  • [NAACL-HLT 2021] Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation [pdf] [code]
  • [MICCAI 2021] RATCHET: Medical Transformer for Chest X-ray Diagnosis and Reporting [pdf][code]
  • [arXiv 2022] Attributed Abnormality Graph Embedding for Clinically Accurate X-Ray Report Generation [pdf]
  • [EMNLP 2020] Generating Radiology Reports via Memory-driven Transformer [pdf] [code]
  • [ACCV 2020] Hierarchical X-Ray Report Generation via Pathology tags and Multi Head Attention [pdf] [code]
  • [MICCAI 2021] Trust It or Not: Confidence-Guided Automatic Radiology Report Generation [pdf]
  • [MICCAI 2021] Surgical Instruction Generation with Transformers [pdf]
  • [MICCAI 2021] Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation [pdf] [code]
  • [Nature Machine Intelligence 2022] Generalized Radiograph Representation Learning via Cross-supervision between Images and Free-text Radiology Reports [pdf] [code]
  • [MICCAI 2022] A Self-Guided Framework for Radiology Report Generation [pdf]
  • [ACL-IJCNLP 2021] Cross-modal Memory Networks for Radiology Report Generation [pdf] [code]
  • [MICCAI 2022] A Medical Semantic-Assisted Transformer for Radiographic Report Generation [pdf]
  • [MIDL 2022] Representative Image Feature Extraction via Contrastive Learning Pretraining for Chest X-ray Report Generation [pdf]
  • [MICCAI 2022] RepsNet: Combining Vision with Language for Automated Medical Reports [pdf] [code]
  • [arXiv 2022] Improving Radiology Report Generation Systems by Removing Hallucinated References to Non-existent Priors [pdf]
  • [IEEE TNNLS 2022] Hybrid Reinforced Medical Report Generation with M-Linear Attention and Repetition Penalty [pdf]
  • [Medical Image Analysis 2022] CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation [pdf]
  • [arXiv 2022] Lesion Guided Explainable Few Weak-shot Medical Report Generation [pdf] [code]
  • [arXiv 2022] Self adaptive global-local feature enhancement for radiology report generation [pdf]
  • [arXiv 2022] On the Importance of Image Encoding in Automated Chest X-Ray Report Generation [pdf] [code]
  • [arXiv 2022] RoentGen: Vision-Language Foundation Model for Chest X-ray Generation [pdf]
  • [arXiv 2022] DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis [pdf] [code]
  • [arXiv 2023] Unified Chest X-ray and Radiology Report Generation Model with Multi-view Chest X-rays [pdf] [code]
  • [ECCV 2022] Cross-modal Prototype Driven Network for Radiology Report Generation [pdf] [code]
  • [WWW 2023] Auxiliary signal-guided knowledge encoder-decoder for medical report generation [pdf]
  • [CVPR 2023] Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation [pdf] [code]

Medical Visual Question Answering

  • [arXiv 2021] MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering [pdf]
  • [MICCAI 2022] Consistency-preserving Visual Question Answering in Medical Imaging [pdf] [code]
  • [TMI 2020] A Question-Centric Model for Visual Question Answering in Medical Imaging [pdf] [code]
  • [arXiv 2021] Medical Visual Question Answering: A Survey [pdf]
  • [arXiv 2022] Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer [pdf] [code]
  • [CLEF 2020 Working Notes] HCP-MIC at VQA-Med 2020: Effective visual representation for medical visual question answering [pdf] [code]
  • [CLEF 2020 Working Notes] TeamS at VQA-Med 2021: BBN-Orchestra for long-tailed medical visual question answering [pdf] [code]
  • [Nature Scientific Reports 2021] MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain [pdf]
  • [ECCV 2022] Distilled Dual-Encoder Model for Vision-Language Understanding [pdf] [code]
  • [arXiv 2022] A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering [pdf] [code]
  • [arXiv 2022] MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering [pdf]
  • [arXiv 2022] Self-supervised vision-language pretraining for Medical visual question answering [pdf] [code]
  • [arXiv 2022] UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering [pdf]
  • [arXiv 2023] Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning [pdf]
  • [arXiv 2023] Medical visual question answering using joint self-supervised learning [pdf]
  • [arXiv 2023] RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training [pdf] [code]

Vision-Language Prompt

  • [arXiv 2022] Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video [pdf]
  • [CVPR 2022] LAVT: Language-Aware Vision Transformer for Referring Image Segmentation [pdf] [code]
  • [CVPR 2022] Image Segmentation Using Text and Image Prompts [pdf] [code]
  • [IJCV 2022] Learning to Prompt for Vision-Language Models [pdf] [code]
  • [CVPR 2022] Conditional Prompt Learning for Vision-Language Models [pdf] [code]
  • [arXiv 2022] Neural Prompt Search [pdf] [code]
  • [CVPR 2022] Prompt-RSVQA: Prompting Visual Context to a Language Model for Remote Sensing Visual Question Answering [pdf]
  • [arXiv 2022] Prompt-to-Prompt Image Editing with Cross Attention Control [pdf]
  • [arXiv 2022] Prompt Tuning for Generative Multimodal Pretrained Models [pdf] [code]
  • [arXiv 2022] P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting [pdf] [code]
  • [arXiv 2022] MILAN: Masked Image Pretraining on Language Assisted Representation [pdf] [code]
  • [arXiv 2022] Class-Aware Visual Prompt Tuning for Vision-Language Pre-Trained Model [pdf]
  • [arXiv 2022] Prompt Vision Transformer for Domain Generalization [pdf]
  • [arXiv 2022] Prompt-Matched Semantic Segmentation [pdf]
  • [ECCV 2022] Learning from Unlabeled 3D Environments for Vision-and-Language Navigation [pdf] [code]
  • [arXiv 2022] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task [pdf] [code]
  • [arXiv 2022] Prompt Tuning with Soft Context Sharing for Vision-Language Models [pdf]
  • [ICLR 2023] Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models [pdf] [code]
  • [NeurIPS 2022] Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models [pdf] [code]
  • [NeurIPS 2022] M^4I: Multi-modal Models Membership Inference [pdf] [code]
  • [IJCAI 2022] Declaration-based Prompt Tuning for Visual Question Answering [pdf] [code]
  • [MICCAI 2022] Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training [pdf] [code]
  • [NeurIPS 2022] GLIPv2: Unifying Localization and VL Understanding [pdf] [code]
  • [arXiv 2022] Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study [pdf]
  • [arXiv 2022] Language-Aware Soft Prompting for Vision & Language Foundation Models [pdf]
  • [ICLR 2023] LPT: Long-tailed Prompt Tuning for Image Classification [pdf]
  • [arXiv 2022] Prompt Learning with Optimal Transport for Vision-Language Models [pdf]
  • [arXiv 2022] Variational prompt tuning improves generalization of vision-language models [pdf]
  • [arXiv 2022] MaPLe: Multi-modal Prompt Learning [pdf] [code]
  • [arXiv 2022] Learning to Decompose Visual Features with Latent Textual Prompts [pdf]
  • [arXiv 2022] Visual Prompting for Adversarial Robustness [pdf]
  • [arXiv 2022] Unified Vision and Language Prompt Learning [pdf] [code]
  • [arXiv 2022] CPL: Counterfactual Prompt Learning for Vision and Language Models [pdf]
  • [arXiv 2022] Prompting through Prototype: A Prototype-based Prompt Learning on Pretrained Vision-Language Models [pdf]
  • [arXiv 2022] Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models [pdf] [code]
  • [arXiv 2022] Multitask Vision-Language Prompt Tuning [pdf] [code]
  • [arXiv 2022] ProSFDA: Prompt Learning based Source-free Domain Adaptation for Medical Image Segmentation [pdf] [code]
  • [arXiv 2022] PromptCap: Prompt-Guided Task-Aware Image Captioning [pdf]
  • [arXiv 2022] Prompt Tuning for Parameter-efficient Medical Image Segmentation [pdf] [code]
  • [arXiv 2022] Task Residual for Tuning Vision-Language Models [pdf] [code]
  • [arXiv 2022] Texts as Images in Prompt Tuning for Multi-Label Image Recognition [pdf] [code]
  • [AAAI 2023] Controllable Image Captioning via Prompting [pdf]
  • [arXiv 2022] See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning [pdf]
  • [arXiv 2022] From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models [pdf]
  • [CVPR 2023] Position-guided Text Prompt for Vision-Language Pre-training [pdf] [code]
  • [arXiv 2022] Doubly Right Object Recognition: A Why Prompt for Visual Rationales [pdf]
  • [arXiv 2022] PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery [pdf]
  • [arXiv 2022] Unified vision and language prompt learning [pdf] [code]
  • [arXiv 2022] Prompt Vision Transformer for Domain Generalization [pdf]
  • [AAAI 2023] Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation [pdf]
  • [NeurIPS 2022] DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations [pdf] [code]
  • [ICLR 2023] Visual Classification via Description from Large Language Models [pdf]
  • [CVPR 2022] Prompt Distribution Learning [pdf]
  • [arXiv 2023] StyLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-based Domain Generalization [pdf]
  • [ICLR 2023] Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection [pdf] [code]
  • [arXiv 2022] Prompt Vision Transformer for Domain Generalization [pdf] [code]
  • [CVPR 2023] Multimodal Prompting with Missing Modalities for Visual Recognition [pdf] [code]
  • [CVPR 2023] Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering [pdf] [code]
  • [CVPR 2023] Turning a CLIP Model into a Scene Text Detector [pdf] [code]

Medical Vision-Language Model

  • [ICLR 2023] Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study [pdf] [code]
  • [EMNLP 2022] Medclip: Contrastive learning from unpaired medical images and text [pdf] [code]
  • [arXiv 2023] CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection [pdf] [code]
  • [arXiv 2023] Towards General Purpose Medical AI: Continual Learning Medical Foundation Model [pdf]
  • [NerIPS Workshop 2022] Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains [pdf]
  • [ACL 2022] ViLMedic: a framework for research at the intersection of vision and language in medical AI [pdf] [code]
  • [MICCAI 2022] Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-training [pdf] [code]
  • [IEEE Journal of Biomedical and Health Informatics 2022] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training [pdf] [code]
  • [AAAI 2022] Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation [pdf]
  • [arXiv 2022] LViT: Language meets Vision Transformer in Medical Image Segmentation [pdf] [code]
  • [arXiv 2023] Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts [pdf] [code]
  • [IEEE Journal of Biomedical and Health Informatics 2022] Vision-language transformer for interpretable pathology visual question answering [link]
  • [arXiv 2022] RoentGen: Vision-Language Foundation Model for Chest X-ray Generation [pdf]
  • [ECCV 2022] Making the most of text semantics to improve biomedical vision–language processing [pdf]
  • [arXiv 2023] Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing [pdf] [code]
  • [MICCAI 2022] Berthop: An effective vision-and-language model for chest x-ray disease diagnosis [pdf]
  • [ICLR 2023] Advancing Radiograph Representation Learning with Masked Record Modeling [pdf] [code]
  • [arXiv 2023] ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax [pdf]
  • [arXiv 2023] PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents [pdf]
  • [arXiv 2023] Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models [pdf]
  • [arXiv 2023] ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models [pdf]
  • [arXiv 2023] MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training [pdf] [project]
  • [MICCAI 2022] RepsNet: Combining Vision with Language for Automated Medical Reports [pdf] [code]
  • [CVPR 2023] Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing [pdf]
  • [NerIPS 2022] Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning [pdf] [code]