This section provides a summary of representative state-of-the-art (SOTA) algorithms for several Visual-Linguistic Reasoning task, such as visual question answering (VQA) and medical report generation tasks. All algorithms have been implemented using PyTorch. The CausalVLR library will be continuously updated in the coming years. In this section, we will provide a concise introduction to the selected algorithms.
- v0.0.2 was released in 8/19/2023
- Support CaCo-CoT for Faithful Reasoning task in LLMs
- v0.0.1 was released in 6/30/2023
- Support VLCI for Medical Report Generation task
- Support CAMDA (T-PAMI 2023) for Causality-Aware Medical Diagnosis task
- Support CMCIR (T-PAMI 2023) for Event-Level Visual Question Answering task
- Support VCSR (ACM MM 2023) for Video Question Answering task
- Support Robust Fine-tuning (CVPR 2023) for Model Generalization and Robustness
Task | Model | Benchmark |
---|---|---|
Medical Report Generation | VLCI | IU-Xray, MIMIC-CXR |
VQA | CMCIR | SUTD-TrafficQA, TGIF-QA, MSVD-QA, MSRVTT-QA |
Visual Causal Scene Discovery | VCSR | NExT-QA, Causal-VidQA, and MSRVTT-QA |
Model Generalization and Robustness | Robust Fine-tuning | ImageNet-V2, ImageNet-R, ImageNet-Sketch, ObjectNet, ImageNet-A |
Causality-Aware Medical Diagnosis | CAMDA | MuZhi, DingXiang |
Faithful Reasoning in LLMs | CaCo-CoT | ScienceQA, Com2Sense, BoolQ |