Papers, Datasets, Codes about Multimodality
- VLMO: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts Wenhui Wang, Hangbo Bao, Li Dong, Furu Wei [pdf]
- Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts Yan Zeng, Xinsong Zhang, Hang Li [pdf]
- Masked Autoencoders Are Scalable Vision Learners Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick [pdf]
- Multi-Modal Open-Domain Dialogue Kurt Shuster, Eric Michael Smith, Da Ju, Jason Weston [pdf]
- Multimodal Dialogue Response Generation Qingfeng Sun, Yujing Wang, Can Xu, Kai Zheng, Yaming Yang, Huang Hu, Fei Xu, Jessica Zhang, Xiubo Geng, Daxin Jiang [pdf]
- Reason first, then respond:Modular Generation for Knowledge-infused Dialogue Leonard Adolphs, Kurt Shuster, Jack Urbanek, Arthur Szlam, Jason Weston [pdf]
- CPT: COLORFUL PROMPT TUNING FOR PRE-TRAINED VISION-LANGUAGE MODELS Yuan Yao, Ao Zhang, Zhengyan Zhang, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun [pdf]
- Multimodal Few-Shot Learning with Frozen Language Models Maria Tsimpoukelli, Jacob Menick, Serkan Cabi, S. M. Ali Eslami, Oriol Vinyals, Felix Hill [pdf]