- AVA ActiveSpeaker Dataset
- MVVA-Database(本身是用来做Salient Faces的,因为有talk face annotation也可用来做ASD)
- Talkies
- Columnbia
audio-visual-active-speaker-detection-on-ava
- ASD-TRANSFORMER: EFFICIENT ACTIVE SPEAKER DETECTION USING SELF AND MULTIMODAL TRANSFORMERS
- End-to-End Active Speaker Detection
- Look&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement
- Extended UniCon
- UniCon: Unified Context Network for Robust Active Speaker Detection
- Learning Spatial-Temporal Graphs for Active Speaker Detection
- Is Someone Speaking?: Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection 2021 ACM MM code
- How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild 2021 code
- MAAS: Multi-modal Assignation for Active Speaker Detection 2021 IEEE/CVF International Conference on Computer Vision (ICCV) code
- Multi-Task Learning for Audio-Visual Active Speaker Detection
- Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion