Skip to content

[NeurIPS 2023] Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective

Notifications You must be signed in to change notification settings

ldkong1205/TranSVAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

88 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


Unsupervised Video Domain Adaptation for Action Recognition:
A Disentanglement Perspective

Pengfei Wei1Β Β  Lingdong Kong1,2Β Β  Xinghua Qu1Β Β  Yi Ren1Β Β  Zhiqiang Xu3Β Β  Jing Jiang4Β Β  Xiang Yin1
1ByteDance AI LabΒ Β  2National University of SingaporeΒ Β  3MBZUAIΒ Β  4University of Technology Sydney

NeurIPS 2023

About

TranSVAE is a disentanglement framework designed for unsupervised video domain adaptation. It aims at disentangling the domain information from the data during the adaptation process. We consider the generation of cross-domain videos from two sets of latent factors: one encoding the static domain-related information and another encoding the temporal and semantic-related information. Objectives are enforced to constrain these latent factors to achieve domain disentanglement and transfer.



Col1: Original sequences ("Human" $\mathcal{D}=\mathbf{P}_1$ and "Alien" $\mathcal{D}=\mathbf{P}_2$); Col2: Sequence reconstructions; Col3: Reconstructed sequences using $z_1^{\mathcal{D}},...,z_T^{\mathcal{D}}$; Col4: Domain transferred sequences with exchanged $z_d^{\mathcal{D}}$.


Visit our project page to explore more details. 🐾

Updates

  • [2023.10] - We provide our extracted I3D features, kindly refer to this page for more details.
  • [2023.09] - TranSVAE was accepted to NeurIPS 2023! πŸŽ‰
  • [2022.08] - TranSVAE achieves 1st place among the UDA leaderboards of UCF-HMDB, Jester, and Epic-Kitchens, based on Paper-with-Code.
  • [2022.08] - Try a Gradio demo for domain disentanglement in TranSVAE at Hugging Face Spaces! πŸ€—
  • [2022.08] - Our paper is available on arXiv, click here to check it out!

Outline

Highlights

Conceptual Comparison
Graphical Model
Framework Overview

Installation

Please refer to INSTALL.md for the installation details.

Data Preparation

Please refer to DATA_PREPARE.md for the details to prepare the 1UCF101, 2HMDB51, 3Jester, 4Epic-Kitchens, and 5Sprites datasets.

Getting Started

Please refer to GET_STARTED.md to learn more usage about this codebase.

Main Results

UCF101 - HMDB51

PWC

Method Backbone U101 β†’ H51 H51 β†’ U101 Average
DANN (JMLR'16) ResNet-101 75.28 76.36 75.82
JAN (ICML'17) ResNet-101 74.72 76.69 75.71
AdaBN (PR'18) ResNet-101 72.22 77.41 74.82
MCD (CVPR'18) ResNet-101 73.89 79.34 76.62
TA3N (ICCV'19) ResNet-101 78.33 81.79 80.06
ABG (MM'20) ResNet-101 79.17 85.11 82.14
TCoN (AAAI'20) ResNet-101 87.22 89.14 88.18
MA2L-TD (WACV'22) ResNet-101 85.00 86.59 85.80
Source-only I3D 80.27 88.79 84.53
DANN (JMLR'16) I3D 80.83 88.09 84.46
ADDA (CVPR'17) I3D 79.17 88.44 83.81
TA3N (ICCV'19) I3D 81.38 90.54 85.96
SAVA (ECCV'20) I3D 82.22 91.24 86.73
CoMix (NeurIPS'21) I3D 86.66 93.87 90.22
CO2A (WACV'22) I3D 87.78 95.79 91.79
TranSVAE (Ours) I3D 87.78 98.95 93.37
Oracle I3D 95.00 96.85 95.93

Jester

PWC

Task Source-only DANN ADDA TA3N CoMix TranSVAE (Ours) Oracle
JS β†’ JT 51.5 55.4 52.3 55.5 64.7 66.1 95.6

Epic-Kitchens

PWC

Task Source-only DANN ADDA TA3N CoMix TranSVAE (Ours) Oracle
D1 β†’ D2 32.8 37.7 35.4 34.2 42.9 50.5 64.0
D1 β†’ D3 34.1 36.6 34.9 37.4 40.9 50.3 63.7
D2 β†’ D1 35.4 38.3 36.3 40.9 38.6 50.3 57.0
D2 β†’ D3 39.1 41.9 40.8 42.8 45.2 58.6 63.7
D3 β†’ D1 34.6 38.8 36.1 39.9 42.3 48.0 57.0
D3 β†’ D2 35.8 42.1 41.4 44.2 49.2 58.0 64.0
Average 35.3 39.2 37.4 39.9 43.2 52.6 61.5

Ablation Study

UCF101 β†’ HMDB51

HMDB51 β†’ UCF101

Domain Transfer Example

Source (Original) Target (Original) Source (Original) Target (Original)
src_original tar_original src_original tar_original
Reconstruct ($\mathbf{z}_d^{\mathcal{S}}$ + $\mathbf{z}_t^{\mathcal{S}}$) Reconstruct ($\mathbf{z}_d^{\mathcal{T}}$ + $\mathbf{z}_t^{\mathcal{T}}$) Reconstruct ($\mathbf{z}_d^{\mathcal{S}}$ + $\mathbf{z}_t^{\mathcal{S}}$) Reconstruct ($\mathbf{z}_d^{\mathcal{T}}$ + $\mathbf{z}_t^{\mathcal{T}}$)
src_recon tar_recon src_recon tar_recon
Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{0}$) Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{0}$) Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{0}$) Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{0}$)
recon_srcZf recon_tarZf recon_srcZf recon_tarZf
Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{S}}$) Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{T}}$) Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{S}}$) Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{T}}$)
recon_srcZt recon_tarZt recon_srcZt recon_tarZt
Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{z}_t^{\mathcal{T}}$) Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{z}_t^{\mathcal{S}}$) Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{z}_t^{\mathcal{T}}$) Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{z}_t^{\mathcal{S}}$)
recon_srcZf_tarZt recon_tarZf_srcZt recon_srcZf_tarZt recon_tarZf_srcZt

TODO List

  • Initial release. πŸš€
  • Add license. See here for more details.
  • Add demo at Hugging Face Spaces.
  • Add installation details.
  • Add data preparation details.
  • Add evaluation details.
  • Add training details.

License

Creative Commons License
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Acknowledgement

We acknowledge the use of the following public resources during the course of this work: 1UCF101, 2HMDB51, 3Jester, 4Epic-Kitchens, 5Sprites, 6I3D, and 7TRN.

Citation

If you find this work helpful, please kindly consider citing our paper:

@inproceedings{wei2023transvae,
  title = {Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective},
  author = {Wei, Pengfei and Kong, Lingdong and Qu, Xinghua and Ren, Yi and Xu, Zhiqiang and Jiang, Jing and Yin, Xiang},
  booktitle = {Advances in Neural Information Processing Systems}, 
  year = {2023},
}

About

[NeurIPS 2023] Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published