Skip to content

key-cc/master_thesis

Repository files navigation

Master Thesis: Action Recognition in Video

This repo will serve as a summary of the code in my master thesis. The video action recognition model includes ConvLSTM, P3D, ARTNet, Res3D, Res21D. I will mainly use the UCF-101 dataset, HMDB51 dataset and HockeyFight dataset.

master_thesis
├── dataset.py
├── log.py
├── models.py
├── test.py
├── train.py
├── test_on_video.py
├── code_HockeyFight
├── code_UCF101_HMDB51

Dataset UCF101

@article{Soomro2012UCF101AD,
  title={UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild},
  author={K. Soomro and A. Zamir and M. Shah},
  journal={ArXiv},
  year={2012},
  volume={abs/1212.0402}
}

Dataset HMDB51

@article{Kuehne2011HMDBAL,
  title={HMDB: A large video database for human motion recognition},
  author={Hilde Kuehne and Hueihan Jhuang and E. Garrote and T. Poggio and Thomas Serre},
  journal={2011 International Conference on Computer Vision},
  year={2011},
  pages={2556-2563}
}

Dataset HockeyFights

@inproceedings{Nievas2011ViolenceDI,
  title={Violence Detection in Video Using Computer Vision Techniques},
  author={Enrique Bermejo Nievas and Oscar D{\'e}niz-Su{\'a}rez and Gloria Bueno Garc{\'i}a and Rahul Sukthankar},
  booktitle={CAIP},
  year={2011}
}

Setup

cd data/              
bash download_ucf101.sh     # Downloads the UCF-101 dataset (~7.2 GB)
unrar x UCF101.rar          # Unrars dataset
unzip UCF101TrainTestSplits-RecognitionTask.zip  # Unzip train / test split
python3 extract_frames.py   # Extracts frames from the video (~26.2 GB, go grab a coffee for this)

Test on Video

$ python3 test_on_video.py  --video_path data/UCF-101/SoccerPenalty/v_SoccerPenalty_g01_c01.avi \
                            --checkpoint_model model_checkpoints/ConvLSTM_150.pth

Results

UCF101

Model Parameters acc
C3D (pretrained) 78.00M 77.58
biLSTM + Attention (with dropout) 74M 74.46
VTN (pretrained) 25.54M 86.09
Divided Space-Time Attention (T+S) 121.34M 93.11
Joint Space-Time Attention (ST) 85.88M 91.83
Space Attention Attention (S) 85.88M 91.36
Swin-T 49.59M 92.60

An example of action recognition on UCF101

HMDB51

Model Parameters acc
C3D (pretrained) 78.20M 67.60
biLSTM + Attention 73.95M 62.46
VTN (pretrained) 28.67M 60.25
Divided Space-Time Attention (T+S) 121.3M 66.08
Joint Space-Time Attention (ST) 85.84M 64.25
Space Attention Attention (S) 85.84M 65.49
Swin-T 49.55M 66.25

An example of action recognition on HMDB51

HockeyFights

Model Parameters acc
C3D 78M 93.50
biLSTM + Attention 26.41M 95.50
ARTNet 20.15M 98.00
Divided Space-Time Attention (T+S) 121.27M 93.50
Joint Space-Time Attention (ST) 85.81M 92.00
Space Attention Attention (S) 85.80M 91.50

An example of action recognition on HockeyFights

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published