- The final data structure should look like
kinetics400_30fps_frames/
├── train/
│ ├── abseiling/
│ │ ├──0347ZoDXyP0_000095_000105
│ │ │ ├──frame_00001.jpg
│ │ │ ├──...
│ │ ├──...
│ ├──...
├── val/
│ ├── abseiling/
│ │ ├──0wR5jVB-WPk
│ │ │ ├──frame_00001.jpg
│ │ │ ├──...
│ │ ├──...
│ ├──...
├── train.txt
├── val.txt
-
step 1: Download raw videos either from Academic Torrents or CVDF. Suppose the videos are stored at
~/data/kinetics400
. -
step 2: Extract frames from raw videos
We fix each video to 300 frames through three rules:
- if number of frames = 300 frames: keep it
- if number of frames > 300 frames: drop the image after the 300th frame
- if number of frames < 300 frames: duplicate the last frame until it fills up to 300 frames
To extract frames of each video, we use
ffmpeg
:sudo apt-get install ffmpeg
Follow the command below to extract frames, for example:
python ./tools/data/k400/extract_frames.py --source_dir ~/data/kinetics400/train_256 --target_dir ~/data/kinetics400_30fps_frames/train python ./tools/data/k400/extract_frames.py --source_dir ~/data/kinetics400/val_256 --target_dir ~/data/kinetics400_30fps_frames/val
- (optional) In case you want to extract frames faster in parallel, please follow
wget https://www.parallelpython.com/downloads/pp/pp-1.6.4.4.zip unzip pp-1.6.4.4.zip && cd pp-1.6.4.4 python setup.py install && cd .. python ./tools/data/k400/extract_frames_parallel.py --source_dir ~/data/kinetics400/train_256 --target_dir ~/data/kinetics400_30fps_frames/train python ./tools/data/k400/extract_frames_parallel.py --source_dir ~/data/kinetics400/val_256 --target_dir ~/data/kinetics400_30fps_frames/val
-
step 3: Download train/val split files,
cd ~/data/kinetics400_30fps_frames wget https://yzaws-data-log.s3.amazonaws.com/data/Kinetics/k400_train.txt wget https://yzaws-data-log.s3.amazonaws.com/data/Kinetics/k400_val.txt mv k400_train.txt train.txt && mv k400_val.txt val.txt
Please refer gluoncv tutorials to prepare this dataset. Suppose the videos are stored at ~/data/ucf101
.
- The final data structure should look like
ucf101/
├── rawframes/
│ ├── ApplyEyeMakeup/
│ │ ├──v_ApplyEyeMakeup_g01_c01/
│ │ │ ├──img_00001.jpg
│ │ │ ├──...
│ │ ├──...
│ ├──...
├── annotations/
│ ├── classInd.txt
│ ├── testlist01.txt
│ ├── testlist02.txt
│ ├── testlist03.txt
│ ├── trainlist01.txt
│ ├── trainlist02.txt
│ ├── trainlist03.txt
├── ucf101_train_split_1_rawframes.txt
├── ucf101_train_split_2_rawframes.txt
├── ucf101_train_split_3_rawframes.txt
├── ucf101_val_split_1_rawframes.txt
├── ucf101_val_split_2_rawframes.txt
├── ucf101_val_split_3_rawframes.txt
Please refer gluoncv tutorials to prepare this dataset. Suppose the videos are stored at ~/data/hmdb51
.
- The final data structure should look like
hmdb51/
├── rawframes/
│ ├── brush_hair/
│ │ ├──April_09_brush_hair_u_nm_np1_ba_goo_0/
│ │ │ ├──img_00001.jpg
│ │ │ ├──...
│ │ ├──...
│ ├──...
├── annotations/
│ ├── classInd.txt
│ ├── brush_hair_test_split1.txt
│ ├── ...
├── hmdb51_train_split_1_rawframes.txt
├── hmdb51_train_split_2_rawframes.txt
├── hmdb51_train_split_3_rawframes.txt
├── hmdb51_val_split_1_rawframes.txt
├── hmdb51_val_split_2_rawframes.txt
├── hmdb51_val_split_3_rawframes.txt
Please refer gluoncv tutorials to prepare this dataset. Suppose the videos are stored at ~/data/sthv2
.
- The final data structure should look like
sthv2/
├── rawframes/
│ ├──...
│ ├── 14876/
│ │ ├──img_00001.jpg
│ │ ├──...
│ ├──...
├── annotations/
│ ├── something-something-v2-labels.json
│ ├── something-something-v2-test.json
│ ├── something-something-v2-train.json
│ ├── something-something-v2-validation.json
├── sthv2_train_list_rawframes.txt
├── sthv2_val_list_rawframes.txt
- The final data structure should look like
ActivityNet/
├── videos/
│ ├──...
│ ├── 00018--lj-VovhJcPA.mp4
│ ├── 00018--lk2niPrG3y8.webm
│ ├──...
├── rawframes/
│ ├──...
│ ├── 00015--lnHdEtuXU8w/
│ │ ├──img_00000.jpg
│ │ ├──...
│ ├──...
├── anet_anno_train.json
├── anet_anno_val.json
├── anet_train_clip.txt
├── anet_train_video.txt
├── anet_val_clip.txt
├── anet_val_video.txt
-
step 1: Download raw videos using official crawler. Suppose the videos are stored at
~/data/ActivityNet
. -
step 2: Extract frames
-
install denseflow: please refer to mmaction2 tutorials install.md to install denseflow
- add environment variables
echo 'export ZZROOT=$HOME/app' >> ~/.bashrc echo 'export PATH=$ZZROOT/bin:$PATH' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=$ZZROOT/lib:$ZZROOT/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
- install denseflow by https://github.com/innerlee/setup, please make sure each shell scripts could be executed correctly (maybe you lack some dependencies).
cd ~ git clone https://github.com/innerlee/setup.git cd setup sudo apt-get install autoconf cmake yasm ./zznasm.sh ./zzyasm.sh ./zzlibx264.sh ./zzlibx265.sh ./zzlibvpx.sh ./zzffmpeg.sh ./zzopencv.sh export OpenCV_DIR=$ZZROOT ./zzboost.sh ./zzdenseflow.sh
- add environment variables
-
extract frames
- make sure you install mmaction2, refer to README.md
- use bash script below to extract frames
Note, you need to change the raw videos path and the target frames path in
cd mmaction2/tools/data/activitynet/ bash extract_rgb_frames.sh
extract_rgb_frames.sh
file accordingly.
-
-
step 3: Prepare annotation files
cd ~/data/ActivityNet wget https://yzaws-data-log.s3.amazonaws.com/data/activitynet/anet_anno.zip unzip anet_anno.zip