From 6e0f7f66823b120e12ec77b6131e759f9e951979 Mon Sep 17 00:00:00 2001 From: innat Date: Fri, 6 Oct 2023 11:31:13 +0600 Subject: [PATCH] update:model zoo info --- MODEL_ZOO.md | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/MODEL_ZOO.md b/MODEL_ZOO.md index fe8f014..ded7442 100644 --- a/MODEL_ZOO.md +++ b/MODEL_ZOO.md @@ -10,15 +10,23 @@ The official results of torch VideoMAE finetuned with I3D dense sampling on Kinetics400 and TSN uniform sampling on Something-Something V2, respectively. +The name of the weight is followed as `model_{size}_{input_frame}x{input_size}_{mode}`, where `size` can be `S/B/L/H`, and `input_frame` for videomae is `16` for both `Fine Tuned (FT)` and `Pre Trained (PT)` models. + +``` +TFVideoMAE_B_16x224_FT +TFVideoMAE_B_16x224_PT +``` + ### Kinetics-400 -For Kinetrics-400, VideoMAE is trained around **1600** epoch without **any extra data**. +For Kinetrics-400, VideoMAE is trained around **1600** epoch without **any extra data**. The following checkpoints are available in both tensorflow [`SavedModel`](https://www.tensorflow.org/guide/saved_model) and [`h5`](https://keras.io/api/saving/weights_saving_and_loading/#save_weights-method) format. -| Backbone | \#Frame | Pre-train | Fine-tune | Top-1 | Top-5 | - | :------: | :-----: | :----------------------------------------------------------: | :----------------------------------------------------------: | :---: | :---: | - ViT-S | 16x5x3 | checkpoint | checkpoint | 79.0 | 93.8 | - ViT-B | 16x5x3 | checkpoint | checkpoint | 81.5 | 95.1 | - ViT-L | 16x5x3 | checkpoint | checkpoint | 85.2 | 96.8 | + +| Backbone | \#Frame | Pre-train | Fine-tune | Top-1 | Top-5 | + | :--: | :--: | :--: | :--: | :---: | :---: | + ViT-S | 16x5x3 | [savedmodel]() | [h5](https://github.com/innat/VideoMAE/releases/download/v1.0/TFVideoMAE_S_16x224_FT.h5) | 79.0 | 93.8 | + ViT-B | 16x5x3 | [savedmodel]() | [h5](https://github.com/innat/VideoMAE/releases/download/v1.0/TFVideoMAE_B_16x224_FT.h5) | 81.5 | 95.1 | + ViT-L | 16x5x3 | checkpoint | [h5](https://github.com/innat/VideoMAE/releases/download/v1.0/TFVideoMAE_L_16x224_FT.h5) | 85.2 | 96.8 | ViT-H | 16x5x3 | checkpoint | checkpoint | 86.6 | 97.1 |