update:model zoo info

innat · Oct 6, 2023 · 6e0f7f6 · 6e0f7f6
1 parent 5a6b500
commit 6e0f7f6
Showing 1 changed file with 14 additions and 6 deletions.
diff --git a/MODEL_ZOO.md b/MODEL_ZOO.md
@@ -10,15 +10,23 @@
 The official results of torch VideoMAE finetuned with I3D dense sampling on Kinetics400 and TSN uniform sampling on Something-Something V2, respectively.
 
 
+The name of the weight is followed as `model_{size}_{input_frame}x{input_size}_{mode}`, where `size` can be `S/B/L/H`, and `input_frame` for videomae is `16` for both `Fine Tuned (FT)` and `Pre Trained (PT)` models.
+
+```
+TFVideoMAE_B_16x224_FT
+TFVideoMAE_B_16x224_PT
+```
+
 ### Kinetics-400
 
-For Kinetrics-400, VideoMAE is trained around **1600** epoch without **any extra data**.
+For Kinetrics-400, VideoMAE is trained around **1600** epoch without **any extra data**. The following checkpoints are available in both tensorflow [`SavedModel`](https://www.tensorflow.org/guide/saved_model) and [`h5`](https://keras.io/api/saving/weights_saving_and_loading/#save_weights-method) format.
 
-| Backbone | \#Frame |                          Pre-train                           |                          Fine-tune                           | Top-1 | Top-5 |
- | :------: | :-----: | :----------------------------------------------------------: | :----------------------------------------------------------: | :---: | :---: |
-  ViT-S    | 16x5x3  | checkpoint | checkpoint | 79.0 | 93.8   |
-  ViT-B    | 16x5x3  | checkpoint | checkpoint | 81.5  | 95.1  |
-  ViT-L    | 16x5x3  | checkpoint | checkpoint | 85.2  | 96.8  |
+
+| Backbone | \#Frame | Pre-train | Fine-tune | Top-1 | Top-5 |
+ | :--: | :--: | :--: | :--: | :---: | :---: |
+  ViT-S    | 16x5x3  | [savedmodel]() | [h5](https://github.com/innat/VideoMAE/releases/download/v1.0/TFVideoMAE_S_16x224_FT.h5) | 79.0 | 93.8   |
+  ViT-B    | 16x5x3  | [savedmodel]() | [h5](https://github.com/innat/VideoMAE/releases/download/v1.0/TFVideoMAE_B_16x224_FT.h5) | 81.5  | 95.1  |
+  ViT-L    | 16x5x3  | checkpoint | [h5](https://github.com/innat/VideoMAE/releases/download/v1.0/TFVideoMAE_L_16x224_FT.h5) | 85.2  | 96.8  |
   ViT-H    | 16x5x3  | checkpoint | checkpoint | 86.6 | 97.1   |