TF-Vision modeling library for computer vision provides a collection of
baselines and checkpoints for image classification, object detection, and
segmentation.
ResNet models trained with vanilla settings
Models are trained from scratch with batch size 4096 and 1.6 initial learning
rate.
Linear warmup is applied for the first 5 epochs.
Models trained with l2 weight regularization and ReLU activation.
Model
Resolution
Epochs
Top-1
Top-5
Download
ResNet-50
224x224
90
76.1
92.9
config
ResNet-50
224x224
200
77.1
93.5
config
ResNet-101
224x224
200
78.3
94.2
config
ResNet-152
224x224
200
78.7
94.3
config
ResNet-RS models trained with various settings
We support state-of-the-art ResNet-RS image
classification models with features:
ResNet-RS architectural changes and Swish activation. (Note that ResNet-RS
adopts ReLU activation in the paper.)
Regularization methods including Random Augment, 4e-5 weight decay, stochastic
depth, label smoothing and dropout.
New training methods including a 350-epoch schedule, cosine learning rate and
EMA.
Configs are in this directory .
Model
Resolution
Params (M)
Top-1
Top-5
Download
ResNet-RS-50
160x160
35.7
79.1
94.5
config | ckpt
ResNet-RS-101
160x160
63.7
80.2
94.9
config | ckpt
ResNet-RS-101
192x192
63.7
81.3
95.6
config | ckpt
ResNet-RS-152
192x192
86.8
81.9
95.8
config | ckpt
ResNet-RS-152
224x224
86.8
82.5
96.1
config | ckpt
ResNet-RS-152
256x256
86.8
83.1
96.3
config | ckpt
ResNet-RS-200
256x256
93.4
83.5
96.6
config | ckpt
ResNet-RS-270
256x256
130.1
83.6
96.6
config | ckpt
ResNet-RS-350
256x256
164.3
83.7
96.7
config | ckpt
ResNet-RS-350
320x320
164.3
84.2
96.9
config | ckpt
Object Detection and Instance Segmentation
Common Settings and Notes
COCO Object Detection Baselines
RetinaNet (ImageNet pretrained)
Backbone
Resolution
Epochs
FLOPs (B)
Params (M)
Box AP
Download
R50-FPN
640x640
12
97.0
34.0
34.3
config
R50-FPN
640x640
72
97.0
34.0
36.8
config | ckpt
RetinaNet (Trained from scratch) with training features including:
Stochastic depth with drop rate 0.2.
Swish activation.
Backbone
Resolution
Epochs
FLOPs (B)
Params (M)
Box AP
Download
SpineNet-49
640x640
500
85.4
28.5
44.2
config | TB.dev
SpineNet-96
1024x1024
500
265.4
43.0
48.5
config | TB.dev
SpineNet-143
1280x1280
500
524.0
67.0
50.0
config | TB.dev
Mobile-size RetinaNet (Trained from scratch):
Backbone
Resolution
Epochs
FLOPs (B)
Params (M)
Box AP
Download
MobileNetv2
256x256
600
-
2.27
23.5
config
Mobile SpineNet-49
384x384
600
1.0
2.32
28.1
config | ckpt
Instance Segmentation Baselines
Mask R-CNN (Trained from scratch)
Backbone
Resolution
Epochs
FLOPs (B)
Params (M)
Box AP
Mask AP
Download
ResNet50-FPN
640x640
350
227.7
46.3
42.3
37.6
config
SpineNet-49
640x640
350
215.7
40.8
42.6
37.9
config
SpineNet-96
1024x1024
500
315.0
55.2
48.1
42.4
config
SpineNet-143
1280x1280
500
498.8
79.2
49.3
43.4
config
Cascade RCNN-RS (Trained from scratch)
backbone
resolution
epochs
params (M)
box AP
mask AP
download
SpineNet-49
640x640
500
56.4
46.4
40.0
config
SpineNet-143
1280x1280
500
94.9
51.9
45.0
config
We support DeepLabV3 and
DeepLabV3+ architectures, with
Dilated ResNet backbones.
Backbones are pre-trained on ImageNet.
Model
Backbone
Resolution
Steps
mIoU
Download
DeepLabV3
Dilated Resnet-101
512x512
30k
78.7
DeepLabV3+
Dilated Resnet-101
512x512
30k
79.2
Model
Backbone
Resolution
Steps
mIoU
Download
DeepLabV3+
Dilated Resnet-101
1024x2048
90k
78.79
Common Settings and Notes
Kinetics-400 Action Recognition Baselines
Model
Input (frame x stride)
Top-1
Top-5
Download
SlowOnly
8 x 8
74.1
91.4
config
SlowOnly
16 x 4
75.6
92.1
config
R3D-50
32 x 2
77.0
93.0
config
R3D-RS-50
32 x 2
78.2
93.7
config
R3D-RS-101
32 x 2
79.5
94.2
-
R3D-RS-152
32 x 2
79.9
94.3
-
R3D-RS-200
32 x 2
80.4
94.4
-
R3D-RS-200
48 x 2
81.0
-
-
Kinetics-600 Action Recognition Baselines
Model
Input (frame x stride)
Top-1
Top-5
Download
SlowOnly
8 x 8
77.3
93.6
config
R3D-50
32 x 2
79.5
94.8
config
R3D-RS-200
32 x 2
83.1
-
-
R3D-RS-200
48 x 2
83.8
-
-