数据集:ImageNet
模型: Resnet-50
评价指标:准确率
模型 Resnet-50 |
GPU个数/每个节点 | Batch Size/ 每个节点 |
Samples/s | Top_1 | Top_5 | 推理 加速 |
---|---|---|---|---|---|---|
Oneflow未量化 | 1 | 256 | 482.22 | 0.7732 | 0.9357 | 1.0x |
1 | 350(max) | 483.12 | 0.7732 | 0.9357 | 1.0x | |
TensorRT Online int 8 Calibration |
1 | 256 | 1357.99 | 0.7731 | 0.9356 | 2.8x |
TensorRT Offline int 8 Calibration |
1 | 256 | 1319.04 | 0.7721 | 0.9347 | 2.7x |
1 | 350 | 1443.31 | 0.7722 | 0.9348 | 3.0x | |
TensorRT FP32 |
1 | 256 | 780.61 | 0.7731 | 0.9356 | 1.6x |
1 | 350 | 785.00 | 0.7732 | 0.9357 | 1.6x |
数据集:Cifar10
模型:Alexnet、Lenet
设置:剪枝率为0.5、0.7
模型 - 剪枝算子 | 测试次数 | Acc | 剪枝率 | 压缩比例 | 推理耗时samples/s |
---|---|---|---|---|---|
Alexnet - 无剪枝 | 5 | 94.89% | - | 1x | 5409 |
Alexnet - bn | 5 | 98.81% | 50% | 1.4x | 5968 |
Alexnet - conv_all | 5 | 93.95% | 50% | 1.3x | 5969 |
Alexnet - conv_avg | 5 | 98.56% | 50% | 1.3x | 5865 |
Alexnet - conv_max | 5 | 97.44% | 50% | 1.3x | 5555 |
Alexnet - random | 5 | 97.32% | 50% | 1.3x | 5580 |
Alexnet -conv_threshold | 5 | 98.03% | 50% | x1.3x | 5567 |
Lenet - 无剪枝 | 5 | 75.72% | - | 1x | 5821 |
Lenet - bn | 5 | 64.89% | 70% | 3x | 1923 |
数据集:SST-2
环境:单卡2080Ti
设置:BERT类模型最大序列长度设为128,LSTM类模型最大序列长度设为32,词表大小为10000
模型 | 测试次数 | Acc | 层数 | 隐藏层维度/前馈层维度 | 模型尺寸 | 压缩比例 | 推理耗时 | 推理加速 |
---|---|---|---|---|---|---|---|---|
BERT_base(Teacher) | 5 | 92.2% | 12 | 768/3072 | 110M | 1x | 4.04s | 1x |
KD | 5 | 80.5% | 3 | 312/1200 | 14.5M | 7.5x | 0.81s | 5.0x |
BiLSTM | 5 | 80.4% | 1 | 300/400 | 15.3M | 7.2x | 0.83s | 4.8x |
Distilled-BiLSTM | 5 | 82.9% | 1 | 300/400 | 15.3M | 7.2x | 0.83s | 4.8x |
BERT-PKD(from scratch) | 5 | 81.5% | 3 | 768/3072 | 45.7M | 2.4x | 1.69s | 2.4x |
BERT-PKD | 5 | 88.4% | 3 | 768/3072 | 45.7M | 2.4x | 1.69s | 2.4x |
TinyBERT | 5 | 91.3% | 4 | 312/1200 | 14.5M | 7.5x | 0.65s | 6.2x |
BERT-of-Theseus | 5 | 87.2% | 4 | 768/3072 | 53.7M | 2.05x | 2.05s | 2.0x |
注:层数不包含embedding和prediction层。