model quantization #7

iishiishii · 2024-05-07T01:24:50Z

LiteMedSAM encoder op:
{'Slice', 'Softmax', 'Pad', 'Erf', 'Cast', 'MatMul', 'Constant', 'Sub', 'Mul', 'Pow', 'Concat', 'Reshape', 'Div', 'Transpose', 'Split', 'Conv', 'LayerNormalization', 'ConstantOfShape', 'Add', 'Shape', 'Sqrt', 'ReduceMean'}

LiteMedSAM decoder ops:
{'Cos', 'Slice', 'Softmax', 'Gather', 'Gemm', 'Cast', 'Erf', 'MatMul', 'Not', 'Expand', 'Relu', 'Where', 'Constant', 'Sub', 'Resize', 'Mul', 'Reciprocal', 'Pow', 'Concat', 'Reshape', 'Unsqueeze', 'OneHot', 'ArgMax', 'Floor', 'Div', 'Transpose', 'Range', 'Flatten', 'Tile', 'ConvTranspose', 'Conv', 'LayerNormalization', 'ReduceMax', 'ConstantOfShape', 'Add', 'Shape', 'Equal', 'Sqrt', 'ReduceMean', 'Sin'}

Based on literature, MatMul, Conv, LayerNormalization, GEMM are the most computationally intensive operations. It might be worth profiling them during inference process.

To keep the accuracy, I set reduce_range=True to avoid large accuracy drop.

@nanthan987

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model quantization #7

model quantization #7

iishiishii commented May 7, 2024

model quantization #7

model quantization #7

Comments

iishiishii commented May 7, 2024