You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Previously in #206 we got pytorch post-training static quantization to work, but the model was not faster in inference, probably due to some missing ops on CPU/GPU in the pytorch runtime.
Previously in #206 we got pytorch post-training static quantization to work, but the model was not faster in inference, probably due to some missing ops on CPU/GPU in the pytorch runtime.
However, we are currently using ONNX for inference, and ONNX has its own system of quantization: https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html
We should also try to do quantization via ONNX and see if that will be faster in CMSSW.
The text was updated successfully, but these errors were encountered: