Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX post-training static quantization #362

Open
jpata opened this issue Nov 1, 2024 · 0 comments
Open

ONNX post-training static quantization #362

jpata opened this issue Nov 1, 2024 · 0 comments

Comments

@jpata
Copy link
Owner

jpata commented Nov 1, 2024

Previously in #206 we got pytorch post-training static quantization to work, but the model was not faster in inference, probably due to some missing ops on CPU/GPU in the pytorch runtime.

However, we are currently using ONNX for inference, and ONNX has its own system of quantization: https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html

We should also try to do quantization via ONNX and see if that will be faster in CMSSW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant