-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] #20010
Comments
@tianleiwu Can you please review it |
@inisis, thanks for creating a helpful tool for ONNX community. Onnx Runtime has graph optimizations during creating session. They are implemented in C++ as listed in https://github.com/microsoft/onnxruntime/blob/06fe4f31131a6873a295ba47ed60f4cb16584296/orttraining/orttraining/core/optimizer/graph_transformer_utils.cc Another is python based offline optimization tool for transformers: Related doc can be found here: For LLMs, we start using torch dynamo exporter. The fusion pattern could be different from torchscript based onnx exporter. I did a quick look at onnxslim. some fusion patterns might be able to add to C++ optimizer. That need porting some code from python to C++. |
so the reason why I wrote onnxslim is that I feel C++ based project is hard for beginners, but onnxslim is pure python, and onnxslim aims at more generalized optimization techniques but not platform targeted. I'm also working on torch dynamo exported onnx with onnxslim, hope to hear more details from you, thanks! |
Hi @tianleiwu is there any doc about torch dynamo fusion as you mentioned before |
Please take a look at model builder for LLM: LLM usually need some type of quantization and some special handling of kv-cache. That makes it difficult to export and fusion. Our current approach is to directly generate the optimized onnx graph. It only supports popular LLM models though. |
Describe the feature request
Hi, onnx and onnxruntime are great. I have built a tool named onnxslim, which can help optimize onnx model especially large language model, is there any chance that this tool can be used in onnxruntime repo. Thanks.
Describe scenario use case
example show how onnxslim can slim qwen-1.8b from alibaba
The text was updated successfully, but these errors were encountered: