Call onnx-rewritter when possible in onnxruntime.InferenceSession #19348

wschin · 2024-01-31T09:01:28Z

No description provided.

onnxruntime/python/onnxruntime_inference_collection.py

+try:
+    from onnxrewriter.rewriter.transformers import rewrite
+    from onnxrewriter.optimizer import optimize
+except:


onnxruntime/python/onnxruntime_inference_collection.py

+                onnx_model = rewrite_and_optimize_model_bytes(self._model_bytes)
+                sess = C.InferenceSession(session_options, onnx_model, False, self._read_config_from_model)
+            else:
+                sess = C.InferenceSession(session_options, onnx_model.SerializeToString(), False, self._read_config_from_model)


onnxruntime/python/onnxruntime_inference_collection.py

+try:
+    from onnxrewriter.rewriter.transformers import rewrite
+    from onnxrewriter.optimizer import optimize
+except:


onnxruntime/python/onnxruntime_inference_collection.py

+def rewrite_and_optimize_model_bytes(model):
+    assert HAS_ONNX_REWRITTER
+    onnx_model = onnx.ModelProto()
+    onnx_model.ParseFromString(self._model_bytes)


onnxruntime/python/onnxruntime_inference_collection.py

+
+def rewrite_and_optimize_model_path(model_path):
+    assert HAS_ONNX_REWRITTER
+    onnx_model = onnx.load(self._model_path)


tianleiwu · 2024-01-31T16:10:44Z

onnxruntime/python/onnxruntime_inference_collection.py

+HAS_ONNX_REWRITTER = True
+try:
+    from onnxrewriter.rewriter.transformers import rewrite
+    from onnxrewriter.optimizer import optimize


I could see a few problems:
(1) onnxrewriter is not available in other language C/C++/Nuget API. It make the result inconsistent across different language.
(2) If onnxrewriter has a bug, you have to uninstall it since no explicit option to disable it.
(3) It might not work well with large model (>2GB) since .SerializeToString() is used.

If onnxrewriter is generic enough, why not implement it inside onnxruntime with C++?

I am ok with whatever way onnx-rewriter should be called. I just use the only available way I can unblock llama with onnxrt dynamo backend (without this change, ORT is many times slower than inductor, and we will get no market share from PyTorch 2 features). I can add a flag to turn it on/off. Does it make sense?

For how this thing should ultimately implemented, please talk with @thiagocrepaldi. I guess exporter (i.e., onnxscript) will eventually include this optimization pass after it's matured.

I've discussed this with Thiago. The current plan is that onnx-rewriter will be invoked by onnxscript and will be part of the exporter workflow. Having a flag is again not consistent between different language bindings. We don't want users to think that the Python bindings can give better perf with a flag, but that flag is not available for others. Can the llama model be unblocked by calling the rewriter separately?

Nop. The only alternating solution I have in mind is to support custom post-processing pass in InferenceSession or DORT.

pranavsharma · 2024-01-31T17:39:08Z

We don't want onnx-rewriter to be called from within InferenceSession. This leads to inconsistencies between different language bindings. This script should be invoked as part of the exporter workflow and enabled with an optional parameter that indicates the ORT version since the rewriter will be tied to the ORT ver (due to the fusions and the availability of the relevant ops in that ver of ORT).

pranavsharma

Please see my comment.

Call onnx-rewritter when possible

9ac807b

github-advanced-security bot found potential problems Jan 31, 2024

View reviewed changes

tianleiwu reviewed Jan 31, 2024

View reviewed changes

pranavsharma suggested changes Jan 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call onnx-rewritter when possible in onnxruntime.InferenceSession #19348

Call onnx-rewritter when possible in onnxruntime.InferenceSession #19348

wschin commented Jan 31, 2024

tianleiwu Jan 31, 2024 •

edited

Loading

wschin Jan 31, 2024

pranavsharma Jan 31, 2024

wschin Feb 2, 2024 •

edited

Loading

pranavsharma commented Jan 31, 2024

pranavsharma left a comment

Call onnx-rewritter when possible in onnxruntime.InferenceSession #19348

Are you sure you want to change the base?

Call onnx-rewritter when possible in onnxruntime.InferenceSession #19348

Conversation

wschin commented Jan 31, 2024

tianleiwu Jan 31, 2024 • edited Loading

Choose a reason for hiding this comment

wschin Jan 31, 2024

Choose a reason for hiding this comment

pranavsharma Jan 31, 2024

Choose a reason for hiding this comment

wschin Feb 2, 2024 • edited Loading

Choose a reason for hiding this comment

pranavsharma commented Jan 31, 2024

pranavsharma left a comment

Choose a reason for hiding this comment

tianleiwu Jan 31, 2024 •

edited

Loading

wschin Feb 2, 2024 •

edited

Loading