-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
onnxruntime-web is 11-17x times slower than native inference #11181
Comments
Hello community, any help on this? Thanks a lot |
For a single function, it is true that webassembly could give you near-native speed on web. |
(Add) Current ort-web implementation consists of:
I think 2-3 should be merged into wasm part in order to get more performance, but there will be possible issues like:
|
same problem, I want to know your final decision or solution, please |
tfjs demo may has bug, if you select multi-thread, then the speed will be very slow, and can't recover even unselect it. I wonder your wasm test result may be not correct, tfjs used webgl because of bug |
Hey @CanyonWind , Thanks |
I am also running into this issue: I find that onnxruntime-web is ~10x slower for inference than onnxruntime-node and onnxruntime in python (which both have comparable performance) when using the same model and input data. The web profiler indicates that all of the time consumed during the onnxruntime-web inference is in wasm functions. This issue appears to have been around for a while; is it simply an accepted performance limitation when using onnxruntime-web? It's not obvious why the performance of the onnxruntime-web implementation should be so slow, particularly compared to the node implementaton. |
Any progress on this? |
I am trying to run web inference on a transformer model, with the modules separately exported to onnx. I can confirm that this issue still exists for onnx runtime web. Web inference is much slower compared to Python inference using the same onnx modules. |
@kabyanil There are multiple reasons that may cause perf diff with the native, and it's case by case. |
Describe the bug
Hi, in the onnxruntime-web blog, it claims near-native speed on the web. I tested mobilenetv2 as a benchmark and our own panoptic segmentation model as well. It runs 11 and 17 times slower than native inference for mobilenet v2 and our model. Wonder whether this is expected or if some inference configs are messed up on our side?
Just for reference,
tensorflow-js
with SIMD and multi-thread enabled runs 12ms for mobilenetv2, andonnxruntime-web
takes about 45ms. Native inference with onnxruntime takes 4ms on my 2019 MacBook pro.We would like to use onnxruntime-web as the inference engine because of the easy portability for our existing onnx models. But the speed difference between tf-js is quite significant. Help would be appreciated.
Urgency
High
System information
To Reproduce
For mobilenetv2:
For our model:
The text was updated successfully, but these errors were encountered: