onnxruntime-web is 11-17x times slower than native inference #11181

CanyonWind · 2022-04-12T00:22:24Z

Describe the bug
Hi, in the onnxruntime-web blog, it claims near-native speed on the web. I tested mobilenetv2 as a benchmark and our own panoptic segmentation model as well. It runs 11 and 17 times slower than native inference for mobilenet v2 and our model. Wonder whether this is expected or if some inference configs are messed up on our side?

Just for reference, tensorflow-js with SIMD and multi-thread enabled runs 12ms for mobilenetv2, and onnxruntime-web takes about 45ms. Native inference with onnxruntime takes 4ms on my 2019 MacBook pro.

We would like to use onnxruntime-web as the inference engine because of the easy portability for our existing onnx models. But the speed difference between tf-js is quite significant. Help would be appreciated.

Urgency
High

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 12.0.1
ONNX Runtime web installed from (source or binary): binary from npm
ONNX Runtime web version: 1.11 (latest from https://www.npmjs.com/package/onnxruntime-web)

To Reproduce
For mobilenetv2:

onnxruntime-web: used this repo, with the latest onnxruntime-web version
tf-js: used this demo

For our model:

cannot share because of confidentiality.

The text was updated successfully, but these errors were encountered:

CanyonWind · 2022-04-13T14:41:45Z

Hello community, any help on this? Thanks a lot

ncianeo · 2022-04-14T07:48:38Z

For a single function, it is true that webassembly could give you near-native speed on web.
However, the inference processes of some DNN models call sequences of a number of functions corresponding to the layers that compose the network. Since Javascript function call overheads are not neglectable, there should be performance issue in whole inference time of the network.
You may profile your JS inference using devtools (Ctrl + Shift + I), performance tab.

ncianeo · 2022-04-14T08:07:17Z

(Add) Current ort-web implementation consists of:

onnx model parsing & loading weights: Javascript(Typescript)
infer each layer (wasm function call): Javascript(Typescript)
actual computation of each layer: wasm(WebAssembly)

I think 2-3 should be merged into wasm part in order to get more performance, but there will be possible issues like:

model weights should be sent to wasm buffer at first (memory usage issue)
it will be not compatible with webgl backend (whole webgl backend is written in javascript(typescript) + glsl). In this moment (at the development stage), this will decrease productivity of the library because the development of webgl backend is not even completed yet.

vacing · 2022-09-28T04:56:43Z

Describe the bug Hi, in the onnxruntime-web blog, it claims near-native speed on the web. I tested mobilenetv2 as a benchmark and our own panoptic segmentation model as well. It runs 11 and 17 times slower than native inference for mobilenet v2 and our model. Wonder whether this is expected or if some inference configs are messed up on our side?

Just for reference, tensorflow-js with SIMD and multi-thread enabled runs 12ms for mobilenetv2, and onnxruntime-web takes about 45ms. Native inference with onnxruntime takes 4ms on my 2019 MacBook pro.

We would like to use onnxruntime-web as the inference engine because of the easy portability for our existing onnx models. But the speed difference between tf-js is quite significant. Help would be appreciated.

Urgency High

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS 12.0.1

ONNX Runtime web installed from (source or binary): binary from npm

ONNX Runtime web version: 1.11 (latest from https://www.npmjs.com/package/onnxruntime-web)

To Reproduce For mobilenetv2:

onnxruntime-web: used this repo, with the latest onnxruntime-web version

tf-js: used this demo

For our model:

cannot share because of confidentiality.

same problem, I want to know your final decision or solution, please

vacing · 2022-09-29T11:50:36Z

tf-js: used this demo

tfjs demo may has bug, if you select multi-thread, then the speed will be very slow, and can't recover even unselect it.

I wonder your wasm test result may be not correct, tfjs used webgl because of bug

francis2tm · 2023-07-31T00:16:27Z

Hey @CanyonWind ,
Did you manage to find any solution to make onnxruntime-web at least on pair with tensorflow.js?

Thanks

sebastian-east · 2023-09-25T11:15:44Z

I am also running into this issue: I find that onnxruntime-web is ~10x slower for inference than onnxruntime-node and onnxruntime in python (which both have comparable performance) when using the same model and input data. The web profiler indicates that all of the time consumed during the onnxruntime-web inference is in wasm functions. This issue appears to have been around for a while; is it simply an accepted performance limitation when using onnxruntime-web? It's not obvious why the performance of the onnxruntime-web implementation should be so slow, particularly compared to the node implementaton.

chinmayakcv · 2024-01-29T10:59:03Z

Any progress on this?

kabyanil · 2024-07-29T09:21:12Z

I am trying to run web inference on a transformer model, with the modules separately exported to onnx. I can confirm that this issue still exists for onnx runtime web. Web inference is much slower compared to Python inference using the same onnx modules.

gyagp · 2024-07-29T09:31:58Z

@kabyanil There are multiple reasons that may cause perf diff with the native, and it's case by case.
Could you please share your model and how to run it?

kabyanil · 2024-07-29T10:10:48Z

@gyagp Please refer to this issue I opened yesterday for more info #21535

faxu added the component:ort-web label Apr 12, 2022

sophies927 added platform:web issues related to ONNX Runtime web; typically submitted using template and removed component:ort-web labels Aug 12, 2022

DawChihLiou mentioned this issue Aug 8, 2023

Ship text transformer in WASM tantaraio/voy#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

onnxruntime-web is 11-17x times slower than native inference #11181

onnxruntime-web is 11-17x times slower than native inference #11181

CanyonWind commented Apr 12, 2022 •

edited

Loading

CanyonWind commented Apr 13, 2022

ncianeo commented Apr 14, 2022

ncianeo commented Apr 14, 2022

vacing commented Sep 28, 2022

vacing commented Sep 29, 2022 •

edited

Loading

francis2tm commented Jul 31, 2023

sebastian-east commented Sep 25, 2023

chinmayakcv commented Jan 29, 2024

kabyanil commented Jul 29, 2024

gyagp commented Jul 29, 2024

kabyanil commented Jul 29, 2024

onnxruntime-web is 11-17x times slower than native inference #11181

onnxruntime-web is 11-17x times slower than native inference #11181

Comments

CanyonWind commented Apr 12, 2022 • edited Loading

CanyonWind commented Apr 13, 2022

ncianeo commented Apr 14, 2022

ncianeo commented Apr 14, 2022

vacing commented Sep 28, 2022

vacing commented Sep 29, 2022 • edited Loading

francis2tm commented Jul 31, 2023

sebastian-east commented Sep 25, 2023

chinmayakcv commented Jan 29, 2024

kabyanil commented Jul 29, 2024

gyagp commented Jul 29, 2024

kabyanil commented Jul 29, 2024

CanyonWind commented Apr 12, 2022 •

edited

Loading

vacing commented Sep 29, 2022 •

edited

Loading