[Feature Request] Assess performance capability before a model is loaded #20998

beaufortfrancois · 2024-06-11T12:05:47Z

Describe the feature request

Assess performance capability without downloading the full model.

Describe scenario use case

For some models, the performance may be a blocker. Since model downloads can be quite large, I wonder if there should be a way for web developers to know their machine performance class for running a model without downloading it completely first.

I believe this would involve running the model code with zeroed-out weights, which would still require buffer allocations but would allow the web app to catch out-of-memory errors or such. The model architecture would still needed to generate shaders, but this be much smaller than model weights.

cc @xenova @guschmue

Originally posted at huggingface/transformers.js#545 (comment)

xhcao · 2024-06-24T08:38:29Z

I have an implementation in https://github.com/xhcao/onnx_model_external_data_web_test, the draft code is in onnxruntime.diff .
And the repo also has an demo to verify the implementation, I cannot upload a big model to that repo, but you could create a custom model with external data file following https://github.com/onnx/onnx/blob/main/docs/ExternalData.md
Any comments are welcome.

beaufortfrancois · 2024-07-01T12:06:15Z

Thank you @xhcao for bootstrapping this effort!

I believe your code in onnxruntime.diff may benefit from the following modification. What do you think?

diff --git a/onnxruntime/core/framework/tensorprotoutils.cc b/onnxruntime/core/framework/tensorprotoutils.cc
index 6af78f18fb..f32d3253e6 100644
--- a/onnxruntime/core/framework/tensorprotoutils.cc
+++ b/onnxruntime/core/framework/tensorprotoutils.cc
@@ -956,24 +956,38 @@ Status GetExtDataFromTensorProto(const Env& env, const ORTCHAR_T* model_path,
                                  if (fileName.startsWith('./')) {
                                    fileName = fileName.substring(2);
                                  }
+
+                                 if (Module.MountedFiles.get('fakefakefake_' + fileName)) {
+                                   const length = $2 >>> 0;
+                                   const buffer = $3 >>> 0;
+                                   try {
+                                     // Set fake zero data to buffer.
+                                     const dummyData = new Uint8Array(length);
+                                     HEAPU8.set(dummyData, buffer);
+                                     return 0;
+                                   } catch {
+                                     return 4;
+                                   }
+                                 }
+
                                  const fileData = Module.MountedFiles.get(fileName);
                                  if (!fileData) {
                                    return 2;  // File not found in preloaded files.
                                  }
                                  const offset = $1 >>> 0;
                                  const length = $2 >>> 0;
                                  const buffer = $3 >>> 0;

                                  if (offset + length > fileData.byteLength) {
                                    return 3;  // Out of bounds.
                                  }

                                  try {
                                    // Copy the file data (fileData,offset,length) into WebAssembly memory (HEAPU8,buffer,length).
                                    HEAPU8.set(fileData.subarray(offset, offset + length), buffer);
                                    return 0;
                                  } catch {
                                    return 4;
                                  }
                                }),
                                external_data_file_path.c_str(),

beaufortfrancois · 2024-07-31T11:49:11Z

@guschmue Any update on this by any chance?

beaufortfrancois · 2024-08-28T11:35:58Z

@guschmue gentle ping

guschmue · 2024-08-28T15:59:48Z

sorry, still on the wish list. We have some high priority tasks for the next few weeks and need to put this one into the queue.

beaufortfrancois · 2024-08-29T05:29:19Z

That's good to hear it's still on the queue. Thanks for keeping us up to date @guschmue!

beaufortfrancois · 2024-09-30T08:25:16Z

@guschmue Just wanted to check if there's been any movement on the priority for this.

beaufortfrancois added the feature request request for unsupported feature or enhancement label Jun 11, 2024

beaufortfrancois mentioned this issue Jun 11, 2024

🚀🚀🚀 Transformers.js V3 🚀🚀🚀 huggingface/transformers.js#545

Merged

13 tasks

guschmue added the ep:WebGPU ort-web webgpu provider label Jun 11, 2024

sophies927 added the platform:web issues related to ONNX Runtime web; typically submitted using template label Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Assess performance capability before a model is loaded #20998

[Feature Request] Assess performance capability before a model is loaded #20998

beaufortfrancois commented Jun 11, 2024

xhcao commented Jun 24, 2024

beaufortfrancois commented Jul 1, 2024

beaufortfrancois commented Jul 31, 2024

beaufortfrancois commented Aug 28, 2024

guschmue commented Aug 28, 2024

beaufortfrancois commented Aug 29, 2024

beaufortfrancois commented Sep 30, 2024

[Feature Request] Assess performance capability before a model is loaded #20998

[Feature Request] Assess performance capability before a model is loaded #20998

Comments

beaufortfrancois commented Jun 11, 2024

Describe the feature request

Describe scenario use case

xhcao commented Jun 24, 2024

beaufortfrancois commented Jul 1, 2024

beaufortfrancois commented Jul 31, 2024

beaufortfrancois commented Aug 28, 2024

guschmue commented Aug 28, 2024

beaufortfrancois commented Aug 29, 2024

beaufortfrancois commented Sep 30, 2024