Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Assess performance capability before a model is loaded #20998

Open
beaufortfrancois opened this issue Jun 11, 2024 · 7 comments
Labels
ep:WebGPU ort-web webgpu provider feature request request for unsupported feature or enhancement platform:web issues related to ONNX Runtime web; typically submitted using template

Comments

@beaufortfrancois
Copy link

Describe the feature request

Assess performance capability without downloading the full model.

Describe scenario use case

For some models, the performance may be a blocker. Since model downloads can be quite large, I wonder if there should be a way for web developers to know their machine performance class for running a model without downloading it completely first.

I believe this would involve running the model code with zeroed-out weights, which would still require buffer allocations but would allow the web app to catch out-of-memory errors or such. The model architecture would still needed to generate shaders, but this be much smaller than model weights.

cc @xenova @guschmue

Originally posted at huggingface/transformers.js#545 (comment)

@beaufortfrancois beaufortfrancois added the feature request request for unsupported feature or enhancement label Jun 11, 2024
@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Jun 11, 2024
@sophies927 sophies927 added the platform:web issues related to ONNX Runtime web; typically submitted using template label Jun 13, 2024
@xhcao
Copy link
Contributor

xhcao commented Jun 24, 2024

I have an implementation in https://github.com/xhcao/onnx_model_external_data_web_test, the draft code is in onnxruntime.diff .
And the repo also has an demo to verify the implementation, I cannot upload a big model to that repo, but you could create a custom model with external data file following https://github.com/onnx/onnx/blob/main/docs/ExternalData.md
Any comments are welcome.

@beaufortfrancois
Copy link
Author

Thank you @xhcao for bootstrapping this effort!

I believe your code in onnxruntime.diff may benefit from the following modification. What do you think?

diff --git a/onnxruntime/core/framework/tensorprotoutils.cc b/onnxruntime/core/framework/tensorprotoutils.cc
index 6af78f18fb..f32d3253e6 100644
--- a/onnxruntime/core/framework/tensorprotoutils.cc
+++ b/onnxruntime/core/framework/tensorprotoutils.cc
@@ -956,24 +956,38 @@ Status GetExtDataFromTensorProto(const Env& env, const ORTCHAR_T* model_path,
                                  if (fileName.startsWith('./')) {
                                    fileName = fileName.substring(2);
                                  }
+
+                                 if (Module.MountedFiles.get('fakefakefake_' + fileName)) {
+                                   const length = $2 >>> 0;
+                                   const buffer = $3 >>> 0;
+                                   try {
+                                     // Set fake zero data to buffer.
+                                     const dummyData = new Uint8Array(length);
+                                     HEAPU8.set(dummyData, buffer);
+                                     return 0;
+                                   } catch {
+                                     return 4;
+                                   }
+                                 }
+
                                  const fileData = Module.MountedFiles.get(fileName);
                                  if (!fileData) {
                                    return 2;  // File not found in preloaded files.
                                  }
                                  const offset = $1 >>> 0;
                                  const length = $2 >>> 0;
                                  const buffer = $3 >>> 0;

                                  if (offset + length > fileData.byteLength) {
                                    return 3;  // Out of bounds.
                                  }

                                  try {
                                    // Copy the file data (fileData,offset,length) into WebAssembly memory (HEAPU8,buffer,length).
                                    HEAPU8.set(fileData.subarray(offset, offset + length), buffer);
                                    return 0;
                                  } catch {
                                    return 4;
                                  }
                                }),
                                external_data_file_path.c_str(),

@beaufortfrancois
Copy link
Author

@guschmue Any update on this by any chance?

@beaufortfrancois
Copy link
Author

@guschmue gentle ping

@guschmue
Copy link
Contributor

sorry, still on the wish list. We have some high priority tasks for the next few weeks and need to put this one into the queue.

@beaufortfrancois
Copy link
Author

That's good to hear it's still on the queue. Thanks for keeping us up to date @guschmue!

@beaufortfrancois
Copy link
Author

@guschmue Just wanted to check if there's been any movement on the priority for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:WebGPU ort-web webgpu provider feature request request for unsupported feature or enhancement platform:web issues related to ONNX Runtime web; typically submitted using template
Projects
None yet
Development

No branches or pull requests

4 participants