-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Assess performance capability before a model is loaded #20998
Comments
I have an implementation in https://github.com/xhcao/onnx_model_external_data_web_test, the draft code is in onnxruntime.diff . |
Thank you @xhcao for bootstrapping this effort! I believe your code in onnxruntime.diff may benefit from the following modification. What do you think? diff --git a/onnxruntime/core/framework/tensorprotoutils.cc b/onnxruntime/core/framework/tensorprotoutils.cc
index 6af78f18fb..f32d3253e6 100644
--- a/onnxruntime/core/framework/tensorprotoutils.cc
+++ b/onnxruntime/core/framework/tensorprotoutils.cc
@@ -956,24 +956,38 @@ Status GetExtDataFromTensorProto(const Env& env, const ORTCHAR_T* model_path,
if (fileName.startsWith('./')) {
fileName = fileName.substring(2);
}
+
+ if (Module.MountedFiles.get('fakefakefake_' + fileName)) {
+ const length = $2 >>> 0;
+ const buffer = $3 >>> 0;
+ try {
+ // Set fake zero data to buffer.
+ const dummyData = new Uint8Array(length);
+ HEAPU8.set(dummyData, buffer);
+ return 0;
+ } catch {
+ return 4;
+ }
+ }
+
const fileData = Module.MountedFiles.get(fileName);
if (!fileData) {
return 2; // File not found in preloaded files.
}
const offset = $1 >>> 0;
const length = $2 >>> 0;
const buffer = $3 >>> 0;
if (offset + length > fileData.byteLength) {
return 3; // Out of bounds.
}
try {
// Copy the file data (fileData,offset,length) into WebAssembly memory (HEAPU8,buffer,length).
HEAPU8.set(fileData.subarray(offset, offset + length), buffer);
return 0;
} catch {
return 4;
}
}),
external_data_file_path.c_str(), |
@guschmue Any update on this by any chance? |
@guschmue gentle ping |
sorry, still on the wish list. We have some high priority tasks for the next few weeks and need to put this one into the queue. |
That's good to hear it's still on the queue. Thanks for keeping us up to date @guschmue! |
@guschmue Just wanted to check if there's been any movement on the priority for this. |
Describe the feature request
Assess performance capability without downloading the full model.
Describe scenario use case
For some models, the performance may be a blocker. Since model downloads can be quite large, I wonder if there should be a way for web developers to know their machine performance class for running a model without downloading it completely first.
I believe this would involve running the model code with zeroed-out weights, which would still require buffer allocations but would allow the web app to catch out-of-memory errors or such. The model architecture would still needed to generate shaders, but this be much smaller than model weights.
cc @xenova @guschmue
Originally posted at huggingface/transformers.js#545 (comment)
The text was updated successfully, but these errors were encountered: