-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JS/Web] External weights load #18535
Conversation
Thanks very much for the PR. External weights support is one of the very important features for ort-web to support large models. However, it seems that this change still cannot break the 4GB memory limit for wasm32. @guschmue is trying to add support for ORT core to load models > 4GB into wasm32 by not reading the data into WebAssembly memory at all - still WIP. for webgpu, 4GB memory size is a hard limit for wasm32 and to break this limit a change in ORT core is required to skip the loading of weights in the models and instead create GPU buffers directly |
Well, since it mmaps chunks of weights file to memory for each layer and then frees them, it might be possible to fit more than 4gb with 32bit. I've loaded 2.2gb latent consistency model and run it without any out of memory issues with wasm32 build. However, i got NaN as a result but that's most likely because new code is not compatible with Tensor class from transformers.js. Will do some more tests closer to the weekend |
supporting the external data format is super high on our list. |
should add - we are thinking if we can add some way to pass the weights via reference so they don't get copied into wasm heap until needed. ort would call the EP to put them into the right place so webgpu can just copy them directly from JS to gpu. |
As per our discussion in team meeting, using MEMFS is an option. However, we need a build flag to allow this feature to be enabled/disabled. Please add a build time flag If the flag is ON, ort-web should be able to work with external data via MEMFS; otherwise, ort-web should not change (not including unnecessary FS modules or extra unused JS code) I will later review the rest of the changes in this PR. |
Hey fs-eire do you mean MEMFS will be enabled in future release to support large models? does |
Before answering this question, let's take a look at what are the problems that we are trying to resolve. Problem 1: Large model (> 2GB) does not work in ONNX Runtime. This is because of the 2G hard limit of protobuf, the format that ONNX model is using. To resolve this problem, the ONNX Runtime team introduces external data feature - a raw data file including model weights which can be very large, and the corresponding ONNX model containing the weights by offset and length referring the raw data. This works in ONNX Runtime, but when it turns into Web Assembly, 2 new problems came out: Problem 2: Incompatible file system API being used in external data feature. ONNX Runtime uses synchromized file I/O API to read the external data when initializing the model. However, in web we don't have any sync I/O APIs. This is the technical blocker for ONNX Runtime Web to use external data feature without modifying anything. Emscripten, the WebAssembly C++ compiler offers this MEMFS utility to simulate a in memory file system to work with synchronized file read API with pre-loaded data. This is what this PR resolved. Problem 3: The 4GB hard limit of memory space of wasm32. Because wasm uses 32-bit pointers, the memory space is no more than 4GB. There are 2 ways to resolve this problem:
Now let's answer your question. We are not sure whether MEMFS will be enabled by default in future, but we will keep the capabilities to build from source with enabling/disabling this feature. It resolves the problem 2, which extends the model size that supported by ort-web from 2GB to ~4GB, but still not working with models > 4GB. Personally I don't want to add too many things into ort-web as the artifacts is already very large now and our CI pipeline takes more and more time to build. So we will carefully think about features like MEMFS and WASM64 should be enabled or not. We will try to figure out the answer eventually when a real usage of a huge model on browser is born (instead of a very "cool" demo). Otherwise we keep looking and keep things in control. |
Thank you for such a detailed explanation! Being able to support model between 2GB ~ 4GB can unblock the landing of many stable diffusion models on web I believe (e.g. https://huggingface.co/runwayml/stable-diffusion-v1-5). I think this is a "cool" and real huge model that worth attentions. Thank you anyway, if there are plans to support it, I'm happy to offer help by building a SD demo (currently blocked by the large model size issue, the unet model is about 3.4 GB, which falls into the 2~4GB range perfectly). |
it is already done here https://islamov.ai/diffusers.js/ I think the best way would be to support wasm32 with some way to load weights directly into WebGPU memory because
|
yea, I also noticed that project before (just found you're the author), but it is based on the modified version of onnxruntime-js, looking forward to see official supports for large models ;) |
We totally agree that we need external data format and a way to deal with models > 4GB after that. Using FS is an easy way to add support for the external data format but longer term we want to be able to pass in a dictionary with the external data. The reason is that if we use FS the data will go thought the wasm heap, while with the dictionary we think we can add some method to have an EP copy the data directly from js heap to the device without going through the wasm heap. We are thinking to merge this PR but make it a build option and in the very near team add the support for the dictionary followed by the wasm heap bypass. Later needs some changes in onnxruntime but we think we can make that work. |
External data is implemented in #19087 and merged in main branch as a replacement of this PR. |
Description
Much cleaner approach to load external weights for wasm/webgpu providers via ExecutionProviderOption