-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoreML - Writing CoreML Model on every inference session creation #21761
Comments
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
Any updates on this? |
I'm having the same issue here. And disk usage is also a problem. It seems onnxruntime is converting .onnx to coreml models and load again from disk. onnxruntime/onnxruntime/core/providers/coreml/builders/model_builder.cc Lines 996 to 1008 in 0f1f3b7
|
@skottmckay Could you please take a look? Using the cached models cannot be such big of a deal to implement. |
The basics are simple. Making it robust is more work. For example, if the source onnx model changes how do you detect that? e.g. an app could download a new version of the onnx model in the background invalidating the cached version. An app could have multiple onnx models so the caching logic would need a way to ensure that a specific input model is cached in a unique location. An inference session could be created with model bytes instead of a path. Should the solution handle this as well? If not, we need to explain to users how/when/why they can/can't cache the CoreML model via log messages and documentation. I've added the request to the backlog. |
I suggest creating CRV32 based hashes as sha and even md5 are quite slow for larger models.
Save the Let me share the FaceFusion code. https://github.com/facefusion/facefusion/blob/master/facefusion/hash_helper.py Both path and bytes inputs can be supported easily with that approach. |
@skottmckay Could you please add this to the roadmap? I have many macOS users suffering with the performance of CoreML. |
We'll look at adding it in the 1.21 release which will be early next year. We need to handle a mix of scenarios where a simple checksum may not be possible. We don't necessarily have a nice path to the model in the CoreML EP to checksum a file. e.g. an InferenceSession can be created with bytes instead of a path and those bytes are converted to an onnxruntime Graph instance way before the CoreML EP is involved. That means the most consistent thing the CoreML EP has to work with is the Graph instance, and that's not one contiguous block of memory. On the performance side of things, even if there's a cached model we need to run the logic in CoreMLExecutionProvider::GetCapability to figure out how many partitions there are (there's one CoreML model per partition) and which parts of the ONNX model map to which cached CoreML models to know how and when to execute the CoreML model. |
I am debugging the issue and have some findings about this initilization costing too much time.
Could you please get it a try with either 1. set the backend to MLprogram instead of netrulnetwork or 2. set computeunit to CPUonly and check if CoreML provider is slow as before? |
Besides, CoreML EP support float16 model directly if you select MLProgram as the model format. |
AmusementClub/vs-mlrt#116 (comment)
|
Hi @yuygfgg Thanks for the update. More operators would be supported once |
With the caching mechanism we can of course save the cost of writing disk, but I think it's not costing too much time. Please let me know if it's still a problem in any product senario. |
There is no way to avoid reloading the CoreML models from disk to the system framework. Based on that, I did a lot of profiling to measuring the time cost of
|
Is there something we can do nowdays like monkey-patching the onnxruntime? |
Describe the issue
I set onnxruntime to verbose to understand why the CoreML provider is that slow. Then I figured out it does convert the ONNX to mlmodel every time without a cache.
To reproduce
Just use
set_default_logger_severity(0)
I use a ONNX with opset 17 - but the issue is not related to a single model
Urgency
No response
Platform
Mac
OS Version
14.5
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.18.0
ONNX Runtime API
Python
Architecture
ARM64
Execution Provider
CoreML
Execution Provider Library Version
No response
The text was updated successfully, but these errors were encountered: