CoreML - Writing CoreML Model on every inference session creation #21761

henryruhs · 2024-08-15T21:25:42Z

Describe the issue

I set onnxruntime to verbose to understand why the CoreML provider is that slow. Then I figured out it does convert the ONNX to mlmodel every time without a cache.

2024-08-15 23:01:33.924908 [I:onnxruntime:, model_builder.cc:926 SaveModel] Writing CoreML Model to /var/folders/nz/t7blntb55gv6k4hzbd91kl5h0000gn/T/onnxruntime-72B02264-8413-46DF-8067-45E9808043DF-16638-000199F2EF486AAE.model.mlmodel

To reproduce

Just use set_default_logger_severity(0)

I use a ONNX with opset 17 - but the issue is not related to a single model

Urgency

No response

Platform

Mac

OS Version

14.5

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

CoreML

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-09-15T15:00:59Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

henryruhs · 2024-09-15T15:02:30Z

Any updates on this?

yuygfgg · 2024-09-19T01:55:57Z

I'm having the same issue here. And disk usage is also a problem.

AmusementClub/vs-mlrt#107

It seems onnxruntime is converting .onnx to coreml models and load again from disk.

onnxruntime/onnxruntime/core/providers/coreml/builders/model_builder.cc

Lines 996 to 1008 in 0f1f3b7

    
           Status ModelBuilder::Build(const GraphViewer& graph_viewer, const logging::Logger& logger, 
        
                                      int32_t coreml_version, uint32_t coreml_flags, 
        
                                      std::vector<std::string>&& onnx_input_names, 
        
                                      std::vector<std::string>&& onnx_output_names, 
        
                                      std::unique_ptr<Model>& model) { 
        
             ModelBuilder builder(graph_viewer, logger, coreml_version, coreml_flags, 
        
                                  std::move(onnx_input_names), std::move(onnx_output_names)); 
        
             ORT_RETURN_IF_ERROR(builder.CreateModel()); 
        
             ORT_RETURN_IF_ERROR(builder.SaveModel()); 
        
             return builder.LoadModel(model); 
        
           }

henryruhs · 2024-10-02T10:43:20Z

@skottmckay Could you please take a look? Using the cached models cannot be such big of a deal to implement.

skottmckay · 2024-10-03T07:18:08Z

The basics are simple. Making it robust is more work.

For example, if the source onnx model changes how do you detect that? e.g. an app could download a new version of the onnx model in the background invalidating the cached version.

An app could have multiple onnx models so the caching logic would need a way to ensure that a specific input model is cached in a unique location.

An inference session could be created with model bytes instead of a path. Should the solution handle this as well? If not, we need to explain to users how/when/why they can/can't cache the CoreML model via log messages and documentation.

I've added the request to the backlog.

henryruhs · 2024-10-03T08:09:24Z

I suggest creating CRV32 based hashes as sha and even md5 are quite slow for larger models.

def create_hash(content : bytes) -> str:
        return format(zlib.crc32(content), '08x')

Save the HASH.mlmodel files once an ONNX model has been cached. Later you can perform a simple file lookup to either use an existing mlmodel or converting one.

Let me share the FaceFusion code.

https://github.com/facefusion/facefusion/blob/master/facefusion/hash_helper.py

Both path and bytes inputs can be supported easily with that approach.

henryruhs · 2024-10-30T07:57:52Z

@skottmckay Could you please add this to the roadmap? I have many macOS users suffering with the performance of CoreML.

skottmckay · 2024-10-31T07:21:32Z

We'll look at adding it in the 1.21 release which will be early next year.

We need to handle a mix of scenarios where a simple checksum may not be possible. We don't necessarily have a nice path to the model in the CoreML EP to checksum a file. e.g. an InferenceSession can be created with bytes instead of a path and those bytes are converted to an onnxruntime Graph instance way before the CoreML EP is involved. That means the most consistent thing the CoreML EP has to work with is the Graph instance, and that's not one contiguous block of memory.

On the performance side of things, even if there's a cached model we need to run the logic in CoreMLExecutionProvider::GetCapability to figure out how many partitions there are (there's one CoreML model per partition) and which parts of the ONNX model map to which cached CoreML models to know how and when to execute the CoreML model.

wejoncy · 2024-11-28T11:59:51Z

Hi @henryruhs @yuygfgg

I am debugging the issue and have some findings about this initilization costing too much time.

When I use the default backend "Netural Network" it looks like ORT does consume a few minutes to convert model, but the truth is APPLE SDK "compilingmodel" did. this function try to dispatch different ops to different hardward CPU/GPU/ANE, espeicially for ANE. ANE used float-16 by default, so APPLE has to convert the model from float to float16 at runtime
When I set the computeunit to CPUonly, then it works good then.

Could you please get it a try with either 1. set the backend to MLprogram instead of netrulnetwork or 2. set computeunit to CPUonly and check if CoreML provider is slow as before?

wejoncy · 2024-11-28T12:03:15Z

Besides, CoreML EP support float16 model directly if you select MLProgram as the model format.

yuygfgg · 2024-11-29T01:55:18Z

Could you please get it a try with either 1. set the backend to MLprogram instead of netrulnetwork or 2. set computeunit to CPUonly and check if CoreML provider is slow as before?

AmusementClub/vs-mlrt#116 (comment)

ml_program=1 + fp16=True: ANE 120% usage, 10.52fps
ml_program=1 + fp16=False: GPU 97% usage, 5.40fps
ml_program=0 + fp16=True: CPU 100% usage, I have no patience to wait
ml_program=0 + fp16=False: ANE 114% usage, 10.23fps

wejoncy · 2024-11-29T02:15:42Z

Hi @yuygfgg Thanks for the update.

More operators would be supported once ml_program is enabled. Does this resolve the issue induced by writing the coreml model to disk? ORT will get those subgraphs which can be captured by CoreML EP and it wouldn't take too much time as this wouldn't convert to FP16 at runtime.

wejoncy · 2024-11-29T02:17:32Z

With the caching mechanism we can of course save the cost of writing disk, but I think it's not costing too much time.

Please let me know if it's still a problem in any product senario.

wejoncy · 2024-11-29T09:47:03Z

I'm having the same issue here. And disk usage is also a problem.
ModelBuilder builder(graph_viewer, logger, coreml_version, coreml_flags,
std::move(onnx_input_names), std::move(onnx_output_names));

ORT_RETURN_IF_ERROR(builder.CreateModel());
ORT_RETURN_IF_ERROR(builder.SaveModel());

return builder.LoadModel(model);
}

There is no way to avoid reloading the CoreML models from disk to the system framework. Based on that, I did a lot of profiling to measuring the time cost of builder.SaveModel(). This operator cost at most 100ms if model size is up to more than 300MB.
And once a session is initialized, we can re-use this session as long as you want. So this shouldn't be a bottleneck.

LoadModel is indeed a bottleneck and it possibly cost lots of time, but Apple framework didn't expose any API to cache it.

henryruhs · 2024-12-04T15:13:18Z

Is there something we can do nowdays like monkey-patching the onnxruntime?

github-actions bot added the ep:CoreML issues related to CoreML execution provider label Aug 15, 2024

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Sep 15, 2024

github-actions bot removed the stale issues that have not been addressed in a while; categorized by a bot label Sep 16, 2024

skottmckay added the contributions welcome lower priority issues for the core ORT teams label Oct 3, 2024

wejoncy linked a pull request Dec 10, 2024 that will close this issue

[CoreML] support coreml model cache #23065

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoreML - Writing CoreML Model on every inference session creation #21761

CoreML - Writing CoreML Model on every inference session creation #21761

henryruhs commented Aug 15, 2024 •

edited

Loading

github-actions bot commented Sep 15, 2024

henryruhs commented Sep 15, 2024

yuygfgg commented Sep 19, 2024 •

edited

Loading

henryruhs commented Oct 2, 2024 •

edited

Loading

skottmckay commented Oct 3, 2024

henryruhs commented Oct 3, 2024 •

edited

Loading

henryruhs commented Oct 30, 2024

skottmckay commented Oct 31, 2024

wejoncy commented Nov 28, 2024

wejoncy commented Nov 28, 2024

yuygfgg commented Nov 29, 2024

wejoncy commented Nov 29, 2024

wejoncy commented Nov 29, 2024

wejoncy commented Nov 29, 2024 •

edited

Loading

henryruhs commented Dec 4, 2024 •

edited

Loading

CoreML - Writing CoreML Model on every inference session creation #21761

CoreML - Writing CoreML Model on every inference session creation #21761

Comments

henryruhs commented Aug 15, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

github-actions bot commented Sep 15, 2024

henryruhs commented Sep 15, 2024

yuygfgg commented Sep 19, 2024 • edited Loading

henryruhs commented Oct 2, 2024 • edited Loading

skottmckay commented Oct 3, 2024

henryruhs commented Oct 3, 2024 • edited Loading

henryruhs commented Oct 30, 2024

skottmckay commented Oct 31, 2024

wejoncy commented Nov 28, 2024

wejoncy commented Nov 28, 2024

yuygfgg commented Nov 29, 2024

wejoncy commented Nov 29, 2024

wejoncy commented Nov 29, 2024

wejoncy commented Nov 29, 2024 • edited Loading

henryruhs commented Dec 4, 2024 • edited Loading

henryruhs commented Aug 15, 2024 •

edited

Loading

yuygfgg commented Sep 19, 2024 •

edited

Loading

henryruhs commented Oct 2, 2024 •

edited

Loading

henryruhs commented Oct 3, 2024 •

edited

Loading

wejoncy commented Nov 29, 2024 •

edited

Loading

henryruhs commented Dec 4, 2024 •

edited

Loading