Slow model loading time for CoreML quantized model #5718

cccclai · 2024-09-27T02:54:42Z

🐛 Describe the bug

Get #5710 and run

python executorch.examples.apple.coreml.scripts.export -m resnet18 --quantize

The FP32 model runs fully resident on ANE at 0.9ms on average and 11.13ms cold-start (first inference).
The int8 quantized model runs also fully resident on ANE at 0.54ms on average and 3.10 ms cold-start. Also looking at the layers, looks like there is a lot of quantize followed immediately by dequantize.

Versions

Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 15.0 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.26.3)
CMake version: version 3.29.2
Libc version: N/A

Python version: 3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ] (64-bit runtime)
Python platform: macOS-15.0-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Apple M1 Pro

Versions of relevant libraries:
[pip3] executorch==0.4.0a0+7047162
[pip3] flake8==6.0.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] numpydoc==1.5.0
[pip3] torch==2.5.0.dev20240618
[pip3] torchaudio==2.4.0.dev20240618
[pip3] torchsr==1.0.4
[pip3] torchvision==0.20.0.dev20240618
[conda] executorch                0.4.0a0+7047162          pypi_0    pypi
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] numpydoc                  1.5.0           py311hca03da5_0
[conda] torch                     2.4.0a0+gitae81855           dev_0    <develop>
[conda] torchaudio                2.4.0.dev20240618          pypi_0    pypi
[conda] torchsr                   1.0.4                    pypi_0    pypi
[conda] torchvision               0.20.0.dev20240618          pypi_0    pypi

The text was updated successfully, but these errors were encountered:

YifanShenSZ · 2024-09-27T16:56:16Z

Could you please clarify what's the goal of this issue? Is the 3.10 ms cold-start time too much?

d-findlay · 2024-09-27T17:43:31Z

Thanks @YifanShenSZ, I'd like to correct the above numbers. The goal of this issue is to resolve the long load time for quantized models using the CoreML delegate. Load times:
fp32: 484ms
quantized: 1392ms

cccclai · 2024-09-27T17:47:22Z

Could you please clarify what's the goal of this issue? Is the 3.10 ms cold-start time too much?

Sorry the description is not clear; David's response is clearer, and it was original from him

YifanShenSZ · 2024-09-27T18:02:20Z

Thanks David, so the issue is, using executorch CoreML delegate has much longer loading time than directly using CoreML runtime?

Handing it over to @cymbalrush to investigate where the overhead came from

d-findlay · 2024-09-27T18:07:18Z

Thanks @YifanShenSZ @cymbalrush. Both models are using the ExecuTorch CoreML delegate. The quantized model takes much longer.

cymbalrush · 2024-10-03T20:11:01Z

@d-findlay how are you getting the load time? is it from the devtools?

cccclai · 2024-10-03T20:28:09Z

@d-findlay how are you getting the load time? is it from the devtools?

I just asked @d-findlay and he said both devtools and the xcode instruments showed long load time.

d-findlay · 2024-10-03T23:01:51Z

@cymbalrush, we are using devtools while specifying profile so we can inspect it with Instruments.

We can see that the quantized model takes 1.3seconds to Load (prepare and cache) the model on CoreML, where 1.14 seconds is spent on The Neural Engine Compile.

Compared to the unquantized model that takes 464ms to Load (prepare and cache) the model on CoreML, where 297ms is spent on The Neural Engine Compile.

d-findlay · 2024-10-03T23:31:53Z

@cymbalrush It's also worth noting that when we try to use MODEL_TYPE.COMPILED_MODEL, we get a failure:
AttributeError: 'NoneType' object has no attribute 'get_compiled_model_path'

However, this is unrelated to the above concern that with default MODEL_TYPE we still get longer load times for quantized models.

cymbalrush · 2024-10-07T22:55:10Z

Thanks @d-findlay! Could you try iOS18, there is an optimization that was part of the iOS18 release that improves NeuralEngine compile time. I am seeing load improvement when I test locally on iOS18 but it would be great if you could confirm it. This is a one time cost as you know, the subsequent loads should be faster. Discussed another option with @cccclai in which we could try loading the model on gpu if the neural engine load takes too much time.

Investigating MODEL_TYPE.COMPILED_MODEL, will put up a fix soon.

d-findlay · 2024-10-08T14:56:03Z

Thank you very much @cymbalrush. Do you recommend the MODEL_TYPE.COMPILED_MODEL when shipping models? Would that improve the NeuralEngine compile time further?

cymbalrush · 2024-10-09T01:20:19Z

It won't improve the NeuralEngine compile time but it could improve the model load time. If the type is MODEL_TYPE.MODEL then the pte has mlpackage contents, and the Core ML delegate on the first load has to convert it to mlmodelc. If MODEL_TYPE.COMPILED_MODEL is used then the pte has mlmodelc contents, in this case Core ML delegate skips the mlpackage -> mlmodelc conversion.

cccclai added the module: coreml Issues related to Apple's Core ML delegation label Sep 27, 2024

cccclai assigned cymbalrush and YifanShenSZ Sep 27, 2024

Olivia-liu added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow model loading time for CoreML quantized model #5718

Slow model loading time for CoreML quantized model #5718

cccclai commented Sep 27, 2024 •

edited

Loading

YifanShenSZ commented Sep 27, 2024

d-findlay commented Sep 27, 2024 •

edited

Loading

cccclai commented Sep 27, 2024

YifanShenSZ commented Sep 27, 2024

d-findlay commented Sep 27, 2024

cymbalrush commented Oct 3, 2024

cccclai commented Oct 3, 2024

d-findlay commented Oct 3, 2024

d-findlay commented Oct 3, 2024

cymbalrush commented Oct 7, 2024 •

edited

Loading

d-findlay commented Oct 8, 2024

cymbalrush commented Oct 9, 2024

Slow model loading time for CoreML quantized model #5718

Slow model loading time for CoreML quantized model #5718

Comments

cccclai commented Sep 27, 2024 • edited Loading

🐛 Describe the bug

Versions

YifanShenSZ commented Sep 27, 2024

d-findlay commented Sep 27, 2024 • edited Loading

cccclai commented Sep 27, 2024

YifanShenSZ commented Sep 27, 2024

d-findlay commented Sep 27, 2024

cymbalrush commented Oct 3, 2024

cccclai commented Oct 3, 2024

d-findlay commented Oct 3, 2024

d-findlay commented Oct 3, 2024

cymbalrush commented Oct 7, 2024 • edited Loading

d-findlay commented Oct 8, 2024

cymbalrush commented Oct 9, 2024

cccclai commented Sep 27, 2024 •

edited

Loading

d-findlay commented Sep 27, 2024 •

edited

Loading

cymbalrush commented Oct 7, 2024 •

edited

Loading