-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] CoreML not being used to it's fullest capacity - custom transformer #19887
Comments
#17654 is related. |
Note, my actual model is more complicated. It uses Rotary embeddings, XL-recurrence, kv memories and a few other things. I've stripped things back massively to produce a minimal example. |
Thanks @pfeatherstone! Are you able to attach the output from the usability checker? |
|
Hi @natke have you had a chance to look at this? |
Hi @pfeatherstone, we are looking into it! |
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details. |
Any news? I am facing a similar performance issue. LayerNorm and MultiHeadAttention seems not to be implemented as operators in CoreML. Any plans to support them? |
I just checked with onnxruntime 1.18 and it's exactly the same |
Any updates ? |
Describe the issue
I am converting a Pytorch model to ONNX and running it with ONNXRUNTIME on a MacBook Pro using CoreML EP.
My model is a custom transformer model.
Only 25% of the nodes can run on CoreML. So performance is about the same as running on CPU.
To reproduce
Then run on the terminal:
You will see something like:
Urgency
Not super urgent but this would be a massive win for me if I could get the performance on CoreML to be within 25% of my NVIDIA card.
Platform
Linux
OS Version
ubuntu 22
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.17.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CoreML
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
No
The text was updated successfully, but these errors were encountered: