You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I notice there are implementations of PagedAttention for the CPU backend. However, these ops are never used in ipex.llm.optimize. Is there any opportunity to support the PagedAttention version of compiled models?
The text was updated successfully, but these errors were encountered:
Hi @YYue000
Which op are you referring to here? Could you share the implementation?
By "PagedAttention version of compiled models", do you mean with torch.compile?
Describe the issue
I notice there are implementations of PagedAttention for the CPU backend. However, these ops are never used in ipex.llm.optimize. Is there any opportunity to support the PagedAttention version of compiled models?
The text was updated successfully, but these errors were encountered: