supporting PagedAttention in optimized models #735

YYue000 · 2024-11-21T15:51:04Z

Describe the issue

I notice there are implementations of PagedAttention for the CPU backend. However, these ops are never used in ipex.llm.optimize. Is there any opportunity to support the PagedAttention version of compiled models?

devpramod · 2024-12-12T14:53:38Z

Hi @YYue000
Which op are you referring to here? Could you share the implementation?
By "PagedAttention version of compiled models", do you mean with torch.compile?

devpramod self-assigned this Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

supporting PagedAttention in optimized models #735

supporting PagedAttention in optimized models #735

YYue000 commented Nov 21, 2024

devpramod commented Dec 12, 2024

supporting PagedAttention in optimized models #735

supporting PagedAttention in optimized models #735

Comments

YYue000 commented Nov 21, 2024

Describe the issue

devpramod commented Dec 12, 2024