LlamaVision: Move xattn cache generation to text prefill forward #15056

cglagovichTT · 2024-11-14T18:14:31Z

Ticket

Problem description

From the ticket:

Currently, xattn caches are generated before running text prefill. This is an issue for vLLM integration since vLLM will provide page tables for the xattn caches which we must respect. If xattn caches are generated before text prefill, then prefill attention SDPA will need to support paged KV.

In order to allow vLLM integration without changing prefill SDPA to support paged KV, we will modify the model to compute xattn caches during text prefill. That way, text prefill will generate an unpaged cache, use it locally in attention, then do a paged fill cache to store the cache for decode iterations.

What's changed

I moved xattn cache generation to text model prefill. This is a change to the plumbing, nothing drastic.

Checklist

Post commit CI https://github.com/tenstorrent/tt-metal/actions/runs/11843122221
- Note: My changes should have no effect on the post commit tests
T3K unit, frequent, demo https://github.com/tenstorrent/tt-metal/actions/runs/11843107464

(cherry picked from commit d0f78cb)

#15008: Move xattn cache generation to text prefill forward

5cde377

(cherry picked from commit d0f78cb)

cglagovichTT marked this pull request as ready for review November 14, 2024 18:18

cglagovichTT requested review from yieldthought, mtairum and uaydonat as code owners November 14, 2024 18:18

mtairum approved these changes Nov 14, 2024

View reviewed changes

cglagovichTT merged commit d7d7b3c into main Nov 14, 2024
145 of 155 checks passed

cglagovichTT deleted the cglagovich/15008_rebase branch November 14, 2024 21:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LlamaVision: Move xattn cache generation to text prefill forward #15056

LlamaVision: Move xattn cache generation to text prefill forward #15056

cglagovichTT commented Nov 14, 2024 •

edited

Loading

LlamaVision: Move xattn cache generation to text prefill forward #15056

LlamaVision: Move xattn cache generation to text prefill forward #15056

Conversation

cglagovichTT commented Nov 14, 2024 • edited Loading

Ticket

Problem description

What's changed

Checklist

cglagovichTT commented Nov 14, 2024 •

edited

Loading