KV Cache Improved Flexibility #4668

cmikeh2 · 2023-11-11T04:50:11Z

This KV-cache adds the foundation for appropriately supporting two key KV-cache improvements:

Delineation between local/dense KV caches/models at the cache level in addition to the attention module level.
Support for multiple types of disjoint KV caches (such as alternating local + dense attention GPT-Neo).

Follow up item: Determine appropriate statistics for weighting local + dense KV block ratios when both are present.

…exibility

jeffra · 2023-11-16T01:21:47Z

deepspeed/inference/v2/ragged/ragged_manager.py

@@ -124,7 +116,9 @@ def flush_sequence(self, uid: int) -> None:
            return

        seq = self._seqs[uid]
-        self._kv_cache.free(seq.all_block_ids)
+        for i in range(self.n_kv_caches):


self.n_kv_caches doesn't exist :(

This KV-cache adds the foundation for appropriately supporting two key KV-cache improvements: 1. Delineation between local/dense KV caches/models at the cache level in addition to the attention module level. 2. Support for multiple types of disjoint KV caches (such as alternating local + dense attention GPT-Neo). Follow up item: Determine appropriate statistics for weighting local + dense KV block ratios when both are present. --------- Co-authored-by: Olatunji Ruwase <[email protected]>

cmikeh2 added 3 commits November 11, 2023 04:12

Initial implementation

8450156

Passing unit tests

f191146

Formatting

58cf675

cmikeh2 requested review from RezaYazdaniAminabadi, jeffra, mrwyattii, awan-10, arashb and tjruwase as code owners November 11, 2023 04:50

cmikeh2 mentioned this pull request Nov 11, 2023

Compatibility with DS Inference KV-cache flexibility PR microsoft/DeepSpeed-MII#284

Merged

tjruwase and others added 2 commits November 13, 2023 13:31

Merge branch 'master' into cholmes/kv-cache-flexibility

21b2b02

Merge remote-tracking branch 'origin/master' into cholmes/kv-cache-fl…

2de9ef5

…exibility

awan-10 approved these changes Nov 14, 2023

View reviewed changes

awan-10 added this pull request to the merge queue Nov 14, 2023

Merged via the queue into master with commit 901d807 Nov 14, 2023
16 checks passed

jeffra reviewed Nov 16, 2023

View reviewed changes

jeffra added a commit that referenced this pull request Nov 16, 2023

fixes issues introduced in #4668

5acef9e

mauryaavinash95 pushed a commit to mauryaavinash95/DeepSpeed that referenced this pull request Feb 17, 2024

fixes issues introduced in microsoft#4668

85d878d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KV Cache Improved Flexibility #4668

KV Cache Improved Flexibility #4668

cmikeh2 commented Nov 11, 2023

jeffra Nov 16, 2023

KV Cache Improved Flexibility #4668

KV Cache Improved Flexibility #4668

Conversation

cmikeh2 commented Nov 11, 2023

jeffra Nov 16, 2023

Choose a reason for hiding this comment