Fix for random sampler recompilations for incomplete batches #663

mfylcek · 2024-12-30T10:21:31Z

Changes the sampler used by dummy sequences to random if all other sequences are using it. Prevents sampler recompilations.

…padding

madamczykhabana · 2025-01-10T12:06:01Z

vllm/worker/hpu_model_runner.py

@@ -1228,10 +1228,18 @@ def prepare_input_tensors(
        batch_size_padded = self.bucketing_ctx.get_padded_batch_size(
            real_batch_size, is_prompt)
        batch_size_padding = batch_size_padded - real_batch_size
+        if all([


does it have to be 'all' ?
I mean, wouldn't a single sample with sampling be sufficient?

We can be only sure that changing the type of sampler for padded sequences will prevent (and not cause) sampler recompilations if all the sequences in a batch use the same type of sampler. For example there could be 1 sequence with greedy sampler and 2 with random sampler in a batch before batch size padding. Let's say the closest warmed-up batch size is 4. We can add 1 sequence with random sampler or 1 with greedy sampler to make 2 greedy/2 random or 1 greedy/3 random groups. If a bucket with batch size 2 was warmed-up then the sampler with 2 greedy samplings or 2 random samplings will also be warmed-up I believe and that prevents the recompilation.

I'd say it's pretty situational. If we flip the situation you described, i.e. 1x sampling + 2x greedy then it might be better to add another sampling as 2x sampling is more likely to be already warmed up then 3x greedy. Anyway handling mixed batches is a huge PITA as this might accidently create a recompilation because sampler shapes are not padded afaik.

Ideally we should warm-up sampler separately and have a similar yet independent bucketing from current bs bucketing as sampling params are aggregated before running. This is out of scope for this PR I'm afraid.

For now I'm thinking about optimizing the most common case which is greedy. What if we flipped the logic like this:
"if there's at least one sample with temperature=0 set temperature=0 for all dummy samples" ? 'any' can be faster then 'all' as it doesn't need to traverse all samples. This means that in case of all temperature>0 batches we'll behave exactly the same as your original code, but in optimistic scenario which is all greedy we can at least reduce the impact of the check. If the batch is mixed then well... we might as well flip a coin ;)

Sampler-aware batch_size padding

0eefae4

mfylcek changed the title ~~Sampler-aware batch_size padding~~ Fix for sampler recompilations when using random sampler with batch_size padding Dec 30, 2024

mfylcek changed the title ~~Fix for sampler recompilations when using random sampler with batch_size padding~~ Fix for random sampler recompilations for incomplete batches Dec 31, 2024

mfylcek marked this pull request as ready for review December 31, 2024 13:36

mfylcek requested review from kzawora-intel, madamczykhabana, michalkuligowski, mgawarkiewicz and vivekgoe as code owners December 31, 2024 13:36

mfylcek added 3 commits December 31, 2024 15:42

Formatting

ea6324b

Formatting

deda42a

Merge branch 'habana_main' into dev/mfylcek/sampler-aware_batch_size_…

2520ec5

…padding

madamczykhabana reviewed Jan 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for random sampler recompilations for incomplete batches #663

Fix for random sampler recompilations for incomplete batches #663

mfylcek commented Dec 30, 2024 •

edited

Loading

madamczykhabana Jan 10, 2025 •

edited

Loading

mfylcek Jan 10, 2025

madamczykhabana Jan 10, 2025

Fix for random sampler recompilations for incomplete batches #663

Are you sure you want to change the base?

Fix for random sampler recompilations for incomplete batches #663

Conversation

mfylcek commented Dec 30, 2024 • edited Loading

madamczykhabana Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

mfylcek Jan 10, 2025

Choose a reason for hiding this comment

madamczykhabana Jan 10, 2025

Choose a reason for hiding this comment

mfylcek commented Dec 30, 2024 •

edited

Loading

madamczykhabana Jan 10, 2025 •

edited

Loading