[Draft]: Add flash attention to whisper (unsure how correct this is) #2722

Murad-Awad · 2025-01-16T21:44:58Z

Whisper should be able to support flash attention. I attempted to add this in this PR; I don't know much about flash attention and a lot of this was pattern matching with other examples in the repo. I need to take a more detailed look later.

Murad-Awad · 2025-01-16T21:45:43Z

candle-transformers/src/models/whisper/model.rs

+            let k = k.transpose(1, 2)?;
+            let v = v.transpose(1, 2)?;
+            let softmax_scale = 1f32 / (self.n_head as f32).sqrt();
+            flash_attn(&q, &k, &v, softmax_scale, true)?.transpose(1, 2)


I think all attention is technically causal in whisper given that it predicts tokens sequentially. I could be wrong on this though and I have a very cursory understanding of flash attention.

[Draft]: Add flash attention to whisper (unsure how correct this is)

260a58b

Murad-Awad commented Jan 16, 2025

View reviewed changes

fix

b1449c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft]: Add flash attention to whisper (unsure how correct this is) #2722

[Draft]: Add flash attention to whisper (unsure how correct this is) #2722

Murad-Awad commented Jan 16, 2025

Murad-Awad Jan 16, 2025

[Draft]: Add flash attention to whisper (unsure how correct this is) #2722

Are you sure you want to change the base?

[Draft]: Add flash attention to whisper (unsure how correct this is) #2722

Conversation

Murad-Awad commented Jan 16, 2025

Murad-Awad Jan 16, 2025

Choose a reason for hiding this comment