Skip to content

Commit

Permalink
fixed masked flash attention
Browse files Browse the repository at this point in the history
  • Loading branch information
l-k-11235 committed May 17, 2024
1 parent 0920080 commit 2c4eded
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions onmt/modules/multi_headed_attn.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,12 @@ def forward(
if sliding_window > 0 and key.size(2) > sliding_window:
key = key[:, :, 1:, :]
value = value[:, :, 1:, :]
if key_pad_mask is not None and step == 0:
x = key_pad_mask
x = x.expand(-1, self.head_count // self.parallel_gpu, -1)
x = x.unsqueeze(3)
x = x.expand(-1, -1, -1, 128)
value = value.masked_fill(x, 0)

self.layer_cache[1]["keys"] = key
self.layer_cache[1]["values"] = value
Expand Down

0 comments on commit 2c4eded

Please sign in to comment.