You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that at this line, the window_size passed to flash_attn_func is (-window_size, 0). From my tests, this configuration effectively implements global attention rather than sliding window attention. I believe the correct implementation should resemble the one found in Mistral.
The text was updated successfully, but these errors were encountered:
Hello Volodymyr,
I noticed that at this line, the window_size passed to flash_attn_func is (-window_size, 0). From my tests, this configuration effectively implements global attention rather than sliding window attention. I believe the correct implementation should resemble the one found in Mistral.
The text was updated successfully, but these errors were encountered: