-
我理解整个函数的目的是从 global memory 中加载 kvcache 到 rmem, 如果移除这一行,程序会 crash;如果改为用任意值初始化 rmem[s][c],似乎也能得到正确结果。 |
Beta Was this translation helpful? Give feedback.
Answered by
lzhangzz
Nov 13, 2024
Replies: 1 comment
-
这里是提前清零越界值,否则可能会有 NAN 或者 INF。随便设个值也行估计是因为会被 attention score mask 掉。 |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
vicety
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
这里是提前清零越界值,否则可能会有 NAN 或者 INF。随便设个值也行估计是因为会被 attention score mask 掉。