You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I just found this repository and I really like the idea. I wanted to try it out for a different model that what's on the readme for this repo, like say LLama 3 8B TP2.
But I'm a novice and am struggling to understand how the inputs in the given example must be modified
1.) From all the looking around I assume this is a prefill kernel for cl 4096 with input size of 256. am I correct?
2.) for each of the Q K and V tensors, why is the first of the input dims tuple 2 * batch size?
3.) what is 64 in the input dims tuple supposed to be? because the readme says this is a kernel for Llama 70B tp4...
The text was updated successfully, but these errors were encountered:
Hi, I just found this repository and I really like the idea. I wanted to try it out for a different model that what's on the readme for this repo, like say LLama 3 8B TP2.
But I'm a novice and am struggling to understand how the inputs in the given example must be modified
1.) From all the looking around I assume this is a prefill kernel for cl 4096 with input size of 256. am I correct?
2.) for each of the Q K and V tensors, why is the first of the input dims tuple 2 * batch size?
3.) what is 64 in the input dims tuple supposed to be? because the readme says this is a kernel for Llama 70B tp4...
The text was updated successfully, but these errors were encountered: