You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have the same question. It seems that this project primarily focuses on translating the described computation graph into code, with all the examples provided illustrating the forward inference process. For the backward, one would need to describe the corresponding backward graph's computation process on their own. This is just my understanding, so it may not be correct.
Even if you use Triton (not torch.compile), you will need to write your own backward. Torch.compile doesn't work for complex operators with transposes etc. such as Flash Attn
I would like to know if it can be combined with the torch for training?
The text was updated successfully, but these errors were encountered: