You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bump. Can someone explain the usage of self.E = torch.randn([self.max_seq, int(self.dh)], requires_grad=False)
while calculating relative attention? Also, this parameter isn't registered so it prevents reproducibility when model is reloaded.
안녕하세요.
Music Transformer 페이퍼와 비교하면서 코들를 읽다가 질문이 있어서 올립니다.
self.E = torch.randn([self.max_seq, int(self.dh)], requires_grad=False)로 distribution을 쓰시더라구요.
이부분은 페이퍼와 다르게 하신건가요?
감사합니다.
The text was updated successfully, but these errors were encountered: