-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CombineLayer
and others (Data.get_common_data
) should use broadcast_matches=False
#666
Comments
I am running into this issue in several places now while updating my config to use dim tags. E.g. I have an "attention head" spatial dim tag, which I just want to set to have dimension 1 if I want to use single head attention. That breaks things all over the place for me, but luckily, because of Besides The same also for |
As another solution, maybe we can disallow the broadcasting if this is an explicit dim tag by the user? To implement that, we would need to go through all places in RETURNN where a static dim tag is created, and add a new flag like This would not require a new behavior version. But not sure if really needed... Maybe new behavior version is cleaner and easier. Edit Actually, this also was implemented in #1388. There exist the flag |
It shouldn't be too difficult for a user to update their config to not use broadcast dims, so I'd say let's just add a new behavior version. |
Yea ok. Are you doing this? |
Jup, sure. |
Auto-generated dims are those via the legacy n_out or all other internal dims. Any dim tags created explicitly by the user should never be treated as broadcast dims. See also #666.
Auto-generated dims are those via the legacy n_out or all other internal dims. Any dim tags created explicitly by the user should never be treated as broadcast dims. See also #666.
CombineLayer
and others rely onData.get_common_data
.Data.get_common_data
usesbroadcast_matches=True
for theDimensionTag.is_equal
opts. Such that you can match a tensor [B,1,T] with [B,4,T].However, this relies on heuristics, and is not really needed. There is never a reason in RETURNN to create such broadcast dim [1] in the first place. So instead of [B,1,T] you would just have a tensor [B,T]. The broadcasting in RETURNN would automatically happen on non-existing dims. (
Data.copy_compatible_to
usually does that internally.) This would be much more consistent and reliable.This needs a new behavior version (#508) because older configs rely on this.
The text was updated successfully, but these errors were encountered: