Skip to content

Commit

Permalink
TF WindowLayer, use opt dim order only with new behavior version
Browse files Browse the repository at this point in the history
We had a case where this change of dim order
caused some setup to break.
This was because the setup was also using an old behavior version,
where the dim order had more influence.
It would not have happened with a more recent behavior version.
However, we anyway don't want to break even such old setups.
  • Loading branch information
albertz committed Sep 2, 2023
1 parent e215b19 commit 7a80a2a
Show file tree
Hide file tree
Showing 3 changed files with 59 additions and 25 deletions.
38 changes: 27 additions & 11 deletions docs/configuration_reference/behavior_version.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,20 @@ and not listing legacy/deprecated parameters.
Version History
---------------

Behavior version 18 (2023-09-02)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

TF ``WindowLayer`` returns an optimized dimension order by default.
This is the dimension order which is used anyway internally.
The old behavior was to reshuffle the dim order to the original input order.
There should not be any reason to use the old behavior
(please report it if you think otherwise),
so the flag to control this is considered internal (``_use_opt_dim_order``).

Behavior version 17 (2023-04-19)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``ZoneoutLSTMCell`` used the wrong output,
TF ``ZoneoutLSTMCell`` used the wrong output,
which was different from ``h``
(it was actually the original output without zoneout),
so it was not as specified in the Zoneout paper,
Expand All @@ -51,6 +61,8 @@ Behavior version 16 (2022-11-11)
for different user-generated tags,
or also when comparing user-generated to auto-generated tags.
This should rarely have an effect for you.

For TF layers:
It might break when you mix ``n_out`` and then later also have a different
own dim tag for the same dim.
In that case, they will not match because the tag is different.
Expand All @@ -75,7 +87,7 @@ See issue `#1205 <https://github.com/rwth-i6/returnn/issues/1205>`__.
Behavior version 14 (2022-10-19)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The dim matching in :class:`DotLayer` is now more strict
The dim matching in TF :class:`DotLayer` is now more strict
for the case that ``var1`` and ``var2`` are not provided,
to figure out the common dims.

Expand All @@ -88,7 +100,7 @@ See issue `#1154 <https://github.com/rwth-i6/returnn/issues/1154>`__.
Behavior version 13 (2022-10-13)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This enables some extra checks in the :class:`RecLayer` which break some old configs,
This enables some extra checks in the TF :class:`RecLayer` which break some old configs,
where the old configs where actually broken,
but those broken parts did not play a role for the training
and thus it did not matter.
Expand All @@ -106,7 +118,7 @@ See issue `#1140 <https://github.com/rwth-i6/returnn/issues/1140>`__.
Behavior version 12 (2022-01-06)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The batch norm default settings have been changed.
The TF batch norm default settings have been changed.
The old settings did not make much sense
and almost always lead to unwanted behavior.

Expand All @@ -123,7 +135,7 @@ See issue `#522 <https://github.com/rwth-i6/returnn/issues/522>`__.
Behavior version 11 (2021-12-16)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Broadcasting dims no longer match in :class:`CombineLayer` and others.
Broadcasting dims no longer match in TF :class:`CombineLayer` and others.
This was never needed, instead broadcasting happens in RETURNN automatically to non-existing dims.
To fix this, do not add any broadcasting dims.

Expand All @@ -132,14 +144,14 @@ See issue `#666 <https://github.com/rwth-i6/returnn/issues/666>`__.
Behavior version 10 (2021-12-07)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:class:`ConvLayer` use ``with_bias=True`` by default.
TF :class:`ConvLayer` use ``with_bias=True`` by default.

See issue `#787 <https://github.com/rwth-i6/returnn/issues/787>`__.

Behavior version 9 (2021-12-03)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:class:`ConvLayer`, :class:`PoolLayer` use ``auto_use_channel_first=True`` by default.
TF :class:`ConvLayer`, :class:`PoolLayer` use ``auto_use_channel_first=True`` by default.

In principle, nothing should ever change due to this
when a config is correct in that nothing depends on the order of axes.
Expand All @@ -152,7 +164,7 @@ this should be safe.
Behavior version 8 (2021-11-30)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:class:`ConvLayer`, :class:`PoolLayer` and :class:`TransposedConvLayer`
TF :class:`ConvLayer`, :class:`PoolLayer` and :class:`TransposedConvLayer`
require ``in_spatial_dims`` to be specified
when the input has more than one spatial dimension
(which implies that you perform 2D or 3D convolution or pooling).
Expand All @@ -161,7 +173,7 @@ This is required to make the order of the spatial axes well defined
because the input axes could have been reordered in any way before.
See issue `#594 <https://github.com/rwth-i6/returnn/issues/594>`__.

Usually, you would use :class:`DimensionTag` to specify ``in_spatial_dims``.
Usually, you would use :class:`Dim` to specify ``in_spatial_dims``.
However, to make the transition easier for this specific new behavior,
you can also use a string description for a dimension.
So example usages look like:
Expand All @@ -178,6 +190,7 @@ So example usages look like:
Behavior version 7 (2021-11-29)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For TF layers:
Do not allow to specify ``axes`` or ``axis`` arguments in a way that depends on the order of the axes.
E.g. things like ``axis="spatial:1"`` would not be allowed.

Expand All @@ -189,7 +202,7 @@ or ``"dim:<static-dim>"``.
Behavior version 6 (2021-11-27)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:class:`MergeDimsLayer` uses ``keep_order=True`` and does not allow ``keep_order=False``.
TF :class:`MergeDimsLayer` uses ``keep_order=True`` and does not allow ``keep_order=False``.
There never should be a reason to use ``keep_order=False`` anyway.
If you have that, just remove it.
If that causes any problems, there is probably some other issue in your config.
Expand All @@ -199,6 +212,7 @@ See issue `#654 <https://github.com/rwth-i6/returnn/issues/654>`__.
Behavior version 5 (2021-11-26)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For TF layers:
Any ``axis`` or ``axes`` argument in layers does not allow int values anymore.
Instead, use either a str like ``"F"`` or ``"stag:..."``
or use a :class:`DimensionTag` instance.
Expand All @@ -208,6 +222,7 @@ See issue `#773 <https://github.com/rwth-i6/returnn/issues/773>`__.
Behavior version 4 (2021-11-23)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For TF layers:
Broadcasting in all inputs simultaneously in layers and other ops
is not allowed anymore by default.
In all inputs simultaneously means that there is no input which has all common dimensions.
Expand All @@ -221,7 +236,7 @@ and issue `#691 <https://github.com/rwth-i6/returnn/issues/691>`__.
Behavior version 3 (2021-11-08)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``DotLayer``: disallow ``int`` axes descriptions, remove and change defaults.
TF ``DotLayer``: disallow ``int`` axes descriptions, remove and change defaults.

Change ``-1`` to e.g. ``"static:-1"`` or ``"F"``.
Change ``-2`` to e.g. ``"dynamic:0"`` or ``"T"`` or ``"stag:..."`` or ``dim_tag``.
Expand All @@ -239,6 +254,7 @@ See issue `#512 <https://github.com/rwth-i6/returnn/issues/514>`__.
Behavior version 1 (2021-05-28)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For TF layers:
Disallow not specifying ``"from"`` in layer definition dictionaries,
thus making use of the hidden default ``"data"`` as layer input.

Expand Down
38 changes: 27 additions & 11 deletions returnn/tf/layers/basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3754,6 +3754,7 @@ def __init__(
out_spatial_dim=None,
padding="same",
stride=1,
_use_opt_dim_order=None,
**kwargs,
):
"""
Expand All @@ -3765,7 +3766,7 @@ def __init__(
:param Dim|None out_spatial_dim:
:param str padding: "same" or "valid"
:param int stride: return only each Nth window
:param kwargs:
:param bool|None _use_opt_dim_order:
"""
out_spatial_dim # noqa # via get_out_data_from_opts
super(WindowLayer, self).__init__(**kwargs)
Expand All @@ -3774,6 +3775,10 @@ def __init__(
window_size = window_dim.dimension
data = self.input_data.copy_as_batch_major()
from returnn.tf.util.basic import is_axis_from_description_recurrent
from returnn.util.basic import BehaviorVersion

if _use_opt_dim_order is None:
_use_opt_dim_order = BehaviorVersion.get() >= 18

if is_axis_from_description_recurrent(axis=axis, network=self.network, data=data):
# Inside RecLayer.
Expand All @@ -3795,18 +3800,20 @@ def __init__(

else:
axis = data.get_axis_from_description(axis)
data = data.copy_move_axis(axis, 0) # move to front, more efficient, see windowed_nd

from returnn.tf.util.basic import windowed_nd
new_dim_axis = axis + 1 # add new axis right after
if _use_opt_dim_order:
data = data.copy_move_axis(axis, 0) # move to front, more efficient, see windowed_nd
axis = 0
new_dim_axis = 0 if stride == 1 else 1 # see windowed_nd

self.output.placeholder = windowed_nd(
self.output.placeholder = tf_util.windowed_nd(
data.placeholder,
window_size=window_size,
window_left=window_left,
window_right=window_right,
padding=padding,
time_axis=0,
new_window_axis=1,
time_axis=axis,
new_window_axis=new_dim_axis,
stride=stride,
)
self.output.placeholder.set_shape(tf.TensorShape(self.output.batch_shape))
Expand All @@ -3823,6 +3830,7 @@ def get_out_data_from_opts(
out_spatial_dim=None,
padding="same",
stride=1,
_use_opt_dim_order=None,
**kwargs,
):
"""
Expand All @@ -3835,6 +3843,7 @@ def get_out_data_from_opts(
:param Dim|None out_spatial_dim:
:param str padding:
:param int stride:
:param bool|None _use_opt_dim_order:
:rtype: Data
"""
if not window_size:
Expand All @@ -3844,23 +3853,30 @@ def get_out_data_from_opts(
data = data.copy_template(name="%s_output" % name)
data = data.copy_as_batch_major()
from returnn.tf.util.basic import is_axis_from_description_recurrent
from returnn.util.basic import BehaviorVersion

if _use_opt_dim_order is None:
_use_opt_dim_order = BehaviorVersion.get() >= 18

if is_axis_from_description_recurrent(axis=axis, network=network, data=data):
# Inside RecLayer.
assert not out_spatial_dim
new_dim_axis = 1 # after batch
else:
axis = data.get_axis_from_description(axis)
data = data.copy_move_axis(axis, 0) # move to front, more efficient, see windowed_nd
in_spatial_dim = data.dim_tags[0]
new_dim_axis = axis + 1 # add new axis right after
if _use_opt_dim_order:
data = data.copy_move_axis(axis, 0) # move to front, more efficient, see windowed_nd
axis = 0
new_dim_axis = 0 if stride == 1 else 1 # see windowed_nd
in_spatial_dim = data.dim_tags[axis]
out_spatial_dim_ = ConvLayer.calc_out_dim(
in_dim=in_spatial_dim, filter_size=window_size, stride=stride, dilation_rate=1, padding=padding
)
assert isinstance(out_spatial_dim_, Dim)
if out_spatial_dim:
out_spatial_dim_.declare_same_as(out_spatial_dim)
data = data.copy_template_replace_dim_tag(axis=0, new_dim_tag=out_spatial_dim_)
new_dim_axis = 1 # add new axis right after
data = data.copy_template_replace_dim_tag(axis=axis, new_dim_tag=out_spatial_dim_)
window_dim_ = Dim(
kind=Dim.Types.Spatial, description="%s:window" % name, dimension=window_size, auto_generated=True
)
Expand Down
8 changes: 5 additions & 3 deletions returnn/util/basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,12 +192,14 @@ def is_torch_selected(cls):

class BehaviorVersion:
"""
Stores the global behavior_version
Stores the global behavior_version.
The version will be set after the config is defined at __main__.init_config() or Engine.__init__()
The version will be set after the config is defined at __main__.init_config() or Engine.__init__().
See :ref:`behavior_version`.
"""

_latest_behavior_version = 17
_latest_behavior_version = 18
_behavior_version = None # type: typing.Optional[int]
_min_behavior_version = 0 # type: int

Expand Down

0 comments on commit 7a80a2a

Please sign in to comment.