Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change warning for placeholder size to exception #1143

Open
wants to merge 27 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
30e8568
Raise exception for different size placeholders
Dark-Rock Oct 13, 2022
f5df789
fixed typo
Dark-Rock Oct 13, 2022
5374abb
fixed typo for pep8
Dark-Rock Oct 13, 2022
b191839
removed dedicated exception
Dark-Rock Oct 13, 2022
d5acc2e
removed check for dimension
Dark-Rock Oct 13, 2022
a14f1df
removed unused import
Dark-Rock Oct 13, 2022
d18cb30
Merge remote-tracking branch 'origin' into warning-to-exception
Dark-Rock Oct 13, 2022
2105a37
Merge branch 'rwth-i6:master' into warning-to-exception
Dark-Rock Oct 16, 2022
c6b5023
implemented behaviorversion exception
Dark-Rock Oct 16, 2022
28b3985
Revert "implemented behaviorversion exception"
Dark-Rock Oct 16, 2022
6ea93f1
Behavioral implementation for exception
Dark-Rock Oct 16, 2022
da2fed7
added documentation for behavior
Dark-Rock Oct 17, 2022
59bae84
formating contribution
Dark-Rock Oct 17, 2022
49d9960
Update returnn/tf/util/data.py
albertz Oct 17, 2022
3820bc7
Update docs/configuration_reference/behavior_version.rst
albertz Oct 17, 2022
8ae0cbd
Update returnn/tf/util/data.py
albertz Oct 17, 2022
2ebc9fc
removed auto_create_placeholders
Dark-Rock Oct 18, 2022
51850fc
merged master into warning-to-exception
Dark-Rock Oct 19, 2022
ac94057
corrected documentation for behavior
Dark-Rock Oct 19, 2022
b6883e2
moved dim tags to behavior version 15
Dark-Rock Oct 19, 2022
d283da7
Merge remote-tracking branch 'origin' into warning-to-exception
Dark-Rock Oct 19, 2022
33d7ebf
tflayernetwork test compatibility warning-to-exception
Dark-Rock Oct 20, 2022
41608db
changed the order of the if statement
Dark-Rock Oct 20, 2022
7aa2303
restored comment to it's rightful place
Dark-Rock Oct 20, 2022
c407835
Update returnn/tf/util/data.py
albertz Oct 20, 2022
3538108
Update returnn/tf/util/data.py
albertz Oct 20, 2022
7df2a58
Update returnn/tf/util/data.py
albertz Oct 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/configuration_reference/behavior_version.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,17 @@ and not listing legacy/deprecated parameters.
Version History
---------------

Behavior version 15 (2022-10-19)
albertz marked this conversation as resolved.
Show resolved Hide resolved
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Dimension tags with different dynamic size tensors
can not be merged anymore via `declare_same_as`.
This can happen when the user did not set the dim tags
correctly in `extern_data`.
Otherwise it is likely a bug.

See issue `#1141 <https://github.com/rwth-i6/returnn/issues/1141>`__.

Behavior version 14 (2022-10-19)
albertz marked this conversation as resolved.
Show resolved Hide resolved
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
13 changes: 6 additions & 7 deletions returnn/tf/util/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
import traceback

import returnn.util.basic as util
from returnn.util.basic import NotSpecified, Entity
from returnn.util.basic import BehaviorVersion, NotSpecified, Entity
import returnn.tf.compat as tf_compat


Expand Down Expand Up @@ -998,12 +998,11 @@ def declare_same_as(self, other):
if self.dyn_size is not None and other_same_base.dyn_size is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this whole block of code (if self.dyn_size is not None and other_same_base.dyn_size is not None: ...) a few lines up, just before the line if self_same_as is not self: ...?

This should make the error message and stacktrace more clear because then you still see the dim tags separated and not yet merged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trace is pretty big but I think this is the most relevant part:

Finished training in epoch 2.
Run 3: Train results:
{1: EpochData(learningRate=0.01, error={
'dev_error': 0.6000000089406967,
'dev_score': 1.0964271708790534,
'train_error': 0.700000010430813,
'train_score': 1.0993032428518419,
}),
 2: EpochData(learningRate=0.01, error={
'dev_error': 0.6000000089406967,
'dev_score': 1.0963989421188671,
'train_error': 0.6600000098347665,
'train_score': 1.0985237666744554,
})}
Run 3: Forward cv seq 0:
[1 2 0 1 2]
Epoch 1, error key 'dev_error', current value 0.600000 vs prev value 0.600000, equal?
Epoch 1, error key 'dev_score', current value 1.096427 vs prev value 1.096427, equal?
Epoch 1, error key 'train_error', current value 0.700000 vs prev value 0.680000, equal?
EXCEPTION
Traceback (most recent call last):
  File "/Users/darkrock/returnn/tests/test_TFEngine.py", line 4406, in <module>
    line: v()
    locals:
      v = <local> <function test_rec_subnet_auto_optimize at 0x1658a9120>
  File "/Users/darkrock/returnn/tests/test_TFEngine.py", line 2603, in test_rec_subnet_auto_optimize
    line: run(run_idx=3, optimize_move_layers_out=True)
    locals:
      run = <local> <function test_rec_subnet_auto_optimize.<locals>.run at 0x168c6df30>
      run_idx = <not found>
      optimize_move_layers_out = <not found>
  File "/Users/darkrock/returnn/tests/test_TFEngine.py", line 2594, in test_rec_subnet_auto_optimize.<locals>.run
    line: numpy.testing.assert_almost_equal(error_value, prev_error_value, decimal=3)
    locals:
      numpy = <global> <module 'numpy' from '/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/__init__.py'>
      numpy.testing = <global> <module 'numpy.testing' from '/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/testing/__init__.py'>
      numpy.testing.assert_almost_equal = <global> <function assert_almost_equal at 0x165877ac0>
      error_value = <local> 0.700000010430813
      prev_error_value = <local> 0.6800000101327897
      decimal = <not found>
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 599, in assert_almost_equal
    line: raise AssertionError(_build_err_msg())
    locals:
      AssertionError = <builtin> <class 'AssertionError'>
      _build_err_msg = <local> <function assert_almost_equal.<locals>._build_err_msg at 0x168c6de10>
AssertionError: 
Arrays are not almost equal to 3 decimals
 ACTUAL: 0.700000010430813
 DESIRED: 0.6800000101327897

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't exactly understand your comment. Did you do this change? I don't see it here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the change, didn't commit it yet. I'll push it so you can see the stacktrace in the pipeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't see this stacktrace/error you posted here in the pipeline?

if self.dyn_size is not other_same_base.dyn_size:
if self.batch == other_same_base.batch and self.control_flow_ctx == other_same_base.control_flow_ctx:
# Note: Instead of making this a warning, we could also enforce this at some point.
# The user should be able to fix `extern_data` in the config such that this is correct in the first place.
# Also, in addition to this warning, we might want to add some runtime check on the eq of the dyn sizes.
print(
"Warning: assuming dim tags are same with different size placeholders: %r vs %r" % (
self.dyn_size, other_same_base.dyn_size))
BehaviorVersion.require(
False,
"Dim tags are same with different size placeholders (%r vs %r), please check external_data" % (
self.dyn_size, other_same_base.dyn_size),
15)
# If we have a defined source, and this is a dynamic spatial axis, and it was undefined before,
albertz marked this conversation as resolved.
Show resolved Hide resolved
# maybe we can overtake the size_placeholder now.
if other_same_base.dyn_size is not None and self.src_data:
Expand Down
2 changes: 1 addition & 1 deletion returnn/util/basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ class BehaviorVersion:
The version will be set after the config is defined at __main__.init_config() or Engine.__init__()
"""

_latest_behavior_version = 14
_latest_behavior_version = 15
_behavior_version = None # type: typing.Optional[int]

@classmethod
Expand Down
18 changes: 6 additions & 12 deletions tests/test_TFNetworkLayer.py
Original file line number Diff line number Diff line change
Expand Up @@ -1178,15 +1178,13 @@ def test_CombineLayer_two_time_dims():
name="in0", shape=(None, None, n_dim), batch_dim_axis=1, auto_create_placeholders=True)
in1 = Data(
# same time as first in in0
name="in1", shape=(None, n_dim), auto_create_placeholders=True)
name="in1", dim_tags=[in0.dim_tags[i] for i in (1, 0, 3)], auto_create_placeholders=True)
in2 = Data(
# same time as in second in in0
name="in2", shape=(None, n_dim), batch_dim_axis=1, auto_create_placeholders=True)
name="in2", dim_tags=[in0.dim_tags[i] for i in (2, 1, 3)], auto_create_placeholders=True)
extern_data.register_data(in0)
extern_data.register_data(in1)
extern_data.register_data(in2)
in1.get_size_dim_tag(0).declare_same_as(in0.get_size_dim_tag(0))
in2.get_size_dim_tag(0).declare_same_as(in0.get_size_dim_tag(1))
print("ExternData all dimension tags (allow_same_feature_dim=True):")
pprint(extern_data.get_all_dimension_tags(allow_same_feature_dim=True))
network = TFNetwork(config=config, extern_data=extern_data, train_flag=True)
Expand Down Expand Up @@ -1232,15 +1230,13 @@ def test_CombineLayer_two_time_dims_first_not_most_generic():
name="in0", shape=(None, None, n_dim), batch_dim_axis=1, auto_create_placeholders=True)
in1 = Data(
# same time as first in in0
name="in1", shape=(None, n_dim), auto_create_placeholders=True)
name="in1", dim_tags=[in0.dim_tags[i] for i in (1, 0, 3)], auto_create_placeholders=True)
in2 = Data(
# same time as in second in in0
name="in2", shape=(None, n_dim), batch_dim_axis=1, auto_create_placeholders=True)
name="in2", dim_tags=[in0.dim_tags[i] for i in (2, 1, 3)], auto_create_placeholders=True)
extern_data.register_data(in0)
extern_data.register_data(in1)
extern_data.register_data(in2)
in1.get_size_dim_tag(0).declare_same_as(in0.get_size_dim_tag(0))
in2.get_size_dim_tag(0).declare_same_as(in0.get_size_dim_tag(1))
print("ExternData all dimension tags (allow_same_feature_dim=True):")
pprint(extern_data.get_all_dimension_tags(allow_same_feature_dim=True))
network = TFNetwork(config=config, extern_data=extern_data, train_flag=True)
Expand Down Expand Up @@ -1286,15 +1282,13 @@ def test_CombineLayer_two_time_dims_first_not_most_generic_with_n_out():
name="in0", shape=(None, None, n_dim), batch_dim_axis=1, auto_create_placeholders=True)
in1 = Data(
# same time as first in in0
name="in1", shape=(None, n_dim), auto_create_placeholders=True)
name="in1", dim_tags=[in0.dim_tags[i] for i in (1, 0, 3)], auto_create_placeholders=True)
in2 = Data(
# same time as in second in in0
name="in2", shape=(None, n_dim), batch_dim_axis=1, auto_create_placeholders=True)
name="in2", dim_tags=[in0.dim_tags[i] for i in (2, 1, 3)], auto_create_placeholders=True)
extern_data.register_data(in0)
extern_data.register_data(in1)
extern_data.register_data(in2)
in1.get_size_dim_tag(0).declare_same_as(in0.get_size_dim_tag(0))
in2.get_size_dim_tag(0).declare_same_as(in0.get_size_dim_tag(1))
print("ExternData all dimension tags (allow_same_feature_dim=True):")
pprint(extern_data.get_all_dimension_tags(allow_same_feature_dim=True))
network = TFNetwork(config=config, extern_data=extern_data, train_flag=True)
Expand Down
8 changes: 4 additions & 4 deletions tests/test_TFUtil.py
Original file line number Diff line number Diff line change
Expand Up @@ -663,8 +663,8 @@ def test_Data_copy_compatible_to_bias_to_batch_time_spatial_feature():


def test_Data_get_common_data_extra_static_spatial():
d1 = Data(name='t', shape=(None, 32, 128), dtype='float32', auto_create_placeholders=True)
d2 = Data(name='r', shape=(None, 32, 128), dtype='float32', auto_create_placeholders=True)
d1 = Data(name='t', shape=(None, 32, 128), dtype='float32')
d2 = Data(name='r', shape=(None, 32, 128), dtype='float32')
d2.get_size_dim_tag(0).declare_same_as(d1.get_size_dim_tag(0))
common = Data.get_common_data([d1, d2])
assert d1.shape == common.shape
Expand All @@ -678,8 +678,8 @@ def test_Data_get_common_data_broadcast_multiple():


def test_Data_get_common_data_extra2_static_spatial():
d1 = Data(name='t', shape=(None, 32, 32, 128), dtype='float32', auto_create_placeholders=True)
d2 = Data(name='r', shape=(None, 32, 32, 128), dtype='float32', auto_create_placeholders=True)
d1 = Data(name='t', shape=(None, 32, 32, 128), dtype='float32')
d2 = Data(name='r', shape=(None, 32, 32, 128), dtype='float32')
d2.get_size_dim_tag(0).declare_same_as(d1.get_size_dim_tag(0))
common = Data.get_common_data([d1, d2])
assert d1.shape == common.shape
Expand Down