Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError occuring during conversion #116

Open
lucasoethe opened this issue Mar 24, 2022 · 2 comments
Open

KeyError occuring during conversion #116

lucasoethe opened this issue Mar 24, 2022 · 2 comments
Assignees

Comments

@lucasoethe
Copy link
Contributor

While trying to convert this code, I get the following error during the third run of the converter, while performing a view on the tensor(Full log):

Traceback (most recent call last):
  File "main.py", line 32, in <module>
    out = verify_torch_and_convert_to_returnn(
- snip -
  File "/u/soethe/pytorch-to-returnn-cn/pytorch_to_returnn/naming/namespace.py", line 265, in name_in_ctx
    raise KeyError(f"namespace {self!r}: {_src_tensor or possible_sub_names!r} not found")
KeyError: "namespace <RegisteredName 'primary_capsules' <ModuleEntry <CapsuleLayer>> -> ...>: <TensorEntry name:? tensor:(B(100),F'feature:data'(1)(1),'time:data'[B](28),'spatial1:data'[B](28)) returnn_data:'data' [B,F|F'feature:data'(1),T|'time:data'[B],'spatial1:data'[B]] axes id> not found"

The code I'm using to convert can be found here and is adapted from the mnist example already in the converter.
Unfortunately I haven't managed to reproduce the error in a separate test case, the current test case can be found here. The torch layers are identical between the test case and the code that errors, as are the shapes including the metadata of which axis is B/T/F, however in the test case the error cannot be reproduced and converts the call to view fine.

However in the test case, another error occurs related to a torch.cat call. I'm not sure whether this is an error by itself (Full log):

File "/u/soethe/pythonpackages/returnn/returnn/tf/util/data.py", line 4964, in Data.set_dynamic_size
    line: assert sizes_tag, "%s: assign dyn sizes %s without defined dim tag" % (self, sizes)
    locals:
      sizes_tag = <local> None
      self = <local> Data{'Cat_ReturnnReinterpretSameSizeAs_output', [B,F'Conv2d_2:channel*Conv2d_2:conv:s0*Conv2d_2:conv:s1'[B],F|F'Unflatten_1_split_dims1'(1)]}
      sizes = <local> <tf.Tensor 'Flatten/mul_1:0' shape=(?,) dtype=int32>
AssertionError: Data{'Cat_ReturnnReinterpretSameSizeAs_output', [B,F'Conv2d_2:channel*Conv2d_2:conv:s0*Conv2d_2:conv:s1'[B],F|F'Unflatten_1_split_dims1'(1)]}: assign dyn sizes Tensor("Flatten/mul_1:0", shape=(?,), dtype=int32) without defined dim tag
@christophmluscher
Copy link

@albertz do you have an idea where this is coming from?

@lucasoethe
Copy link
Contributor Author

The error seems to occur when there are nested calls using nn.Module, here is a small testcase:

def test_view_in_nested_module():
  def model_func(wrapped_import, inputs: torch.Tensor):
    if wrapped_import:
      nn = wrapped_import("torch.nn")
      F = wrapped_import("torch.nn.functional")
    else:
      import torch.nn.functional as F
      import torch.nn as nn

    class CapsuleLayer(nn.Module):
      def __init__(self):
        super(CapsuleLayer, self).__init__()

      def forward(self, x):
        return x.view(x.size(0), -1, 1)

    class CapsuleNet(nn.Module):
      def __init__(self):
        super(CapsuleNet, self).__init__()
        self.primary_capsules = CapsuleLayer()

      def forward(self, x):
        x = F.relu(x)
        x = self.primary_capsules(x)
        return x

    net = CapsuleNet()
    return net(inputs)
  rnd = numpy.random.RandomState(42)
  x = rnd.normal(0., 1., (100, 256, 20, 20)).astype("float32")
  verify_torch_and_convert_to_returnn(model_func, inputs=x, returnn_dummy_input_shape=x.shape)

The output during the second run seems to indicate that the converter is not finding the call to view because it's not recursively going through the module:

>>>> Module naming hierarchy:
.tmp_root: (hidden, empty)
primary_capsules: <ModuleEntry CapsuleLayer()> -> ...
>>>> Root module calls:
{
  'primary_capsules': <CallEntry 'primary_capsules' <ModuleEntry CapsuleLayer()> (depth 1)>
}
>>>> Modules with params:
{}
>>>> Looks good!

However the F.relu is required to reproduce the error, if x is just passed into primary_capsules the error doesn't occur, which indicates there's more to it than only the nested nn.Module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants