Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensions for Sisyphus serialization of net dict and extern data #104

Closed
JackTemaki opened this issue Feb 7, 2022 · 21 comments
Closed

Comments

@JackTemaki
Copy link
Contributor

JackTemaki commented Feb 7, 2022

When Sisyphus stores a job its whole content is pickled before the execution of the tasks itself.
Pickling ReturnnDimTagsProxy instances work, but their representation call stops working after unpickling them:

~/experiments/tts_asr_2021/recipe/returnn_common/nn/naming.py in __repr__(self=<class 'returnn_common.nn.naming.ReturnnDimTagsProxy.SetProxy'> instance)
    627 
    628     def __repr__(self):
--> 629       return f"{{{', '.join(map(repr, self.values))}}}"
    630 
    631   def collect_dim_tags_and_transform_config(self, config: Dict[str, Any]) -> Dict[str, Any]:

~/experiments/tts_asr_2021/recipe/returnn_common/nn/naming.py in __repr__(self=<class 'returnn_common.nn.naming.ReturnnDimTagsProxy.DimRefProxy'> instance)
    589 
    590     def __repr__(self):
--> 591       return self.ref_repr()
        self.ref_repr = undefined
    592 
    593     def ref_repr(self) -> str:

~/experiments/tts_asr_2021/recipe/returnn_common/nn/naming.py in ref_repr(self=<class 'returnn_common.nn.naming.ReturnnDimTagsProxy.DimRefProxy'> instance)
    593     def ref_repr(self) -> str:
    594       """ref repr"""
--> 595       return self.parent.dim_ref_repr(self.dim)
        self.parent.dim_ref_repr = <bound method ReturnnDimTagsProxy.dim_ref_repr of {
  'extern_data.data.dim_tags.1.time': Dim(kind=spatial, description='time', dimension=None),
  'extern_data.data.dim_tags.2.input': Dim(kind=feature, description='input', dimension=50),
  'network.linear.subnetwork.weight.shape.1.ctc_out_dim': Dim(kind=feature, description='ctc_out_dim', dimension=139),
  'extern_data.data.dim_tags.2.input': Dim(kind=feature, description='input', dimension=50),
  'network.linear.subnetwork.weight.shape.1.ctc_out_dim': Dim(kind=feature, description='ctc_out_dim', dimension=139),
}>
        self.dim = Dim{'global batch'[?]}
    596 
    597     def py_id_name(self) -> str:

~/experiments/tts_asr_2021/recipe/returnn_common/nn/naming.py in dim_ref_repr(self={
  'extern_data.data.dim_tags.1.time': Dim(kind...ure, description='ctc_out_dim', dimension=139),
}, dim=Dim{'global batch'[?]})
    574       op_str = {"add": "+", "mul": "*", "truediv_right": "//"}[dim.derived_from_op.kind]
    575       return f" {op_str} ".join(self.dim_ref_repr(in_) for in_ in dim.derived_from_op.inputs)
--> 576     ref = self.dim_tags_to_ref[dim]
        ref = undefined
        self.dim_tags_to_ref = {Dim{'time'[?]}: extern_data_data_dim_tags_1_time_dim, Dim{'input'(50)}: extern_data_data_dim_tags_2_input_dim, Dim{'ctc_out_dim'(139)}: network_linear_subnetwork_weight_shape_1_ctc_out_dim_dim}
        dim = Dim{'global batch'[?]}
    577     return ref.py_id_name()
    578 

KeyError: Dim{'global batch'[?]}

While I might find a solution to my workflow that does not need pickling, I still wanted to raise this issue. Maybe there is a very simple solution for this.

The config was created using:

    config = name_ctx.get_returnn_config()
    dim_tags_proxy = nn.ReturnnDimTagsProxy()
    config = dim_tags_proxy.collect_dim_tags_and_transform_config(config)
@JackTemaki JackTemaki changed the title KeyError when pickling ReturnnDimTagsProxy KeyError after un-pickling ReturnnDimTagsProxy Feb 7, 2022
@albertz
Copy link
Member

albertz commented Feb 7, 2022

Pickling of ReturnnDimTagsProxy is not supported. This involves pickling of Dim instances, which is not supported. That is what ReturnnDimTagsProxy is actually for, to serialize Dim instances.

@albertz
Copy link
Member

albertz commented Feb 7, 2022

Of course, we could add logic which makes pickling of ReturnnDimTagsProxy and its contained dim tags possible. But then, when the net dict is pickled separately, this would not match to each other.

Basically, you don't want to pickle ReturnnDimTagsProxy at all but you just want to pickle the net dict. However, pickling the net dict will not work directly because that contains Dim instances and pickling those is not supported.

You also want to pickle the net dict jointly with extern_data because that shares some of the dim tags. If you don't do that jointly, it will also not work.

@albertz
Copy link
Member

albertz commented Feb 7, 2022

Why do you want to pickle ReturnnDimTagsProxy? I assume you don't really need this, or do you? This doesn't really make sense. ReturnnDimTagsProxy is a helper for the serialization of the config, net dict, extern data to a Python code string.

So, if this issue here is actually about the question/issue on how to pickle the config, let's rephrase it that way, and discuss this, instead of discussing how to pickle ReturnnDimTagsProxy, which is not really relevant.

@JackTemaki
Copy link
Contributor Author

Okay so this is not easily solvable, then lets close it here and continue at some other point. The Sisyphus connection things should anyway better be kept outside of returnn_common, as this is something that should be implemented on the recipe side instead of returnn_common (which is the usual case, we have to adapt the pipeline logic to the software, not the software to the pipeline logic).

@albertz
Copy link
Member

albertz commented Feb 7, 2022

Okay so this is not easily solvable

What do you refer to? Pickling the config (net dict, extern data), or pickling ReturnnDimTagsProxy? The latter should not be relevant for you.

The former, why do you think this is not easily solvable?

For whatever Sisyphus needs for serialization, we might need to extend sth in returnn-common, or maybe also in RETURNN itself. So it makes sense to keep an open issue about this until this is resolved.

@albertz albertz reopened this Feb 7, 2022
@JackTemaki
Copy link
Contributor Author

JackTemaki commented Feb 7, 2022

The former, why do you think this is not easily solvable?

This was about the later. And yes, we agreed it is not relevant so I wanted to close the issue

So it makes sense to keep an open issue about this until this is resolved.

We can rename the issue and continue on the serialization of you want

@albertz albertz changed the title KeyError after un-pickling ReturnnDimTagsProxy Extensions for Sisyphus serialization of net dict and extern data Feb 7, 2022
@JackTemaki
Copy link
Contributor Author

I just wrote a small code part that replaces every instance of ReturnnDimTagProxy or its subclasses by a CodeWrapper object of Sisyphus, that was the most simple solution I could think of, and that works and gives the correct results, at least for the network. For the handling of extern_data there are more changes needed, as this was so far managed by helpers in the Sisyphus code.

So pickling dim tags might not be needed.

@albertz
Copy link
Member

albertz commented Feb 7, 2022

Why do you need ReturnnDimTagProxy at all? Why do you call collect_dim_tags_and_transform_config? This is only useful for serialization to Python code directly.

@albertz
Copy link
Member

albertz commented Feb 7, 2022

collect_dim_tags_and_transform_config replaces dim tags by such proxy objects. And now you again replace the proxy objects by sth else. Instead, you could simply also directly replace the dim tags by sth else.

@JackTemaki
Copy link
Contributor Author

Because of course the config needs to be serialized at some point (so the DimTags get actual variable names). And from looking at the code I got the impression that this is exactly the task of the ReturnnDimTagProxy (to give DimTags real unique names). All I wanted is that the nested structure is kept, which the name_ctx.get_returnn_config_serialized() is not doing.

@JackTemaki
Copy link
Contributor Author

This is the current result for a single network:
https://gist.github.com/JackTemaki/21a48df026f7847211a62c9df1f41bd3

Which looks as expected to me.

@albertz
Copy link
Member

albertz commented Feb 7, 2022

Because of course the config needs to be serialized at some point (so the DimTags get actual variable names).

So you need to serialize the net dict and extern data. You don't need to serialize ReturnnDimTagProxy.

And from looking at the code I got the impression that this is exactly the task of the ReturnnDimTagProxy (to give DimTags real unique names).

ReturnnDimTagProxy is a helper to serialize to Python code. You don't want to serialize ReturnnDimTagProxy itself.

When you want to use pickling for serialization (it sounds so), then you don't need to care about unique names. The pickle logic does handle that already.

All I wanted is that the nested structure is kept, which the name_ctx.get_returnn_config_serialized() is not doing.

Why is this a requirement?

The generated content (esp net dict) can be kind of arbitrary. I don't think it is a good idea to hard-code any implicit or explicit assumptions on how this net dict looks like, so depending on anything of this nested structure is probably not a good idea.

@JackTemaki
Copy link
Contributor Author

When you want to use pickling for serialization

Pickle is not used for serialization, it is just used to store the job information when passing it from the Sisyphus manager to the Sisyphus worker. But as the config is an input of the job, it tries to pickle whatever is in there, e.g. Dims or the Proxies.

Of course if Dim is picklable I could just use the config as is and pass it to the job, and then let returnn_common do the serialization. But I am not sure if like that idea, as first I do not want any returnn_common dependency in the Jobs, and secondly the (at least current) serialization creates a full config instead of the network only, which gives problems if you use multiple networks (like dynamic stages). So ideally I would like to have a network representation that has no special classes or anything within its structure when passing it to the Jobs...

I will play around a little bit more to see what seems to be okay to do.

@albertz
Copy link
Member

albertz commented Feb 7, 2022

When you want to use pickling for serialization

Pickle is not used for serialization, it is just used to store the job information when passing it from the Sisyphus manager to the Sisyphus worker. But as the config is an input of the job, it tries to pickle whatever is in there, e.g. Dims or the Proxies.

I don't understand. So, what you say means: Pickle is used for serialization? This is what pickle does. It serializes.

So, what do you mean? You don't need the serialization? I thought you need the serialization? The job information must be serialized, or not?

Of course if Dim is picklable I could just use the config as is and pass it to the job, and then let returnn_common do the serialization.

I don't understand. If it is picklable, why not use pickle for the serialization? You don't need returnn_common for serialization when you use pickle for serialization.

But I am not sure if like that idea, as first I do not want any returnn_common dependency in the Jobs, and secondly the (at least current) serialization creates a full config instead of the network only, which gives problems if you use multiple networks (like dynamic stages). So ideally I would like to have a network representation that has no special classes or anything within its structure when passing it to the Jobs...

I don't understand. You don't have any of that. Neither with the current returnn_common serialization (see this example generated config) nor with pickle.

@JackTemaki
Copy link
Contributor Author

The job information must be serialized, or not?

I am sorry I mixed terms here. Yes, the job information must be serialized, and this is what pickle does. But for serializing as in "writing the config into a .py file", this is a separate thing (maybe "serializing" should not be used as term here, as this is non-revertable).

Of course I can write a new Sisyphus Jobs that (if Dim is pickleable) just uses the config as is, and uses the get_returnn_config_serialized() later on. But this is a bad idea, as some of the config parameters strictly need to be controlled by the sisyphus job. One way would be to pass the py_code_str, extern_data and network converted already to a string separately. But yes, you said one should never alter the network afterwards, but especially for testing purposes, or simply concatenating legacy code without having to pack it as a module, this is quite an important feature.

@albertz
Copy link
Member

albertz commented Feb 7, 2022

The job information must be serialized, or not?

I am sorry I mixed terms here. Yes, the job information must be serialized, and this is what pickle does. But for serializing as in "writing the config into a .py file", this is a separate thing

Ah yes. For the latter, i.e. generating Python code, there you can use ReturnnDimTagProxy. That is the intended use case of ReturnnDimTagProxy. But only for that step. Serializing ReturnnDimTagProxy itself is not really what you want. You can use ReturnnDimTagProxy to serialize the config (net dict and extern data) as Python code.

(maybe "serializing" should not be used as term here, as this is non-revertable).

We should just say "generate the config.py" or "generating Python code" to be specific here.

Although this is also one special kind of serialization. And it also should be revertable. You simply can eval the config and get back the objects.

Of course I can write a new Sisyphus Jobs that (if Dim is pickleable) just uses the config as is, and uses the get_returnn_config_serialized() later on.

Note that pickling Dim is implemented here: rwth-i6/returnn#937

But this is a bad idea, as some of the config parameters strictly need to be controlled by the sisyphus job.

But you can still combine this with the other part of the config? This is only about network (the net dict), extern_data, and maybe some other related things. Everything else can be as before.

One way would be to pass the py_code_str, extern_data and network converted already to a string separately.

Yes.

But yes, you said one should never alter the network afterwards, but especially for testing purposes, or simply concatenating legacy code without having to pack it as a module, this is quite an important feature.

For combination with legacy code, this is what I mentioned in #98. I don't see any problem there.

For testing purpose, I'm also not really seeing any problem? You can just edit the file. Or do whatever you want.

Can you be more specific?

@JackTemaki
Copy link
Contributor Author

Okay now we are on the same page.

So the options are (assuming there is some get_network call somewhere in the pipeline that returns the network config):

  1. convert the resulting network using collect_dim_tags and replace all occurrences of proxies with CodeWrapper using some function defined in the pipeline helpers to just store the variable names. The resulting network is editable as before, and will be correctly serialized.

Advantages:

  • nearly no changes on the Sisyphus side, expect for a needed "network prolog" support to write the dim tag definition code.

Disadvantages:

  • no custom hashing supported unless CodeWrapper is extended or a derived object created.
  1. Use network and extern_data as outputed from name_ctx.get_returnn_config(), and pass it directly to the Sisyphus job (which is now supported because dim tags are serializable). The name resolving will happen inside the create_config job task.

Advantages:

  • Allows custom hashes for DimTags (this is good if the naming changes at some update)

Disadvantages:

  • Any Returnn related Sisyphus job needs to know about returnn_common in order to make use of the proxies to resolve the DimTag naming, and more code changes than 1 or 2 line edits are needed.

It is obvious that I would prefer option 1, as less code changes in heavily used code always seem the better idea (in addition this code is not really under our ownership, so changes will be more difficult). Also we can switch to variant 2 at any point.

Actually with variant 1 things are simpler than they first seemed to be, so all I would suggest is adding some kind of interface to not get the directly finished config as string but the components more fine grained. Also, we could add something like CodeWrapper (which is really only a class which wraps a string as __repr__ without quotation for writing into a file) directly on the returnn_common side, which could be extended by custom hashing.

@JackTemaki
Copy link
Contributor Author

JackTemaki commented Feb 7, 2022

Ok for now I added some rather ugly hacks (storing the prolog code as network dict entry, and retrieving this within the job later on), but the general idea (1. from above) does work. So I only had to change 2 lines within i6_core to enable writing the dimension tags for each specific network. If this not needed (no dynamic training) it works out of the box, additional code is only needed on the pipeline side (and not within core jobs) to correctly prepare the network for Sisyphus usage, but when can then discuss which functionality of this should be moved into returnn_common (if any). I guess we can further discuss this here.

@albertz
Copy link
Member

albertz commented Feb 7, 2022

Hashing the dim tag is by far not the only thing where hashing is relevant. There is issue #51 specifically about this topic. We better should move any discussion regarding hashing over there.

In any case, I don't understand why the type of serialization matters for the hashing. I thought the Sisyphus hashing happens on the Job and its arguments, e.g. such config object. How we define and handle the hashes should be totally independent from the question how we serialize and/or how we generate the Python code for the config.py.

Hashing is very relevant though. I still think that the Sis hash should not just use the net dict but better the module hierarchy. The net dict will still change a lot while the underlying behavior of most modules (e.g. Linear, LSTM, Transformer, Conformer) and functions is already stable now. But again, this is exactly the discussion in #51.

Allows custom hashes for DimTags (this is good if the naming changes at some update)

What naming are you referring to? Dim.description? Or the local variable name by ReturnnDimTagProxy? None of that is stable. This is actually not so clear, how we should define the hash. But this is #51 again.

@albertz
Copy link
Member

albertz commented Feb 8, 2022

The Sis hash for the dim tags doesn't have to be complicated though. We can iterate through them and assign then indices, just ordered by first occurrence, including and starting with extern data. This index can be the hash, together maybe with its kind. This should be enough.

@albertz
Copy link
Member

albertz commented Feb 8, 2022

Except of the aspect of hashing (which is a separate issue: #51), from our discussion today, it sounds like we have everything (mostly) ready here in returnn-common. Nothing is really missing. So I'm closing this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants