-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extensions for Sisyphus serialization of net dict and extern data #104
Comments
Pickling of |
Of course, we could add logic which makes pickling of Basically, you don't want to pickle You also want to pickle the net dict jointly with |
Why do you want to pickle So, if this issue here is actually about the question/issue on how to pickle the config, let's rephrase it that way, and discuss this, instead of discussing how to pickle |
Okay so this is not easily solvable, then lets close it here and continue at some other point. The Sisyphus connection things should anyway better be kept outside of |
What do you refer to? Pickling the config (net dict, extern data), or pickling The former, why do you think this is not easily solvable? For whatever Sisyphus needs for serialization, we might need to extend sth in returnn-common, or maybe also in RETURNN itself. So it makes sense to keep an open issue about this until this is resolved. |
This was about the later. And yes, we agreed it is not relevant so I wanted to close the issue
We can rename the issue and continue on the serialization of you want |
I just wrote a small code part that replaces every instance of So pickling dim tags might not be needed. |
Why do you need |
|
Because of course the config needs to be serialized at some point (so the DimTags get actual variable names). And from looking at the code I got the impression that this is exactly the task of the ReturnnDimTagProxy (to give DimTags real unique names). All I wanted is that the nested structure is kept, which the |
This is the current result for a single network: Which looks as expected to me. |
So you need to serialize the net dict and extern data. You don't need to serialize
When you want to use pickling for serialization (it sounds so), then you don't need to care about unique names. The pickle logic does handle that already.
Why is this a requirement? The generated content (esp net dict) can be kind of arbitrary. I don't think it is a good idea to hard-code any implicit or explicit assumptions on how this net dict looks like, so depending on anything of this nested structure is probably not a good idea. |
Pickle is not used for serialization, it is just used to store the job information when passing it from the Sisyphus manager to the Sisyphus worker. But as the config is an input of the job, it tries to pickle whatever is in there, e.g. Dims or the Proxies. Of course if I will play around a little bit more to see what seems to be okay to do. |
I don't understand. So, what you say means: Pickle is used for serialization? This is what pickle does. It serializes. So, what do you mean? You don't need the serialization? I thought you need the serialization? The job information must be serialized, or not?
I don't understand. If it is picklable, why not use pickle for the serialization? You don't need
I don't understand. You don't have any of that. Neither with the current |
I am sorry I mixed terms here. Yes, the job information must be serialized, and this is what pickle does. But for serializing as in "writing the config into a .py file", this is a separate thing (maybe "serializing" should not be used as term here, as this is non-revertable). Of course I can write a new Sisyphus Jobs that (if Dim is pickleable) just uses the config as is, and uses the |
Ah yes. For the latter, i.e. generating Python code, there you can use
We should just say "generate the config.py" or "generating Python code" to be specific here. Although this is also one special kind of serialization. And it also should be revertable. You simply can
Note that pickling
But you can still combine this with the other part of the config? This is only about
Yes.
For combination with legacy code, this is what I mentioned in #98. I don't see any problem there. For testing purpose, I'm also not really seeing any problem? You can just edit the file. Or do whatever you want. Can you be more specific? |
Okay now we are on the same page. So the options are (assuming there is some
Advantages:
Disadvantages:
Advantages:
Disadvantages:
It is obvious that I would prefer option 1, as less code changes in heavily used code always seem the better idea (in addition this code is not really under our ownership, so changes will be more difficult). Also we can switch to variant 2 at any point. Actually with variant 1 things are simpler than they first seemed to be, so all I would suggest is adding some kind of interface to not get the directly finished config as string but the components more fine grained. Also, we could add something like CodeWrapper (which is really only a class which wraps a string as |
Ok for now I added some rather ugly hacks (storing the prolog code as network dict entry, and retrieving this within the job later on), but the general idea (1. from above) does work. So I only had to change 2 lines within |
Hashing the dim tag is by far not the only thing where hashing is relevant. There is issue #51 specifically about this topic. We better should move any discussion regarding hashing over there. In any case, I don't understand why the type of serialization matters for the hashing. I thought the Sisyphus hashing happens on the Job and its arguments, e.g. such config object. How we define and handle the hashes should be totally independent from the question how we serialize and/or how we generate the Python code for the config.py. Hashing is very relevant though. I still think that the Sis hash should not just use the net dict but better the module hierarchy. The net dict will still change a lot while the underlying behavior of most modules (e.g.
What naming are you referring to? |
The Sis hash for the dim tags doesn't have to be complicated though. We can iterate through them and assign then indices, just ordered by first occurrence, including and starting with extern data. This index can be the hash, together maybe with its kind. This should be enough. |
Except of the aspect of hashing (which is a separate issue: #51), from our discussion today, it sounds like we have everything (mostly) ready here in returnn-common. Nothing is really missing. So I'm closing this now. |
When Sisyphus stores a job its whole content is pickled before the execution of the tasks itself.
Pickling
ReturnnDimTagsProxy
instances work, but their representation call stops working after unpickling them:While I might find a solution to my workflow that does not need pickling, I still wanted to raise this issue. Maybe there is a very simple solution for this.
The config was created using:
The text was updated successfully, but these errors were encountered: