-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Direct corresponding helpers for RETURNN Datasets #231
Comments
Actually But you are right, then it goes further and also mixes this up a bit with actual dataset instances (corpora such as Librispeech). It also combines train/dev/eval already right in the base class. I think its use is somewhat rare. I have some demos which still use the RETURNN Otherwise, I currently started to use the base class ( I'm not sure who else uses it actively. I assume no-one. If this is the case, we can safely move and rename this code to sth like In case someone uses the current code, and we don't want to break it, we can also put your code as |
For specific corpora, or instances of the datasets, or helper functions to get such instances, I would still put those under |
For discussions on the code, such as review comments, we can keep using this issue here. I agree, I think we don't need PR-based development at the moment while this is new and no-one else has really started to use it. Only later when some people started to use it, then we can switch over to PR-based development. |
If we go with your |
Yes this is what I imagined. I think no one is using the current dataset code except you. Just move/rename it how you want it and then I start moving my code to where you want to have it. |
And then afterwards we can check how to adapt the rest of the code. I will probably not use |
Ok, I moved it to |
@JackTemaki What's the state here? |
There were other tasks with a higher priority, so I postponed this. How should I deal with the Sisyphus dependency for paths. Like this? try:
from sisyphus import tk
FilePathType = Union[tk.Path, str]
except:
FilePathType = str Then I still have the problem that I want to disallow `str' when working in a Sisyphus environment. An assert that figures out if the code is run within a Sisyphus manager would be nice, but I am not sure how to determine this. |
Obviously never
We can easily introduce such thing on Sisyphus. E.g. in Sisyphus |
in_sisyphus_config = False
try:
from sisyphus import tk
from sisyphus.loader import config_manager
FilePathType = Union[tk.Path, str]
if config_manager.current_config is not None:
in_sisyphus_config = True
except ImportError:
FilePathType = str Then |
This should not be in the module scope. You should put this into a helper function, and call that. |
Can you describe the state here? |
The |
Currently
returnn_common
is lacking any direct interface for defining RETURNN datasets, we just have some generic interface for a task or some very hard-coded settings for librispeech, which do not fit with the current from-scratch Sisyphus pipelines.Originally it was my plan to move everything from https://github.com/rwth-i6/i6_experiments/tree/main/users/rossenbach/common_setups/returnn/datasets to
i6_experiments/common
, but I understand that this better belongs here.Before I start pushing code (I also understand PR-driven development is too slow at this stage), I would like to discuss where to put the code, and use this thread as a discussion starting point.
There is currently the module
datasets
, which contains theDatasetConfig
in interface. There is unfortunately the dual usage of the word "dataset", once as dataset in the sense of "corpus/task", and once as dataset in the sense of "returnn dataset/dataset type". Not sure how to handle this optimally.So @albertz, where would you put this and how would you name it?
The text was updated successfully, but these errors were encountered: