Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem of data preprocess #18

Open
thinkingmanyangyang opened this issue May 24, 2021 · 1 comment
Open

problem of data preprocess #18

thinkingmanyangyang opened this issue May 24, 2021 · 1 comment

Comments

@thinkingmanyangyang
Copy link

How do you preprocess the data, for example for DailyDialogue?
I see that you are directly using the file "data/interim/dialog/train_sentences.tsv". How did you get it? Thank you

@vikigenius
Copy link
Owner

For training the VAE, you just need all the utterances in the training dataset. Just remove the context/turn information and independently extract all the utterances, make sure to shuffle them if you are using a different training script.

For the GAN the preprocessing was done by just considering all pairs of consecutive utterances from the original dataset and deduplicating them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants