Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seq2seq missing file #3

Open
AngusMonroe opened this issue Jul 22, 2019 · 1 comment
Open

seq2seq missing file #3

AngusMonroe opened this issue Jul 22, 2019 · 1 comment

Comments

@AngusMonroe
Copy link

./seq2seq/data/invalid_kg.json and ./seq2seq/outputs/20190511/valid/2019-5-1-transformer-8w_step_9000.pt seems missing. I hope it can be added, or could you tell me how to generate these files.

Moreover, I got an unrecognized arguments error when I tried to run preclean.py by using the default config.

usage: preclean.py [-h] [--log_file LOG_FILE]
                   [--raw_train_file RAW_TRAIN_FILE]
                   [--raw_dev_file RAW_DEV_FILE] [--save_dir SAVE_DIR]
                   [--raw_test_file RAW_TEST_FILE]
preclean.py: error: unrecognized arguments: --train_sample_file_save data/train_sample.txt --dev_sample_file_save data/dev_sample.txt --test_sample_file_save data/test_sample.txt --train_text_file_save data/train.src --dev_text_file_save data/dev.src --test_text_file_save data/test.src --train_topic_file_save data/train_topic.txt --dev_topic_file_save data/dev_topic.txt --test_topic_file_save data/test_topic.txt --train_tgt_file data/train.tgt --dev_tgt_file data/dev.tgt --test_tgt_file data/test.tgt

And it could be run successfully when I change ./seq2seq/config/preclean.yml as follow.

raw_train_file: "data/train.txt"
raw_dev_file: "data/dev.txt"
raw_test_file: "data/test.txt"
log_file: "outputs/log/log.txt"
save_dir: "data/"

Please check if your configuration is wrong or if I am missing any steps that cause this problem.

Thank you!

@circlePi
Copy link
Owner

@AngusMonroe

  1. The invalid_kg.json denotes the knowledge that has been filtered by human-setting rules. More specifically, we filtered knowledge which is irrelevant or Not useful with current dialogue or response for the training set.
  2. The ./seq2seq/outputs/20190511/valid/2019-5-1-transformer-8w_step_9000 is the seq2seq model file that has been trained on the training set. I will upload it soon.
  3. The preclean.yml is for the seq2seq/preclean.py preclean_baidu.py preclean_baidu_aug.py, not just preclean.py. You can find the corresponding parameters in the function of preclen_opt() in each of them. The preclean_baidu.py preclean_baidu_aug.py is mainly used to produce data in our case rather than the preclean.py, so we could modify some config, which maybe not fit for the preclean.py anymore. Your change is right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants