Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch for /trunk/demo-train-big-model-v1.sh #20

Open
GoogleCodeExporter opened this issue Jun 14, 2015 · 4 comments
Open

Patch for /trunk/demo-train-big-model-v1.sh #20

GoogleCodeExporter opened this issue Jun 14, 2015 · 4 comments

Comments

@GoogleCodeExporter
Copy link

Fixed a couple of bugs:
1. name mismatch with the UMBC-webbase corpus
2. Downloading the phrases dataset

Original issue reported on code.google.com by [email protected] on 15 Sep 2014 at 6:42

Attachments:

@GoogleCodeExporter
Copy link
Author

Thanks, I fixed the second part (the missing download of 
questions-phrases.txt). However, I don't know what the first problem is about - 
this part of the script runs OK for me.

Original comment by [email protected] on 15 Sep 2014 at 9:23

@GoogleCodeExporter
Copy link
Author

1. Is your shell case-insensitive? Also, does it implicitly add the .tar.gz 
suffix?
You download UMBC-webbase-corpus and extract umbc_webbase_corpus.tar.gz. 

2. The corpus contains two types of files - plain txt (.txt) and parsed files 
(.possf2). I assume you are only interested in the txt files, so you want to 
iterate over these files only.

Original comment by [email protected] on 16 Sep 2014 at 8:30

@GoogleCodeExporter
Copy link
Author

I just noticed that when downloading 
http://ebiquity.umbc.edu/redirect/to/resource/id/351/UMBC-webbase-corpus 
through my browser I also get umbc_webbase_corpus.tar.gz, as in the script. 
However, when I download it using wget, I get UMBC-webbase-corpus. This might 
explain the difference. And I also noticed you also handle the txt files only, 
so that's cool. 

Original comment by [email protected] on 17 Sep 2014 at 8:25

@GoogleCodeExporter
Copy link
Author

I get umbc_webbase_corpus.tar.gz when using wget, so the issue must be in 
something else. If more people will have the same problem as you, I may have to 
update the script and give the output file an exact name.

Original comment by [email protected] on 17 Sep 2014 at 5:48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant