-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None. #96
Comments
I am having the same problem. It seems that site: http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz is no longer available. The maintainer of this repository: https://github.com/PetrochukM/PyTorch-NLP/blob/master/torchnlp/datasets/multi30k.py writes: "Host Hope this offers some insight into the problem. |
Thank you for the info @aambrioso1 |
@tiassap I ran into the same problem as what you explained. Did you find another way around to access those files? |
I was able to get the code to work by using another data file. The basic idea is that the training, validation, and test sets are all lists of tuples. The tuples consist of sentence pairs in each language. This insight is nice since it makes it easy to create any language pairing you would like. Here is my implementation in Colab along with lots of notes: https://colab.research.google.com/drive/131hohvAKRqzHg4K3_68UGL4oi4SGOB45?usp=sharing |
Thank you @aambrioso1. It is very helpful. So we can use other dataset as well with data format Just for information, @youbinaa It seems like multi30K can also be downloaded from this repo https://github.com/multi30k/dataset. The problem is because the url source of |
How can I download in colab? I mean what change i need to to in code to download? train, val, test = datasets.Multi30k('data', language_pair=("de", "en")) |
from torchtext.datasets import multi30k multi30k.URL["train"] = "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/training.tar.gz" multi30k.MD5["train"] = "20140d013d05dd9a72dfde46478663ba05737ce983f478f960c1123c6671be5e" https://discuss.pytorch.org/t/build-vocab-from-iterator-does-not-work-in-notebook/153575/16 |
Thanks! It works! |
I ran the code on Google colab.
When building German vocabulary here:
This error showed up:
Is this problem with torchtext?
I found that this error occurred when calling this line:
Thank you in advance.
The text was updated successfully, but these errors were encountered: