Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None. #96

tiassap · 2022-06-14T05:23:15Z

I ran the code on Google colab.

When building German vocabulary here:

if is_interactive_notebook():
    # global variables used later in the script
    spacy_de, spacy_en = show_example(load_tokenizers)
    vocab_src, vocab_tgt = show_example(load_vocab, args=[spacy_de, spacy_en])

This error showed up:

Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None.

Is this problem with torchtext?
I found that this error occurred when calling this line:

vocab_src = build_vocab_from_iterator(
        yield_tokens(train + val + test, tokenize_de, index=0),
        min_freq=2,
        specials=["<s>", "</s>", "<blank>", "<unk>"],
    )

Thank you in advance.

The text was updated successfully, but these errors were encountered:

aambrioso1 · 2022-06-15T15:15:27Z

I am having the same problem. It seems that site:

http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz

is no longer available.

The maintainer of this repository:

https://github.com/PetrochukM/PyTorch-NLP/blob/master/torchnlp/datasets/multi30k.py

writes:

"Host www.quest.dcs.shef.ac.uk forgot to update their SSL certificate; therefore, this dataset does not download securely."

Hope this offers some insight into the problem.

tiassap · 2022-06-21T04:56:29Z

Thank you for the info @aambrioso1

youbinaa · 2022-06-21T10:52:03Z

@tiassap I ran into the same problem as what you explained. Did you find another way around to access those files?

aambrioso1 · 2022-06-23T15:15:10Z

I was able to get the code to work by using another data file. The basic idea is that the training, validation, and test sets are all lists of tuples. The tuples consist of sentence pairs in each language. This insight is nice since it makes it easy to create any language pairing you would like. Here is my implementation in Colab along with lots of notes:

https://colab.research.google.com/drive/131hohvAKRqzHg4K3_68UGL4oi4SGOB45?usp=sharing

tiassap · 2022-06-24T07:21:05Z

Thank you @aambrioso1. It is very helpful.

So we can use other dataset as well with data format [(de_1, eng_1), ..., (de_n, en_n)] and you are using this the German/English dataset from the European Parliament Proceedings Parallel Corpus 1996-2011 https://www.statmt.org/europarl/.
And dataset training, val, and test are declared as global variable.

Just for information, @youbinaa It seems like multi30K can also be downloaded from this repo https://github.com/multi30k/dataset.

The problem is because the url source of Torchtext.datasets.Multi30k() is not accessible. Let's hope it will be fixed soon.

EsmaeilChitgar · 2023-10-27T07:04:22Z

Thank you @aambrioso1. It is very helpful.

So we can use other dataset as well with data format [(de_1, eng_1), ..., (de_n, en_n)] and you are using this the German/English dataset from the European Parliament Proceedings Parallel Corpus 1996-2011 https://www.statmt.org/europarl/. And dataset training, val, and test are declared as global variable.

Just for information, @youbinaa It seems like multi30K can also be downloaded from this repo https://github.com/multi30k/dataset.

The problem is because the url source of Torchtext.datasets.Multi30k() is not accessible. Let's hope it will be fixed soon.

How can I download in colab? I mean what change i need to to in code to download?

train, val, test = datasets.Multi30k('data', language_pair=("de", "en"))

g-i-o-r-g-i-o · 2023-11-27T18:45:47Z

Thank you @aambrioso1. It is very helpful.
So we can use other dataset as well with data format [(de_1, eng_1), ..., (de_n, en_n)] and you are using this the German/English dataset from the European Parliament Proceedings Parallel Corpus 1996-2011 https://www.statmt.org/europarl/. And dataset training, val, and test are declared as global variable.
Just for information, @youbinaa It seems like multi30K can also be downloaded from this repo https://github.com/multi30k/dataset.
The problem is because the url source of Torchtext.datasets.Multi30k() is not accessible. Let's hope it will be fixed soon.

How can I download in colab? I mean what change i need to to in code to download?

train, val, test = datasets.Multi30k('data', language_pair=("de", "en"))

from torchtext.datasets import multi30k

multi30k.URL["train"] = "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/training.tar.gz"
multi30k.URL["valid"] = "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/validation.tar.gz"
multi30k.URL["test"] = "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/mmt16_task1_test.tar.gz"

multi30k.MD5["train"] = "20140d013d05dd9a72dfde46478663ba05737ce983f478f960c1123c6671be5e"
multi30k.MD5["valid"] = "a7aa20e9ebd5ba5adce7909498b94410996040857154dab029851af3a866da8c"
multi30k.MD5["test"] = "6d1ca1dba99e2c5dd54cae1226ff11c2551e6ce63527ebb072a1f70f72a5cd36"

https://discuss.pytorch.org/t/build-vocab-from-iterator-does-not-work-in-notebook/153575/16

minsuk-sung · 2024-03-31T08:57:55Z

Thank you @aambrioso1. It is very helpful.
So we can use other dataset as well with data format [(de_1, eng_1), ..., (de_n, en_n)] and you are using this the German/English dataset from the European Parliament Proceedings Parallel Corpus 1996-2011 https://www.statmt.org/europarl/. And dataset training, val, and test are declared as global variable.
Just for information, @youbinaa It seems like multi30K can also be downloaded from this repo https://github.com/multi30k/dataset.
The problem is because the url source of Torchtext.datasets.Multi30k() is not accessible. Let's hope it will be fixed soon.

How can I download in colab? I mean what change i need to to in code to download?
train, val, test = datasets.Multi30k('data', language_pair=("de", "en"))

from torchtext.datasets import multi30k

multi30k.URL["train"] = "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/training.tar.gz" multi30k.URL["valid"] = "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/validation.tar.gz" multi30k.URL["test"] = "https://raw.githubusercontent.com/neychev/small_DL_repo/master/datasets/Multi30k/mmt16_task1_test.tar.gz"

multi30k.MD5["train"] = "20140d013d05dd9a72dfde46478663ba05737ce983f478f960c1123c6671be5e" multi30k.MD5["valid"] = "a7aa20e9ebd5ba5adce7909498b94410996040857154dab029851af3a866da8c" multi30k.MD5["test"] = "6d1ca1dba99e2c5dd54cae1226ff11c2551e6ce63527ebb072a1f70f72a5cd36"

https://discuss.pytorch.org/t/build-vocab-from-iterator-does-not-work-in-notebook/153575/16

Thanks! It works!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None. #96

Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None. #96

tiassap commented Jun 14, 2022 •

edited

Loading

aambrioso1 commented Jun 15, 2022 •

edited

Loading

tiassap commented Jun 21, 2022

youbinaa commented Jun 21, 2022

aambrioso1 commented Jun 23, 2022

tiassap commented Jun 24, 2022 •

edited

Loading

EsmaeilChitgar commented Oct 27, 2023 •

edited

Loading

g-i-o-r-g-i-o commented Nov 27, 2023 •

edited

Loading

minsuk-sung commented Mar 31, 2024

Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None. #96

Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None. #96

Comments

tiassap commented Jun 14, 2022 • edited Loading

aambrioso1 commented Jun 15, 2022 • edited Loading

tiassap commented Jun 21, 2022

youbinaa commented Jun 21, 2022

aambrioso1 commented Jun 23, 2022

tiassap commented Jun 24, 2022 • edited Loading

EsmaeilChitgar commented Oct 27, 2023 • edited Loading

g-i-o-r-g-i-o commented Nov 27, 2023 • edited Loading

minsuk-sung commented Mar 31, 2024

tiassap commented Jun 14, 2022 •

edited

Loading

aambrioso1 commented Jun 15, 2022 •

edited

Loading

tiassap commented Jun 24, 2022 •

edited

Loading

EsmaeilChitgar commented Oct 27, 2023 •

edited

Loading

g-i-o-r-g-i-o commented Nov 27, 2023 •

edited

Loading