-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error happened with load data #314
Comments
The problem seems to be that the downloaded zinc.tab file is empty (in my case zinc) |
Hi, |
If you just want to download the data, directly download from here |
Hi, I am seeing the same (misleading) "TDC is hosted in Harvard Dataverse and it is currently under maintenance" message. The underlying cause (in my environment at least) is due to getting a 202 response instead of 200 when sending a GET request. def dataverse_download(url, path, name, types, id=None):
"""dataverse download helper with progress bar
Args:
url (str): the url of the dataset
path (str): the path to save the dataset
name (str): the dataset name
types (dict): a dictionary mapping from the dataset name to the file format
"""
if id is None:
save_path = os.path.join(path, name + "." + types[name])
else:
save_path = os.path.join(path, name + "-" + str(id) + "." + types[name])
response = requests.get(url, stream=True)
total_size_in_bytes = int(response.headers.get("content-length", 0))
block_size = 1024
progress_bar = tqdm(total=total_size_in_bytes, unit="iB", unit_scale=True)
with open(save_path, "wb") as file:
for data in response.iter_content(block_size):
progress_bar.update(len(data))
file.write(data)
progress_bar.close() The 202 status means that import requests
r = requests.get("https://dataverse.harvard.edu/api/access/datafile/4267146")
print(r.status_code)
Strangely, the same behaviour is not observed when running in a Google colab environment (I haven't figured-out why that is yet!). Kind regards James |
Describe the bug
The bug was happened while loading the data
To Reproduce
Steps to reproduce the behavior:
from tdc.single_pred import Yields
data = Yields(name = 'Buchwald-Hartwig')
split = data.get_split()
Expected behavior
get a dataframe
Screenshots
Environment:
Additional context
The text was updated successfully, but these errors were encountered: