Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot download using wget or gcp #145

Open
mahela97 opened this issue Sep 23, 2022 · 7 comments
Open

Cannot download using wget or gcp #145

mahela97 opened this issue Sep 23, 2022 · 7 comments

Comments

@mahela97
Copy link

When i tried to download the dataset using wget command it just created sub directories with index.html and when i tried using GCP it gives me this error "BadRequestException: 400 Bucket is a requester pays bucket but no user project provided."

@tompollard
Copy link
Member

To download from Google Cloud, you'll need to specify a project ID for covering any download costs. See: https://stackoverflow.com/questions/47739741/bucket-is-requester-pays-bucket-but-no-user-project-provided

If you have a project ID then you can specify it in the download command (see: https://cloud.google.com/storage/docs/using-requester-pays#using). e.g. for gsutil:

gsutil -u PROJECT_IDENTIFIER cp gs://BUCKET_NAME/OBJECT_NAME OBJECT_DESTINATION

If you want to avoid download fees, you can download the data from the PhysioNet servers using the suggested wget command. This will be slower!

@mahela97
Copy link
Author

@tompollard thank you for the reply. I am trying using wget but inside the subdirectories, there is only an index.html file instead of images. could you please help me to fix that?

@tompollard
Copy link
Member

@mahela97 I'm not clear what dataset you are trying to download, but essentially I think you'll need to be patient! wget loads the directory structure before files, I believe, so you may not immediately see data within directories.

@mahela97
Copy link
Author

@tompollard i am trying with the mimic-cxr. Thank you for the help. Will check after few hours.

@tompollard
Copy link
Member

@mahela97 makes sense, it's a large dataset! I think this is the same issue described at:

I'd be interested in hearing how long the download takes to complete with wget.

@edeiana23
Copy link

Was anybody able to download the dataset with wget command? I left my computer on the whole weekend but nothing happened. I still only see the folders and the index.html file but no images in dicom format. Asking specifically to @mahela97 @ayhyap who raised the issue previously.

@ayhyap
Copy link

ayhyap commented Apr 26, 2023

Was anybody able to download the dataset with wget command? I left my computer on the whole weekend but nothing happened. I still only see the folders and the index.html file but no images in dicom format. Asking specifically to @mahela97 @ayhyap who raised the issue previously.

Yes the files eventually transferred, it just took a while.
It is possible that some error occurred that caused your wget command to be interrupted.
The command provided on the mimic-cxr page includes the relevant options to resume the transfer if it stopped prematurely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants