Downloading fails for files with no Content-Disposition #1659

henrykironde · 2022-07-29T13:35:15Z

Example packages:
1: Package file: https://github.com/weecology/retriever-recipes/blob/main/scripts/usda_agriculture_plants_database.py
Sample url: https://plants.sc.egov.usda.gov/csvdownload?plantLst=plantCompleteList

2: package file: https://github.com/weecology/retriever-recipes/blob/main/scripts/aquatic_animal_excretion.py
url: https://esajournals.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1002%2Fecy.1792&file=ecy1792-sup-0001-DataS1.zip

ethanwhite · 2022-08-03T17:22:38Z

The second one is fixed by spoofing the user agent with a browser, i.e., it's Wiley (the publisher) trying to block automated downloads. I did it using wget to test but we should be able to do the same thing in Python.

As you mentioned earlier the first one is a mess. Not only is it rendering into html, but the data itself isn't in the html it's being rendered by javascript, so I think you'd basically have to cut and paste the text out of the browser. I don't have any good thoughts on this one other than to email the data providers and ask them to provide a better option. We might be able to scrape it out somehow, but I don't think it's worth it for one dataset.

henrykironde added the getting-started label Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downloading fails for files with no Content-Disposition #1659

Downloading fails for files with no Content-Disposition #1659

henrykironde commented Jul 29, 2022

ethanwhite commented Aug 3, 2022

Downloading fails for files with no Content-Disposition #1659

Downloading fails for files with no Content-Disposition #1659

Comments

henrykironde commented Jul 29, 2022

ethanwhite commented Aug 3, 2022