Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podcast enclosure URLs are unencoded before being downloaded #227

Open
ribbons opened this issue Nov 19, 2018 · 1 comment
Open

Podcast enclosure URLs are unencoded before being downloaded #227

ribbons opened this issue Nov 19, 2018 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@ribbons
Copy link
Owner

ribbons commented Nov 19, 2018

Now that #226 is fixed, another URL encoding issue has been discovered by @cjpcjpindre: Podcast enclosure URLs have URL encoded characters replaced by literal ones, which causes an issue if the server is expecting a URL encoded characters.

An example original enclosure URL from the feed https://anchor.fm/s/7368c04/podcast/rss is:

https://anchor.fm/s/7368c04/podcast/play/1722642/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2018-10-13%2FJohn-Kearns-782f1f393cd0f.m4a
@ribbons ribbons added the bug Something isn't working label Nov 19, 2018
@ribbons ribbons self-assigned this Nov 19, 2018
@ribbons
Copy link
Owner Author

ribbons commented Nov 27, 2018

I'm really struggling with this one. The URL unencoding is done when it is passed to the .NET framework Uri class (which can't be avoided when using the WebClient for downloads). This means that the URL above will be changed into the following:

https://anchor.fm/s/7368c04/podcast/play/1722642/https://d3ctxlq1ktw2nl.cloudfront.net/staging/2018-10-13/John-Kearns-782f1f393cd0f.m4a

After some digging, it looks like this behaviour is partially fixed in the .NET framework 4.5 and the same behaviour can be enabled in .NET 2.0 via some slightly nasty reflection (courtesy of the code at https://mikehadlow.blogspot.com/2011/08/how-to-stop-systemuri-un-escaping.html), but this doesn't prevent the colon from being unescaped, so the URL ends up as:

https://anchor.fm/s/7368c04/podcast/play/1722642/https:%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2018-10-13%2FJohn-Kearns-782f1f393cd0f.m4a

This unfortunately still causes a 404 error to be returned from anchor.fm.

Suggestions appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant