-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undesirable changing of URLs by Addressable #35
Comments
Hi Jan, thanks for the report 😃 I would consider this a bug in the Addressable gem. Normalizing the URL shouldn't change it, especially not in the way shown in your example. |
That makes it a bug in |
I wonder, is it also possible it's inappropriate to use Addressable#normalize on URLs before fetching them? Why is down normalizing at all, instead of fetching the URL that was specified? What is the use case where we want to fetch something other than the calling code specified (a "normalized" version)? Even without any bugs in addressable, could there are cases where an HTTP server may respond to an HTTP request for a "non-normalized" path, but not respond with the same thing to the "normalized" version of the path? I don't entirely understand what "normalized" means here or is intending to be doing... Do other HTTP fetching gems (say http-rb) HTTP request a "normalized" version of an argument, instead of the argument as supplied? |
It's quite likely that the answer from Edit: having scanned the RFC, it does look like @janko you might still remember why |
Some more context is here: shrinerb/shrine#132 Yes, there was a reason we added this normalization, but I don’t remember it now. Feel free to look at the git blame. Basically, it seems users would sometimes submit URLs that are not encoded properly, so I belive that what the normalization did. |
Maybe this gets acknowledged and resolved upstream with sporkmonger/addressable#366. I'll keep an eye on it. |
I'm still thinking it may be inappropriate to insist on normalization here. There are HTTP servers which will respond to an un-normalized GET request differenetly than the normalized version of that same request, using the addressable normalization algorithm. (Is this a violation of standards? I don't know, probably? But I have seen it happen). The normalization makes it impossible to make the request with the un-normalized URI/path. May be a violation of standards, but it happens in the real world, and I don't think (could be wrong?) any other ruby HTTP client implementation insists on normalizing for you. If down really is the only ruby http client implementation that does this normalization, to me it adds evidence that it might be an unhelpful thing to do. |
@jrochkind Have you read the history behind the use of Addressable in Down? It was added because the URI standard library would fail parsing some URLs submitted by the user, which browsers would know how to handle (we look at it in terms of Shrine's
Down is not an HTTP client implementation, it's more high level than that. It was designed primary for Shrine's remote URL feature. We can see for example that CarrierWave and Dragonfly do a similar thing. Paperclip doesn't as it requires you to parse the URL yourself and give it a However, Dragonfly's implementation sounds actually like a good workaround to me. What it does is it first tries to This would make sense to me. The user should already pass a correctly-encoded URL, but if they don't, we try our best effort to "fix" that. And that was the reason we introduced Addressable in the first place, to fix those cases, so it makes sense to activate it only when needed. @janklimo that approach sounds like it would fix your particular issue. If you agree with this idea, would you like to submit a PR? |
Yeah, as with all escaping issues, it gets really confusing. An earlier version of your commentI got in email suggested that http.rb and net-http worked differently here... you edited it, so maybe they don't? I am curious if http.rb does forcible normalization. I like where you ended up though, normalizing only ones that don't parse -- while it seems hacky, if it's what Dragonfly does, I like that others have agreed it's a good compromise. And I like that you remembered now why you introduced normalization in the first place. :) |
@jrochkind I apologize for editing the comment, it was true, but for some reason I thought it wasn't relevant to the discussion. By "true" I mean that Addressable::URI is more lax when it comes to parsing, so it will take even URLs that are not properly encoded, e.g: URI.parse("https://movies.com/matrix[1991].mp4") # raises exception
Addressable::URI.parse("https://movies.com/matrix[1991].mp4") # succeeds I don't think this does any normalization internally, but I haven't checked the source code. In that case http.rb backend should not have this problem.
Great, I'm glad we've reached a consensus 👌 |
OK, I'm confused what that has to do with Searching in the project, it uses Addressable to I just always like looking at what others are doing. But it may not matter in this case, and it may be too confusing to figure out, at least for me! |
Aha, it looks like http.rb does normalize most of the URI components... but NOT the URI query component! (the part after the https://github.com/httprb/http/blob/6240672bc84b23339fc9a9878040fcb45db78fb5/lib/http/uri.rb#L34-L38 It is normalizing the URI query that is particularly causing the problem mentioned in this issue. I wonder if the bug in Addressable is it is applying a normalization routine meant for "path" and other URI components (perhaps in a scheme-agnostic way) -- to the URI query component. URI escaping gets really confusing, but I believe maybe escaping an HTTP URI query component in the same way as you would the path component is wrong. |
We've also hit this issue today. The URL works fine in a browser, curl or wget but we get 404s with Down. After debugging it, we found it's because Addressable::URI is changing the Addressable::URI.parse("http://example.com/2ELk8hUpTC2wqJ%2BZ%25GfTFA.jpg").normalize.to_s
=> "http://example.com/2ELk8hUpTC2wqJ+Z%25GfTFA.jpg" Since nothing happened here in the last month I opened a PR (#37) where the normalization is only applied when it's needed, as was proposed here at some point. |
I've released 5.0.1 with #37, but I would like to discuss improving addressable normalization. First, I would like to address @jrochkind's find in http.rb, which normalizes everything but the query parameters. This change was done in httprb/http@8c8486c to address httprb/http#246. However, @coding-chimp presented an example of invalid URI normalization in query path, which http.rb would handle incorrectly as well (as it does normalize the query path), so that doesn't sound like a permanent solution. As of carrierwaveuploader/carrierwave@57a4a3b, CarrierWave uses Addressable as well, and it does so in the same way – normalizing everything but the query parameters. However, this behaviour seems to be there for backwards compatibility, as query parameters were being split from the rest of the URI before introducing Addressable. The main reason for introducing Addressable in CarrierWave was to support non-ASCII domain names, which according to carrierwaveuploader/carrierwave#2086 is important. I was thinking what the correct normalization would be. Originally, I wanted to support non-encoded URLs, such as ones containing spaces and square brackers. The reason for not using
Since However, as @jrochkind suggested in #37 (comment), I would like to add a configuration option for disabling or overriding the normalization (http.rb added one as well in httprb/http#533). That's the only change I would make for now. @jrochkind would you like to make that PR? |
your analysis is thorough, makes sense for me. (Escaping is always a tricky issue, especially in URLs/URIs, in part because of a long history of inconsistent/varying/technically-invalid/changing-standards legacy approaches). |
Hi Janko,
came across this odd problem today. The following URL works fine in the browser, curl, HTTParty, etc but always returns a 401 when using this gem (by means of remote URL plugin in shrine), i.e.:
I thought this was strange because the following works:
I've narrowed it down to how the URL is encoded:
down/lib/down/net_http.rb
Lines 287 to 290 in 68754ed
this changes the comma in the URL (from
bottom%2Cleft
tobottom,left
, making the signature ins
param invalid (works like secure URLs in Imgix, as it seems).Would you call it a bug or is this the desired behavior?
The text was updated successfully, but these errors were encountered: