-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalization of path segments should probably happen before normalization of percent escaping #8
Comments
This issue probably requires a check-in with the IETF URI mailing list before deciding one way or the other. |
I understand that it's been a long time ago, but still wanted to check in to see what's up with this issue? We've hit this bug in a bit different context and are not sure how to deal with it. Any chance this going to be fixed? |
Could you elaborate on the issue you're hitting? A test case would be awesome. |
Actually, now I'm not sure if our issue is related to this one. Here is our problem: irb(main):001:0> Addressable::URI.parse(PostRank::URI.unescape("http://foo.com/blah%ef%bc%9f"))
=> #<Addressable::URI:0x5648890 URI:http://foo.com/blah?>
irb(main):002:0> Addressable::URI.parse(PostRank::URI.unescape("http://foo.com/blah%ef%bc%9f")).normalize!
=> #<Addressable::URI:0x564ed08 URI:http://foo.com/blah%3F> Normalize call screws up a perfectly valid (AFAIU) unicode symbol and replaces it with a latin1 question mark. |
It's doing the right thing actually. IRIs (unicode-friendly URIs) use unicode normalization form KC to limit phishing. NFKC tends to do perceptual codepoint conversions, like converting '?' to '?'. The solution here is not to normalize the URI if this is causing a problem, or to instead normalize components piecemeal. "http://foo.com/blah%ef%bc%9f" and "http://foo.com/blah%3F" are considered equivalent. |
Some more context, irb(main):038:0> CGI.unescapeURIComponent "%2E"
=> "."
Not sure why this should be true? If you want to compare URIs, shouldn't you normalize both before comparing? Hmm, from https://www.rfc-editor.org/rfc/rfc3986#section-2.3
Does this mean that Normalization removes the dot and the trailing slash irb(main):042:0> Addressable::URI.parse("/%2E/").normalize.to_s
=> "/"
irb(main):044:0> Addressable::URI.parse("/./").normalize.to_s
=> "/" |
That would go against what's suggested in #477 |
The text was updated successfully, but these errors were encountered: