-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes url encoding for paths. #151
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,6 +7,7 @@ | |
import java.net.URI; | ||
import java.net.URL; | ||
import java.net.URLDecoder; | ||
import java.net.URLEncoder; | ||
import java.nio.file.LinkOption; | ||
import java.nio.file.Path; | ||
import java.nio.file.WatchEvent; | ||
|
@@ -735,11 +736,20 @@ public int hashCode() | |
*/ | ||
private String encode(String uri) | ||
{ | ||
// remove special case URI starting with // | ||
uri = uri.replace("//", "/"); | ||
uri = uri.replaceAll(" ", "%20"); | ||
|
||
return uri; | ||
try | ||
{ | ||
// URL encode all characters, but then convert known allowed characters back | ||
return URLEncoder.encode(uri, "UTF-8") | ||
.replace("%3A", ":") | ||
.replace("%2F", "/") | ||
.replace("+", "%20"); | ||
} | ||
catch (UnsupportedEncodingException e) | ||
{ | ||
// This should never happen unless there is something | ||
// fundamentally broken with the running JVM. | ||
throw new RuntimeException(e); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should definitely add a short message here and maybe change the type to Error? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this is code that is actually copied from Amazon's SDK, from what I saw. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. However you do it I think you should be very explicit about the behavior in the docs. Valid S3 Paths are not always going to be fully compatible with file paths encoded as URIs. You could follow the lead of FSx for Lustre which, when backed by S3, effectively hides some valid S3 paths which cannot be converted into POSIX file paths. |
||
} | ||
} | ||
|
||
/** | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -58,6 +58,26 @@ void toUri() | |
assertEquals(path, pathActual); | ||
} | ||
|
||
@Test | ||
public void toUriSpecialChars() | ||
{ | ||
Path path = getPath("/bucket/([fol! @#$%der])"); | ||
Comment on lines
+62
to
+64
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From what I understand, this test should actually be called Could you please add a test case where there are actual characters that need to be decoded? |
||
URI uri = path.toUri(); | ||
|
||
// the scheme is s3 | ||
assertEquals("s3", uri.getScheme()); | ||
|
||
// could get the correct fileSystem | ||
S3FileSystem fs = s3fsProvider.getFileSystem(uri); | ||
// the host is the endpoint specified in fileSystem | ||
assertEquals(fs.getEndpoint(), uri.getHost()); | ||
|
||
// bucket name as first path | ||
Path pathActual = fs.provider().getPath(uri); | ||
|
||
assertEquals(path, pathActual); | ||
} | ||
|
||
@Test | ||
void toUriWithEndSlash() | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the feeling that there is a better way to do this. For
URL
-s there isjava.net.URLDecoder
. Perhaps we could do something similar? I mean -- in your code above, it would only catch three of the cases that should be decoded.@ptirador , @steve-todorov , @BjoernAkAManf : Any better ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Years back when I originally submitted that PR, I took that code directly from the AWS SDK. The replacement back for : and / was because URLEncoder actually produces
application/x-www-form-urlencoded
which isn't exactly a URI but was the closest at the time. And I think there were times when S3Path.encode was called with the full absolute URI that contained the scheme (s3:) and everything else. So undoing that was necessary.You are right though. There is an easier way. Spring has a UriUtils class that does just this. In fact, section 5 here says its a common mistake to encode more than just the path and that is where I got the suggestion.
Is it OK to add a dependency on Spring? If so, I'll redo this PR. It would effectively look like:
private String encode(String uri) { return UriUtils.encode(uri, "UTF-8"); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a look at the code that you've linked back to in the AWS SDK and I see what you mean, but I think it will be quite a bit limited. Sure, it will work for the cases you're currently catching, but while we're doing this, we might as well do it right for all the other cases that need encoding/decoding as well.
We would like to keep this library as minimal, as possible. Adding a dependency to Spring would make it more complicated. We do intend to have Spring-specific code later on, but in a separate "add-on" module (we'll have a "core" one as well).
If we could find a more suitable lightweight library that can handle this sort of encoding/decoding, this would be great!
We do have
com.google.guava:guava:jar:30.1-jre:compile
on the classpath and, as far as I have checked, they do have some URL escaping. I'm not sure if this would be useful. Might be worth a closer look...?If there really are no lightweight libraries that we can use, we might as well copy the
UriUtils
class to this project, or simply re-implement it ourseves. (Spring is under an Apache 2.0 license and so is this library and as long as we preserve the original licensing information and credit the authors, we should be okay). It would be better to find a simpler alternative, though, and I'm sure we can find some, if we dig around. :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elerch , @markschreiber : Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I analyzed the code paths in UriUtils eventually settling on HierarchicalUriComponents.encodeUriComponent() as the thing doing the work. Looks fairly straightforward. I'm happy to do the work, just want to make sure owners agree before I go down that path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you ask me, this fix should ultimately be applied to the Amazon SDK (v2), so that users of the library (like us) don't have to do this sort of wizardry on our side. I am open to having this method copied over with a disclaimer comment that it was taken from the Spring SDK. The Guava library also contains a similar class (UrlEscapers), which, from what I can tell, seems more sophisticated. We are already using that library and perhaps it would be neater to use it?
@amarcionek, please, feel free to join our chat channel where we could further discuss this, if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AWS SDK should be here right: https://github.com/aws/aws-sdk-java
Maybe open an Issue there with a potential fix aswell, while applying a Bandage in the Project here, throught the Guava class. Atleast for me, it seems that is easier to contain. However it might be a problem, considering that the UrlEscapers seems to require more manual work, while the Spring version is harder to misuse. Might be wrong though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link you're pointing to is Amazon SDK v1 and we've migrated to v2.
BTW, @amarcionek, Amazon cut releases of their SDK-s every day, (and we upgrade to these releases almost every day). If you were to fix this on the side of Amazon's SDK, we would get this fix into our codebase as soon as they cut a release that includes it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the correct link to the repository of Amazon's SDK (v2).