Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes url encoding for paths. #151

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 15 additions & 5 deletions src/main/java/org/carlspring/cloud/storage/s3fs/S3Path.java
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import java.net.URI;
import java.net.URL;
import java.net.URLDecoder;
import java.net.URLEncoder;
import java.nio.file.LinkOption;
import java.nio.file.Path;
import java.nio.file.WatchEvent;
Expand Down Expand Up @@ -735,11 +736,20 @@ public int hashCode()
*/
private String encode(String uri)
{
// remove special case URI starting with //
uri = uri.replace("//", "/");
uri = uri.replaceAll(" ", "%20");

return uri;
try
{
// URL encode all characters, but then convert known allowed characters back
return URLEncoder.encode(uri, "UTF-8")
.replace("%3A", ":")
.replace("%2F", "/")
.replace("+", "%20");
Comment on lines +742 to +745
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling that there is a better way to do this. For URL-s there is java.net.URLDecoder. Perhaps we could do something similar? I mean -- in your code above, it would only catch three of the cases that should be decoded.

@ptirador , @steve-todorov , @BjoernAkAManf : Any better ideas?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Years back when I originally submitted that PR, I took that code directly from the AWS SDK. The replacement back for : and / was because URLEncoder actually produces application/x-www-form-urlencoded which isn't exactly a URI but was the closest at the time. And I think there were times when S3Path.encode was called with the full absolute URI that contained the scheme (s3:) and everything else. So undoing that was necessary.

You are right though. There is an easier way. Spring has a UriUtils class that does just this. In fact, section 5 here says its a common mistake to encode more than just the path and that is where I got the suggestion.

Is it OK to add a dependency on Spring? If so, I'll redo this PR. It would effectively look like:

private String encode(String uri) { return UriUtils.encode(uri, "UTF-8"); }

Copy link
Owner

@carlspring carlspring Dec 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a look at the code that you've linked back to in the AWS SDK and I see what you mean, but I think it will be quite a bit limited. Sure, it will work for the cases you're currently catching, but while we're doing this, we might as well do it right for all the other cases that need encoding/decoding as well.

We would like to keep this library as minimal, as possible. Adding a dependency to Spring would make it more complicated. We do intend to have Spring-specific code later on, but in a separate "add-on" module (we'll have a "core" one as well).

If we could find a more suitable lightweight library that can handle this sort of encoding/decoding, this would be great!

We do have com.google.guava:guava:jar:30.1-jre:compile on the classpath and, as far as I have checked, they do have some URL escaping. I'm not sure if this would be useful. Might be worth a closer look...?

If there really are no lightweight libraries that we can use, we might as well copy the UriUtils class to this project, or simply re-implement it ourseves. (Spring is under an Apache 2.0 license and so is this library and as long as we preserve the original licensing information and credit the authors, we should be okay). It would be better to find a simpler alternative, though, and I'm sure we can find some, if we dig around. :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elerch , @markschreiber : Any thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I analyzed the code paths in UriUtils eventually settling on HierarchicalUriComponents.encodeUriComponent() as the thing doing the work. Looks fairly straightforward. I'm happy to do the work, just want to make sure owners agree before I go down that path.

Copy link
Owner

@carlspring carlspring Dec 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you ask me, this fix should ultimately be applied to the Amazon SDK (v2), so that users of the library (like us) don't have to do this sort of wizardry on our side. I am open to having this method copied over with a disclaimer comment that it was taken from the Spring SDK. The Guava library also contains a similar class (UrlEscapers), which, from what I can tell, seems more sophisticated. We are already using that library and perhaps it would be neater to use it?

@amarcionek, please, feel free to join our chat channel where we could further discuss this, if you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AWS SDK should be here right: https://github.com/aws/aws-sdk-java
Maybe open an Issue there with a potential fix aswell, while applying a Bandage in the Project here, throught the Guava class. Atleast for me, it seems that is easier to contain. However it might be a problem, considering that the UrlEscapers seems to require more manual work, while the Spring version is harder to misuse. Might be wrong though.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link you're pointing to is Amazon SDK v1 and we've migrated to v2.

BTW, @amarcionek, Amazon cut releases of their SDK-s every day, (and we upgrade to these releases almost every day). If you were to fix this on the side of Amazon's SDK, we would get this fix into our codebase as soon as they cut a release that includes it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the correct link to the repository of Amazon's SDK (v2).

}
catch (UnsupportedEncodingException e)
{
// This should never happen unless there is something
// fundamentally broken with the running JVM.
throw new RuntimeException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely add a short message here and maybe change the type to Error?
A message on the other hand should definitely say something along the Line "URI could not be encoded into an URL. JVM is broken". Not sure i agree with the assesment though, but the URI and URL Standard are broken anyway.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is code that is actually copied from Amazon's SDK, from what I saw.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However you do it I think you should be very explicit about the behavior in the docs. Valid S3 Paths are not always going to be fully compatible with file paths encoded as URIs. You could follow the lead of FSx for Lustre which, when backed by S3, effectively hides some valid S3 paths which cannot be converted into POSIX file paths.

}
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,26 @@ void toUri()
assertEquals(path, pathActual);
}

@Test
public void toUriSpecialChars()
{
Path path = getPath("/bucket/([fol! @#$%der])");
Comment on lines +62 to +64
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, this test should actually be called toUriWithSpecialCharactersThatShouldNodBeDecoded instead, right?

Could you please add a test case where there are actual characters that need to be decoded?

URI uri = path.toUri();

// the scheme is s3
assertEquals("s3", uri.getScheme());

// could get the correct fileSystem
S3FileSystem fs = s3fsProvider.getFileSystem(uri);
// the host is the endpoint specified in fileSystem
assertEquals(fs.getEndpoint(), uri.getHost());

// bucket name as first path
Path pathActual = fs.provider().getPath(uri);

assertEquals(path, pathActual);
}

@Test
void toUriWithEndSlash()
{
Expand Down