Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLVS-1436 Escape rfc 3986 reserved chars in file names. #5668

Conversation

gabriela-trutan-sonarsource
Copy link
Contributor

@gabriela-trutan-sonarsource gabriela-trutan-sonarsource commented Sep 3, 2024


public FileUri(string uriString)
{
uri = new Uri(uriString);
var unescapedUri = Uri.UnescapeDataString(uriString);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of afraid of this method. Have you tested this end to end with weird filenames (reserved chars, other escape-able chars, whitespaces?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested it with reserved chars and it works, yes. I don't think there is any reason to be suspicious about it. it just unescapes all encoded characters. The Uri.EscapeDataString was the one what we didn't want to use, due to escaping characters that we might not want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you could also go ahead and test it. Maybe I missed some cases. :)
I just tested that having the reserved chars in javascript file name leads to analysis working for those files (issues are shown)

return !charsToEscape.Any()
? stringToEscape
: charsToEscape.Aggregate(stringToEscape, (current, charToEscape) => current.Replace(charToEscape.ToString(), Uri.HexEscape(charToEscape)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use k:=Rfc3986ReservedCharsToEncoding.Length, l:=charsToEscape.Length, n:=stringToEscape.Length

Line 59 already gives you O(n*k) complexity
The fact that you optimize k down to l on line 63 does not matter, especially since 0<=l<=k=5 and string replace not returning new string if there were no replacements made
Just create a StringBuilder and run replace in foreach(Rfc3986ReservedCharsToEncoding). That would be a lot simpler :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, line 59 gives you O(n) complexity, which will probably be 99% of the cases, because I don't expected that the Rfc3986ReservedChars to exist in file names too often.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the thing, you could just remove that line and leave the rest of the code as is (charToEscape would be equal to all reserved characters), and it would actually reduce the number of iterations for the same input. Using a string builder is optional. If you think that more often than not it's going to be returning the same string, then we can use string replace aggregate to save on the StringBuilder creation. Though, it takes nanoseconds for new StringBuilder(string).ToString() to execute

[DataRow("]", "%5D")]
[DataRow("#", "%2523")]
[DataRow("@", "%40")]
public void ToString_PercentEncodesReservedRfc3986Characters(string reservedChar, string expectedEncoding)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add similar tests, but in reverse: from file://... to FileUri.LocalPath, rather than from filePath to uri string

@@ -152,4 +165,15 @@ public void Deserialize_ProducesCorrectUri()
fileUri.ToString().Should().Be("file:///C:/file%20with%20%204%20spaces%20and%20a%20back%60tick");
fileUri.LocalPath.Should().Be(@"C:\file with 4 spaces and a back`tick");
}

[TestMethod]
public void Deserialize_ReservedRfc3986Characters_ProducesCorrectUri()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replace this test with a similar testcase based test I mentioned above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't replace it, I will just add one more test. This calls the FileUriConverter, which is also important.

- remove check if the filename contains reserved char for performance reason
- remove ? char from reserved list, because it is not supported in file names in windows and it also increases performance
- use string builder to improve performance
- add more unit tests
Copy link

sonarqubecloud bot commented Sep 4, 2024

@gabriela-trutan-sonarsource
Copy link
Contributor Author

gabriela-trutan-sonarsource commented Sep 4, 2024

Summary:

  • remove check if the filename contains reserved char for performance reason
  • remove ? char from reserved list, because it is not supported in file names in windows and it also increases performance
  • use stringBuilder to improve performance
  • add more unit tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants