Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SLVS-1436 Escape rfc 3986 reserved chars in file names. #5668

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions src/SLCore.UnitTests/Common/Models/FileUriTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -114,13 +114,67 @@ public void ToString_PercentEncodesBackticks()
new FileUri(@"C:\filewithbacktick`1").ToString().Should().Be("file:///C:/filewithbacktick%601");
}

[TestMethod]
[DataRow("[", "%5B")]
[DataRow("]", "%5D")]
[DataRow("#", "%2523")]
[DataRow("@", "%40")]
public void ToString_PercentEncodesReservedRfc3986Characters(string reservedChar, string expectedEncoding)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add similar tests, but in reverse: from file://... to FileUri.LocalPath, rather than from filePath to uri string

{
var actualString = @$"C:\filewithRfc3986ReservedChar{reservedChar}.cs";
var expectedString = @$"file:///C:/filewithRfc3986ReservedChar{expectedEncoding}.cs";

new FileUri(actualString).ToString().Should().Be(expectedString);
}

[TestMethod]
public void LocalPath_ReturnsCorrectPath()
{
var filePath = @"C:\file\path\with some spaces\and with some backticks`1`2`3";
new FileUri(filePath).LocalPath.Should().Be(filePath);
}

[TestMethod]
[DataRow("[", "%5B")]
[DataRow("]", "%5D")]
[DataRow("#", "%2523")]
[DataRow("@", "%40")]
[DataRow(" ", "%20")]
[DataRow("`", "%60")]
public void LocalPath_UnescapesEncodesCharacters(string reservedChar, string expectedEncoding)
{
var expectedFilePath = @$"C:\filewithRfc3986ReservedChar{reservedChar}.cs";
var encodedFilePath = @$"file:///C:/filewithRfc3986ReservedChar{expectedEncoding}.cs";

new FileUri(encodedFilePath).LocalPath.Should().Be(expectedFilePath);
}

[TestMethod]
[DataRow("[")]
[DataRow("]")]
[DataRow("@")]
[DataRow(" ")]
[DataRow("`")]
public void LocalPath_PathDoesNotHaveEncodedChars_ReturnsCorrectLocalPath(string reservedChar)
{
var notEncodedFilePath = @$"file:///C:/filewithRfc3986ReservedChar{reservedChar}.cs";
var expectedFilePath = @$"C:\filewithRfc3986ReservedChar{reservedChar}.cs";

new FileUri(notEncodedFilePath).LocalPath.Should().Be(expectedFilePath);
}

/// <summary>
/// The # character as the beginning of a fragment, so <see cref="Uri.LocalPath"/> will return the path without the fragment.
/// </summary>
[TestMethod]
public void LocalPath_PathWithHashCharacter_ReturnsLocalPathWithoutHash()
{
var notEncodedFilePath = @$"file:///C:/filewithRfc3986ReservedChar#.cs";
var expectedFilePath = @$"C:\filewithRfc3986ReservedChar";

new FileUri(notEncodedFilePath).LocalPath.Should().Be(expectedFilePath);
}

[TestMethod]
public void SerializeDeserializeToEqualObject()
{
Expand Down Expand Up @@ -152,4 +206,15 @@ public void Deserialize_ProducesCorrectUri()
fileUri.ToString().Should().Be("file:///C:/file%20with%20%204%20spaces%20and%20a%20back%60tick");
fileUri.LocalPath.Should().Be(@"C:\file with 4 spaces and a back`tick");
}

[TestMethod]
public void Deserialize_ReservedRfc3986Characters_ProducesCorrectUri()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replace this test with a similar testcase based test I mentioned above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't replace it, I will just add one more test. This calls the FileUriConverter, which is also important.

{
var serialized = @"""file:///C:/file%5B%5Dand%2523and%40""";

var fileUri = JsonConvert.DeserializeObject<FileUri>(serialized);

fileUri.ToString().Should().Be("file:///C:/file%5B%5Dand%2523and%40");
fileUri.LocalPath.Should().Be(@"C:\file[]and#and@");
}
}
28 changes: 26 additions & 2 deletions src/SLCore/Common/Models/FileUri.cs
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@

using System.ComponentModel;
using System.Diagnostics.CodeAnalysis;
using System.Text;
using SonarLint.VisualStudio.SLCore.Protocol;

namespace SonarLint.VisualStudio.SLCore.Common.Models;
Expand All @@ -28,15 +29,38 @@ namespace SonarLint.VisualStudio.SLCore.Common.Models;
public sealed class FileUri
{
private readonly Uri uri;
private static readonly char[] Rfc3986ReservedCharsToEncode = ['#', '[', ']', '@'];

public FileUri(string uriString)
{
uri = new Uri(uriString);
var unescapedUri = Uri.UnescapeDataString(uriString);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of afraid of this method. Have you tested this end to end with weird filenames (reserved chars, other escape-able chars, whitespaces?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested it with reserved chars and it works, yes. I don't think there is any reason to be suspicious about it. it just unescapes all encoded characters. The Uri.EscapeDataString was the one what we didn't want to use, due to escaping characters that we might not want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you could also go ahead and test it. Maybe I missed some cases. :)
I just tested that having the reserved chars in javascript file name leads to analysis working for those files (issues are shown)

uri = new Uri(unescapedUri);
}

public string LocalPath => uri.LocalPath;

public override string ToString() => Uri.EscapeUriString(uri.ToString());
public override string ToString()
{
var escapedUri = Uri.EscapeUriString(uri.ToString());

return EscapeRfc3986ReservedCharacters(escapedUri);
}

/// <summary>
/// The backend (SlCore) uses java, in which the Uri follows the RFC 3986 protocol.
/// The <see cref="Uri.EscapeUriString"/> does not escape the reserved characters, that's why they are escaped here.
/// See https://learn.microsoft.com/en-us/dotnet/api/system.uri.escapeuristring?view=netframework-4.7.2
/// </summary>
/// <param name="stringToEscape"></param>
/// <returns></returns>
private static string EscapeRfc3986ReservedCharacters(string stringToEscape)
{
var stringBuilderToEscape = new StringBuilder(stringToEscape);

return Rfc3986ReservedCharsToEncode.Aggregate(stringBuilderToEscape,
(current, charToEscape) => current.Replace(charToEscape.ToString(), Uri.HexEscape(charToEscape)))
.ToString();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use k:=Rfc3986ReservedCharsToEncoding.Length, l:=charsToEscape.Length, n:=stringToEscape.Length

Line 59 already gives you O(n*k) complexity
The fact that you optimize k down to l on line 63 does not matter, especially since 0<=l<=k=5 and string replace not returning new string if there were no replacements made
Just create a StringBuilder and run replace in foreach(Rfc3986ReservedCharsToEncoding). That would be a lot simpler :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, line 59 gives you O(n) complexity, which will probably be 99% of the cases, because I don't expected that the Rfc3986ReservedChars to exist in file names too often.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the thing, you could just remove that line and leave the rest of the code as is (charToEscape would be equal to all reserved characters), and it would actually reduce the number of iterations for the same input. Using a string builder is optional. If you think that more often than not it's going to be returning the same string, then we can use string replace aggregate to save on the StringBuilder creation. Though, it takes nanoseconds for new StringBuilder(string).ToString() to execute


[ExcludeFromCodeCoverage]
public override bool Equals(object obj)
Expand Down