Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise allocations in AssertEx.NormalizeWhitespace and OperationTreeVerifier.Verify #75202

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hamarb123
Copy link

They should now be amortised allocation free (in the assert succeed case). I have also removed pointless allocations for indentation I noticed in OperationTreeVerifier.

Works by using char[] buffers from ArrayPool<char>.Shared (along with int[] ones in some parts too), and checking the test condition manually (instead of via the Assert type), then calling into Assert if it fails with a string from .ToString. Buffers are returned after the assertion condition does not fail.

Please let me know if there's any parts which don't follow style rules that I missed, or need more comments, etc. :)

/cc @jaredpar who wanted these optimised for allocations.

@hamarb123 hamarb123 requested a review from a team as a code owner September 21, 2024 22:35
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Sep 21, 2024
@dotnet-policy-service dotnet-policy-service bot added the Community The pull request was submitted by a contributor who is not a Microsoft employee. label Sep 21, 2024
while (true)
{
// Go to next preceding \r or \n (if any)
i = buffer.AsSpan().Slice(0, i).LastIndexOfAny("\r\n".AsSpan());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: AsSpan(0, i) (and other places)

expectedOperationTree = expectedOperationTree.Trim(newLineChars);
expectedOperationTree = expectedOperationTree.Replace("\r\n", "\n").Replace(" \n", "\n").Replace("\n", Environment.NewLine);
// Finds spaces before newlines and places them in the specified toRemove buffer in reverse index order
static void FindSpacesBeforeNewlineHelper(char[] buffer, int bufferSize, ref int[] toRemove, ref int toRemoveCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wonder whether the complex dance is measurably better than doing manipulation in StringBuilder.

Copy link
Author

@hamarb123 hamarb123 Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaredpar does this PR measurably improve your scenario?

What would the allocations for StringBuilder.Replace on .NET Framework be like @huoyaoyuan (which @jaredpar was concerned about iirc)? I'm not familiar with how it performs, as I usually am just .Appending on StringBuilders when I use them.

It seems like you'd at least have ~2x total string length in total, to store StringBuilder buffer & result string, whereas the current one is 3x for actual, and 4x for expected - so it seems like it could be at most ~1.8x better in the best scenario (scenario being the impl of .Replace). Whereas this should be amortised allocation free (but obviously has a bit more complexity in implementation as a result). Also, if @jaredpar wanted to optimise other methods in the future for allocations too, a large amount of the code here could be extracted & re-used / minorly adjusted to make them amortised alloc-free too.

Edit: I suppose using StringBuilder could actually theoretically be more like ~3.5x better if you copy into an ArrayPool buffer instead of ToString (again, depending on impl. of Replace).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaredpar does this PR measurably improve your scenario?

Going to be profiling this probably Tuesday. Monday tends to be pretty meeting heavy for me. Hoping to have time to profile this out tomorrow when I'm back working on tests.

@jaredpar jaredpar added this to the 17.13 milestone Sep 23, 2024
Comment on lines +659 to +662
if (pooled1 != null)
{
ArrayPool<char>.Shared.Return(pooled1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this pattern but it's also feels a bit much to me. Too much manual work. I'm wondering if we should instead approach it with something like the following:

struct ArrayPoolValue<T> : IDisposable
{
  private T[] _array;
  public T[] T Array
  {
    get => _array;
    set
    {
       if (_array is not null)
        ArrayPool<T>.Shared.Return(_array);
       _array = value;
    }
  }
  public void Dispose()
  {
    if (_array is not null)
    {
        ArrayPool<T>.Shared.Return(_array);
        _array = null;
    }
  }
}

Then we could change the signatures to take ref ArrayPoolValue<T> pool. Consumption becomes simply:

using var pooled1 = new ArrayPool<char>;

@333fred for thoughts

Copy link
Member

@huoyaoyuan huoyaoyuan Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemoryPool<T> and IMemoryOwner<T> does exactly that, but with overhead of extra abstraction through Memory<T>.
Maybe we should introduce such helper in BCL because ArrayPool usage with finally is more and more now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I care that much. These helpers are presumably not touched much, and I don't think we care about ensuring it's always returned?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I don't think we care about ensuring it's always returned

No, it's not a correctness issue to not return it. Unless assertion failure is expected (e.g., if you have a bunch of tests like bool didError = false; try { ... calls Assert(); } catch { didError = true; } Assert.True(didError);), there's not really any reason to worry about ensuring it imo.

Then we could change the signatures to take ref ArrayPoolValue<T> pool. Consumption becomes simply:
using var pooled1 = new ArrayPool<char>;

You can't pass using variables by reference, we will very quickly run into this issue I think.

}

internal static string NormalizeWhitespace(string input)
internal static ReadOnlyMemory<char> NormalizeWhitespaceROM(ReadOnlySpan<char> input, out char[] pooledAlloc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do suffix on the name we'd prefer the longer form vs. the abbreviation. Overall though I think I'd prefer we just have the one overload that returns ReadOnlyMemory<char> and then let the consumers do .Span as needed. That is a bit of ceremony but it's one that we've already done in a few other places.

}

return output.ToString();
// Return the valid part of our allocation
if (pooledAlloc == null) return default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the style in roslyn requires that this be put on separate lines. There are a few instances of this in the PR that need attention.

int prev = 0;
int next = input.IndexOfAny("\r\n".AsSpan());
if (next < 0) next = input.Length;
while (true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through this I'm wondering if it should be broken up a bit. Basically should we have one method that iterates through a ReadOnlySpan<char> and gives us back ReadOnlyMemory<char> for the lines within it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure what you mean by this?

Comment on lines +702 to +705
char[] newArr = ArrayPool<char>.Shared.Rent(allocSize + newChars);
pooledAlloc.AsSpan().Slice(0, allocSize).CopyTo(newArr.AsSpan().Slice(0, allocSize));
ArrayPool<char>.Shared.Return(pooledAlloc);
pooledAlloc = newArr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern of basically increasing the size of the pooled array is repeated a few times in the PR. Think we should factor this out into a helper method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean another inline method, or in some helper type? If you mean in a helper type, which one, or where would I create it?

}

// Copy new text into buffer
pooledAlloc.AsSpan().Slice(allocSize, spacesToAppend).Fill(' ');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pooledAlloc.AsSpan().Slice(allocSize, spacesToAppend).Fill(' ');
pooledAlloc.AsSpan(allocSize, spacesToAppend).Fill(' ');

actual = actual.Replace(" \n", "\n").Replace(" \r", "\r");
expectedOperationTree = expectedOperationTree.Trim(newLineChars);
expectedOperationTree = expectedOperationTree.Replace("\r\n", "\n").Replace(" \n", "\n").Replace("\n", Environment.NewLine);
// Finds spaces before newlines and places them in the specified toRemove buffer in reverse index order
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local functions go at the end of the method for our style.

}
}

// Emulate actualOperationTree.Trim(newLineChars).Replace(" \n", "\n").Replace(" \r", "\r") into actualBuffer with actualBufferSize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I have the brain power to review this today, but I'm going to insist that we actually spell out the general algorithm going on here in more plain english.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Community The pull request was submitted by a contributor who is not a Microsoft employee. untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants