Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: use a faster vendored regexp/syntax/Regexp.String #753

Merged
merged 1 commit into from
Apr 2, 2024

Conversation

keegancsmith
Copy link
Member

@keegancsmith keegancsmith commented Mar 28, 2024

We replace all calls to Regexp.String with a vendored version which is faster.

go1.22 introduced a commit which "minimizes" the string returned by Regexp.String(). Part of what it does is enumerate through literals runes in to calculate flags related to unicode and case sensitivity. This can be quite slow, but is made worse by the fact we call it per shard per regexp in your query.Q to construct the matchtree.

Currently Regexp.String() represents 40% of CPU time on sourcegraph.com. Before go1.22 it was ~0%.

Note: This is a temporary change to resolve the issue. I have a deeper change to make this less clumsy.

Note: In one place we remove the use of string by relying on Regexp.Equal instead.

Test Plan: go test

Part of https://github.com/sourcegraph/sourcegraph/issues/61462

We replace all calls to Regexp.String with a vendored version which is
faster.

go1.22 introduced a commit which "minimizes" the string returned by
Regexp.String(). Part of what it does is run enumerate through literals
runes in your string to see calculate flags related to unicode and case
sensitivity. This can be quite slow, but is made worse by the fact we
call it per shard per regexp in your query.Q to construct the matchtree.

Currently Regexp.String() represents 40% of CPU time on sourcegraph.com.
Before go1.22 it was ~0%.

Note: This is a temporary change to resolve the issue. I have a deeper
change to make this less clumsy.

Note: In one place we remove the use of string by relying on
Regexp.Equal instead.

Test Plan: go test
@keegancsmith keegancsmith requested a review from a team March 28, 2024 14:51
## Vendored commit

```
commit 2e1003e2f7e42efc5771812b9ee6ed264803796c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions

  1. Should we file this as a perf bug on the go package itself also?
  2. What functionality do we lose rolling back that commit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth filing an issue just for the discussion. But I suspect they will close it as wontfix since apparently String() is only meant to be used for debug. However, there is a non-debug case where you wan't to take a syntax.Regexp and get a compiled Regexp (which is our use case). So that could atleast spark discussion for a new API to use.

The functionality lost is honestly not important. The "debug" strings are nicer for a human to look at... but not that different. In practice for us no change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keegancsmith Thanks and great tactical change

Copy link
Member

@jtibshirani jtibshirani Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to filing an issue when you have the time, always nice to engage with upstream projects and maybe it will help us avoid the need for a future fork!

@keegancsmith
Copy link
Member Author

I am gonna go ahead and merge since I want to get an idea of how this performs in production. I've done manual testing + lots of extra code review myself. So will rely a bit on post-merge review.

@keegancsmith keegancsmith merged commit c39011a into main Apr 2, 2024
7 of 8 checks passed
@keegancsmith keegancsmith deleted the k/regexp-string branch April 2, 2024 07:12
@jtibshirani
Copy link
Member

Sorry for the slow review here! This makes sense to me as a tactical change, and great we got it into the release this week. I see we have https://github.com/sourcegraph/sourcegraph/issues/61524 to track a more solid fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants