-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
all: use a faster vendored regexp/syntax/Regexp.String #753
Conversation
We replace all calls to Regexp.String with a vendored version which is faster. go1.22 introduced a commit which "minimizes" the string returned by Regexp.String(). Part of what it does is run enumerate through literals runes in your string to see calculate flags related to unicode and case sensitivity. This can be quite slow, but is made worse by the fact we call it per shard per regexp in your query.Q to construct the matchtree. Currently Regexp.String() represents 40% of CPU time on sourcegraph.com. Before go1.22 it was ~0%. Note: This is a temporary change to resolve the issue. I have a deeper change to make this less clumsy. Note: In one place we remove the use of string by relying on Regexp.Equal instead. Test Plan: go test
## Vendored commit | ||
|
||
``` | ||
commit 2e1003e2f7e42efc5771812b9ee6ed264803796c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions
- Should we file this as a perf bug on the go package itself also?
- What functionality do we lose rolling back that commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth filing an issue just for the discussion. But I suspect they will close it as wontfix since apparently String() is only meant to be used for debug. However, there is a non-debug case where you wan't to take a syntax.Regexp and get a compiled Regexp (which is our use case). So that could atleast spark discussion for a new API to use.
The functionality lost is honestly not important. The "debug" strings are nicer for a human to look at... but not that different. In practice for us no change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@keegancsmith Thanks and great tactical change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to filing an issue when you have the time, always nice to engage with upstream projects and maybe it will help us avoid the need for a future fork!
I am gonna go ahead and merge since I want to get an idea of how this performs in production. I've done manual testing + lots of extra code review myself. So will rely a bit on post-merge review. |
Sorry for the slow review here! This makes sense to me as a tactical change, and great we got it into the release this week. I see we have https://github.com/sourcegraph/sourcegraph/issues/61524 to track a more solid fix. |
We replace all calls to Regexp.String with a vendored version which is faster.
go1.22 introduced a commit which "minimizes" the string returned by Regexp.String(). Part of what it does is enumerate through literals runes in to calculate flags related to unicode and case sensitivity. This can be quite slow, but is made worse by the fact we call it per shard per regexp in your query.Q to construct the matchtree.
Currently Regexp.String() represents 40% of CPU time on sourcegraph.com. Before go1.22 it was ~0%.
Note: This is a temporary change to resolve the issue. I have a deeper change to make this less clumsy.
Note: In one place we remove the use of string by relying on Regexp.Equal instead.
Test Plan: go test
Part of https://github.com/sourcegraph/sourcegraph/issues/61462