-
Notifications
You must be signed in to change notification settings - Fork 91
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
all: use a faster vendored regexp/syntax/Regexp.String (#753)
We replace all calls to Regexp.String with a vendored version which is faster. go1.22 introduced a commit which "minimizes" the string returned by Regexp.String(). Part of what it does is run enumerate through literals runes in your string to see calculate flags related to unicode and case sensitivity. This can be quite slow, but is made worse by the fact we call it per shard per regexp in your query.Q to construct the matchtree. Currently Regexp.String() represents 40% of CPU time on sourcegraph.com. Before go1.22 it was ~0%. Note: This is a temporary change to resolve the issue. I have a deeper change to make this less clumsy. Note: In one place we remove the use of string by relying on Regexp.Equal instead. Test Plan: go test
- Loading branch information
1 parent
8cf8887
commit c39011a
Showing
9 changed files
with
715 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# vendored std regexp/syntax | ||
|
||
This package contains a vendored copy of std regexp/syntax. However, it only | ||
contains the code for converting syntax.Regexp into a String. It is the | ||
version of the code at a recent go commit, but with a commit which introduces | ||
a significant performance regression reverted. | ||
|
||
At the time of writing regexp.String on go1.22 is taking 40% of CPU at | ||
Sourcegraph. This should return to ~0% with this vendored code. | ||
|
||
https://github.com/sourcegraph/sourcegraph/issues/61462 | ||
|
||
## Vendored commit | ||
|
||
``` | ||
commit 2e1003e2f7e42efc5771812b9ee6ed264803796c | ||
Author: Daniel Martí <[email protected]> | ||
Date: Tue Mar 26 22:59:41 2024 +0200 | ||
cmd/go: replace reflect.DeepEqual with slices.Equal and maps.Equal | ||
All of these maps and slices are made up of comparable types, | ||
so we can avoid the overhead of reflection entirely. | ||
Change-Id: If77dbe648a336ba729c171e84c9ff3f7e160297d | ||
Reviewed-on: https://go-review.googlesource.com/c/go/+/574597 | ||
Reviewed-by: Than McIntosh <[email protected]> | ||
LUCI-TryBot-Result: Go LUCI <[email protected]> | ||
Reviewed-by: Ian Lance Taylor <[email protected]> | ||
``` | ||
|
||
## Reverted commit | ||
|
||
``` | ||
commit 98c9f271d67b501ecf2ce995539abd2cdc81d505 | ||
Author: Russ Cox <[email protected]> | ||
Date: Wed Jun 28 17:45:26 2023 -0400 | ||
regexp/syntax: use more compact Regexp.String output | ||
Compact the Regexp.String output. It was only ever intended for debugging, | ||
but there are at least some uses in the wild where regexps are built up | ||
using regexp/syntax and then formatted using the String method. | ||
Compact the output to help that use case. Specifically: | ||
- Compact 2-element character class ranges: [a-b] -> [ab]. | ||
- Aggregate flags: (?i:A)(?i:B)*(?i:C)|(?i:D)?(?i:E) -> (?i:AB*C|D?E). | ||
Fixes #57950. | ||
Change-Id: I1161d0e3aa6c3ae5a302677032bb7cd55caae5fb | ||
Reviewed-on: https://go-review.googlesource.com/c/go/+/507015 | ||
TryBot-Result: Gopher Robot <[email protected]> | ||
Reviewed-by: Than McIntosh <[email protected]> | ||
Run-TryBot: Russ Cox <[email protected]> | ||
Reviewed-by: Rob Pike <[email protected]> | ||
Auto-Submit: Russ Cox <[email protected]> | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
package syntaxutil | ||
|
||
import "regexp/syntax" | ||
|
||
// A bunch of aliases to avoid needing to modify parse_test.go too much. | ||
|
||
type Regexp = syntax.Regexp | ||
|
||
type Op = syntax.Op | ||
|
||
const ( | ||
OpNoMatch = syntax.OpNoMatch | ||
OpEmptyMatch = syntax.OpEmptyMatch | ||
OpLiteral = syntax.OpLiteral | ||
OpCharClass = syntax.OpCharClass | ||
OpAnyCharNotNL = syntax.OpAnyCharNotNL | ||
OpAnyChar = syntax.OpAnyChar | ||
OpBeginLine = syntax.OpBeginLine | ||
OpEndLine = syntax.OpEndLine | ||
OpBeginText = syntax.OpBeginText | ||
OpEndText = syntax.OpEndText | ||
OpWordBoundary = syntax.OpWordBoundary | ||
OpNoWordBoundary = syntax.OpNoWordBoundary | ||
OpCapture = syntax.OpCapture | ||
OpStar = syntax.OpStar | ||
OpPlus = syntax.OpPlus | ||
OpQuest = syntax.OpQuest | ||
OpRepeat = syntax.OpRepeat | ||
OpConcat = syntax.OpConcat | ||
OpAlternate = syntax.OpAlternate | ||
) | ||
|
||
type Flags = syntax.Flags | ||
|
||
const ( | ||
FoldCase = syntax.FoldCase | ||
Literal = syntax.Literal | ||
ClassNL = syntax.ClassNL | ||
DotNL = syntax.DotNL | ||
OneLine = syntax.OneLine | ||
NonGreedy = syntax.NonGreedy | ||
PerlX = syntax.PerlX | ||
UnicodeGroups = syntax.UnicodeGroups | ||
WasDollar = syntax.WasDollar | ||
Simple = syntax.Simple | ||
MatchNL = syntax.MatchNL | ||
Perl = syntax.Perl | ||
POSIX = syntax.POSIX | ||
) | ||
|
||
var Parse = syntax.Parse |
Oops, something went wrong.