Skip to content

Commit

Permalink
all: use a faster vendored regexp/syntax/Regexp.String
Browse files Browse the repository at this point in the history
We replace all calls to Regexp.String with a vendored version which is
faster.

go1.22 introduced a commit which "minimizes" the string returned by
Regexp.String(). Part of what it does is run enumerate through literals
runes in your string to see calculate flags related to unicode and case
sensitivity. This can be quite slow, but is made worse by the fact we
call it per shard per regexp in your query.Q to construct the matchtree.

Currently Regexp.String() represents 40% of CPU time on sourcegraph.com.
Before go1.22 it was ~0%.

Note: This is a temporary change to resolve the issue. I have a deeper
change to make this less clumsy.

Note: In one place we remove the use of string by relying on
Regexp.Equal instead.

Test Plan: go test
  • Loading branch information
keegancsmith committed Mar 28, 2024
1 parent 8cf8887 commit 4b30df3
Show file tree
Hide file tree
Showing 9 changed files with 715 additions and 11 deletions.
58 changes: 58 additions & 0 deletions internal/syntaxutil/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# vendored std regexp/syntax

This package contains a vendored copy of std regexp/syntax. However, it only
contains the code for converting syntax.Regexp into a String. It is the
version of the code at a recent go commit, but with a commit which introduces
a significant performance regression reverted.

At the time of writing regexp.String on go1.22 is taking 40% of CPU at
Sourcegraph. This should return to ~0% with this vendored code.

https://github.com/sourcegraph/sourcegraph/issues/61462

## Vendored commit

```
commit 2e1003e2f7e42efc5771812b9ee6ed264803796c
Author: Daniel Martí <[email protected]>
Date: Tue Mar 26 22:59:41 2024 +0200
cmd/go: replace reflect.DeepEqual with slices.Equal and maps.Equal
All of these maps and slices are made up of comparable types,
so we can avoid the overhead of reflection entirely.
Change-Id: If77dbe648a336ba729c171e84c9ff3f7e160297d
Reviewed-on: https://go-review.googlesource.com/c/go/+/574597
Reviewed-by: Than McIntosh <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
```

## Reverted commit

```
commit 98c9f271d67b501ecf2ce995539abd2cdc81d505
Author: Russ Cox <[email protected]>
Date: Wed Jun 28 17:45:26 2023 -0400
regexp/syntax: use more compact Regexp.String output
Compact the Regexp.String output. It was only ever intended for debugging,
but there are at least some uses in the wild where regexps are built up
using regexp/syntax and then formatted using the String method.
Compact the output to help that use case. Specifically:
- Compact 2-element character class ranges: [a-b] -> [ab].
- Aggregate flags: (?i:A)(?i:B)*(?i:C)|(?i:D)?(?i:E) -> (?i:AB*C|D?E).
Fixes #57950.
Change-Id: I1161d0e3aa6c3ae5a302677032bb7cd55caae5fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/507015
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Than McIntosh <[email protected]>
Run-TryBot: Russ Cox <[email protected]>
Reviewed-by: Rob Pike <[email protected]>
Auto-Submit: Russ Cox <[email protected]>
```
51 changes: 51 additions & 0 deletions internal/syntaxutil/alias_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
package syntaxutil

import "regexp/syntax"

// A bunch of aliases to avoid needing to modify parse_test.go too much.

type Regexp = syntax.Regexp

type Op = syntax.Op

const (
OpNoMatch = syntax.OpNoMatch
OpEmptyMatch = syntax.OpEmptyMatch
OpLiteral = syntax.OpLiteral
OpCharClass = syntax.OpCharClass
OpAnyCharNotNL = syntax.OpAnyCharNotNL
OpAnyChar = syntax.OpAnyChar
OpBeginLine = syntax.OpBeginLine
OpEndLine = syntax.OpEndLine
OpBeginText = syntax.OpBeginText
OpEndText = syntax.OpEndText
OpWordBoundary = syntax.OpWordBoundary
OpNoWordBoundary = syntax.OpNoWordBoundary
OpCapture = syntax.OpCapture
OpStar = syntax.OpStar
OpPlus = syntax.OpPlus
OpQuest = syntax.OpQuest
OpRepeat = syntax.OpRepeat
OpConcat = syntax.OpConcat
OpAlternate = syntax.OpAlternate
)

type Flags = syntax.Flags

const (
FoldCase = syntax.FoldCase
Literal = syntax.Literal
ClassNL = syntax.ClassNL
DotNL = syntax.DotNL
OneLine = syntax.OneLine
NonGreedy = syntax.NonGreedy
PerlX = syntax.PerlX
UnicodeGroups = syntax.UnicodeGroups
WasDollar = syntax.WasDollar
Simple = syntax.Simple
MatchNL = syntax.MatchNL
Perl = syntax.Perl
POSIX = syntax.POSIX
)

var Parse = syntax.Parse
Loading

0 comments on commit 4b30df3

Please sign in to comment.