Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

all: use a faster vendored regexp/syntax/Regexp.String #753

Merged
merged 1 commit into from
Apr 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions internal/syntaxutil/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# vendored std regexp/syntax

This package contains a vendored copy of std regexp/syntax. However, it only
contains the code for converting syntax.Regexp into a String. It is the
version of the code at a recent go commit, but with a commit which introduces
a significant performance regression reverted.

At the time of writing regexp.String on go1.22 is taking 40% of CPU at
Sourcegraph. This should return to ~0% with this vendored code.

https://github.com/sourcegraph/sourcegraph/issues/61462

## Vendored commit

```
commit 2e1003e2f7e42efc5771812b9ee6ed264803796c
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions

  1. Should we file this as a perf bug on the go package itself also?
  2. What functionality do we lose rolling back that commit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth filing an issue just for the discussion. But I suspect they will close it as wontfix since apparently String() is only meant to be used for debug. However, there is a non-debug case where you wan't to take a syntax.Regexp and get a compiled Regexp (which is our use case). So that could atleast spark discussion for a new API to use.

The functionality lost is honestly not important. The "debug" strings are nicer for a human to look at... but not that different. In practice for us no change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keegancsmith Thanks and great tactical change

Copy link
Member

@jtibshirani jtibshirani Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to filing an issue when you have the time, always nice to engage with upstream projects and maybe it will help us avoid the need for a future fork!

Author: Daniel Martí <[email protected]>
Date: Tue Mar 26 22:59:41 2024 +0200

cmd/go: replace reflect.DeepEqual with slices.Equal and maps.Equal

All of these maps and slices are made up of comparable types,
so we can avoid the overhead of reflection entirely.

Change-Id: If77dbe648a336ba729c171e84c9ff3f7e160297d
Reviewed-on: https://go-review.googlesource.com/c/go/+/574597
Reviewed-by: Than McIntosh <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
```

## Reverted commit

```
commit 98c9f271d67b501ecf2ce995539abd2cdc81d505
Author: Russ Cox <[email protected]>
Date: Wed Jun 28 17:45:26 2023 -0400

regexp/syntax: use more compact Regexp.String output

Compact the Regexp.String output. It was only ever intended for debugging,
but there are at least some uses in the wild where regexps are built up
using regexp/syntax and then formatted using the String method.
Compact the output to help that use case. Specifically:

- Compact 2-element character class ranges: [a-b] -> [ab].
- Aggregate flags: (?i:A)(?i:B)*(?i:C)|(?i:D)?(?i:E) -> (?i:AB*C|D?E).

Fixes #57950.

Change-Id: I1161d0e3aa6c3ae5a302677032bb7cd55caae5fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/507015
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Than McIntosh <[email protected]>
Run-TryBot: Russ Cox <[email protected]>
Reviewed-by: Rob Pike <[email protected]>
Auto-Submit: Russ Cox <[email protected]>
```
51 changes: 51 additions & 0 deletions internal/syntaxutil/alias_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
package syntaxutil

import "regexp/syntax"

// A bunch of aliases to avoid needing to modify parse_test.go too much.

type Regexp = syntax.Regexp

type Op = syntax.Op

const (
OpNoMatch = syntax.OpNoMatch
OpEmptyMatch = syntax.OpEmptyMatch
OpLiteral = syntax.OpLiteral
OpCharClass = syntax.OpCharClass
OpAnyCharNotNL = syntax.OpAnyCharNotNL
OpAnyChar = syntax.OpAnyChar
OpBeginLine = syntax.OpBeginLine
OpEndLine = syntax.OpEndLine
OpBeginText = syntax.OpBeginText
OpEndText = syntax.OpEndText
OpWordBoundary = syntax.OpWordBoundary
OpNoWordBoundary = syntax.OpNoWordBoundary
OpCapture = syntax.OpCapture
OpStar = syntax.OpStar
OpPlus = syntax.OpPlus
OpQuest = syntax.OpQuest
OpRepeat = syntax.OpRepeat
OpConcat = syntax.OpConcat
OpAlternate = syntax.OpAlternate
)

type Flags = syntax.Flags

const (
FoldCase = syntax.FoldCase
Literal = syntax.Literal
ClassNL = syntax.ClassNL
DotNL = syntax.DotNL
OneLine = syntax.OneLine
NonGreedy = syntax.NonGreedy
PerlX = syntax.PerlX
UnicodeGroups = syntax.UnicodeGroups
WasDollar = syntax.WasDollar
Simple = syntax.Simple
MatchNL = syntax.MatchNL
Perl = syntax.Perl
POSIX = syntax.POSIX
)

var Parse = syntax.Parse
Loading
Loading