Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunkmatches: reuse last calculated column when filling #711

Merged
merged 3 commits into from
Jan 10, 2024

Conversation

keegancsmith
Copy link
Member

@keegancsmith keegancsmith commented Jan 9, 2024

This change uses the fact that candidate matches should be increasing in byte offset, to avoid recounting runes on a line. Before this change if you have many matches on the same line we would call utf8.RuneCount for each match, which is a O(nm) algorithm where n is your line length and m is the number of matches. After this change the complexity is O(n).

I came across this while investigating slow performance for searching the string "dev" on s2 taking 2s if the match limits where 100k instead of 10k. With 10k it would take 0.04s. It turns out with the larger limit we ended up searching a file were the word dev appeared many times on one line. Running a profiler against the service came up with 96% of CPU time in utf8.RuneCount.

This commit adds a benchmark for the helper introduced to reuse RuneCounts. Unsurprisingly the difference is massive between O(nm) and O(n) :)

name             old time/op  new time/op  delta
ColumnHelper-32   299ms ± 2%     0ms ± 2%  -99.97%  (p=0.000 n=10+10)

See details in a comment below for how I obtained the profiles and the information from them.

Test Plan: Added tests and benchmarks.

This doesn't change the logic, it just moves it into a struct so I can
make it smarter. Additionally we add a benchmark and test in this
commit. The next commit will contain the perf improvement.
O(nm) to O(n):

  benchstat before.txt after.txt
  name             old time/op  new time/op  delta
  ColumnHelper-32   299ms ± 2%     0ms ± 2%  -99.97%  (p=0.000 n=10+10)
@keegancsmith
Copy link
Member Author

Here are my journal notes for the pprof part of this investigation. It contains useful information for future performance debugging.

[2024-01-09 Tue 10:51] Gonna try grab a pprof during the search. It is only 2 seconds, so unsure how useful it will be. Maybe can try adding a loop.

I can just visit https://sourcegraph.sourcegraph.com/-/debug/proxies/indexed-search-0/debug/pprof/profile

I then used devtools to Copy as cURL the request. I then did the same for making the search request. For both I copied into a bash function and added the --silent --show-error flags for my sanity. The script looked like this, curl flags removed.

#!/usr/bin/env bash

set -e

function fetch_profile {
    echo "profile start"
    curl 'https://sourcegraph.sourcegraph.com/-/debug/proxies/indexed-search-0/debug/pprof/profile' > /tmp/cpu.pprof
    echo "profile done"
}

function search {
    echo "search start"
    curl 'https://sourcegraph.sourcegraph.com/search/stream?q=context%3Aglobal%20repo%3Agithub.com%2Fsourcegraph%2Fsourcegraph%20%20content%3A%22dev%22%20&v=V3&t=newStandardRC1&sm=0&display=1500&cm=t&trace=1&feat=search-debug' > /dev/null
    echo "search done"
}

fetch_profile &

while jobs %%; do
    search
done

Next thing I needed was the zoekt binary used. I did this by getting the version of s2 at https://sourcegraph.sourcegraph.com/__version and then getting the binary from the docker container:

docker pull sourcegraph/indexed-searcher:257084_2024-01-09_5.2-9efa6c7e2efb
docker create sourcegraph/indexed-searcher:257084_2024-01-09_5.2-9efa6c7e2efb
docker cp 74ad840ae6a8f51ddec7ff4a382660ca8a62bb4d47d7730f355d71f9b68cde15:/usr/local/bin/zoekt-webserver /tmp/
docker rm -v 74ad840ae6a8f51ddec7ff4a382660ca8a62bb4d47d7730f355d71f9b68cde15
go tool pprof -http 127.0.0.1:6062 zoekt-webserver cpu.pprof

Then so I could use source code listing

go tool pprof -trim_path external/com_github_sourcegraph_zoekt/ -source_path ~/src/github.com/sourcegraph/zoekt zoekt-webserver cpu.pprof

Turns out we spend 96.50% in utf8.RuneCount inside of fillContentChunkMatches. This is an issue only with chunk matches, not the original format. We calculate the column, and it has a hidden O(n^2) algorithm in it! If you have a long line, we basically do a O(n) operation on that line per match, where n is the line length.

Column: uint32(utf8.RuneCount(data[startLineOffset:startOffset]) + 1),

(pprof) list fillContentChunkMatches
Total: 25.15s
ROUTINE ======================== github.com/sourcegraph/zoekt.(*contentProvider).fillContentChunkMatches in contentprovider.go
      10ms     24.36s (flat, cum) 96.86% of Total
         .          .    291:func (p *contentProvider) fillContentChunkMatches(ms []*candidateMatch, numContextLines int) []ChunkMatch {
         .       20ms    292:	newlines := p.newlines()
         .       30ms    293:	chunks := chunkCandidates(ms, newlines, numContextLines)
         .          .    294:	data := p.data(false)
         .          .    295:	chunkMatches := make([]ChunkMatch, 0, len(chunks))
         .          .    296:	for _, chunk := range chunks {
         .       20ms    297:		ranges := make([]Range, 0, len(chunk.candidates))
         .          .    298:		var symbolInfo []*Symbol
         .          .    299:		for i, cm := range chunk.candidates {
         .          .    300:			startOffset := cm.byteOffset
         .          .    301:			endOffset := cm.byteOffset + cm.byteMatchSz
      10ms       20ms    302:			startLine, startLineOffset, _ := newlines.atOffset(startOffset)
         .          .    303:			endLine, endLineOffset, _ := newlines.atOffset(endOffset)
         .          .    304:
         .          .    305:			ranges = append(ranges, Range{
         .          .    306:				Start: Location{
         .          .    307:					ByteOffset: startOffset,
         .          .    308:					LineNumber: uint32(startLine),
         .     12.42s    309:					Column:     uint32(utf8.RuneCount(data[startLineOffset:startOffset]) + 1),
         .          .    310:				},
         .          .    311:				End: Location{
         .          .    312:					ByteOffset: endOffset,
         .          .    313:					LineNumber: uint32(endLine),
         .     11.85s    314:					Column:     uint32(utf8.RuneCount(data[endLineOffset:endOffset]) + 1),
         .          .    315:				},
         .          .    316:			})
         .          .    317:
         .          .    318:			if cm.symbol {
         .          .    319:				if symbolInfo == nil {

Copy link
Member

@camdencheek camdencheek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow! Great find!

I think that equivalent logic exists in searcher. We should probably implement this same thing there as well.

// columnHelper is a helper struct which caches the number of runes last
// counted. If we naively use utf8.RuneCount for each match on a line, this
// leads to an O(nm) algorithm where m is the number of matches and n is the
// length of the line. Aassuming we our candidates are increasing in offset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// length of the line. Aassuming we our candidates are increasing in offset
// length of the line. Assuming we our candidates are increasing in offset

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to check: are we always sure we can assume our candidates are increasing in offset? I can't remember if this is always true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, reading the implementation, I guess we just fall back to the less performant version.

Copy link
Member

@jtibshirani jtibshirani Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this invariant is true, because in gatherMatches we always make sure to sort by byteOffset.

Maybe we could update the comments to make it clear this invariant is assumed, and treat the unsorted case as an error rather than being expected? That way if we ever introduce a bug here, we don't silently fall back to an O(n^2) algorithm... much harder to track down than a clear error in testing.

General thought: if invariants are too tricky to reason about, sometimes I just explicitly add a (re)sort! I believe Go's default sort is very fast when the input is already sorted. This bounds the worst case nicely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also checked the invariants. We also have the invariant that things don't overlap, which is also important since we lookup the end column.

The sorted invariant is actually quite important for other bits of code like chunkCandidates. So what I did was add a sorted check which loudly complains and then sorts if the invariant is broken.

Initially I pretended I was a haskell programmer and added a special type which guaranteed this, but TBH it felt quite overengineered. Happy to try it out if there is interest, but for now gonna merge with extra perf invariant documentation and sort check.

Copy link
Member

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to fix this!

// columnHelper is a helper struct which caches the number of runes last
// counted. If we naively use utf8.RuneCount for each match on a line, this
// leads to an O(nm) algorithm where m is the number of matches and n is the
// length of the line. Aassuming we our candidates are increasing in offset
Copy link
Member

@jtibshirani jtibshirani Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this invariant is true, because in gatherMatches we always make sure to sort by byteOffset.

Maybe we could update the comments to make it clear this invariant is assumed, and treat the unsorted case as an error rather than being expected? That way if we ever introduce a bug here, we don't silently fall back to an O(n^2) algorithm... much harder to track down than a clear error in testing.

General thought: if invariants are too tricky to reason about, sometimes I just explicitly add a (re)sort! I believe Go's default sort is very fast when the input is already sorted. This bounds the worst case nicely.

@jtibshirani
Copy link
Member

I think that equivalent logic exists in searcher. We should probably implement this same thing there as well.

Assigning this to myself! Said differently: please don't work on this concurrently, as it will cause a lot of conflicts with my in-progress searcher optimizations (https://github.com/sourcegraph/sourcegraph/issues/59038) :)

@keegancsmith keegancsmith merged commit 7487a0d into main Jan 10, 2024
8 checks passed
@keegancsmith keegancsmith deleted the k/chunk-matches-perf branch January 10, 2024 09:45
jtibshirani added a commit to sourcegraph/sourcegraph-public-snapshot that referenced this pull request Jan 12, 2024
This PR improves how searcher creates matches, making it more consistent with
how it's done in Zoekt.

Changes:
* Pull chunking logic out of structural search code and into its own file
`chunk.go`
* Remove overlapping ranges (this is what Zoekt does when chunk matches are
enabled)
* Optimize the column calculation using the same strategy from Zoekt ([zoekt#711](sourcegraph/zoekt#711))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants