shards: respect scheduler and use smarter synchronization for List #750

keegancsmith · 2024-03-26T14:35:49Z

Previously List would never call proc.Yield, which broke the co-operative scheduler in the case of slow List calls. Additionally, we used a naive concurrency (a large buffered channel) which shows up in the profiler as 30% of CPU spent on chan_ related operations under List.

This commit follows how Search used to respect proc.Yield. See sched.go in 90ed7bf. We did not copy Search since it uses a more complicated implementation than we need since it supports streaming, while List is still batch only.

We needed to use errgroup to ensure we drained all channels in the case of an error. Previously we did not need to do this since the channels had a buffer size of len(shards), which gaurenteed nothing would ever block. Now channels are never larger than the number of workers (<= GOMAXPROCS).

Test Plan: go test covers the no error cases. In the case of errors we manually tested by running zoekt-webserver and adding a random context cancellation. We observed the error being reported and no List goroutines running.

Stacked on #749

stefanhengl · 2024-03-26T14:51:32Z

Previously List would never call proc.Yield, which broke the co-operative scheduler in the case of slow List calls
😮

stefanhengl · 2024-03-26T15:07:33Z

Additionally, we used a naive concurrency (a large buffered channel) which shows up in the profiler as 30% of CPU spent on chan_ related operations under List.

I see that the new implementation has a much better concurrency, but shouldn't we have similar chan-related operations?

Previously List would never call proc.Yield, which broke the co-operative scheduler in the case of slow List calls. Additionally, we used a naive concurrency (a large buffered channel) which shows up in the profiler as 30% of CPU spent on chan_ related operations under List. This commit follows how Search used to respect proc.Yield. See sched.go in 90ed7bf. We did not copy Search since it uses a more complicated implementation than we need since it supports streaming, while List is still batch only. We needed to use errgroup to ensure we drained all channels in the case of an error. Previously we did not need to do this since the channels had a buffer size of len(shards), which gaurenteed nothing would ever block. Now channels are never larger than the number of workers (<= GOMAXPROCS). Test Plan: go test covers the no error cases. In the case of errors we manually tested by running zoekt-webserver and adding a random context cancellation. We observed the error being reported and no List goroutines running. Co-authored-by: William Bezuidenhout <[email protected]>

keegancsmith · 2024-03-26T15:12:44Z

Additionally, we used a naive concurrency (a large buffered channel) which shows up in the profiler as 30% of CPU spent on chan_ related operations under List.

I see that the new implementation has a much better concurrency, but shouldn't we have similar chan-related operations?

I'm not sure if go has poorer performance with very large buffered channels vs smaller ones. Given that we don't see chan_ operations appear for normal search in the flamegraphs makes me think that is the case? But this is totally unconfirmed.

A more likely explanation is that now that we are doing selectRepoSet (like for normal search) we just send much less work over the channels. So essentially the use of smaller buffered channels doesn't actually help.

However, the most important part of this change is the proc.Yield. And while I was introducing that I thought we might as well use a similar pattern for concurrency control.

keegancsmith · 2024-03-26T15:25:34Z

Manual testing results:

zoekt on  k/concurrency [$!] ❄️ zoekt-env took 37s 
❯ git diff
diff --git a/shards/shards.go b/shards/shards.go
index aa384371..1f3011fa 100644
--- a/shards/shards.go
+++ b/shards/shards.go
@@ -951,7 +951,7 @@ func (ss *shardedSearcher) List(ctx context.Context, r query.Q, opts *zoekt.List
        g, ctx := errgroup.WithContext(ctx)
 
        // Bound work by number of CPUs.
-       workers := min(runtime.GOMAXPROCS(0), len(shards))
+       workers := min(runtime.GOMAXPROCS(0), len(shards), 2)
 
        var (
                feeder = make(chan zoekt.Searcher, workers)
@@ -976,7 +976,12 @@ func (ss *shardedSearcher) List(ctx context.Context, r query.Q, opts *zoekt.List
        // send results down all. If an error is encountered we cancel the errgroup.
        for range workers {
                g.Go(func() error {
+                       i := 0
                        for s := range feeder {
+                               i++
+                               if i > 1 {
+                                       return fmt.Errorf("boom")
+                               }
                                result, err := listOneShard(ctx, s, r, opts)
                                if err != nil {
                                        return err

zoekt on  k/concurrency [$!] ❄️ zoekt-env 
❯ go run ./cmd/zoekt-webserver -pprof
2024/03/26 17:25:07 loading 5 shard(s): github.com%2Fgolang%2Fgo_v16.00000.zoekt, github.com%2Fsourcegraph%2Fcody_v16.00000.zoekt, github.com%2Fsourcegraph%2Fsourcegraph_v16.00000.zoekt, github.com%2Fsourcegraph%2Fsymf-private_v16.00000.zoekt, github.com%2Fsourcegraph%2Fzoekt_v16.00000.zoekt

… List (#750)" This reverts commit 931974d. After monitoring production we are seeing increased latencies in List calls. Additionally we are seeing many more scheduler transitions into interactive queued. The only realistic cause of this was this commit, so we are reverting for now until further investigation. Test Plan: go test

… List (#750)" (#752) This reverts commit 931974d. After monitoring production we are seeing increased latencies in List calls. Additionally we are seeing many more scheduler transitions into interactive queued. The only realistic cause of this was this commit, so we are reverting for now until further investigation. Test Plan: go test

) Previously List would never call proc.Yield, which broke the co-operative scheduler in the case of slow List calls. Additionally, we used a naive concurrency (a large buffered channel) which shows up in the profiler as 30% of CPU spent on chan_ related operations under List. This commit follows how Search used to respect proc.Yield. See sched.go in 90ed7bf. We did not copy Search since it uses a more complicated implementation than we need since it supports streaming, while List is still batch only. We needed to use errgroup to ensure we drained all channels in the case of an error. Previously we did not need to do this since the channels had a buffer size of len(shards), which gaurenteed nothing would ever block. Now channels are never larger than the number of workers (<= GOMAXPROCS). Test Plan: go test covers the no error cases. In the case of errors we manually tested by running zoekt-webserver and adding a random context cancellation. We observed the error being reported and no List goroutines running. Co-authored-by: William Bezuidenhout <[email protected]> (cherry picked from commit 931974d)

… List (#750)" (#752) This reverts commit 931974d. After monitoring production we are seeing increased latencies in List calls. Additionally we are seeing many more scheduler transitions into interactive queued. The only realistic cause of this was this commit, so we are reverting for now until further investigation. Test Plan: go test (cherry picked from commit 8cf8887)

keegancsmith requested a review from a team March 26, 2024 14:35

Base automatically changed from k/list to main March 26, 2024 15:06

keegancsmith force-pushed the k/concurrency branch from 57837ba to 753239b Compare March 26, 2024 15:07

stefanhengl approved these changes Mar 26, 2024

View reviewed changes

keegancsmith merged commit 931974d into main Mar 26, 2024
7 of 8 checks passed

keegancsmith deleted the k/concurrency branch March 26, 2024 15:27

keegancsmith mentioned this pull request Mar 27, 2024

Revert "shards: respect scheduler and use smarter synchronization for… #752

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shards: respect scheduler and use smarter synchronization for List #750

shards: respect scheduler and use smarter synchronization for List #750

keegancsmith commented Mar 26, 2024

stefanhengl commented Mar 26, 2024

stefanhengl commented Mar 26, 2024

keegancsmith commented Mar 26, 2024

keegancsmith commented Mar 26, 2024 •

edited

Loading

shards: respect scheduler and use smarter synchronization for List #750

shards: respect scheduler and use smarter synchronization for List #750

Conversation

keegancsmith commented Mar 26, 2024

stefanhengl commented Mar 26, 2024

stefanhengl commented Mar 26, 2024

keegancsmith commented Mar 26, 2024

keegancsmith commented Mar 26, 2024 • edited Loading

keegancsmith commented Mar 26, 2024 •

edited

Loading