Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctags: monitor symbol analysis and report stuck documents #678

Merged
merged 1 commit into from
Nov 6, 2023

Conversation

keegancsmith
Copy link
Member

This adds a monitor which will report every minute the progress of symbol analysis. Additionally, if a document is taking too long to analyse (10s) we report it.

At first this is just reporting via stdlog. However, once we are comfortable with thresholds around this we can likely also include a way to kill analysis for a file.

Test Plan: Adjusted monitorReportStatus to 1s then indexed the sourcegraph repo and inspected the output

$ go run ./cmd/zoekt-git-index -require_ctags ../sourcegraph/
2023/11/03 16:03:10 attempting to index 14533 total files
2023/11/03 16:03:13 DEBUG: symbol analysis still running for shard statistics: duration=1s symbols=15805 bytes=44288971
2023/11/03 16:03:14 DEBUG: symbol analysis still running for shard statistics: duration=2s symbols=26189 bytes=51564417
2023/11/03 16:03:15 DEBUG: symbol analysis still running for shard statistics: duration=3s symbols=55613 bytes=64748084
2023/11/03 16:03:16 DEBUG: symbol analysis still running for shard statistics: duration=4s symbols=86557 bytes=93771404
2023/11/03 16:03:17 DEBUG: symbol analysis still running for shard statistics: duration=5s symbols=125352 bytes=116319453
2023/11/03 16:03:18 symbol analysis finished for shard statistics: duration=5s symbols=142951 bytes=129180023
2023/11/03 16:03:22 finished shard github.com%2Fsourcegraph%2Fsourcegraph_v16.00000.zoekt: 283983298 index bytes (overhead 2.8), 14533 files processed

I then added a random sleep for a minute in a file to see the stuck reporting:

$ go run ./cmd/zoekt-git-index -require_ctags ../sourcegraph/
2023/11/03 16:14:57 attempting to index 14533 total files
2023/11/03 16:15:15 WARN: symbol analysis for README.md (3485 bytes) has been running for 14s
2023/11/03 16:15:25 WARN: symbol analysis for README.md (3485 bytes) has been running for 24s
2023/11/03 16:15:45 WARN: symbol analysis for README.md (3485 bytes) has been running for 44s
2023/11/03 16:16:00 DEBUG: symbol analysis still running for shard statistics: duration=1m0s symbols=958 bytes=624329
2023/11/03 16:16:00 symbol analysis for README.md (size 3485 bytes) is done and found 4 symbols
2023/11/03 16:16:06 symbol analysis finished for shard statistics: duration=1m5s symbols=142951 bytes=129180023
2023/11/03 16:16:10 finished shard github.com%2Fsourcegraph%2Fsourcegraph_v16.00000.zoekt: 283983299 index bytes (overhead 2.8), 14533 files processed

This adds a monitor which will report every minute the progress of
symbol analysis. Additionally, if a document is taking too long to
analyse (10s) we report it.

At first this is just reporting via stdlog. However, once we are
comfortable with thresholds around this we can likely also include a way
to kill analysis for a file.

Test Plan: Adjusted monitorReportStatus to 1s then indexed the
sourcegraph repo and inspected the output

  $ go run ./cmd/zoekt-git-index -require_ctags ../sourcegraph/
  2023/11/03 16:03:10 attempting to index 14533 total files
  2023/11/03 16:03:13 DEBUG: symbol analysis still running for shard statistics: duration=1s symbols=15805 bytes=44288971
  2023/11/03 16:03:14 DEBUG: symbol analysis still running for shard statistics: duration=2s symbols=26189 bytes=51564417
  2023/11/03 16:03:15 DEBUG: symbol analysis still running for shard statistics: duration=3s symbols=55613 bytes=64748084
  2023/11/03 16:03:16 DEBUG: symbol analysis still running for shard statistics: duration=4s symbols=86557 bytes=93771404
  2023/11/03 16:03:17 DEBUG: symbol analysis still running for shard statistics: duration=5s symbols=125352 bytes=116319453
  2023/11/03 16:03:18 symbol analysis finished for shard statistics: duration=5s symbols=142951 bytes=129180023
  2023/11/03 16:03:22 finished shard github.com%2Fsourcegraph%2Fsourcegraph_v16.00000.zoekt: 283983298 index bytes (overhead 2.8), 14533 files processed

I then added a random sleep for a minute in a file to see the stuck
reporting:

  $ go run ./cmd/zoekt-git-index -require_ctags ../sourcegraph/
  2023/11/03 16:14:57 attempting to index 14533 total files
  2023/11/03 16:15:15 WARN: symbol analysis for README.md (3485 bytes) has been running for 14s
  2023/11/03 16:15:25 WARN: symbol analysis for README.md (3485 bytes) has been running for 24s
  2023/11/03 16:15:45 WARN: symbol analysis for README.md (3485 bytes) has been running for 44s
  2023/11/03 16:16:00 DEBUG: symbol analysis still running for shard statistics: duration=1m0s symbols=958 bytes=624329
  2023/11/03 16:16:00 symbol analysis for README.md (size 3485 bytes) is done and found 4 symbols
  2023/11/03 16:16:06 symbol analysis finished for shard statistics: duration=1m5s symbols=142951 bytes=129180023
  2023/11/03 16:16:10 finished shard github.com%2Fsourcegraph%2Fsourcegraph_v16.00000.zoekt: 283983299 index bytes (overhead 2.8), 14533 files processed
@keegancsmith keegancsmith requested a review from a team November 3, 2023 14:23
Copy link
Member

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! It will be super useful for debugging slow reindexes.

I noticed we have a lockedParser struct, has some conceptual overlap. For example, it uses a timer to enforce a parse timeout of 1 minute. I wonder if it'd make sense to fold some of this logic into that wrapper struct.

case <-statusTicker.C:
now := time.Now()
m.mu.Lock()
log.Printf("DEBUG: symbol analysis still running for shard statistics: duration=%v symbols=%d bytes=%d", now.Sub(m.start).Truncate(time.Second), m.totalSymbols, m.totalSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't have the concept of log level, we just use these informal prefixes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have our structured logger, but it isn't used in our zoekt-git-index binary. Given this is only consumed by humans (at the moment) I stuck with the builtin logger. But zoekt should unify its logging. I would suggest using the new stdlib structured logger with an adapter for sourcegraph/log.

@keegancsmith
Copy link
Member Author

This is great! It will be super useful for debugging slow reindexes.

I noticed we have a lockedParser struct, has some conceptual overlap. For example, it uses a timer to enforce a parse timeout of 1 minute. I wonder if it'd make sense to fold some of this logic into that wrapper struct.

nice. indeed that does make sense to somehow use together. Funnily enough it looks like I implemented the lockedParser and forgot it exists. What is surprising to me is that it errors out the whole build if it comes across parsing which lasts longer than a minute and yet we see what looks like stuck builds. Maybe the logic is broken somehow? Either way the logic here will give overall status updates which should help us understand.

I think I will leave this as independent for now since this requires the view of time taken to parse symbols of a full shard, while the lockedParser seems to be created once per process (which I will file a follow up issue about).

@keegancsmith
Copy link
Member Author

Here is the issue about lockedParser https://github.com/sourcegraph/sourcegraph/issues/58112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants