Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

zoekt: only one ctags process per build #58112

Closed
Tracked by #58133
keegancsmith opened this issue Nov 6, 2023 · 4 comments
Closed
Tracked by #58133

zoekt: only one ctags process per build #58112

keegancsmith opened this issue Nov 6, 2023 · 4 comments
Assignees

Comments

@keegancsmith
Copy link
Member

I came across this while reading the zoekt sourcecode. Currently we create one ctags process per process. We call NewParser once at the creation of a builder. It then is shared amongst the shards via lockedParser. From what I have seen in local testing symbol parsing is our bottleneck which means we effectively never index concurrenctly.

This means some of our advice at reducing concurrency with the expectation that ctags is the problem is likely incorrect. I think before increasing concurrency here we likely need to instrument our ctags code such that we can monitor memory usage better. Additionally we need to make our use of ctags more robust.

Additionally I thought there may be issues in what happens if we hit a timeout. Luckily we correctly clean up and will restart the ctags parser on the next call. However we always have CTagsMustSucceed set so in practice we abort indexing at that point.

cc @sourcegraph/search-platform

@keegancsmith
Copy link
Member Author

We discussed this a bit in our sync yesterday. Given that disabling symbols has often unblocked issues at customers we are not sure if blanket enabling concurrency is safe here. So if we do add concurrency here we likely will do it carefully.

Meta: @jtibshirani I see you remove this from our project board. Is that intentional?

@jtibshirani
Copy link
Member

@keegancsmith sorry I missed your comment! I removed it from the board because I'm tracking it under the umbrella issue #58133 and wanted to declutter.

I opened a draft change here: sourcegraph/zoekt#702. Although it's a bit risky changing this, it feels like the right move to me.

  • CPU usage: parallelism is bounded by the available CPUs. With the change, we can have at most 2 * parallelism ctags processes (one for universal-ctags, one for scip-ctags), but only 1 * parallelism of them actively processing a document at one time. As of #699, we respect this limit even when index_concurrency is set (like for dot com).
  • Memory: from profiling on megarepo, each process takes 20-100MB of memory. Client-side, parsing and converting tags is also not a big memory consumer.

I think before increasing concurrency here we likely need to instrument our ctags code such that we can monitor memory usage better. Additionally we need to make our use of ctags more robust.

We discussed this a bit in our sync yesterday. Given that disabling symbols has often unblocked issues at customers we are not sure if blanket enabling concurrency is safe here. So if we do add concurrency here we likely will do it carefully.

My best understanding is that disabling ctags helped not because ctags itself was causing problems, but rather other memory issues:

  • We accidentally passed skipped, binary/ non-source docs to ctags (fixed in #686)
  • When shard building takes a long time, we can buffer up way too many docs in memory and OOM (now fixed in #689)

How do you feel about it? Are there safeguards or monitoring you'd like to see to make this change less scary? One idea is to introduce an overall parallelism limit of something like 4 to start, so we don't immediately start spawning 64 processes, etc. on some customer machines :)

@keegancsmith
Copy link
Member Author

How do you feel about it? Are there safeguards or monitoring you'd like to see to make this change less scary? One idea is to introduce an overall parallelism limit of something like 4 to start, so we don't immediately start spawning 64 processes, etc. on some customer machines :)

Yeah I agree, the root cause was more than likely what you identified. I think if we see much better behaviour in megarepo and gigarepo in practice then we can be confident. It feels good to put some guardrails in given we will go from 1 to many and that the process can use upto 100mb.

I think something we may be missing is an easy way to control concurrency. Right now I believe it is a command line flag, which is hard to adjust. It would be good if we had a nicer knob to tune here like an envvar or even a site config setting.

@jtibshirani
Copy link
Member

jtibshirani commented Dec 5, 2023

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants