-
Notifications
You must be signed in to change notification settings - Fork 1.3k
zoekt: only one ctags process per build #58112
Comments
We discussed this a bit in our sync yesterday. Given that disabling symbols has often unblocked issues at customers we are not sure if blanket enabling concurrency is safe here. So if we do add concurrency here we likely will do it carefully. Meta: @jtibshirani I see you remove this from our project board. Is that intentional? |
@keegancsmith sorry I missed your comment! I removed it from the board because I'm tracking it under the umbrella issue #58133 and wanted to declutter. I opened a draft change here: sourcegraph/zoekt#702. Although it's a bit risky changing this, it feels like the right move to me.
My best understanding is that disabling ctags helped not because ctags itself was causing problems, but rather other memory issues:
How do you feel about it? Are there safeguards or monitoring you'd like to see to make this change less scary? One idea is to introduce an overall parallelism limit of something like 4 to start, so we don't immediately start spawning 64 processes, etc. on some customer machines :) |
Yeah I agree, the root cause was more than likely what you identified. I think if we see much better behaviour in megarepo and gigarepo in practice then we can be confident. It feels good to put some guardrails in given we will go from 1 to many and that the process can use upto 100mb. I think something we may be missing is an easy way to control concurrency. Right now I believe it is a command line flag, which is hard to adjust. It would be good if we had a nicer knob to tune here like an envvar or even a site config setting. |
I'm closing this, since we implemented parallel ctags parsing in sourcegraph/zoekt#702. Other relevant changes: |
I came across this while reading the zoekt sourcecode. Currently we create one ctags process per process. We call NewParser once at the creation of a builder. It then is shared amongst the shards via lockedParser. From what I have seen in local testing symbol parsing is our bottleneck which means we effectively never index concurrenctly.
This means some of our advice at reducing concurrency with the expectation that ctags is the problem is likely incorrect. I think before increasing concurrency here we likely need to instrument our ctags code such that we can monitor memory usage better. Additionally we need to make our use of ctags more robust.
Additionally I thought there may be issues in what happens if we hit a timeout. Luckily we correctly clean up and will restart the ctags parser on the next call. However we always have CTagsMustSucceed set so in practice we abort indexing at that point.
cc @sourcegraph/search-platform
The text was updated successfully, but these errors were encountered: