-
Notifications
You must be signed in to change notification settings - Fork 1.3k
☂️ Search: improve Zoekt indexing #58133
Comments
meta: Nice. Seeing this makes me also want to start using tracking issues for sprints of work. |
Here are some profiling results for CPU
Takeaways:
Memory allocations
Takeaways:
Peak memory usage |
Here's a profile of memory usage after fixing some obvious issues, taken right after we finish building the 10th shard out of ~20). Peak memory usage
Takeaways:
|
If you want to experiment with removing go-git, or atleast avoid it for the heavy lifting you can see a few experiments I did here sourcegraph/zoekt#424 This was me a while ago experimenting with ideas around how to more efficiently get stuff off of gitserver for searching/indexing. |
Documenting the results of profiling universal-ctags versus scip-ctags on Peak memory usage Processing time Takeaway: currently, the main benefit of scip-ctags is its superior symbol quality, not its resource usage |
There is definitely more we can do here, but I'm closing this out as a "completed" round of work. Highlights:
Will file follow-up issues about better observability in case of OOMs and about trying GOMEMLIMIT. |
Here's a rough formula for calculating the peak memory usage of Zoekt indexserver:
Total: ~400MB * (num_threads) + 2GB |
Zoekt can sometimes fail to index large repos because of timeouts or memory issues. This can result in missing or out-of-date search results. There’s also little visibility into the indexing process: we don't report progress or surface errors clearly, and we don't have good observability tools for debugging problems. This issue tracks a round of improvements we want to make to search indexing.
Indexing performance
Indexing observability
Squash bugs
/cc @sourcegraph/search-platform
The text was updated successfully, but these errors were encountered: