Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
matchtree: call prepare on symbolRegexpMatchTree subtree (#685)
This was a huge oversight that has lived in our codebase since we introduced symbolRegexpMatchTree. Because we don't call prepare, we don't correctly use the index for symbol regex queries. From some local testing this makes a huge difference to performance. Huge shout-out to @camdencheek who spotted this. Test Plan: validated with some local searches that results remain the same and that the statistics for the searches go up for IndexBytesLoaded, but go down for ContentBytesLoaded, FilesConsidered, FilesLoaded, etc. Added unit tests which assert the index is used. Also perf tested with hyperfine. Hyperfine results: Benchmark 1: ./zoekt-before -sym '^searcher$' Time (mean ± σ): 93.0 ms ± 1.2 ms [User: 142.2 ms, System: 18.9 ms] Range (min … max): 90.8 ms … 95.6 ms 31 runs Benchmark 2: ./zoekt-after -sym '^searcher$' Time (mean ± σ): 52.3 ms ± 0.5 ms [User: 76.3 ms, System: 13.0 ms] Range (min … max): 50.7 ms … 53.4 ms 53 runs Summary './zoekt-after -sym '^searcher$'' ran 1.78 ± 0.03 times faster than './zoekt-before -sym '^searcher$'' For that search, a random comparison of the zoekt stats: | Stat | Before | After | Delta | |---------------------- |---------- |--------- |----------- | | ContentBytesLoaded | 199007382 | 22566033 | -176441349 | | IndexBytesLoaded | 3527 | 165645 | 162118 | | Crashes | 0 | 0 | 0 | | Duration | 57956167 | 17568708 | -40387459 | | FileCount | 28 | 28 | 0 | | ShardFilesConsidered | 0 | 0 | 0 | | FilesConsidered | 28477 | 766 | -27711 | | FilesLoaded | 28477 | 766 | -27711 | | FilesSkipped | 0 | 0 | 0 | | ShardsScanned | 5 | 5 | 0 | | ShardsSkipped | 0 | 0 | 0 | | ShardsSkippedFilter | 0 | 0 | 0 | | MatchCount | 29 | 29 | 0 | | NgramMatches | 87 | 4407 | 4320 | | NgramLookups | 644 | 644 | 0 | | Wait | 5792 | 11500 | 5708 | | MatchTreeConstruction | 498042 | 515248 | 17206 | | MatchTreeSearch | 97661875 | 23089418 | -74572457 | Analysis: An absolutely massive reduction in the number of files we consider. This means we are actually using the index properly. eg look at ContentBytesLoaded, Duration, FilesConsidered, FilesLoaded. You can also see that IndexBytesLoaded has gone up since we now use it properly. This was on a small corpus so will have huge impact in production. Note that the random changes Wait, MatchTreeConstruction are random, but the MatchTreeSearch change is a big deal since that is time spent searching after analysing a query.
- Loading branch information