-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing: use one ctags process per shard #702
Conversation
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package ctags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made sure to separate the rename from parser_map.go
-> parser_factory.go
into its own commit, to make it easier to see what changed.
continue | ||
} | ||
parsers[parserKind] = parser | ||
defer parser.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit messy. In a follow up I plan to:
- Clean up
lockedParser
, removing unneeded synchronization and simplifying timeout logic - Abstract out this multiplexing logic into a parser wrapper, so it delegates to the right sub-parser process
2cbe5ec
to
5d480d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In sourcegraph/zoekt#702, we updated indexserver to parse symbols in parallel by spawning a new ctags process per shard. By default, indexing uses all available CPUs to create shards in parallel, so now it will create many more processes than before. As a safeguard, we're exposing a site config setting to reduce the indexing concurrency. It's not intended to be set by users, but will let us experiment and make sure the defaults are solid. As part of this change, I bumped the Zoekt dependency to pull in the change to IndexOptions.
This change cleans up the Go ctags parser wrapper as a follow-up to #702. Specific changes: * Remove synchronization in `lockedParser` and rename it to `CTagsParser` * Push delegation to universal vs. SCIP ctags into parser wrapper * Simplify document timeout logic * Rename some files
In sourcegraph/zoekt#702, we updated indexserver to parse symbols in parallel by spawning a new ctags process per shard. By default, indexing uses all available CPUs to create shards in parallel, so now it will create many more processes than before. As a safeguard, we're exposing a site config setting to reduce the indexing concurrency. It's not intended to be set by users, but will let us experiment and make sure the defaults are solid. As part of this change, I bumped the Zoekt dependency to pull in the change to IndexOptions.
Currently, we use a single ctags process for indexing an entire repository.
Even though we build shards in parallel, they all share the same (single
threaded) ctags process. Since ctags is one of the most expensive parts of
shard building, this creates a bottleneck that can really slow down indexing.
This change proposes to launch a new ctags process per shard. For
sgtest/megarepo
, this speeds up indexing by almost 2x (enabling scip-ctagsand setting
-parallelism=4
):Addresses https://github.com/sourcegraph/sourcegraph/issues/58112