Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing: use one ctags process per shard #702

Merged
merged 5 commits into from
Nov 22, 2023
Merged

Conversation

jtibshirani
Copy link
Member

@jtibshirani jtibshirani commented Nov 20, 2023

Currently, we use a single ctags process for indexing an entire repository.
Even though we build shards in parallel, they all share the same (single
threaded) ctags process. Since ctags is one of the most expensive parts of
shard building, this creates a bottleneck that can really slow down indexing.

This change proposes to launch a new ctags process per shard. For
sgtest/megarepo, this speeds up indexing by almost 2x (enabling scip-ctags
and setting -parallelism=4):

  • Before: took 4 min 48 sec to index repo
  • After: took 2 min 30 sec to index repo

Addresses https://github.com/sourcegraph/sourcegraph/issues/58112

// See the License for the specific language governing permissions and
// limitations under the License.

package ctags
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made sure to separate the rename from parser_map.go -> parser_factory.go into its own commit, to make it easier to see what changed.

continue
}
parsers[parserKind] = parser
defer parser.Close()
Copy link
Member Author

@jtibshirani jtibshirani Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit messy. In a follow up I plan to:

  • Clean up lockedParser, removing unneeded synchronization and simplifying timeout logic
  • Abstract out this multiplexing logic into a parser wrapper, so it delegates to the right sub-parser process

@jtibshirani jtibshirani marked this pull request as ready for review November 21, 2023 22:41
Copy link
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jtibshirani jtibshirani merged commit 5c51794 into main Nov 22, 2023
8 checks passed
@jtibshirani jtibshirani deleted the jtibs/parallel-ctags branch November 22, 2023 22:15
jtibshirani added a commit to sourcegraph/sourcegraph-public-snapshot that referenced this pull request Nov 27, 2023
In sourcegraph/zoekt#702, we updated indexserver to parse symbols in parallel
by spawning a new ctags process per shard. By default, indexing uses all
available CPUs to create shards in parallel, so now it will create many more
processes than before.

As a safeguard, we're exposing a site config setting to reduce the indexing
concurrency. It's not intended to be set by users, but will let us experiment
and make sure the defaults are solid.

As part of this change, I bumped the Zoekt dependency to pull in the change to
IndexOptions.
jtibshirani added a commit that referenced this pull request Dec 6, 2023
This change cleans up the Go ctags parser wrapper as a follow-up to #702. Specific changes:
* Remove synchronization in `lockedParser` and rename it to `CTagsParser` 
* Push delegation to universal vs. SCIP ctags into parser wrapper
* Simplify document timeout logic
* Rename some files
vovakulikov pushed a commit to sourcegraph/sourcegraph-public-snapshot that referenced this pull request Dec 12, 2023
In sourcegraph/zoekt#702, we updated indexserver to parse symbols in parallel
by spawning a new ctags process per shard. By default, indexing uses all
available CPUs to create shards in parallel, so now it will create many more
processes than before.

As a safeguard, we're exposing a site config setting to reduce the indexing
concurrency. It's not intended to be set by users, but will let us experiment
and make sure the defaults are solid.

As part of this change, I bumped the Zoekt dependency to pull in the change to
IndexOptions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants