You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, gazelle fix-update runs with a configurable option of an --index. Certain extensions can use the index to properly resolve imports in a repo. The ruleIndex is essentially a large structure containing all of the rules that have an associated Resolver.
If the index is enabled, the rule index must load and store information from the entire repository. If disabled, gazelle can rely (in most cases) on the # gazelle:prefix directive or -go_prefix flag to resolve dependencies.
There are two reasons this does not scale:
With index=true every gazelle run must rely on O(size of repo) rather than O(size of changes) to run because it must have a fully-populated index.
WIth index=false any exceptions to the go_prefix or # gazelle:prefix will require manual # gazelle:resolve statements added, which almost removes the purpose of automatic build file generation. Also, not all plugins or gazelle language extensions might work without an index.
Running gazelle in a repository should almost always be doable by running on just the files changed since a recent run. If a user is only modifying a small project, their gazelle invocation should only use and care about that part of the repository. Otherwise, the incrementally of gazelle is not there.
Gazelle Convention Checker
At Uber, we've solved this problem using a gazelle ConventionChecker interface, which is patched onto our gazelle binary. This convention checker works in the following manner.
A language extension can implement a Convention interface.
// Convention should be implemented by langauge extensions in order to
// register language specific convention logic with the convention checker.
type Convention interface {
// CheckConvention returns whether or not the rule information follows
// a known convention.
CheckConvention(c *config.Config, kind, imp, name, rel string) bool
}
If an import path follows the conventions expected by the Resolver of the interface, then the convention checker will return true. Otherwise, the ConventionChecker will add to list of rules which violate the convention.
Then, on Finish(, the ConventionChecker will write any rules that violate the convention as # gazelle:resolve directives in the top-level BUILD.bazel file. The "automatic resolves" are added to the top-level file to speed up lookup, and are added in a special location to avoid confusion with other parts of this file. They are added under a header "### AUTOMATIC RESOLVES ###" at the bottom of the file.
Optionally, these could be added somewhere else from some other file passed at runtime, but since they are technically part of the build configuration, using the BUILD.bazel file makes the most sense.
Why do this?
This has two clear benefits:
All gazelle runs are incremental, and do not need to have an index. Only the convention violations need to be loaded, and there is no manual intervention required for rules that don't follow convention.
The bottleneck of gazelle performance will either be the size of the change, or the number of # gazelle:resolve statements for convention violations. There can be 10000s of convention violations, but they will still be loaded much faster than walking 10000s of directories to load an index. Once they are loaded once, they exist in memory in a O(1) lookup map, so they can be easily retrieved.
The convention checker can also be enabled/disabled with a flag, so this optimization is optional for all users, and can be avoided if desired.
The text was updated successfully, but these errors were encountered:
This was mentioned in my proposal for lazy indexing (#1891) as an alternative method. I think that this conventions system offers a better way to get faster incremental results, as it allows repos to more easily implement their own custom plugins to form their own conventions. The "automatic resolves" act as a versioned partial index, which solves the cache consistency issue that #1181 faces. Nice idea here!
I have a few questions:
How do you keep the automatic resolves up-to-date? I'm guessing that a CI job runs gazelle with --use_conventions and asserts no changes are made to the index?
It seems like this would work well, as any change to a specific directory would assert that the automatic resolves for that directory are updated in the root-level "automatic resolves" section. Have there been any issues with merge conflicts?
Can you provide some example descriptions of conventions you've implemented and found use for? The Convention interface is somewhat abstract from what an actual import->label convention is, so having an example or two would help.
Specifically, I'm curious about what conventions you use for python. The problem of mapping import statements to target labels without full indexing seems like it would require enforcing a strict style for import statements or considering all python targets as not adhering to any conventions. The latter essentially would make the conventions index have every python target in it.
See implementation: #1900
Why Gazelle's index doesn't scale
Currently,
gazelle fix-update
runs with a configurable option of an--index
. Certain extensions can use the index to properly resolve imports in a repo. TheruleIndex
is essentially a large structure containing all of the rules that have an associated Resolver.If the index is enabled, the rule index must load and store information from the entire repository. If disabled,
gazelle
can rely (in most cases) on the # gazelle:prefix directive or -go_prefix flag to resolve dependencies.There are two reasons this does not scale:
index=true
every gazelle run must rely onO(size of repo)
rather thanO(size of changes)
to run because it must have a fully-populated index.index=false
any exceptions to thego_prefix
or# gazelle:prefix
will require manual# gazelle:resolve
statements added, which almost removes the purpose of automatic build file generation. Also, not all plugins or gazelle language extensions might work without an index.Running
gazelle
in a repository should almost always be doable by running on just the files changed since a recent run. If a user is only modifying a small project, their gazelle invocation should only use and care about that part of the repository. Otherwise, the incrementally ofgazelle
is not there.Gazelle Convention Checker
At Uber, we've solved this problem using a gazelle
ConventionChecker
interface, which is patched onto ourgazelle
binary. This convention checker works in the following manner.A language extension can implement a
Convention
interface.If an import path follows the conventions expected by the
Resolver
of the interface, then the convention checker will returntrue
. Otherwise, theConventionChecker
will add to list of rules which violate the convention.Then, on
Finish(
, theConventionChecker
will write any rules that violate the convention as# gazelle:resolve
directives in the top-level BUILD.bazel file. The "automatic resolves" are added to the top-level file to speed up lookup, and are added in a special location to avoid confusion with other parts of this file. They are added under a header"### AUTOMATIC RESOLVES ###"
at the bottom of the file.Optionally, these could be added somewhere else from some other file passed at runtime, but since they are technically part of the build configuration, using the BUILD.bazel file makes the most sense.
Why do this?
This has two clear benefits:
# gazelle:resolve
statements for convention violations. There can be 10000s of convention violations, but they will still be loaded much faster than walking 10000s of directories to load an index. Once they are loaded once, they exist in memory in a O(1) lookup map, so they can be easily retrieved.The convention checker can also be enabled/disabled with a flag, so this optimization is optional for all users, and can be avoided if desired.
The text was updated successfully, but these errors were encountered: