Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ranking: standardize ctags kind names before scoring #674

Merged
merged 3 commits into from
Oct 27, 2023

Conversation

jtibshirani
Copy link
Member

@jtibshirani jtibshirani commented Oct 25, 2023

SCIP ctags can output different kind names than universal-ctags (for example
typeAlias instead of talias). This change makes sure we handle different
names for the same kind. To do so, it refactors the logic so we first match
strings to standard kinds, then decide how these are scored for each language.
That way, you don't need to remember to cover all the possible kind names each
time you adjust scoring for a new language.

Also added basic tests for Ruby and Python to ensure we don't accidentally
change the scoring.

Relates to https://github.com/sourcegraph/sourcegraph/issues/57659

type SymbolKind uint8

const (
Accessor SymbolKind = iota
Copy link
Member Author

@jtibshirani jtibshirani Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on this PR got me thinking ... it'd be nice to just use SCIP as the exchange format instead of ctags output, and have a tool to map universal-ctags onto SCIP. That would get us closer to an actual spec, unlike the ctags output which has inconsistent naming and an unknown universe of values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree with this. I think this was discussed when scip-ctags was added and that was seen as an part of the end goal.

@@ -815,6 +815,16 @@ func TestScoring(t *testing.T) {
t.Fatal(err)
}

examplePython, err := os.ReadFile("./testdata/example.py")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a follow-up, I'll try to pull in SCIP ctags so we can run the same tests using that binary.

@jtibshirani jtibshirani marked this pull request as ready for review October 25, 2023 21:38
Copy link
Member

@keegancsmith keegancsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

type SymbolKind uint8

const (
Accessor SymbolKind = iota
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% agree with this. I think this was discussed when scip-ctags was added and that was seen as an part of the end goal.

@jtibshirani jtibshirani merged commit c23ed05 into main Oct 27, 2023
8 checks passed
@jtibshirani jtibshirani deleted the jtibs/score-kind branch October 27, 2023 15:37
keegancsmith pushed a commit that referenced this pull request Nov 1, 2023
SCIP ctags can output different kind names than universal-ctags (for example
`typeAlias` instead of `talias`). This change makes sure we handle different
names for the same kind. To do so, it refactors the logic so we first match
strings to standard kinds, then decide how these are scored for each language.
That way, you don't need to remember to cover all the possible kind names each
time you adjust scoring for a new language.

Also added basic tests for Ruby and Python to ensure we don't accidentally
change the scoring.
jtibshirani added a commit to sourcegraph/sourcegraph-public-snapshot that referenced this pull request Jul 31, 2024
To implement `select:symbol.enum` filters, we look at each symbol's
ctags 'kind' and check if it matches the filter value `enum`. We
accidentally didn't include 'enum' in this match logic, so all these
symbols were filtered away.

This PR fixes that, and adds a few improvements:
* Use a shared map between `symbol.LSPKind` and `symbol.SelectKind`, to
avoid drift between these two conversions.
* Audit the ctags mapping from
[sourcegraph/zoekt#674](sourcegraph/zoekt#674)
and add other missing kinds (besides enum)

Closes SPLF-178
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants