Ranking: standardize ctags kind names before scoring #674

jtibshirani · 2023-10-25T21:16:17Z

SCIP ctags can output different kind names than universal-ctags (for example
typeAlias instead of talias). This change makes sure we handle different
names for the same kind. To do so, it refactors the logic so we first match
strings to standard kinds, then decide how these are scored for each language.
That way, you don't need to remember to cover all the possible kind names each
time you adjust scoring for a new language.

Also added basic tests for Ruby and Python to ensure we don't accidentally
change the scoring.

Relates to https://github.com/sourcegraph/sourcegraph/issues/57659

jtibshirani · 2023-10-25T21:28:57Z

ctags/symbol_kind.go

+type SymbolKind uint8
+
+const (
+	Accessor SymbolKind = iota


Working on this PR got me thinking ... it'd be nice to just use SCIP as the exchange format instead of ctags output, and have a tool to map universal-ctags onto SCIP. That would get us closer to an actual spec, unlike the ctags output which has inconsistent naming and an unknown universe of values.

100% agree with this. I think this was discussed when scip-ctags was added and that was seen as an part of the end goal.

jtibshirani · 2023-10-25T21:33:22Z

build/e2e_test.go

@@ -815,6 +815,16 @@ func TestScoring(t *testing.T) {
 		t.Fatal(err)
 	}

+	examplePython, err := os.ReadFile("./testdata/example.py")


In a follow-up, I'll try to pull in SCIP ctags so we can run the same tests using that binary.

keegancsmith

nice

keegancsmith · 2023-10-27T13:44:32Z

ctags/symbol_kind.go

+type SymbolKind uint8
+
+const (
+	Accessor SymbolKind = iota


100% agree with this. I think this was discussed when scip-ctags was added and that was seen as an part of the end goal.

SCIP ctags can output different kind names than universal-ctags (for example `typeAlias` instead of `talias`). This change makes sure we handle different names for the same kind. To do so, it refactors the logic so we first match strings to standard kinds, then decide how these are scored for each language. That way, you don't need to remember to cover all the possible kind names each time you adjust scoring for a new language. Also added basic tests for Ruby and Python to ensure we don't accidentally change the scoring.

To implement `select:symbol.enum` filters, we look at each symbol's ctags 'kind' and check if it matches the filter value `enum`. We accidentally didn't include 'enum' in this match logic, so all these symbols were filtered away. This PR fixes that, and adds a few improvements: * Use a shared map between `symbol.LSPKind` and `symbol.SelectKind`, to avoid drift between these two conversions. * Audit the ctags mapping from [sourcegraph/zoekt#674](sourcegraph/zoekt#674) and add other missing kinds (besides enum) Closes SPLF-178

Ranking: standardize ctags kind names before scoring

3d052cb

jtibshirani commented Oct 25, 2023

View reviewed changes

jtibshirani marked this pull request as ready for review October 25, 2023 21:38

jtibshirani requested review from keegancsmith and stefanhengl October 26, 2023 15:00

jtibshirani added 2 commits October 26, 2023 08:08

Capture 'enumerator' as EnumMember, which is common in universal-ctags

7052188

Merge remote-tracking branch 'upstream/main' into jtibs/score-kind

afd8dca

jtibshirani mentioned this pull request Oct 26, 2023

scip-ctags: returns different kinds to universal-ctags sourcegraph/sourcegraph-public-snapshot#57659

Closed

keegancsmith approved these changes Oct 27, 2023

View reviewed changes

jtibshirani merged commit c23ed05 into main Oct 27, 2023
8 checks passed

jtibshirani deleted the jtibs/score-kind branch October 27, 2023 15:37

jtibshirani mentioned this pull request Oct 30, 2023

Bump Zoekt for ctags ranking fix sourcegraph/sourcegraph-public-snapshot#57991

Merged

jtibshirani mentioned this pull request Jul 31, 2024

fix(search): correctly handle select:symbol.enum sourcegraph/sourcegraph-public-snapshot#64170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ranking: standardize ctags kind names before scoring #674

Ranking: standardize ctags kind names before scoring #674

jtibshirani commented Oct 25, 2023 •

edited

Loading

jtibshirani Oct 25, 2023 •

edited

Loading

keegancsmith Oct 27, 2023

jtibshirani Oct 25, 2023

keegancsmith left a comment

keegancsmith Oct 27, 2023

Ranking: standardize ctags kind names before scoring #674

Ranking: standardize ctags kind names before scoring #674

Conversation

jtibshirani commented Oct 25, 2023 • edited Loading

jtibshirani Oct 25, 2023 • edited Loading

Choose a reason for hiding this comment

keegancsmith Oct 27, 2023

Choose a reason for hiding this comment

jtibshirani Oct 25, 2023

Choose a reason for hiding this comment

keegancsmith left a comment

Choose a reason for hiding this comment

keegancsmith Oct 27, 2023

Choose a reason for hiding this comment

jtibshirani commented Oct 25, 2023 •

edited

Loading

jtibshirani Oct 25, 2023 •

edited

Loading