-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Symbol search: support content-based lang detection #60626
Changes from all commits
211016b
3bed303
55e6ba7
fa699a0
b80f086
9db0d1f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,7 @@ import ( | |
|
||
"github.com/google/go-cmp/cmp" | ||
"github.com/sourcegraph/go-ctags" | ||
|
||
"github.com/sourcegraph/sourcegraph/cmd/symbols/fetcher" | ||
"github.com/sourcegraph/sourcegraph/internal/api" | ||
"github.com/sourcegraph/sourcegraph/internal/database/dbtest" | ||
|
@@ -118,7 +119,11 @@ func TestIndex(t *testing.T) { | |
verifyBlobs := func() { | ||
repo := "somerepo" | ||
commit := getHead() | ||
args := search.SymbolsParameters{Repo: api.RepoName(repo), CommitID: api.CommitID(commit), Query: ""} | ||
args := search.SymbolsParameters{ | ||
Repo: api.RepoName(repo), | ||
CommitID: api.CommitID(commit), | ||
Query: "", | ||
IncludeLangs: []string{"Text"}} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought this patch was additive, so I don't quite understand why this part of the test needs an update, since the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oops, I did this too quickly! I just pushed a correction. Here I added a lang filter on "text" to ensure the new codepath in Overall, the rockskip tests are really lacking. If we make further changes, we will invest in new unit testing suites. |
||
symbols, err := service.Search(context.Background(), args) | ||
fatalIfError(err, "Search") | ||
|
||
|
@@ -133,7 +138,10 @@ func TestIndex(t *testing.T) { | |
} | ||
wantPaths := []string{} | ||
for wantPath := range state { | ||
wantPaths = append(wantPaths, wantPath) | ||
// We only want .txt files since we're filtering by lang: text | ||
if strings.Contains(wantPath, ".txt") { | ||
wantPaths = append(wantPaths, wantPath) | ||
} | ||
} | ||
sort.Strings(gotPaths) | ||
sort.Strings(wantPaths) | ||
|
@@ -179,6 +187,9 @@ func TestIndex(t *testing.T) { | |
add("c.txt", "sym1\nsym2") | ||
commit("add another file with 2 symbols") | ||
|
||
add("a.java", "sym1\nsym2") | ||
commit("System.out.println(\"hello, world!\"") | ||
|
||
add("a.txt", "sym1\nsym2") | ||
commit("add a symbol to a.txt") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to be aware of: the languages stored here are produced by ctags, whereas the lang filters are expected to be normalized languages from go-enry. I looked through
universal-ctags --list-languages
and they seem to match up well, so I don't expect this to be a problem in practice.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong suggestion: Please let's use different types here
EnryLanguage
andCtagsLanguage
. These kinds of problems come up in edge cases. For example,CUDA
is all upper-case in ctags butCuda
in enry.COBOL
is uppercase in enry butCobol
in ctags.MatLab
in ctags isMATLAB
in enry.Protobuf
in ctags isProtocol Buffers
in enryPerl6
andRaku
(which are the same language IIUC) whereas enry only hasPerl 6
ObjectiveC
in ctags isObjective-C
in enryAsm
in ctags isAssembly
in enry (there's alsoUnix Assembly
)Zsh
andSh
whereas enry hasShell
.Some of these are fixable with the case conversion below. But some are not.
I'm not expecting this PR to have a perfect conversion function. But it seems conceptually wrong to conflate these two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, there are indeed more exceptions than I realized. To check I understand your suggestion -- we should have an explicit conversion from go-enry language to ctags language, and update naming to make this distinction clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and specifically for naming, I'm emphasizing that we should have different types (not just variable names) for these two, and a conversion function from one type to another.
We have something similar in the syntax highlighter, albeit more complicated, to translate Sublime grammar names to Tree-sitter language names.
https://sourcegraph.com/github.com/sourcegraph/sourcegraph@8379154d11d98bae47f5a7bb17336485e8b93ecb/-/blob/docker-images/syntax-highlighter/crates/syntax-analysis/src/highlighting.rs?L163-201
Right now, the
langInclude
andlangExclude
in various places are just strings instead of being domain-specific types, and that's understandable, but changing that to be a more specific type would be valuable. (However, I'm not suggesting you do that in this PR specifically.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, that matches my understanding -- I will punt on any new types for this PR.