This repository has been archived by the owner on Sep 30, 2024. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Symbol search: support content-based lang detection #60626
Symbol search: support content-based lang detection #60626
Changes from 3 commits
211016b
3bed303
55e6ba7
fa699a0
b80f086
9db0d1f
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to be aware of: the languages stored here are produced by ctags, whereas the lang filters are expected to be normalized languages from go-enry. I looked through
universal-ctags --list-languages
and they seem to match up well, so I don't expect this to be a problem in practice.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong suggestion: Please let's use different types here
EnryLanguage
andCtagsLanguage
. These kinds of problems come up in edge cases. For example,CUDA
is all upper-case in ctags butCuda
in enry.COBOL
is uppercase in enry butCobol
in ctags.MatLab
in ctags isMATLAB
in enry.Protobuf
in ctags isProtocol Buffers
in enryPerl6
andRaku
(which are the same language IIUC) whereas enry only hasPerl 6
ObjectiveC
in ctags isObjective-C
in enryAsm
in ctags isAssembly
in enry (there's alsoUnix Assembly
)Zsh
andSh
whereas enry hasShell
.Some of these are fixable with the case conversion below. But some are not.
I'm not expecting this PR to have a perfect conversion function. But it seems conceptually wrong to conflate these two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, there are indeed more exceptions than I realized. To check I understand your suggestion -- we should have an explicit conversion from go-enry language to ctags language, and update naming to make this distinction clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and specifically for naming, I'm emphasizing that we should have different types (not just variable names) for these two, and a conversion function from one type to another.
We have something similar in the syntax highlighter, albeit more complicated, to translate Sublime grammar names to Tree-sitter language names.
https://sourcegraph.com/github.com/sourcegraph/sourcegraph@8379154d11d98bae47f5a7bb17336485e8b93ecb/-/blob/docker-images/syntax-highlighter/crates/syntax-analysis/src/highlighting.rs?L163-201
Right now, the
langInclude
andlangExclude
in various places are just strings instead of being domain-specific types, and that's understandable, but changing that to be a more specific type would be valuable. (However, I'm not suggesting you do that in this PR specifically.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, that matches my understanding -- I will punt on any new types for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this patch was additive, so I don't quite understand why this part of the test needs an update, since the
"Text"
string doesn't appear elsewhere in this patch. Is this test going to continue to pass even if[]string{"Text"}
is omitted/did you add this to make sure we're exercising the 'has language filter' code path?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I did this too quickly! I just pushed a correction. Here I added a lang filter on "text" to ensure the new codepath in
internal/rockskip/search.go
is exercised. If we remove that logic, this test will fail.Overall, the rockskip tests are really lacking. If we make further changes, we will invest in new unit testing suites.