Skip to content

Commit

Permalink
New SCIP fields: kind, enclosing_symbol, signature_documentation, dis…
Browse files Browse the repository at this point in the history
…play_name (#677 #707)

* Update Document from upstream scip.proto

This copies the latest additions to the Document message:
 * a new language field and Language enum
 * a new text field, to embed the document content itself.
This is meant for the new SymbolInformation::signature_documentation
field.

This also updates some documentation comments.

* Update SymbolInformation from upstream scip.proto

This copies the latest additions to the SymbolInformation message:
 * the documentation field is explicitly not meant for signature
   documentation anymore, instead a new signature_documentation field
   is added
 * a new display_name field is added
 * a new enclosing_symbol field is added for local symbols
 * a new kind field is added along with a Kind enum to have a
   finer-grained classification than the one provided by descriptor
   suffixes (and is especially useful for local symbols which don't
   have suffixes)

* Forward display_name from SemanticDB to SCIP

The SemanticDB schema already provides a display_name field, forward it
to the SCIP output in scip-semanticdb.
This also adds support to the ScipPrinters testing utility and updates
the tests accordingly.

* Move signature documentation to its new dedicated field

SemanticDB provides a structured version of the signature in the
signature field. Instead of turning it into a markdown-encoded string
for the documentation field, this builds a Document for the
signature_documentation field.

This also updates the ScipPrinters testing utility and the tests
accordingly.

* Add back SemanticDB SymbolInformation::owner field

SemanticDB used to have a SymbolInformation::owner field with id 15.
This re-introduces the field with the same semantics under the name
enclosing_symbol.

To be able to re-use the field 15, this moves the out-of-spec
definition_relationships field to id 21.

* Forward enclosing_symbol from SemanticDB to SCIP

This also adds support to the ScipPrinters testing utility.

* Populate SymbolInformation::enclosing_symbol in semanticdb-javac

This only populates the enclosing_symbol for local symbols, and updates
the tests accordingly.

* Build SCIP kind from SematicDB kind and properties

This also updates the ScipPrinters testing utility and the tests
accordingly.

* semanticdb-javac: set kind to Variable for local variables

---------

Co-authored-by: Nicolas Guichard <[email protected]>
  • Loading branch information
keynmol and nicolas-guichard authored Jun 19, 2024
1 parent 20b5142 commit 1f17e1d
Show file tree
Hide file tree
Showing 143 changed files with 13,603 additions and 3,924 deletions.
357 changes: 350 additions & 7 deletions scip-java-proto/src/main/protobuf/scip.proto
Original file line number Diff line number Diff line change
Expand Up @@ -69,14 +69,38 @@ message ToolInfo {

// Document defines the metadata about a source file on disk.
message Document {
// (Required) Path to the text document relative to the directory supplied in
// the associated `Metadata.project_root`. Not URI-encoded. This value should
// not begin with a directory separator.
// The string ID for the programming language this file is written in.
// The `Language` enum contains the names of most common programming languages.
// This field is typed as a string to permit any programming language, including
// ones that are not specified by the `Language` enum.
string language = 4;
// (Required) Unique path to the text document.
//
// 1. The path must be relative to the directory supplied in the associated
// `Metadata.project_root`.
// 2. The path must not begin with a leading '/'.
// 3. The path must point to a regular file, not a symbolic link.
// 4. The path must use '/' as the separator, including on Windows.
// 5. The path must be canonical; it cannot include empty components ('//'),
// or '.' or '..'.
string relative_path = 1;
// Occurrences that appear in this file.
repeated Occurrence occurrences = 2;
// Symbols that are defined within this document.
// Symbols that are "defined" within this document.
//
// This should include symbols which technically do not have any definition,
// but have a reference and are defined by some other symbol (see
// Relationship.is_definition).
repeated SymbolInformation symbols = 3;

// (optional) Text contents of the this document. Indexers are not expected to
// include the text by default. It's preferrable that clients read the text
// contents from the file system by resolving the absolute path from joining
// `Index.metadata.project_root` and `Document.relative_path`. This field was
// introduced to support `SymbolInformation.signature_documentation`, but it
// can be used for other purposes as well, for example testing or when working
// with virtual/in-memory documents.
string text = 5;
}

// Symbol is similar to a URI, it identifies a class, method, or a local
Expand Down Expand Up @@ -145,12 +169,205 @@ message SymbolInformation {
// The string must be formatted according to the grammar in `Symbol`.
string symbol = 1;
// (optional, but strongly recommended) The markdown-formatted documentation
// for this symbol. This field is repeated to allow different kinds of
// documentation. For example, it's nice to include both the signature of a
// method (parameters and return type) along with the accompanying docstring.
// for this symbol. Use `SymbolInformation.signature_documentation` to
// document the method/class/type signature of this symbol.
// Due to historical reasons, indexers may include signature documentation in
// this field by rendering markdown code blocks. New indexers should only
// include non-code documentation in this field, for example docstrings.
repeated string documentation = 3;
// (optional) Relationships to other symbols (e.g., implements, type definition).
repeated Relationship relationships = 4;
// The kind of this symbol. Use this field instead of
// `SymbolDescriptor.Suffix` to determine whether something is, for example, a
// class or a method.
Kind kind = 5;
// (optional) Kind represents the fine-grained category of a symbol, suitable for presenting
// information about the symbol's meaning in the language.
//
// For example:
// - A Java method would have the kind `Method` while a Go function would
// have the kind `Function`, even if the symbols for these use the same
// syntax for the descriptor `SymbolDescriptor.Suffix.Method`.
// - A Go struct has the symbol kind `Struct` while a Java class has
// the symbol kind `Class` even if they both have the same descriptor:
// `SymbolDescriptor.Suffix.Type`.
//
// Since Kind is more fine-grained than Suffix:
// - If two symbols have the same Kind, they should share the same Suffix.
// - If two symbols have different Suffixes, they should have different Kinds.
enum Kind {
UnspecifiedKind = 0;
// A method which may or may not have a body. For Java, Kotlin etc.
AbstractMethod = 66;
// For Ruby's attr_accessor
Accessor = 72;
Array = 1;
// For Alloy
Assertion = 2;
AssociatedType = 3;
// For C++
Attribute = 4;
// For Lean
Axiom = 5;
Boolean = 6;
Class = 7;
Constant = 8;
Constructor = 9;
// For Solidity
Contract = 62;
// For Haskell
DataFamily = 10;
// For C# and F#
Delegate = 73;
Enum = 11;
EnumMember = 12;
Error = 63;
Event = 13;
// For Alloy
Fact = 14;
Field = 15;
File = 16;
Function = 17;
// For 'get' in Swift, 'attr_reader' in Ruby
Getter = 18;
// For Raku
Grammar = 19;
// For Purescript and Lean
Instance = 20;
Interface = 21;
Key = 22;
// For Racket
Lang = 23;
// For Lean
Lemma = 24;
// For solidity
Library = 64;
Macro = 25;
Method = 26;
// For Ruby
MethodAlias = 74;
// Analogous to 'ThisParameter' and 'SelfParameter', but for languages
// like Go where the receiver doesn't have a conventional name.
MethodReceiver = 27;
// Analogous to 'AbstractMethod', for Go.
MethodSpecification = 67;
// For Protobuf
Message = 28;
// For Solidity
Modifier = 65;
Module = 29;
Namespace = 30;
Null = 31;
Number = 32;
Object = 33;
Operator = 34;
Package = 35;
PackageObject = 36;
Parameter = 37;
ParameterLabel = 38;
// For Haskell's PatternSynonyms
Pattern = 39;
// For Alloy
Predicate = 40;
Property = 41;
// Analogous to 'Trait' and 'TypeClass', for Swift and Objective-C
Protocol = 42;
// Analogous to 'AbstractMethod', for Swift and Objective-C.
ProtocolMethod = 68;
// Analogous to 'AbstractMethod', for C++.
PureVirtualMethod = 69;
// For Haskell
Quasiquoter = 43;
// 'self' in Python, Rust, Swift etc.
SelfParameter = 44;
// For 'set' in Swift, 'attr_writer' in Ruby
Setter = 45;
// For Alloy, analogous to 'Struct'.
Signature = 46;
// For Ruby
SingletonClass = 75;
// Analogous to 'StaticMethod', for Ruby.
SingletonMethod = 76;
// Analogous to 'StaticField', for C++
StaticDataMember = 77;
// For C#
StaticEvent = 78;
// For C#
StaticField = 79;
// For Java, C#, C++ etc.
StaticMethod = 80;
// For C#, TypeScript etc.
StaticProperty = 81;
// For C, C++
StaticVariable = 82;
String = 48;
Struct = 49;
// For Swift
Subscript = 47;
// For Lean
Tactic = 50;
// For Lean
Theorem = 51;
// Method receiver for languages
// 'this' in JavaScript, C++, Java etc.
ThisParameter = 52;
// Analogous to 'Protocol' and 'TypeClass', for Rust, Scala etc.
Trait = 53;
// Analogous to 'AbstractMethod', for Rust, Scala etc.
TraitMethod = 70;
// Data type definition for languages like OCaml which use `type`
// rather than separate keywords like `struct` and `enum`.
Type = 54;
TypeAlias = 55;
// Analogous to 'Trait' and 'Protocol', for Haskell, Purescript etc.
TypeClass = 56;
// Analogous to 'AbstractMethod', for Haskell, Purescript etc.
TypeClassMethod = 71;
// For Haskell
TypeFamily = 57;
TypeParameter = 58;
// For C, C++, Capn Proto
Union = 59;
Value = 60;
Variable = 61;
// Next = 83;
// Feel free to open a PR proposing new language-specific kinds.
}
// (optional) The name of this symbol as it should be displayed to the user.
// For example, the symbol "com/example/MyClass#myMethod(+1)." should have the
// display name "myMethod". The `symbol` field is not a reliable source of
// the display name for several reasons:
//
// - Local symbols don't encode the name.
// - Some languages have case-insensitive names, so the symbol is all-lowercase.
// - The symbol may encode names with special characters that should not be
// displayed to the user.
string display_name = 6;
// (optional) The signature of this symbol as it's displayed in API
// documentation or in hover tooltips. For example, a Java method that adds
// two numbers this would have `Document.language = "java"` and `Document.text
// = "void add(int a, int b)". The `language` and `text` fields are required
// while other fields such as `Documentation.occurrences` can be optionally
// included to support hyperlinking referenced symbols in the signature.
Document signature_documentation = 7;
// (optional) The enclosing symbol if this is a local symbol. For non-local
// symbols, the enclosing symbol should be parsed from the `symbol` field
// using the `Descriptor` grammar.
//
// The primary use-case for this field is to allow local symbol to be displayed
// in a symbol hierarchy for API documentation. It's OK to leave this field
// empty for local variables since local variables usually don't belong in API
// documentation. However, in the situation that you wish to include a local
// symbol in the hierarchy, then you can use `enclosing_symbol` to locate the
// "parent" or "owner" of this local symbol. For example, a Java indexer may
// choose to use local symbols for private class fields while providing an
// `enclosing_symbol` to reference the enclosing class to allow the field to
// be part of the class documentation hierarchy. From the perspective of an
// author of an indexer, the decision to use a local symbol or global symbol
// should exclusively be determined whether the local symbol is accessible
// outside the document, not by the capability to find the enclosing
// symbol.
string enclosing_symbol = 8;
}

message Relationship {
Expand Down Expand Up @@ -382,3 +599,129 @@ enum DiagnosticTag {
Unnecessary = 1;
Deprecated = 2;
}

// Language standardises names of common programming languages that can be used
// for the `Document.language` field. The primary purpose of this enum is to
// prevent a situation where we have a single programming language ends up with
// multiple string representations. For example, the C++ language uses the name
// "CPP" in this enum and other names such as "cpp" are incompatible.
// Feel free to send a pull-request to add missing programming languages.
enum Language {
UnspecifiedLanguage = 0;
ABAP = 60;
Apex = 96;
APL = 49;
Ada = 39;
Agda = 45;
AsciiDoc = 86;
Assembly = 58;
Awk = 66;
Bat = 68;
BibTeX = 81;
C = 34;
COBOL = 59;
CPP = 35; // C++ (the name "CPP" was chosen for consistency with LSP)
CSS = 26;
CSharp = 1;
Clojure = 8;
Coffeescript = 21;
CommonLisp = 9;
Coq = 47;
CUDA = 97;
Dart = 3;
Delphi = 57;
Diff = 88;
Dockerfile = 80;
Dyalog = 50;
Elixir = 17;
Erlang = 18;
FSharp = 42;
Fish = 65;
Flow = 24;
Fortran = 56;
Git_Commit = 91;
Git_Config = 89;
Git_Rebase = 92;
Go = 33;
GraphQL = 98;
Groovy = 7;
HTML = 30;
Hack = 20;
Handlebars = 90;
Haskell = 44;
Idris = 46;
Ini = 72;
J = 51;
JSON = 75;
Java = 6;
JavaScript = 22;
JavaScriptReact = 93;
Jsonnet = 76;
Julia = 55;
Justfile = 109;
Kotlin = 4;
LaTeX = 83;
Lean = 48;
Less = 27;
Lua = 12;
Luau = 108;
Makefile = 79;
Markdown = 84;
Matlab = 52;
Nickel = 110; // https://nickel-lang.org/
Nix = 77;
OCaml = 41;
Objective_C = 36;
Objective_CPP = 37;
Pascal = 99;
PHP = 19;
PLSQL = 70;
Perl = 13;
PowerShell = 67;
Prolog = 71;
Protobuf = 100;
Python = 15;
R = 54;
Racket = 11;
Raku = 14;
Razor = 62;
Repro = 102; // Internal language for testing SCIP
ReST = 85;
Ruby = 16;
Rust = 40;
SAS = 61;
SCSS = 29;
SML = 43;
SQL = 69;
Sass = 28;
Scala = 5;
Scheme = 10;
ShellScript = 64; // Bash
Skylark = 78;
Slang = 107;
Solidity = 95;
Svelte = 106;
Swift = 2;
Tcl = 101;
TOML = 73;
TeX = 82;
Thrift = 103;
TypeScript = 23;
TypeScriptReact = 94;
Verilog = 104;
VHDL = 105;
VisualBasic = 63;
Vue = 25;
Wolfram = 53;
XML = 31;
XSL = 32;
YAML = 74;
Zig = 38;
// NextLanguage = 111;
// Steps add a new language:
// 1. Copy-paste the "NextLanguage = N" line above
// 2. Increment "NextLanguage = N" to "NextLanguage = N+1"
// 3. Replace "NextLanguage = N" with the name of the new language.
// 4. Move the new language to the correct line above using alphabetical order
// 5. (optional) Add a brief comment behind the language if the name is not self-explanatory
}
Loading

0 comments on commit 1f17e1d

Please sign in to comment.