Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Starlark syntax with type annotations #24407

Open
Tracked by #22935
comius opened this issue Nov 20, 2024 · 6 comments
Open
Tracked by #22935

Extend Starlark syntax with type annotations #24407

comius opened this issue Nov 20, 2024 · 6 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel type: feature request

Comments

@comius
Copy link
Contributor

comius commented Nov 20, 2024

Implement a flag guarding the use of types in Starlark and extend Starlark syntax with type annotation.

@comius comius added type: feature request team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel labels Nov 20, 2024
@comius comius self-assigned this Nov 20, 2024
@comius comius added the P2 We'll consider working on this in future. (Assignee optional) label Nov 20, 2024
@comius comius added this to the Bootstrapping Starlark types milestone Nov 20, 2024
@comius comius mentioned this issue Nov 20, 2024
6 tasks
@comius
Copy link
Contributor Author

comius commented Nov 22, 2024

The flag shall be called initially --experimental_starlark_types and may be later renamed into "incompatible".

We need to support gradual rollout, not enabling the types everywhere at once. Gradual rollout is usually done using allowlists, however we don't have any mechanisms to read an allowlist during load. One option is to implement a load time allowlist. That would potentially look similar to the prelude. Anybody modifying it would necessarily trigger reloading of everything.

Another option is to make the flag accept a list of paths, where the types are allowed, for example --experimental_starlark_type=/tools/bazel. The implementation of this is relatively simple. The downside is, to enable the annotations in general, one needs to use something like --experimental_starlark_type=/.

We also need to think about supporting older Bazel versions. New Bazel versions will first support types with the flag, and then when they become stable, without any flags. Similar thing needs to happen in older Bazels - first a flag, then no flag. It's unlikely we'll be able to cherry-pick the complete implementation to older Bazel (if we do, that's a plus). In case we can't and we just ignore the type on older Bazel, there needs to be a warning such code needs to be tested with a Bazel that has a complete implementation.

@comius
Copy link
Contributor Author

comius commented Nov 22, 2024

Initially we're looking to add type annotations only to definitions of functions, for example: def f(a: int) -> bool. Adding types to **kwargs, *args can be done later.

One option to extend Starlark syntax is to follow Python (https://docs.python.org/3/reference/grammar.html), The relevant excerpt is:

function_def_raw:
    | 'def' NAME [type_params] '(' [params] ')' ['->' expression ] ':' block 
param: NAME annotation? 
annotation: ':' expression 

Python allows the annotation to be an arbitrary expression (often called test in the syntax lingo). The benefit is that the parser is very simple to implement.

Converting this to Starlark grammar:

-DefStmt = 'def' identifier '(' [Parameters [',']] ')' ':' Suite .
+DefStmt = 'def' identifier '(' [Parameters [',']] ')' ['->' Test ] ':' Suite .
-Parameter  =  identifier 
-           | identifier '=' Test
+Parameter  = identifier [':' Test]
+           | identifier [':' Test] '=' Test
            | '*'
            | '*' identifier
            | '**' identifier

Most if not all other languages (I'm familiar with) define a strict grammar for their types.

@comius
Copy link
Contributor Author

comius commented Nov 22, 2024

Most if not all other languages (that I'm familiar with) define a strict grammar for their types.

An alternative option is to introduce stricter syntax to Starlark:

-DefStmt = 'def' identifier '(' [Parameters [',']] ')' ':' Suite .
+DefStmt = 'def' identifier '(' [Parameters [',']] ')' ['->' Type ] ':' Suite .
-Parameter  =  identifier 
-           | identifier '=' Test
+Parameter  = identifier [':' Type]
+           | identifier [':' Type] '=' Test
            | '*'
            | '*' identifier
            | '**' identifier
+Type = identifier
+     | identifier '|' Type
+     | identifier '[' [TypeArguments [',']] ']' .
+TypeArguments = Type {',' Type} .

The benefit of stricter syntax is that evaluation of the types becomes simpler. Something that we need, because types need to be evaluated early to do type checking and they potentially need to be evaluated within Java annotations, to specify types of native Java methods exposed to Starlark.

Ellipsis ... in types is a possible future extension.

@comius
Copy link
Contributor Author

comius commented Nov 22, 2024

cc @Wyverald @brandjon

@comius
Copy link
Contributor Author

comius commented Nov 22, 2024

Comparison to Buck2 (I'm not very literate in Rust code):

From the reference in ast:
https://github.com/facebook/buck2/blob/main/starlark-rust/starlark_syntax/src/syntax/ast.rs#L213-L219
it seems that Buck2 would like to restrict the types, but they are at the moment parsed as "tests" (https://github.com/facebook/buck2/blob/main/starlark-rust/starlark_syntax/src/syntax/grammar.lalrpop#L114).

When evaluating the type expression, most of the expressions error out:
https://github.com/facebook/buck2/blob/main/starlark-rust/starlark_syntax/src/syntax/type_expr.rs#L174-L267

Not allowed: calls (because they use regular braces (,)), slices, lambdas, literals, operators (except bit or), ifs, empty list literals, list and dict comprehensions
Allowed: identifiers, index (because index uses square braces [,]), bit or, tuples, dots, non-empty list literals

Identifiers, index, "bit or", and dots make complete sense. My best guess is that tuples and non-empty lists are used as some kind of syntax sugar for tuple and list types.

@comius
Copy link
Contributor Author

comius commented Nov 22, 2024

cc @jin @fmeum

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Starlark-Interpreter Issues involving the Starlark interpreter used by Bazel type: feature request
Projects
Status: Todo
Development

No branches or pull requests

1 participant