Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine readable syntax specification #3707

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Machine readable syntax specification #3707

wants to merge 2 commits into from

Conversation

jsproz
Copy link

@jsproz jsproz commented Dec 2, 2024

Closes #3686

Description

Added a machine readable syntax specification for the Cadence language and a Syntax Notation document.


  • Targeted PR against master branch
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work
  • Updated relevant documentation
  • Re-reviewed Files changed in the Github PR explorer
  • Added appropriate labels

@jsproz jsproz added the Documentation Improvements or additions to documentation label Dec 2, 2024
@jsproz jsproz changed the title Sprowes/syntax Machine readable syntax specification Dec 2, 2024
@jsproz jsproz self-assigned this Dec 2, 2024
Copy link

github-actions bot commented Dec 2, 2024

Cadence Benchstat comparison

This branch with compared with the base branch onflow:master commit 24783e3
The command for i in {1..N}; do go test ./... -run=XXX -bench=. -benchmem -shuffle=on; done was used.
Bench tests were run a total of 7 times on each branch.

Collapsed results for better readability

@jsproz jsproz enabled auto-merge December 2, 2024 21:40
Copy link
Member

@turbolent turbolent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!


### Set Rule

Set rules consist of a sequence of intervals separated by a comma (`,`). These intervals can either represent a single Unicode code point or a range of code points, separated by a hyphen (`-`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't alternations with | also sets? Could we maybe just consistently use | to avoid unnecessarily adding a new operator uncommon in other notions

;

<identifierToken>:
"ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz" *identifierFollower
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to express these as ranges? e.g. [A-Z_a-z] like in flex?


*
program:
*declaration( declarationSeparator ) $
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make it more explicit that the notation here means "separated by"? The parentheses look like grouping (like in other notations), and at first glance it looks like they are unnecessary and can be removed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This language does not have a grouping operator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, could you please leave a comment above, with an equivalent rule in somehting like EBNF notation? Same for the comments above (adding a comment for ranges)

access contractKeyword interfaceKeyword nonReservedIdentifier *1interfaceInheritance "{" membersAndNestedDeclarations "}"
;

enumInterfaceDeclaration:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to indicate that enum interfaces are illegal, but this rule exists to produce better error messages. We might in general make this syntax definition as strict as possible, so maybe just remove the rule

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we should have two versions of the syntax. A strict grammar that is allowed by the compiler, and a more general, internal syntax that is used to produce useful error messages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. To begin with we might at least indicate which variant of the syntax this is

Comment on lines +214 to +215
accessPriv
| accessPub
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub, pub(set), and priv got removed in Cadence 1.0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax is focused on duplicating the behavior of the interpreter's parser. These would produce errors in the semantic checks. That should also be part of the specification.

Copy link
Member

@turbolent turbolent Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe at least leave a comment documenting this decision, e.g. something like // Invalid since Cadence 1.0, so not part of the strict syntax

;

fullTypeOptional:
. optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The . looks like "any character" here, like in other notations. Maybe leave a comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The . and _ were chosen to represent adjacent and not-adjacent. The language for the parser section does not match individual characters. It matches tokens.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's OK. It might not be obvious to readers unfamiliar with this custom notation, and given the . notation means something else in other commonly used notations, maybe help the reader here and point out that this does not mean "any character" but "adjacent"

move
;

expression:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The precedence numbers are an implementation detail of the current parser implementation, maybe remove them from the document as they are not necessary and probably just confusing. The rules are already structured in a way where they encode the precedence rules.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the intention. They were there only to help define the grammar. I forgot to remove them.

Copy link
Member

@turbolent turbolent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another pass over types as requested. Mainly the precedence rules need to be encoded properly, and the only other thing is the intersection types.

Also did another pass over all declarations, and left a few minor comments, especially regarding what types are allowed.

Will do another pass over expressions tomorrow.

;

functionDeclaration:
access *1viewKeyword funKeyword nonReservedIdentifier parameterList *1functionDeclarationTypeAnnotation *1functionBlock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and above: The functionBlock has an "optional" multiplicity here, but this is only the case for some situations, e.g. in interface declarations. Otherwise it is required. We can leave this as-is, but maybe leave a note

;

parseLabeledParameter:
parameterPublicName parameterLocalName ":" typeAnnotation *1defaultValue
Copy link
Member

@turbolent turbolent Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename parameterPublicName to argumentLabel, and parameterLocalName to just parameterName, to match the terminology we have elsewhere

;

parseLabeledParameter:
parameterPublicName parameterLocalName ":" typeAnnotation *1defaultValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaultValue is only valid in some cases, in resource destroyed event declarations. Maybe leave a note that we parse more than semantically allowed


conformances:
conformancesTypes
| conformancesEmpty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified to just | empty, do we need a separate empty rule conformancesEmpty?

;

structInterfaceDeclaration:
access structKeyword interfaceKeyword nonReservedIdentifier *1interfaceInheritance "{" membersAndNestedDeclarations "}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does interfaceInheritance differ from conformances? Maybe remove *1interfaceInheritance and just use conformances here and for the other interface rules below


escapedCharacter:
escapedUnicodeScalar
| escapedCCharacter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential typo (extra C), maybe rename to escapedCharacter

;

functionType:
funKeyword "(" *1functionTypeParameters ")" *1functionTypeReturn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function types may have a view modifier

Comment on lines +479 to +486
fullType:
fullTypeNormal
| fullTypeNested
;

fullTypeNormal:
*1fullTypeReference innerType *fullTypeOptional
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe encode the precedence rules for types in nested rules, like it is done for expressions, see the typeLeftBindingPower* constants. For example, references bind tighter than optionals

| dictionaryType
;

typeRestrictions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restriction types got removed in Cadence 1.0. Instead, there are only intersection types, which have this syntax, but not as a "suffix" to a type. For example, before a type T could be restricted with T{...}, where ... is a list of zero or more nominal types. Now there is just the syntax for intersection types {...} where ... is a list of one or more nominal types.

;

interfaceInheritance:
":" 1*fullType( "," )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only nominal types are allowed

Copy link
Member

@turbolent turbolent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another pass reviewing the expression parsing. Looks good, a couple notes.

Note that some expressions need newline differentiation to resolve ambiguities when expressions are used as statements, see the "(* if no line terminator ahead *)" comments in the expression rules of the EBNF grammar, like

| postfixExpression (* if no line terminator ahead *) invocation
.

For example:

foo(bar) // invocation
foo    // identifier expression
(bar)  // parenthesized expression

;

comparisonExpression: // 50
1*2nilCoalescingExpression( comparisonOp )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though type checking might reject nested comparison expressions, parsing still allows them.
So this should not have an upper bound in the multiplicity:

Suggested change
1*2nilCoalescingExpression( comparisonOp )
1*nilCoalescingExpression( comparisonOp )

Comment on lines +898 to +910
primaryExpression:
nullDenotation *leftDenotation
;

nullDenotation:
identifierExpression
;

leftDenotation:
forceUnwrapOp
| accessExpression
| invocationExpression
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"null denotation" and "left denotation" are terminology of Pratt's Top Down Operator Precedence parsing technique, i.e. are an implementation detail of the current Go parser. They shouldn't be used as rule names.

*1prefixOp postfixExpression
;

postfixExpression: // 160
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Force unwrapping is on this precedence level

Comment on lines +906 to +915
leftDenotation:
forceUnwrapOp
| accessExpression
| invocationExpression
;

accessExpression: // 170
memberExpression
| indexExpression
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

member expression, indexing expression, and invocation expression are all part of the same precedence and should probably be in the same rule. As noted above, force unwrapping has lower precedence (needs to be in a "higher" rule)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Machine Readable Syntax Specification
2 participants