Skip to content
This repository has been archived by the owner on Feb 1, 2020. It is now read-only.

KAST and KORE

Andrew Miller edited this page Nov 11, 2017 · 15 revisions

This page is more recent than https://github.com/kframework/k/wiki/KAST and https://github.com/kframework/k/wiki/Kore-Design, and it is meant to eventually replace them completely. The old pages may still be useful as a reference and for questions that have not been moved here yet.

Please also consult https://github.com/kframework/k/wiki/Design-of-the-new-compiler-parser if interested in how the translation to KORE is being planned.

See the KORE data structures guide for the implementation of the concepts explained here.

Overview

KAST, abbreviating K abstract syntax tree or K AST, is a generic, abstract syntax notation for the user-defined concrete syntax in K sentences. When writing sentences using KAST, users have to replace concrete language syntax, e.g.

if (B) S1 else S2

with syntax in labelled form, e.g.

`if(_)_else_`(B,S1,S2)

KAST serves two major purposes: (1) it is an escape mechanism to avoid parsing ambiguities in sentences; and (2) it can be parsed using the generic K parser in one pass, avoiding the expensive and complex step of generating parsers for the user-defined syntax combined with the generic K syntax.

KORE is a minimal fragment of K, powerful enough to represent any K definition without any loss. KORE essentially does two things: (1) it uses KAST for user-defined syntax; and (2) it desugars K's derived syntax; e.g., in KORE we write

syntax Sort ::= Production1 [ . ]
syntax Sort ::= Production2 [ attributes ]

instead of

syntax Sort ::= Production1
              | Production2 [ attributes ]

Being minimal, each of KORE's syntactic constructs is absolutely needed. Moreover, for each KORE syntactic construct there is precisely one K API construct, and all the K tools, including the K compiler and K runner operate directly on these data-structures. This allows, in particular, users to develop K definitions directly in Java using the K API, which corresponds to writing KORE definitions but skipping their parsing.

#KORE

The syntax of KORE, defined using K, can be found in two files, HERE and HERE. KORE defines a core subset of K, with the property that any K definition can be mechanically translated to a definition using only KORE. An additional property of KORE is that we do not want to lose anything from the original definition. For example, we do not want to lose the capability to pretty-print terms based on the original syntax, or the capability to report error messages referring to constructs in the original definition.

We designed KORE based on the following three requirements:

  1. KORE is a fragment of K, that is, any KORE definition is also a K definition

  2. Any K definition can be easily translated to KORE (linearly, both in time and space)

  3. It is minimal with the properties above

Additional KORE Requirements

Besides the restrictions given by the reduced syntax of KORE, there are a few additional restrictions that need to be respected in order for a K definition to be considered proper KORE:

  • Each proper (i.e., non-subsort) production should have precisely one klabel attribute.

  • Only klabels defined as above or defined independently can be used in rules.

  • All variables in all rules are sorted (using sort membership side conditions).

  • All "requires" commands and all "imports" statements are explicitly listed. Exception makes the syntax of K/KAST/KORE itself, which for now we will assume is included automatically (this sentence will make more sense when we can define the syntax of K itself, which will be hooked to the K API). The reason we do that is because we do not want it to interfere with the syntax that the program parser will use to parse programs.

Examples

Several examples of languages written in KORE can be found HERE. (Grigore says: change this link after we merge)

From K to KORE

K can be desugared into KORE by applying one of more of the following:

  1. Incrementally translate K constructs into combinations of KORE constructs, until all non-KORE constructs are eliminated; for example, translating syntax Sort ::= Prod1 | Prod2 into syntax Sort ::= Prod1 syntax Sort ::= Prod2.

  2. Use special attributes to give special meaning to certain KLabels. For example, syntax StateCell ::= "<state>" Map "</state> [cell]" defines a configuration cell. See details below for how to translate entire configuration declarations to ordinary KORE syntax (with no syntactic support for configurations).

  3. Use special KLabels to enable special functionality at runtime or to hold certain meta-data in terms. For example, if we want to associate meta-data M to K term T, then we can use a special KLabel, say #KMetaData, and replace T with #KMetaData(T,M).

This translation from K to KORE can be best implemented in K itself, of course, but the general consensus is that K is not ready for defining itself yet, at least performance-wise. We are looking forward to doing it soon, though!

From Outer to Outer KORE

Translating all language-independent K constructs to the KORE subset should be relatively easy. That includes translating syntax declarations with multiple productions for the same sort into separate syntax declarations with only one production each, translating syntax declarations with blocks of priorities into separate syntax declarations and separate priority block declarations, translating list/set/bag/map declarations of the form `List{Sort,"separator"} into corresponding ordinary KORE syntax declarations with assoc/comm/unit attributes as explained below, and so on.

Lists, Sets, Bags and Matching Modulo

All these collections can be defined with just attributes in KORE (like in Maude).

An associative list:

syntax SortList ::= Sort
syntax SortList ::= ".SortList"
syntax SortList ::= SortList "," SortList  [assoc unit(.SortList)]

Or, alternatively, if one prefers KLabels only, then

syntax KLabel ::= ".SortList"
syntax KLabel ::= "_,_"  [assoc unit(.SortList)]  // or `,` or whatever KLabel name we want

A bag also adds comm to the list of attributes above. A set adds idem as well. Internally, the backends with support for associative matching will identify `_,_`(`_,_`(a,b,.SortList(.KList)),`_,_`(c),.SortList(.KList),d) with `_,_`(a,b,c,d).

Default KLabels

KORE expects that all productions have explicit KLabels in their list of attributes, so from KORE's perspective, the particular KLabel name associated to a production is irrelevant. That is frontend's problem. For example, if we decide that we want the various lists over related sorts, e.g., ExpList, IntList, ValList, to all have the same KLabel _,_, like we do now, then the frontend will assign all three comma productions the same KLabel, _,_. If we decide that we want them to have different KLabels, for example for a particular backend, then we can instruct the frontend to associate them distinct KLabels, e.g.,

`_,_:ExpList*ExpList->ExpList`
`_,_:ValList*ValList->ValList`
`_,_:IntList*IntList->IntList`

Anyway, the point is that it is not KORE's problem, and that we should start with whatever default we have now in the frontend, but likely allow more flexibility in it later on, to accommodate backends which do not want to or cannot calculate the least sort of a term dynamically.

Freezers and HOLE

In K, freezers and the HOLE are only naming conventions for certain K labels needed to express evaluation strategies. Consider, for example, the K syntax declaration

syntax Exp ::= Exp "+" Exp  [strict(1)]

This is also valid in KORE (and some kompilation stage will expand it appropriately). Suppose now that one writes the corresponding heating/cooling rules explicitly in K:

rule E1:Exp + E2:Exp => E1 ~> HOLE + E2 requires notBool(isKResult(E1))
rule E1:KResult ~> HOLE + E2:Exp => E1 + E2 requires

Translated to KORE, these become:

syntax KLabel ::= "HOLE+_"
rule `_+_`(E1,E2) => E1 ~> `HOLE+_`(E2) requires isExp(E1) andBool isExp(E2) andBool notBool(isKResult(E1))
rule E1 ~> `HOLE+_`(E2) => `_+_`(E1,E2) requires isExp(E1) andBool isExp(E2) andBool isKResult(E1)

Meta-Data

It may be the case that we sometimes want to associate meta-data to terms in K sentences, for example to store its precise location in the original .k file, to store its inferred sort(s) by the frontend, to store the original fragment of the input definition that it corresponds to, etc. These meta-data can be useful for many reasons, for example for error reporting purposes, for tracing, for debugging, etc. We prefer to not add special syntax for meta-data, for the same reason we prefferred to add no special syntax for K attributes: the syntax of KAST is meant to be powerful enough to abstractly represent the syntax of any language, so it should be able to also represent the syntax of our particular languages for meta-data or for attributes. Many encodings are possible here, but for uniformity with other notations let us assume a special KLabel #KMetaData that takes two KAST term arguments, one which is the term to attach the metadata and the other is the metadata itself. For example, #KMetaData(T,M) represents the meta-data M associated to term T. If a particular backend is not interested in the meta-data then it can simply drop it from its input KORE.

Cells, Configurations, and Configuration Abstraction

Defining configurations, with all the fancy cell attributes and multiplicities, can be easily represented using the basic KORE language. All we need to do is to introduce some syntax attributes and to establish some conventions about what they mean in terms of configuration building. We next discuss our proposal by means of an example configuration. Consider the most complex configuration that we have in the tutorial, that of typed KOOL:

  configuration <T color="red">
                  <threads color="orange">
                    <thread multiplicity="*" color="yellow">
                      <k color="green"> ($PGM:Stmts ~> execute) </k>
                      <br/>
                      <control color="cyan">
                        <fstack color="blue"> .List </fstack>
                        <xstack color="purple"> .List </xstack>
                        <return color="LimeGreen"> void </return>
                        <br/>
                        <crntObj color="Fuchsia">
                           <crntClass> Object </crntClass>
                           <envStack> .List </envStack>
                           <location multiplicity="?"> .K </location>
                        </crntObj>
                      </control>
                      <br/>
                      <env color="violet"> .Map </env>
                      <holds color="black"> .Map </holds>
                      <id color="pink"> 0 </id>
                    </thread>
                  </threads>
                  <br/>
                  <store color="white"> .Map </store>
                  <busy color="cyan">.Set </busy>
                  <terminated color="red"> .Set </terminated>
                  <in color="magenta" stream="stdin"> .List </in>
                  <out color="brown" stream="stdout"> .List </out>
                  <nextLoc color="gray"> 0 </nextLoc>
                  <br/>
                  <classes color="Fuchsia">
                     <class multiplicity="*" color="Fuchsia">
                        <className color="Fuchsia"> Main </className>
                        <extends color="Fuchsia"> Object </extends>
                        <declarations color="Fuchsia"> .K </declarations>
                     </class>
                  </classes>
                </T>

Let us consider the following "builtin" attributes; we may even change their name to start with #:

  1. cell - this states that the current production is defining a cell.

  2. topcell - same as above, but the cell is the top cell of the configuration. Each module has precisely one such top cell, which can be implicit or explicit. Once a module M1 is imported by another module M2, the topcell of M1 is lost and the new top cell is that of M2.

  3. maincell - this is the default cell, typically the k cell; but sometimes it may be called type if we define a type system, or code if we define a code generator, etc. Anyway, the point is that we need such a cell so we can write rules like rule I1 + I2 => I1 +Int I2.

  4. initializer - this is an attribute that can be associated to precisely one constant of a cell sort, and says how that cell is initialized.

With the above, we can mechanically translate the KOOL configuration above into the following KORE code:

syntax TopCell      ::= "<T>" Threads Store Busy Terminated In Out NextLoc Classes "</T>"  [topcell, klabel("<T>"), color="red"]
                      | initTop(Map)  [initializer]
rule initTop(Init:Map) => <T> initThreads(Init) initStore initBusy initTerminated initIn initOut initNextLoc initClasses </T>  [macro]

syntax Threads      ::= "<threads>" ThreadBag "<threads>"  [cell, klabel("<threads>"), color="orange"]
                      | initThreads(Map)  [initializer]
rule initThreads(Init:Map) => <threads> initThread(Init) </threads>  [macro]

syntax ThreadBag    ::= Thread | ".ThreadBag" | ThreadBag ThreadBag  [assoc, comm, unit(.ThreadBag)]

syntax Thread       ::= "<thread>" KCell Control EnvCell Holds ThreadId "</thread>"  [cell, klabel("<thread>"), color="yellow"]
                      | initThread(Map)  [initializer]
rule initThread(Init:Map) => <thread> initKCell(Init) initControl initEnvCell initHolds initThreadId </threads>  [macro]

syntax KCell        ::= "<k>" K "</k>"  [maincell, klabel("<k>"), color="green"]
                      | initKCell(Map)  [initializer]
rule initKCell(Init:Map) => <k> (Init[$PGM]:Stmts ~> execute) </k>

syntax Control      ::= "<control>" FStack XStack Return CrntObj "</contol>"  [cell, klabel("control"), color="cyan"]
                      | "initControl"  [initializer]
rule initControl => <control> initFstack initXStack initReturn initCrntObj </control>  [macro]

syntax FStack       ::= "<fstack>" List "</fstack>"  [cell, klabel("<fstack>"), color="blue"]
                      | "initFStack"  [initializer]
rule initFStack => <fstack> .List </fstack>  [macro]

syntax XStack       ::= "<xstack>" List "</xstack>"  [cell, klabel("<xstack>"), color="purple"]
                      | "initXstack"  [initializer]
rule initXStack => <xstack> .List </xstack>  [macro]

syntax Return       ::= "<return>" Type "</return>"  [cell, klabel("<return>"), color="LimeGreen"]
                      | "initReturn"  [initializer]
rule initReturn => <return> void </return>  [macro]

syntax CrntObj      ::= "<crntObj>" CrntClass EnvStack Location "</crntObj>"  [cell, klabel("<crntObj>"), color="Fuchsia"]
                       | "initCrntObj"  [initializer]
rule initCrntObj => <crntObj> initCrntClass initEnvStack initLocation </crntObj>  [macro]

syntax CrntClass    ::= "<crntClass>" Id "</crntClass>"  [cell, klabel("<crntClass>")]
                      | "initCrntClass"  [initializer]
rule initCrntClass => <crntClass> Object </crntClass>  [macro]

syntax EnvStack     ::= "<envStack>" List "</envStack>"  [cell, klabel("<envStack>")]
                      | "initEnvStack"  [initializer]
rule initEnvStack => <envStack> .List </envStack>  [macro]

syntax Location     ::= "noLocation" | "<location>" K "</location>"     [cell, klabel("<location>")]
                      | "initLocation"  [initializer]
rule initLocation => noLocation  [macro]

syntax EnvCell      ::= "<env>" Map "</env>"  [cell, color="violet", klabel("<env>")]
                      | "initEnvCell"  [initializer]
rule initEnvCell => <env> .Map </env>  [macro]

syntax Holds        ::= "<holds>" Map "</holds>"  [cell, color="black", klabel("<holds>")]
                      | "initHolds"  [initializer]
rule initHolds => <holds> .Map </holds>  [macro]

syntax ThreadId     ::= "<id>" Int "</id>"  [cell, color="pink", klabel("<id>")]
                      | "initThreadId"  [initializer]
rule initThreadId => <id> 0 </id>  [macro]

syntax Store        ::= "<store>" Map "</store>"            [cell, klabel("<store>"), color="white"]
                      | "initStore"  [initializer]
rule initStore => <store> .Map </store>  [macro]

syntax Busy         ::= "<busy>" Set "</busy>"              [cell, klabel("<busy>"), color="cyan"]
                      | "initBusy"  [initializer]
rule initBusy => <busey> .Set </busy>  [macro]

syntax Terminated   ::= "<terminated>" Set "</terminated>"  [cell, klabel("<terminated>"), color="red"]
                      | "initTerminated"  [initializer]
rule initTerminated => <terminated> .Set </terminated>  [macro]

syntax In           ::= "<in>" List "</in>"                 [cell, klabel("<in>"), color="magenta", stream="stdin"]
                      | "initIn"  [initializer]
rule initIn => <in> .List </in>  [macro]

syntax Out          ::= "<out>" List "</out>"               [cell, klabel("<out>"), color="brown", stream="stdout"]
                      | "initIn"  [initializer]
rule initIn => <in> .List </in>  [macro]

syntax NextLoc      ::= "<nextLoc>" Int "</nextLoc>"        [cell, klabel("<nextLoc>"), color="gray"]
                      | "initNextLoc"  [initializer]
rule initNExtLoc => <nextLoc> 0 </nextLoc>  [macro]

syntax Classes      ::= "<classes>" ClassBag "<classes>"         [cell, klabel("<classes>"), color="Fuchsia"]
                      | "initClasses"  [initializer]
rule initClasses => <classes> .ClassBag </classes>  [macro]

syntax ClassBag     ::= Class | ".ClassBag" | ClassBag ClassBag  [assoc, comm, unit(.ClassBag)]

syntax Class        ::= "<class>" ClassName Extends Declarations "</class>"  [cell, klabel("<class>"), color="Fuchsia"]
                      | "initClass"  [initializer]
rule initClass => <class> initClassName initExtends initDeclarations </class>  [macro]

syntax ClassName    ::= "<className>" Id "</className>"          [cell, klabel("<className>"), color="Fuchsia"]
                      | "initClassName"  [initializer]
rule initClassName => <className> Main </className>  [macro]

syntax Extends      ::= "<extends>" Id "</extends>"              [cell, klabel("<extends>"), color="Fuchsia"]
                      | "initExtends"  [initializer]
rule initExtends => <classExtends> Object </extends>  [macro]

syntax Declarations ::= "<declarations>" K "</declarations>"     [cell, klabel("<declarations>"), color="Fuchsia"]
                      | "initDeclarations"  [initializer]
rule initDeclarations => <declarations> .K </declarations>  [macro]

This KORE code has multiple benefits, besides the fact that it uses a subset of the syntax:

  • It makes configurations and cells fall under the same general approach to syntax that we have in K, and in particular allows us to pass cells around, visit them, etc., using the same general machinery.

  • It makes it explicit how each cell is to be initialized by the configuration concretization pass.

  • It gives each cell a name, so we can more modularly built configurations, including configurations with "recursive" cells, as we need for example to define process calculi like CCS, CSP, Pi calculus, etc.

  • It will facilitate adding operations on configurations, as we discussed many times in the past. For example, adding an operation merging two configurations into one, say one for the dynamic semantics and one for type system of a language, will simply be syntax AggregTopCell ::= <aggreg> DynamicTopCell TypeTopCell </aggreg>. Similarly, more complex aggregations of configurations will also be possible; for example, we can define an operation, or make it builtin, #KMerge(Cell1, Cell2) which traverses the two configurations and merges their cells in some smart way.

Note that all the rules above were tagged as macro. That means that they will be applied statically, at kompile time, so krun will still receive one large, flat configuration holding references to configuration variables that can be instantiated by krun. BTW: these configuration variables are in fact constants.

Cell fragments

Cell fragments sorts are implemented as a modified copy of the cell which makes each child optional, which is used in rules to mark when the cell fragment variable appeared in a context that already matched that sort of neighboring sort. It is not necessary to introduce an optional version of multiplicity * cells, because no context can be sure to match every instance of a multiplicity * cell. Attributes indicate the relationship between the new and the original sorts. These are the productions generated to support fragments of the control cell.

syntax ControlCellFragment ::= "<control>-fragment" FStackOpt XStackOpt ReturnOpt CrntObjOpt "</control>-fragment" [cellFragment(ControlCell),klabel(<control>-fragment)]
syntax FStackCellOpt ::= "noFStack"[cellOptAbsent(FStackCell)] | FStack
syntax XStackCellOpt ::= "noXStack"[cellOptAbsent(XStackCell)] | XStack
syntax ReturnCellOpt ::= "noReturn"[cellOptAbsent(ReturnCell)] | Return
syntax CrntObjCellOpt ::= "noCrntObj"[cellOptAbsent(CrntObjCell)] | CrntObj

The compiler uses this extra syntax to implement rules matching cell fragments by expanding cell fragment variables which appear under a parent cell into a series of variables which are placed in the appropriate arguments, and occurances of the cell fragment variable outside any cell become an application of the cell fragment label to the same series of variables, with the cellOptAbsent labels filling the other arguments.

rule <control><xstack>(. => C)...</xstack>C</control>

would expand into

rule <control>C1::FStack <xstack>`.List => `<control>-fragment`(C1,noXStack,C2,C3)`...</xstack>
              C2::Resturn C3::CrntObj</control>

Kuestions

What does the token attribute do?

The token attribute associated to a production for a sort Sort will wrap up the fragment of input parsed by that production, say fragment, as a token of sort Sort in the generated AST, that is, as #token("fragment","Sort"). This has at least two important uses:

  1. It allows us to uniformly regard constants defined for semantic reasons as tokens; for example, syntax Id ::= "main" [token] will tell the parser to write #token("main","Id") instead of `main`(.::Klist) in any AST (both in programs and in semantics), this way relieving the user from treating two different cases in their semantics. See also https://github.com/kframework/k/issues/817.

  2. It will allow us to define/implement multi-stage parsers in K, making use of the #parse command. For example, we can first use a rougher but simpler/faster parser to put the sub-string to be parsed with another parser in a bubble, and then parse the bubble with the intended parser, as we do for parsing K itself with K's parser:

  syntax Bubble ::= List{BubbleItem,""}   [token]
  rule #token(S:KString, "Bubble") => #parse(S, "K", "KAST")

How To Encode Meta-Data and Attributes?

We have many options to encode these and I have no particular preference. One thing is clear, these encodings are going to be ugly. For example,

    syntax Exp ::= Exp "+" Exp  [strict(1,2), plus]

is going to be at least as ugly as

    syntax Exp ::= Exp "+" Exp  [#KAttributes(`strict`(#token("1","Int"), #token("2", "Int)), `plus`(.::KList))]

The point here is that we don't want to invent some new, adhoc syntax for encoding attributes and meta-data, but instead to use the same KAST syntax that we already have. It is likely that the kompiler and backend developers will prefer to work with maps, sets, lists, etc., as attributes and meta-data instead of K terms, but like with anything else, there will be back-and-forth translations from their internal representation and such KAST representations.

Open Issues, Concerns

  • So we required that each proper (i.e., non-subsort) production should have precisely one klabel attribute. Should we also require it for subsort productions, and then have another attribute to state that we want the sort inclusion to be injected?

  • It is not entirely clear whether we want requires and ensures to be constructors for rules, contexts, etc., or for KItems.

Clone this wiki locally