Design of the new compiler parser

Since it's a little complex I decided to write up a wiki page for the design of the implementation of the compiler and parser once we have completed all the current set of refactorings. I would like in particular @radumereuta, @cos, and @osa1 to review this design and let me know if it is satisfactory to them. I also include a section at the end corresponding to two parallel paths of work on the implementation which begin the migration path we are dealing with. I would like @osa1 and @radumereuta to work on the first of these two paths which begins the work needed to move forward with the new frontend, and @cos to work on the second of these two paths, which begins the work needed to have a compiler over the new data structures.

Representations and Components

We can imagine the pipeline of the K framework as a set of transformations between four distinct representations of definitions and terms, and the components which link these representations together.

Grigore: briefly describe the 4 different representations, together with a short description of each

Text -> Concrete Syntax

The first component of the system is the parser. The parser converts from the first representation, which is the text to be parsed, into the second representation, which is the concrete syntax of that text, and corresponds as closely as possible to the original text. This component in particular has two forms. The first, over definitions, takes a file and slurps up all included files, and also generates a set of parsing tables and metadata used to parse other things with the same table. Grigore: you may want to keep the two separate, and have API for them, because the latter part is useful for implementing the #parse(KModule, KString) command, while the former is not (note that once the meta-level will be available, we will be able to build K modules programmatically, in K) The second takes such a set of parsing tables and uses them to process a particular string.

The output of this component is concrete syntax. This concrete syntax has brackets, casts, literate comments, and references to the productions used to parse each subterm. An infix traversal of this syntax should yield text nearly identical to the original text to be parsed. Grigore: Why not identical? Also, I think we may even want to keep the illiterate comments, for pretty-printing reasons; also, users in the past wanted to change the comments of K, because they wanted, for example, to use # for comments in their programs in their language defined in K.

Concrete Syntax -> KORE

The second component of the system is the flattener. This relatively small component converts from the concrete syntax output from the parser into the KORE syntax which is used by developers who want an AST instead of a parse tree. It removes casts, brackets, and comments, and outputs a KORE representation of the definition, which should contain all the information needed in order to perform any arbitrary computations over the term. In particular, the only information lost by the flattener should be the specific textual representation of the original definition.

KORE -> compiled KORE

The third component of the system is the compiler. This is the component which performs transformations over the input and generates all the data needed for krun to be able to execute programs. The output of this stage is purely binary and has no textual representation, but in particular may contain any extra data needed to execute functionality

Input compiled KORE -> output compiled KORE

The fourth component of the system is the executor, or krun backend. This takes the input configuration, which is constructed by the krun frontend, and performs whatever analysis is specified by krun (eg execution, search, model checking, etc).

compiled KORE -> KORE

This component undoes the transformations of the compiler in order to output data which can be treated as KORE (ie reversing any optimizations in the representation of the definition or term). The output of this should be the exact same format as the output of the flattener, which is also the same output as the kast tool.

KORE -> Concrete Syntax

This is the unparser. The unparser has the task of attempting to determine the optimal textual representation of the KORE term by reverting to productions, adding brackets and casts, etc. The output of this phase should be able to be converted to text using a simple infix traversal.

Concrete Syntax -> Text

This is the visitor which does the aforementioned infix traversal.

Design of the new data structures

The data structures needed in order to implement these three internal representations will not be dealt with in detail in this document. The primary constraints that they must obey are as follows:

Must have visitors over each representation (see details below)
Must use Google Guava immutable data structure types for lists/maps/etc
Must be immutable and have immutable typed attributes
Must be the only source of information about the definition (ie no separate Context object)
Should not affect the build system of developers who do not wish to develop the data structure classes in any way

Design of the new visitors

Currently in the implementation visitors take a Context object which is used for a variety of purposes including the task of constructing new terms. This will be replaced with a mutable visitor state containing information about the path to the node being visited. In particular, when visiting a module, we will have access to the definition. When visiting a sentence, we will have access to the module. When visiting child nodes of a sentence, we will have access to the sentence.

The visitor implementation will have the task of tracking this information as it traverses the tree, as well as allowing it to be constructed manually at the time of the constructor in order to visit child nodes in the absence of visiting the entire definition or module or sentence. The information that is in the Context object right now will be replaced with methods and attributes on the Definition and Module objects which update incrementally as the definition is transformed.

In particular, it is the responsibility of the transformer to provide an API for deleting and adding sentences, which API will internally update whatever representations are needed in order to keep this information up to date. The transformer will internally call these methods when various values are returned from the visit methods.

Migration path

Part 1 (Front-end)

The purpose of this path is to unblock Radu's work on the frontend by changing the components of the current implementation which are blocking modular parsing. @radumereuta and @osa1 will work with me to come up with a specific implementation for the relevant features of the design.

Add fields and constructors to the visitors to support constructing a visitor in a particular definition, module, or sentence context, and to support accessing the current context of the visitor from a visit method (which updates as it traverses)
Remove the Context field from AbstractVisitor and replace it with a ModuleContext and DefinitionContext fields on the Module and Definition classes. Compute this information statically at parse-time on a per-module basis, combining each module incrementally. Users access module-specific data at the definition level by accessing the relevant information in the definition's main module.

Part 2 (data structures and visitors)

This path begins in parallel with the front end path, and combines after the first path is complete with @osa1 helping @cos to complete the rest of it in the optimum way with whatever parallelism may be possible.

Create and merge the new data structures.
Create and merge the visitors for the new data structures in their final form (except for incremental computation of data as the definition changes)
Create converters from the old KIL to the new KORE and from the new KORE to the old KIL. This will require determining the form the context information should take in the new KORE, ie, determine what information should become attributes, and what information should become incrementally updated indices on the data structure itself.
Attempt to create and merge a hybrid implementation which puts these two transformations at either the beginning or end of the compilation pipeline (to be determined later) and passes all existing tests.
Gradually step the portion using the new KORE outward until it covers the entire compilation pipeline.
Create a converter from the new KORE to the java backend KIL, and from the old frontend representation to the new KORE.
Delete all parts of the old KIL not used by the frontend.

Part 3 (Frontend again)

Create a new representation for the frontend concrete syntax, including visitors and a transformation from the new concrete syntax to the new KORE. This representation should share the same outer syntax as the new KORE.
Create a hybrid implementation which puts transformations from the old concrete syntax to the new concrete syntax and from the new concrete syntax to the new KORE at the end of the parser pipeline.
Gradually step the portion using the new concrete syntax outward until it covers the entire parser pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly