Skip to content

An LLVM compiler for Latte, written in Rust.

License

Notifications You must be signed in to change notification settings

kowaalczyk/latte

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latte

An LLVM compiler for Latte (imperative & object-oriented programming language, similar to Java or C++), written in Rust.

Development status

All features except garbage collection are implemented and pass provided test cases.

  • Front end (4p)
    • parser
    • syntax tree pre-processing
    • typechecker
    • error handling
  • LLVM backend (8p)
    • code generation
    • runtime
    • optimizations
  • SSA (1p)
  • structs (2p)
  • tables (2p)
  • classes (3p)
  • virtual methods (3p)
  • garbage collection (2p)
  • code refactor
  • generated docs
  • integration with community-created tests

How to use

  1. You need a complete (stable or nightly) rust toolchain to built this project:
    • check if you have it: rustc --version
    • installation instructions are available here
  2. You also need LLVM and clang (both versions >= 8.0.1) to be installed:
    • check if you have it clang --version, lli --version
    • on OSX:
      • use brew install llvm to get the latest version
      • run source .env in your shell to use it (as opposed to the system one)
    • for other systems, installation instructions are available here
  3. Use provided Makefile to build or run the compiler:
    • make to run all tests and build everything
    • make runtime to compile the runtime (built-in functions)
    • make test to run all tests
    • make release to create a compiler executable (latc_llvm)

Running the compiler is simple:

latc_llvm path/to/file.lat

It creates the following files:

path/to/file.ll
path/to/file.bc

The .ll file contains LLVM IR of the compiled program (without Latte runtime), while the .bc file contains LLVM bytecode of the program with runtime.

The compiled program can be executed using LLVM interpreter:

lli path/to/file.bc

It can also be compiled to an executable binary using LLVM + compiler of your choice, for example using GCC:

llc -filetype=obj path/to/file.bc
gcc path/to/file.o

The compiler was tested on following operating systems:

  • OSX 10.15.2
  • PLD Linux 3.0 (Th)

It will likely run without problems on any Unix-like system, running on Windows may require some changes to lib/runtime.c and src/main.rs which calls external processes (llvm-as, llvm-link).

Project structure & implementation details

Compiler is separated into a library and executable, which is just a wrapper containing command line interface. That way, compiler functions can be easily accessed programmatically (eg. during tests or to use in a larger project).

Latte compiler library

The compiler library latte implements all front-end and back-end compiler logic, and although it is located in the same directory as executables, it is 100% ready for standalone use (can be imported as latte = "0.2.0" via Cargo.toml file).

It consists of the following modules:

  • frontend
  • backend
  • meta
  • util

Frontend

This module is responsible for reading source code files, parsing them into an abstract syntax tree, performing optimizations on the parsed tree and validating the program (including type checking).

The program structure processed by the frontend meets all compiler requirements:

  • every function ends with a return statement or a conditional statement with both branches ending in return statements
  • main function exists and has correct signature
  • all variable and function references are valid
  • no variable is defined twice in a single block
  • all tree nodes have attached type information and all types are correct

Currently, the front-end pipeline consists of the following steps:

  • use parser generated from grammar to parse the file into abstract syntax tree
  • optimize constant expressions (implemented here using AstMapper pattern)
  • ensure blocks have return values (implemented here using AstMapper pattern)
  • assign and check types, variable access errors and possible name confilcts using typechecker (high-level interface here, structure defined here, and AstMapper implemented here)

If any step in the front-end pipeline fails, the entire pipeline fails as well. Within a single step (ie. parsing or type checking), the frontend tries to collect as many independent errors as possible to speed up debugging and provide better feedback.

Errors returned from the frontend are already mapped to their locations within the source file and can be formatted or printed (implement Display trait) to provide location and error information. Error implementation can be found here, I also use a CharOffset structure to remember original position of characters in file after the comments are removed (which is a necessary step for a lalropop-generated parser).

Public interface

The 2 public functions exposed by frontend module are:

  • process_code, which attempts to perform all frontend actions and return either compiled program or a vector of errors
  • process_file, a convenience wrapper around process_code which reads a file from the given path

Aside from these 2 functions, frontend exposes all abstract syntax tree structures via frontend::ast. Definition and detailed documentation of these structures can be found here.

Backend

Frontend module handles most of the heavy tasks, and because no backend optimizations are implemented yet the backend module consists of the single backend::compiler submodule.

The compiler is separated into 3 structures, responsible for compiling entire Program, single Class and single Function. Most of the logic (expression and statement compilation) is implemented in FunctionCompiler. The rest of the logic is also implemented in the compiler submodule. Output of the compiler structures uses internal representation format (defined here), which implements the Display trait (implementation defined here) that allows to convert it directly to LLVM IR instructions during string conversion.

To allow easy management of dependencies, such as next number of free llvm register or local environment, I separated this logic into Context classes, defined in context submodule.

The part I found the trickiest and most error-prone is correct generation of LLVM PHI nodes - to help myself with that process I defined BlockBuilder class that is responsible for adding instructions to the block, and keeping track of their numbering in case it needs to be altered - which is also triggered automatically as soon as the block ends. The instructions are re-mapped to use new register numbers after phi nodes are added, and sometimes (esp. in case of loops) the same mapping has to apply to other blocks. To do this, I added MapEntities trait that performs this task. Implementation of this trait, as well as BlockBuilder class can be found here.

Other implementation details are documented in individual files.

Public interface

To compile a code checked by frontend module, use the backend::compile function, which is defined here.

The function assumes its input program is checked by the frontend, and it will panic! if that assumption is broken.

Its result is a single string containing program represented as LLVM IR. Most Latte programs will additionally require a runtime in order to be executed, but the linking is delegated to the caller (in the case of assignment: latc_llvm executable).

Meta

Contains a definition of generic Meta structure, which allows to easy add metadata to any node in abstract syntax tree.

The LocationMeta and TypeMeta are some of the concrete implementations of Meta that are used throughout the entire project, and therefore were also located in the Meta module. They both contain aliases to common Meta methods, so that the calling is more verbose (ie. node.get_type() or node.get_location() instead of node.get_meta()).

Util

Contains generic implementations of AstMapper and AstVisitor patterns, as well as the Env wrapper around HashMap for easy creation and management of environments.

Latte runtime: lib

Runtime consists of a single runtime.c file which implements:

  • standard library functions that can be called from Latte programs
  • built-in functions for string operations and object initialization

Detailed documentation can be found in the file itself

Runtime is compiled to LLVM using make runtime and works on both linux and unix systems. It's automatically re-compiled before running tests or building a release to prevent any accidental errors.

Latte compiler executable

The main Cargo crate, located in the repository root, is just a thin wrapper around the latte library that adds the command line interface and calls necessary external commands (llvm-link, llvm-as) to complete the compilation.

Entire implementation is defined in main.rs.

End-to-end tests: tests, test_e2e.sh

The test programs are located in test folder and include all programs provided with assignment (I am not their author and the repository license does not apply to these files).

File tests/frontend.rs implements tests for entire front-end pipeline, which checks if typechecker and other modules work as intended. It contains one test case per folder (not per source file), so it may be useful to log the entire test command output by providing additional flags to cargo test:

# this will run all of the tests:
cargo test --all -- --exact

The full end-to-end tests can be executed via test_e2e.sh bash script, which compiles latc_llvm and executes it on all the files, saving filename.realout which contains program output, filename.log which contains compiler stderr output for debugging purposes.

Utility files

  • Makefile automates build, test and release processes
  • .env sets up environment variables for OSX

External libraries

The project uses following external libraries:

  • lalrpop: parser generator
  • lalrpop-util: error recovery from lalrpop-generated parser
  • regex: lalrpop dependency used in generated parser
  • codemap: mapping byte offset from lalrpop to (file, line, column)
  • itertools: for Iterator.join(sep) - efficient string concatenation
  • include_dir: for recursively including all files in the test directory during test compilation

Some of them are just compile-time dependencies, for details see Cargo.toml and latte_lib/Cargo.toml.

About

An LLVM compiler for Latte, written in Rust.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages