Qi Meeting Oct 11 2024

Kindergarten Compiler

Qi Meeting Oct 11 2024

Adjacent meetings: Previous | Up | Next [None]

Summary

Now that RacketCon is behind us, we discussed next steps on returning to Qi development and some high level goals we could aim for, including a new release, a community event, and targeting end users for Qi and Racket-based applications. We also returned to basics to learn about how to add two numbers together from our resident K-12 educator, Prof. Michael.

Background

Sid has been working on organizing RacketCon, and Michael has been preparing for many recent talks, including one at RacketCon. Dominik has been working on a cryptographic toolkit that has some deadlines coming up. So we've all had other things take up a lot of time of late, but those other things are abating now and we should be able to return to some of the work that's in progress, including:

comprehensive deforestation of list APIs
resolving edge cases with bindings

It's also a good time to reconnect with some longer term goals.

Doug's Feedback at RacketCon

Douglas Crockford was back at RacketCon this year, and at one point he asked the following question (paraphrased):

How is Racket attempting to appeal to a larger audience?

In the Racket community we think about this question a lot, so Sid was surprised to be stumped by this question!

Doug suggested:

Find people in industry who could be convinced to evangelize Racket use in their teams and companies.
Write more end user applications (as opposed to internal tools used by Racketeers), like Frosthaven Manager.

Evangelizing Racket

Dominik coincidentally mentioned he is contracting with a company that has asked him to develop a cross-platform GUI that they will themselves maintain after the end of his contract period. He suggested using Racket and they were enthusiastic about the idea, so it sounds like we're already taking steps here!

What is Qi's "killer app"?

While Qi is a general purpose language, there are likely to be certain kinds of users who especially would benefit from its use. We could try to reach them more intentionally, by designing features and applications specifically for them.

One domain that we've talked about often is Qi's potential use in data science, and especially the possibility that data scientists could write a single program that, during development, could work on local test data in a single process on the local machine, and in production, would work on a Big Data backend like Kafka or AWS Lambda, and the only change that would be needed would be the language declaration at the top of the module, e.g. from #lang qi to #lang qi/aws-lambda. This is a speculative possibility but there don't seem to be any obvious obstacles.

This could be one application that would be compelling for industry to adopt, and we could potentially find data scientists willing to try it out and then evangelize it within their teams. Once we have a proof-of-concept ready, we could reach out and interview data scientists (e.g. Sid knows someone who would be the perfect ~~guinea pig~~ early adopter) to see if it could solve problems they actually have, and make their lives easier in some way.

Crowdsourcing a Compiler

Years ago, Sid took a class during his undergraduate studies at Georgia Tech, taught by Scott Wills (and also by Linda Wills). This class had a unique structure.

For each assignment (to be implemented in MIPS assembly code, and executed in a simulator), the professor would timebox and spend a few hours implementing it themselves immediately after devising the assignment. Then, they'd record their own performance numbers in terms of static program size (number of instructions), dynamic execution time (number of instructions executed during the lifetime of program execution), and number of registers (memory) used.

Your grade was determined by how close you got to these reference numbers, with each of these three categories weighted based on their importance (e.g. dynamic execution was the highest-weighted). This meant, for instance, that it would be worth using more memory in your program if it would mean reducing execution time. What was even more interesting was that if you happened to beat the reference numbers, you could even get more than a 100 on the assignment! This made it an opportunity to even redeem past poor grades by doing extra work to optimize on subsequent assignments (in fact, funny story: Sid turned in the wrong files on the very first assignment and got almost a zero! So he had to work extra hard to get good performance numbers on subsequent assignments and was able to eke out an A in the class after all, even though it would have been impossible given the naive grade breakdown across assignments! It was quite dramatic. And, as all the students would attest, beating (or even attaining) the professor's numbers took way longer than "a few hours"!)

This was an incredibly fun class and it made an impression on Sid. Based on this experience, he feels that if there are well-specified parameters, clear measures that provide feedback, and ways to "win," a similar competition to write optimizations for the Qi compiler could be really fun for the community (but let's definitely leave out the grades part 😅).

How to Write a Compiler

One way we could continue working on the compiler is to write optimizations ourselves within the group of Qi core developers, as, indeed, we have been doing.

But another way is to implement the infrastructure that we anticipate could support useful optimizations. That is, utilities that can enrich syntax with additional inferred or collected information that would then serve as the basis for optimizations. After all, the basis of all optimization is knowledge. The more we know about the code, the more equivalences, including equivalent expressions that are more performant, become apparent. So if we could provide infrastructure that would enrich the source syntax with any and all such information we can think of, that would form the basis for any conceivable optimizations.

This infrastructure could include (some of which we've talked about before):

a queryable table of known arities of built-in functions
a utility to annotate input syntax with arity information in a standard and simple format (i.e. what would typically be called an "IR" or intermediate representation -- but here it would just be utility for use in compiler passes, and not an IR in the lifecycle of the compiler as such, as discussed below)
a utility to annotate input syntax with type information
anything else that infers local information about the input syntax from the context

With such tools available, developers would be empowered to write optimizations leveraging such information.

"Common Core" Architecture

Optimizing compilers typically translate a source language to a target language, and may employ any number of intermediate representations (IRs) along the way -- each of which enrich the syntax with some specialized information for the purposes of performing optimizations, or perhaps strip away such information that is no longer needed.

In this traditional architecture, there are no constraints on these IRs, and this in turn imposes constraints on the ordering of compiler passes. If one were to write a pass that utilizes arity information to optimize the code, it must be placed at a stage of compilation where arity information is part of the IR.

We've talked about an alternative for the Qi compiler where the Qi core language is the common contract across all compiler passes, that is, where each pass must satisfy the contract, F: Qi₀ → Qi₀, translating the core language back into the core language (including any fresh optimizations).

This architecture implies that each pass is responsible for enriching the input core language syntax with any information it needs (a quality that prompted us to refer to this architecture as "dockerized," comparing it to Docker images that install all dependencies from scratch even to do a small task), and then stripping away any such annotations before producing its output in the usual Qi core language. It gives us the flexibility to place compiler passes anywhere and reorder them at will, and it keeps concerns localized within passes to avoid any bugs that may be caused by nonlocal interactions between upstream passes and downstream ones.

At the same time, the architecture relies on the core language being rich enough to express the optimized code. This part is easy, however, since the core language includes the esc form which allows us to "escape" and express a computation in Racket, which is, of course, the target language for Qi compilation.

Of course, there are some tradeoffs. Mainly, each pass being responsible for enriching the core language from scratch means that if, say, arity information is used in N passes, it must be computed afresh N times. By careful ordering of passes, the traditional architecture would only need to compute this information once. On the other hand, each "common core" pass would operate on syntax enriched with precisely the information that it needs, so that the implementation would be precise and clean. In the traditional architecture, as subsequent passes are not isolated from the present one, it may be necessary to propagate (and parse through) such irrelevant components of the IR if a downstream pass will end up needing it. The common core approach is unburdened by such "action at a distance."

Finally, another benefit to this "common core" architecture is that the contract being well-specified in terms of the publicly advertised core language is a clean specification for developers to start from. It should make it relatively easy for people to write compiler passes, without their needing to be intimately familiar with Qi compiler internals. In other words, this architecture is conducive to broad, even casual, community participation, and therefore to "crowdsourcing" optimizations.

Are We Ready?

We've talked about such a performance-themed event for a while, but the compiler was too much in flux in the leadup to the Qi 4 release for it to be practical for community members to participate.

But now, things are different. We have a lot more infrastructure and developer documentation, and the architecture of the compiler is somewhat stable as well. Specifically:

the compiler begins with a normalization step so that all subsequent optimizations operate on a common form.
there is a simple mechanism in place for adding, removing or reordering compiler passes (the registry of compiler passes).
we have a nontrivial compiler pass (deforestation).
each pass produces the standard Qi core language, so that the contract for each pass is straightforward (i.e. each pass is a transformation of syntax, Qi₀ → Qi₀, where Qi₀ is the Qi core language -- this is already the case for our existing passes, i.e. deforestation).
the full cycle of the compiler looks like normalize → pass ... → code generate to Racket → process bindings; and these are each decoupled from one another
we have comprehensive unit tests validating the semantics of the language
we have microbenchmarks that can easily be run at the command line for quick feedback, and more rigorous benchmarks that can also be run at the command line, and which are part of CI on every commit.

In short, it's complex, but it's well-specified and well-supported. We are in a position now that community members with a little help could get up to speed on how everything is set up pretty quickly and get rapid automated feedback on changes they make. They should also be able to work on isolated changes in an isolated manner, as the codebase exhibits loose coupling.

But there is still a decent amount of compiler work in flight around deforestation and the necessary architectural improvements necessary to do it in an extensible way.

Once we merge the comprehensive deforestation work, hopefully, we'll be quite close!

Prerequisites

One thing we still need before we can do this is a clear theory of optimization. We made a lot of progress revealing it in discussions a little while ago, and there was one meeting where we had a dedicated discussion and it was really insightful and fun and we made a lot of progress. Sid made extensive notes based on that discussion, but he is yet to synthesize them into proper meeting notes (and docs) that, hopefully, will give us more clarity on Qi's theory of optimization. So to summarize, these are some pre-requisites to the compiler community event:

Complete and merge the deforestation work
Specify Qi's theory of optimization (including: post the relevant meeting notes and update the PR)
Develop utilities to enrich syntax with information (e.g. arity, types, etc.) in an agreed-upon "IR" format

Competition Structure

Submissions could comprise:

an optimization
a benchmark that proves the optimization
unit tests to validate that relevant semantics are unaffected (if applicable)

Some work still needs to be done to quantify tradeoffs (similar to dynamic vs static vs memory weightage in Linda and Scott Wills' approach), in case an optimization gains in some areas but loses in others. It would be ideal if this could be automated so that we produce a summary score in relation to baseline numbers, instead of just raw benchmark performance numbers as we currently do -- that way, participants need not do the math each time and can see the effect of their changes in a single computed percentage score or a set of such scores.

Resuming Deforestation for Climate Justice!

When we last left off, we were working on an integration branch to incorporate deforestation into the core language and extend it to a large set of list-oriented operations (mirroring those in racket/list).

One problem to be addressed was, if a particular deforestable form was not optimized through compilation, then we should be able to compile it to Racket at the code generation step, on its own, even though the form isn't part of the core language.

To do this, we need the form to be defined in such a way that it encapsulates an independent runtime for the operation. For instance, map and filter would each need to include naive, Racket implementations as part of their definition as Qi macros, so that if either of these does not get fused with other such operations during the deforestation compiler pass, we would still be able to compile them, at the code generation stage, by using the included map or filter Racket runtime. Additionally, if there are any component expressions that are Qi flows (i.e. if a subexpression is a Qi floe nonterminal position rather than a Racket expr nonterminal position -- to be expanded and compiled by Qi before being delegated to Racket), then the compiler needs to know which expressions, so that they can be compiled to Racket. We need some way to encode this information in the macro definition.

The initial, temporary, solution we implemented last time was to manually encode the runtime for each known form in the code generation step. That is, code generation is fulfilled by a macro that matches the name of the deforestable operation (e.g. map) and explicitly encodes a known runtime there (e.g. the naive implementation of map). This requires the compiler, and specifically the code generation stage, to know about all such operations that are defined externally to the compiler as Qi macros. It is only a temporary hack to achieve the functionality and unblock further compiler development.

Which brings us to: what is the right solution for this? For a deforestable operation, how do we encode the Racket runtime in the macro definition (the "front" end of compilation) so that the compiler knows what to do with the expression during code generation (the "back" end)?

Michael had proposed using a syntax-phase struct to achieve this. The rest of us didn't have a clue what this meant, so he coded this example to illustrate, and in doing so demonstrated what we all agreed was a very straightforward way to add two numbers together.

Gather 'Round Kids, and Let's Learn to Add

If you want to add two numbers, then first, of course, we need ...? That's right. A function accepting two arguments.

(define-optimizable-op my-+
  (lambda (e1 e2)
    #`(+ #,e1 #,e2)))

We call it define-optimizable-op because it is intended to be analogous to what we want with defining deforestable operations. Here, the lambda is the runtime for addition, that is, it fulfills simply adding the two arguments together.

What's with all the weird marks there? Ssh. Don't ask silly questions, Billy.

OK, next, what else do we need? Why do we need anything else, you ask? That's ten demerits for you, Billy!

Of course, the next thing we need is a syntax specification for a DSL that we are extending to perform addition by this definition above. This DSL would be analogous to Qi, and must contain a dedicated core form that we would be extending using our macro-based mechanism.

(syntax-spec
  (extension-class my-expr-macro)

  (nonterminal my-expr
    #:allow-extension my-expr-macro
    n:number
    (#%optimizable-app op:id e1:my-expr e2:my-expr))
  ...)

Mmhmm, make sense? Now, who can tell me what's next? Yes, Summer. Good. Whenever we have a syntax specification of a DSL, we also need an interface macro to extend the host language syntax to include the newly-defined DSL nonterminal. What do you mean there's no one named Summer in the class? Go to the corner, Billy!

(syntax-spec
 ...
 (host-interface/expression
    (my-lang e:my-expr)
    (compile-my-expr #'e)))

This interface macro embeds our DSL language into the host language -- that's Racket -- by expanding a provided DSL expression using the my-expr expander we defined above, and then compiling it using a compiler for the DSL core language that we'll need to write.

But before we write the compiler, let's review what we need on the user-facing side of things. Typically, we extend a language using a macro-defining form (for Qi, that's something like define-qi-syntax-rule; for Racket, define-syntax-rule, etc.). But here, we need something more -- we need a way to define a DSL macro, but also include a Racket runtime, that is, we must also specify code implementing the operation in Racket. As a DSL macro, it must expand to the DSL core language, and specifically the #%optimizable-app core form we introduced for this purpose. But it must do it in such a way that the Racket runtime is still available at the end of compilation.

And it turns out this is what Michael was talking about when he said "use a syntax-phase struct" -- we can achieve the above by defining a syntax phase data type, an instance of which we can use to store and convey the runtime specification through to the code generation stage.

(begin-for-syntax
  (struct op-info [codegen]))

Now, let's define a macro-defining form that's similar to the usual macro-defining forms like define-qi-syntax-rule, but which does the extra thing we need here (described in the comments).

(define-syntax define-optimizable-op
  (syntax-parser
    [(_ name codegen-f)
     #'(begin

         ;; capture the codegen in an instance of
         ;; the compile time struct
         ;; note it's define-syntax rather than define
         (define-syntax info
           (op-info codegen-f))

         ;; attach that instance as syntax in the
         ;; expansion to the core language form
         (define-dsl-syntax name my-expr-macro
           (syntax-parser
             [(_ e1 e2)
              #'(#%optimizable-app info e1 e2)])))]))

Finally, let's write the DSL compiler which will make use of this information.

(begin-for-syntax
  (define (compile-my-expr e)
    (syntax-parse e
      #:datum-literals (#%optimizable-app)
      [n:number
       #''n]
      [(#%optimizable-app op e1 e2)
       (define e1^ (compile-my-expr #'e1))
       (define e2^ (compile-my-expr #'e2))
       (define info (syntax-local-value #'op))
       (match info
         [(op-info codegen)
          (codegen e1^ e2^)])])))

Note that we already indicated, earlier, via the core language syntax specification, that e1 and e2 are expected to be DSL expressions. So we know in the compiler that these expressions should be independently compiled first before being passed to the included Racket runtime.

This approach seems to suggest that in order to encode Qi floe expressions in arbitrary positions, we will need a handful of variations of the #%deforestable core form that accept one, two, or three floe expressions. Compilation will need to be implemented for each of these cases, and extensions will need to select one of them to expand to.

One concern: can we safely invoke the entire compiler again on the component expressions at the code generation stage without running into circular module dependency issues?

Basic Addition

The full example (from Michael):

#lang racket

(require syntax-spec-v2 (for-syntax racket/match syntax/parse))

(syntax-spec
  (extension-class my-expr-macro)

  (nonterminal my-expr
    #:allow-extension my-expr-macro
    n:number
    (#%optimizable-app op:id e1:my-expr e2:my-expr))

  (host-interface/expression
    (my-lang e:my-expr)
    (compile-my-expr #'e)))

(begin-for-syntax
  (define (compile-my-expr e)
    (syntax-parse e
      #:datum-literals (#%optimizable-app)
      [n:number
       #''n]
      [(#%optimizable-app op e1 e2)
       (define e1^ (compile-my-expr #'e1))
       (define e2^ (compile-my-expr #'e2))
       (define info (syntax-local-value #'op))
       (match info
         [(op-info codegen)
          (codegen e1^ e2^)])]))

  (struct op-info [codegen])

  (define (optimizable-op-transformer info-name)
    (syntax-parser
      [(_ e1 e2)
       #`(#%optimizable-app #,info-name e1 e2)])))

(define-syntax define-optimizable-op
  (syntax-parser
    [(_ name codegen-f)
     #'(begin
         (define-dsl-syntax name my-expr-macro
           (optimizable-op-transformer #'info))
         (define-syntax info
           (op-info codegen-f)))]))

(define-optimizable-op
  my-+
  (lambda (e1 e2)
    #`(+ #,e1 #,e2)))

(my-lang (my-+ 1 2))

Note that, since the expansion to the core form in the macro is a boilerplate transformation, we've written a handy macro, optimizable-op-transformer to do that for us, to make it extra easy.

Got all that?

Did someone say something? Oh, Billy! What're you doing over there in the corner! Please take your seat with the rest of the class. It's time for a pop quiz.

Generalizing the Pattern for Extending Qi?

Since #%deforestable allows embedding deep language extensions for deforestation of list-oriented forms, a reasonable question is, can this mechanism be generalized beyond this use case?

We felt that while there might be a useful generalization, we don't currently know what it is, and so we will continue designing it for the specific case of deforestation for the moment, and see if any worthwhile generalizations occur to us as we go -- we'd need at least "n=2" applications before we could consider a generalization.

Certainly, the mechanism could in principle express the entire Qi language, so that it would be the only core form we would need. This clearly feels like an overgeneralization, and it illustrates an upper bound on generality that we would like to avoid. It's possible that somewhere between being totally specialized to deforestation and totally generalized to express any semantics, is a more useful generalization, so we will keep an eye open for any "n=2" use cases. It's also possible that it would make sense to introduce a dedicated form for each distinct such use case. Perhaps the right approach will become clear during the proposed Qi compiler competition, with the benefit of, hopefully, many fresh perspectives.

Continuing with Deforestation

Generally speaking, the different components of the remaining deforestation work -- including implementing fused streams for other interfaces like drop, and rewiring the runtime and code generation for deforestable forms -- are unblocked and decoupled, so we should be able to continue on these in parallel.

We still need to figure out how much of deforestation will be in qi-lib vs. in qi-list, but that will probably become clearer over time.

Qi 5?

It would be fun to try and release Qi 5 before the winter solstice, but we'll need to take stock of the outstanding work to see if that's realistic.

Next Steps

Dominik will be speaking at Linux Days in Prague on Sunday on the subject of cryptography. Next week, Dominik, Michael and Sid will all be out so there will be no meeting.

(Some of these are carried over from last time)

Incorporate the above scheme into Qi deforestation.
Resolve the issue with bindings being prematurely evaluated.
Implement more fusable stream components like append, drop, and member.
Provide a runtime and indicate floe positions by enriching the syntax of #%deforestable
Implement the generic scheme for promoting position-oriented Qi macro definitions to handle templates
Update the benchmarking infrastructure (vlibench and also the basic ones in qi-sdk)
Define an extension scheme for deforesting custom list APIs.
Fix the bug in using bindings in deforestable forms like range
Finalize and document guiding principles for Git commits in the developer wiki
Revisit extending deforestation to multi-valued settings.
Write a proof-of-concept compiling Qi to another backend (such as threads or futures), and document the recipe for doing this.
Ask Ben to review tests from the perspective of Frosthaven Manager.
Review Cover's methodology for checking coverage.
Document the release model in the user docs and announce the performance regression (and the remedy) to users.
Improve unit testing infrastructure for deforestation.
Discuss and work out Qi's theory of effects and merge the corresponding PR.
Decide on appropriate reference implementations to use for comparison in the new benchmarks report and add them.
Deforest other racket/list APIs via qi/list
Decide on whether there will be any deforestation in the Qi core, upon (require qi) (without (require qi/list))
Continue investigating options to preserve or synthesize the appropriate source syntax through expansion for blame purposes.

Attendees

Dominik, Michael, Sid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly