Skip to content

Qi Meeting Apr 12 2024

Siddhartha Kasivajhula edited this page Apr 30, 2024 · 13 revisions

Cat's Out of the Bag

Qi Meeting Apr 12 2024

Adjacent meetings: Previous | Up | Next [None]

Summary

We continued on implementing a proper way to recover surface syntax in the event of a runtime error, and ran into some difficulties. We reviewed our release management protocols and found that they aren't very well specified, and realized that some code is out "in production" that we didn't quite intend. We discussed a proposed compact syntax for indicating a closure.

Background

We've been looking for ways to provide good error messages in terms of user-entered syntax, rather than the optimized syntax produced by the compiler.

As we continue on racket/list deforestation, we were wondering how to structure development on it as far as Git workflows and releases.

A while ago, Noah proposed a compact syntax for closures resembling no-arguments invocation. At the time, the syntax was already recognized in Qi, albeit ambiguously. As part of Qi 4, the syntax became unused, opening up the possibility to consider it for closures.

Recovering Surface Syntax

Continuing on our efforts to recover user-entered source syntax to use in error messages generated in the compiler, we felt that we'd be able to make progress faster with a simple testcase in Syntax Spec instead of testing it directly in Qi.

We wrote the test case consisting of a Syntax Spec language specification including a form expanding to an error, and attempted to recover the source syntax from this erroring core language expression.

Our initial plan discussed over recent weeks was to add a new syntax property as part of macroexpansion to store the source syntax, but we ran into some difficulties there since it appeared that phase1-eval used in tests was not propagating the syntax property, making it difficult to test.

We then felt that we could use recover-source-syntax that was developed to solve a similar problem in Typed Racket. Given a macro (err-id) that expands to (err id), it provides the following interface:

(define surface #'(flow (thread (err-id))))

(define expanded #'(flow (thread (err id))))

(define f (recover-source-syntax surface expanded))

(f #'(err id))
-> #'(err-id)

So if we could only preserve the surface syntax through expansion in Syntax Spec, we should be able to use recover-source-syntax to recover a specific surface syntax subexpression at the time when we want to generate an error in the compiler. This would also save us the need to store and propagate a new syntax property in Syntax Spec, and seemed like a good solution.

Michael gave us a tour of some aspects of the Syntax Spec implementation and how it expands DSL syntax given a syntax specification in the form of a grammar, and binding scope rules specified using a compact syntax for expressing tree structure. In doing this it delegates many parts of expansion to Racket's existing macro expansion infrastructure including syntax-parse. These syntax parsing tools are of course very mature, but at the same time, we learned that these tools haven't always been usable as simple building blocks, requiring some elaborate adapters in the Syntax Spec implementation. It's likely that as Syntax Spec itself becomes more mature, the ideal interface with existing Racket macro infrastructure (including any necessary additional utilities) will become more apparent so that the primitive interfacing layer can be well defined.

A challenge we ran into with propagating source location is that Syntax Spec supports implementing a compiler either as a function or as a macro. The former would be easier to reason about as the compilation – and the error – would occur in the dynamic extent of the compiler expression that Syntax Spec processes, but the latter is sometimes how languages are already implemented (e.g. in a syntax-parse macro), and it has some advantages in simplifying hygiene. For these reasons, supporting both kinds of compiler implementations has been a design goal of Syntax Spec.

In the present case, it is tricky to pass the surface syntax on to the compiler implemented as a macro for the above reason, but adding a syntax parameter could potentially allow this. We saw that Typed Racket's implementation of recover-source-syntax itself employs a syntax phase (ordinary) parameter, "orig-module-stx" to support something analogous.

Additionally, it looks like it will require some refactoring to store a reference to the original syntax prior to expansion so that it can be propagated to the compiler in all cases.

An Early Release?

Although we recently released Qi 4 to great fanfare, and followed elaborate and carefully specified procedures to do the release, we never actually discussed just what these releases mean for users, and what impact a new version, like "4.0" that may be indicated in info.rkt, has.

It turns out there was some confusion and, long story short, we've already "released" some in-progress work for racket/list deforestation. Hurray! 😅

The confusion was that everyone had an idea of what releases entail, and while all were self-consistent, none was quite accurate. Dominik assumed we are using a designated branch (e.g. 4.0) for each release, and that the main branch was for daily development. Sid knew that the main branch was actually used by users, but assumed that if they wanted to use a specific version rather than the development version, they could just indicate the desired version in their info.rkt.

In fact, neither of these is quite true. We do use main for daily development but this is also the branch that's directly released to users on every commit. For efforts that are likely to involve a lot of coordination and possible backwards incompatibilities, we have sometimes used integration branches like the recent lets-write-a-qi-compiler branch that was used for the 4.0 release. Additionally, we do have release branches, called 3.0-maintenance and 2.0-maintenance (and so on). We also have a 4.0 tag, which uniquely identifies an immutable commit corresponding to the 4.0 release.

With all this support for indicating versions, surely, we haven't accidentally released somewhat backwards-incompatible work into production?

Well, that's exactly what we've done!

Raco vs Git

The heart of the issue is parallel modeling of "versions" in Git and in Raco. When we indicate this in info.rkt:

(define version "4.0")

This tells raco that the code in the package is the 4.0 version of the code. But the code accompanying this supposed "version" is constantly changing, though the version is rarely updated. That is, we have a static designator for a changing thing. How should we balance the need to improve software -- inherently entailing change -- with the need to keep it reliable -- that is, inherently, static?

Before we try to answer this, it will be useful to understand some things about Git and its approach to this problem.

Some Insights Into Git

Git has three kinds of names:

  1. Commits
  2. Tags
  3. Branches

All three of these are different ways of naming a specific version, but they differ in their mechanics.

Commits name a specific version with specific contents. A commit is immutable and unique.

Tags name a specific version with specific contents, but we can change which specific version a tag refers to. More precisely, a tag is an alias for any commit we designate. Once assigned, it, too, inherits the properties of being immutable and unique, unless it is manually reassigned. By convention, tags are never re-assigned once assigned.

Before we talk about branches, let's talk about HEAD. What Git refers to as "HEAD" is simply a tag that refers to the currently checked out version. Said another way, HEAD is a tag Git maintains to tell you where you are in the tree-structured space of versions. To convince yourself that HEAD is effectively just a tag, try:

$ git checkout HEAD

or

$ git show HEAD

The first will do nothing since HEAD is of course already checked out, by definition, and the second shows you the latest commit and is equivalent to git show <latest_commit_id>. That is, HEAD is an alias for that commit.

Whenever we make a commit -- under any circumstances -- Git will:

  1. Refer to the preceding HEAD as "parent" in creating a fresh commit.
  2. Update HEAD to now be an alias for the freshly created commit.

It's just like adding a new element to a linked list, and unlike other tags, HEAD is automatically updated by Git to serve this role.

This always happens when you make a commit. Now, think about what happens if you check out an old commit, make some changes, and then make a fresh commit. The original commit sequence is unchanged and still exists, but the new commit also exists and points to a parent that is the progenitor of the original sequence. We've essentially started a new branch. If we didn't have actual support from Git for this concept, we would need to manually maintain references to the original sequence and the new sequence by noting down commit hashes, or using and frequently mutating tags, manually. Instead, Git does this for us by formally supporting the concept of a branch, which is just another automatically-maintained tag whose mechanics are almost identical to HEAD. The difference only being that when you make a fresh commit, the branch tag is only updated when it also happens to be HEAD, that is to say, if you are on that branch.

If you check out a commit that isn't (the tip of) any branch, Git calls that being in "detached HEAD" state. This sounds quite morbid and scary, but it's just the "state of nature" we were describing above, and is how everything would work if Git did not formally provide a branch abstraction.

The Source of the Confusion

The source of the confusion with regard to how Qi releases happen as far as versions go is that Raco's "version 4.0" functions as a branch but Git's "version 4.0" is a tag. We have two conflicting versioning schemes in operation.

Racket's Superpower is that It Doesn't Need One

One of the great things about the Racket ecosystem is the simplicity, power, and broad usefulness of its tools (e.g. see this talk by Alexis King). A big reason for this is that Racket tools often do more by doing less (partially out of necessity due to the small size of the community, but necessity indeed is the mother of avoiding superfluous invention!). For instance, by transitioning to a Chez Scheme backend, Racket benefits from decades of compiler work, and an extended ecosystem, in the Chez Scheme community. Likewise, the Racket Package Index relies on existing source code hosting platforms such as GitHub, GitLab, or anything else, instead of inventing a custom hosting solution.

Yet another example of this is in package management, and how, while packages in Python (for example) have a large (and growing!) number of configuration files such as MANIFEST.in that indicate to the package manager which files should and should not be included in the package, Racket's package manager, Raco (we'll use the term "Raco" to refer to Racket Package Management, though technically this is just the name of the command line utility interfacing with package management functions), simply includes all files in the indicated package folder. The beauty of this is that by doing so, it delegates to an existing platform that's designed for file inclusion and exclusion -- a filesystem. If you don't want a certain file to be part of the package, just put it in another folder. It forces us to organize our code better in a generally useful way, and then leverages that better organization to attain the desired feature in a very simple way.

The great advantage of this approach of delegating to existing solutions is that it achieves better results while also avoiding the maintenance overhead of a custom solution.

Versioning is just such a thing, and yet, appears to be an exception to Racket's otherwise minimalist approach, since, as we saw with the release confusion, the Racket package ecosystem maintains its own notion of versions that is independent of any notion of versioning maintained by the source versioning system.

Could we find some way for Racket to delegate versioning to a system like Git? What would that look like?

Leveraging Git Versioning in the Package Catalog

Because Raco's version functions like a "branch," Racket does not have any way to refer to a specific immutable version of a dependency. Perhaps at least partly for this reason, Racket considers backwards-incompatible changes in packages to be undefined.

Yet, technically, any bug fix is backwards-incompatible, and what constitutes a bug is sometimes subjective. Truly, no actively developed software can avoid violating backwards compatibility in some form, and even Racket itself must do so from time to time, despite its austere (and admirable) stance on always upholding backwards compatibility.

When this inevitably happens, it is useful to have support from package management infrastructure to ensure that such transitions are manageable and even routine.

By supporting a means to delegate to the versioning provider, we would get this feature for free, since Git, as we saw, provides many different ways of referring to versions, including dynamic identifiers (branches) but also static identifiers (tags).

Even more impressively, this makes it impossible to break backwards compatibility. Currently, the contract of compatibility between developers and users is a version numbering scheme (such as semantic versioning). Developers make promises on the basis of what these numbers mean and there is general awareness on the part of users as to what to expect. But (as for instance Greg Hendershott talks about) these numbers are superficial and not derivative of the actual functionality labeled by these versions, and thus, the reality is that it's easy, through intent or accident, for developers to break something without the version number changing in the agreed-upon way. This is a hard problem to solve, and it imposes a great inconvenience and cost on the development process.

It is a completely unnecessary problem to solve.

Instead of a version numbering scheme, by allowing direct reliance on Git, developers and users instead agree on a versioning structure (reified and maintained by Git) and on names for interesting nodes and branches in this structure. This decouples development from use, thus freeing developers and users alike to do whatever they need to do without their actions being coupled on either end, eliminating an unnecessary tax on the package ecosystem. Even if developers don't provide any conveniences to users wrt useful branches and tags (as they ought to, which seems more the proper province of contractual conventions as opposed to versioning schemes), the underlying versions themselves are immutable. Use of the package by a user at any time is proof that there is a version they can always use. Of course, the developer could provide some standard branches to model conventional assumptions of "backwards compatibility while still getting new improvements" -- but that's a higher-level problem than pure backwards compatibility, which is impossible to break.

Today, Raco does support providing Git names in dependencies, but doing so does not receive first class treatment, since the dependency must be indicated via full URL ("git://github.com/michaelballantyne/syntax-spec.git#v0.1") instead of simply a package name and a version name (like ("syntax-spec" #:version "v0.1")), and also, using such a dependency prevents the docs for the package from being built.

Adding this kind of first class support could simplify dependency management in the Racket ecosystem and once again, in typical Racket fashion, avoid solving a hard problem in a custom and costly way, when a standard, robust and incredibly powerful way already exists that could be leveraged.

Figuring out a Release Protocol

For now, given the conflicting versions of versions (i.e. Raco vs Git), here are some options on how we could handle releases.

A: The traditional model

  • Maintain a development branch that all in-progress changes are merged to
  • main is only updated when official releases are made

This is the traditional development model.

Benefits: conservative, this ensures that in-progress changes are frequently integrated but without risking consequences for users.

Drawbacks: users get improvements infrequently unless they rely on the development branch. But the development branch in this model is often broken, so that relying on it is not practical. This model can sometimes compromise code quality, since the bar for merging into a development branch is lower, and tech debt tends to accumulate that isn't always addressed at release time.

B: Flying by the seat of our pants

  • By default, changes are merged directly into main

Most packages in the Racket ecosystem follow this model.

Benefits: users get improvements immediately, and it encourages a culture of releasing high quality improvements often without accumulating tech debt.

Drawbacks: more risky, as merges into main are immediately available to users. Integrating diverse changes frequently requires more care and effort (case in point: our premature release).

C: Snapshots

  • Same as B, except that the URL on the package index always points to a release tag in git, rather than the default branch. This approximates, for instance, how Racket source versions work: changes are constantly merged to master, but releases are snapshots of the master branch.

This is effectively identical to option A and has the same benefits and drawbacks. There is also the additional overhead of updating tags frequently on the package catalog.

D: Continuous Deployment

  • Same as B, but more carefully
  • Have comprehensive tests that run on every PR, and every commit
  • Ensure that we follow the protocol for integration branches (which function as the development branch from option A, but on an as-needed basis)
  • For users who want to deploy software relying on a specific immutable version of Qi, encourage (e.g. via docs) use of the git protocol in their info.rkt dependencies. Although this doesn't receive first-class treatment (yet), that is irrelevant in deployment settings.

A compelling advantage of this approach is that it offloads the overhead of release management to the more concrete tasks of writing good and comprehensive tests on an ongoing basis. By maintaining 100% coverage, these tests serve as proof of release-worthiness.

Benefits: The best of A and B.

Drawbacks: Same risks as B. Additionally, this requires periodic rebases of the integration branch.

Essentially, what we are talking about here is a release practice called continuous deployment. As that blog post talks about, when done right (which takes care), it can provide significant advantages. As we do have a comprehensive array of tests (and are aiming for 100% coverage), that puts us in a good position to gain the advantages touted. Given the fourth bullet point above, accidentally releasing bugs also wouldn't be the end of the world. That brings us to:

Communicating Release Practices to Users

Finally, regardless of which option we choose, regarding the already-released racket/list work, it is at least only backwards-incompatible from a performance perspective, but not semantically, so we can accept these changes and take no further action aside from announcing the impact to users. In addition, we agreed that we should document our release practices in the user docs.

Compact Syntax for Closures

Some time ago, Noah proposed a compact syntax for indicating a closure.

At the time, (f) was occupied by ambiguous partial application that would sometimes work and not at other times. As of Qi 4, that was removed so that (f) is now a syntax error, making this unused syntax that is now available to be used for a more worthy purpose.

The clos form was originally added to the language when Ben noticed a need for it while doing Advent of Code problems. Sid implemented it in response, and while we were struggling to come up with a name for it, Jens-Axel came up with "clos" and it stuck.

We have been discussing the proposed syntax and whether it would provide more value or more confusion.

In the meeting, we felt that there was no harm in adding it, but it depends on our plans for closures in general, and what other syntax, if any, we might want to support for the clos form. If such variations exist and they do not translate well to the (f) shorthand, that would be worth considering, to see if that could be confusing.

One possibility we considered was something like the following:

(~> () (clos string-append "a" "b") (_ "c" "d"))

But in this case, we could just as well have done:

(~> ("a" "b") (string-append "c" "d"))

or even

(~> () (string-append "a" "b" "c" "d"))

That is, there doesn't seem to be a case where we might want to provide more arguments to the closure statically where we couldn't achieve that through partial application, so that supporting more than one argument in clos seems unnecessary.

Another consideration is, with ordinary function application, we have several options for providing arguments ahead of time:

  1. partial application with left or right chirality, (~> (f a b)) or (~>> (f a b))
  2. fine-grained template application, (f a _ b)
  3. blanket template application, (f a __)

But for closures, we currently only support the first of these. For example, these are currently supported:

(~> ("a") (clos string-append) (_ "b")) ;=> "ab"
(~>> ("a") (clos string-append) (_ "b")) ;=> "ba"

To get template behavior, what we are talking about is pre-supplying arguments in certain positions in one runtime context, before (invocation-time) arguments in another runtime context are available. So, something like this:

(~> ("a") (clos string-append _ ^) (_ "b")) ;=> "ba"

where we could use ^ to indicate arguments being closed over, while _ (in the closure form) indicates invocation-time arguments to be received later.

The syntax for clos here is awkward. It would be better if we could do (string-append _ ^), which is clearly a template of the eventual application. The presence of the (at least one) ^ here distinguishes this syntax from (f a ...) which in general is partial application. This would allow us to use (f) syntax for simple closures and (f ... ^ ...) for templated closures without ambiguity.

But it's possible that there are cases where the similarity of this syntax to partial application syntax could get confusing in actual use.

So a third possibility is to use two parentheses:

(~> ("a") ((string-append _ ^)) (_ "b")) ;=> "ba"

And then, a simple chiral closure would look like:

(~>> ("a") ((string-append)) (_ "b")) ;=> "ba"

One advantage of this is that the presence of two parentheses marks it out immediately as a closure rather than partial application, and, like the single parenthesis version (f), it would perhaps be more intuitive than using clos for the same reason that (string-append "a" b") is more intuitive than if we had to use something like (call string-append "a" "b"). Of course, it is slightly less economical than overloading ().

Compact application syntax?

While we're on the subject, currently Qi has an apply form that is similar to Racket's apply except that it applies a flow to values rather than to a list of values.

It has come up before that this name is ambiguous since it could be interpreted either as the Qi form apply or, since it is just an identifier, which is typically interpreted as a named function, it could be interpreted as the Racket function apply (which can be used today as (esc apply)). To avoid this ambiguity, in fact, the Qi expander has a "rewrite production rule" that rewrites apply to appleye, the latter of which is the true core form in Qi!

Like closures, function application is also a very simple idea. Perhaps we can find a compact syntax for it that avoids ambiguity. For instance, (), as in, (~> (+ 1 2 3) ()).

Next Steps

(Some of these are carried over from last time)

  • Decide on a release model, document it in the user docs, and announce the performance regression (and the remedy) to users.
  • Improve unit testing infrastructure for deforestation.
  • Discuss and work out Qi's theory of effects and merge the corresponding PR.
  • Schedule a discussion for the language composition proposal and implement a proof of concept.
  • Decide on appropriate reference implementations to use for comparison in the new benchmarks report and add them.
  • Deforest other racket/list APIs via qi/list
  • Decide on whether there will be any deforestation in the Qi core, upon (require qi) (without (require qi/list))
  • Review and merge the fixes for premature termination of compiler passes.
  • Continue investigating options to preserve or synthesize the appropriate source syntax through expansion for blame purposes.

Attendees

Dominik, Michael, Sid

Clone this wiki locally