Update up to end of impl chapter

roccojiang · Jul 2, 2024 · 35bacc4 · 35bacc4
1 parent 95ffb16
commit 35bacc4
Show file tree

Hide file tree

Showing 24 changed files with 35 additions and 114 deletions.
diff --git a/main.pdf b/main.pdf
diff --git a/main.sty b/main.sty
@@ -80,7 +80,7 @@
     basicstyle=\ttfamily,
     flexiblecolumns=false,
     escapechar=\%, % char to escape out of listings and back to LaTeX
-    % literate={\\}{{$\lambda$}}1 % the lambda looks pretty bad so I've disabled this
+    % literate={\\}{{$\lambda$}}1 % the lambda looks pretty bad so I have disabled this
   }
 \fi
 
@@ -142,7 +142,7 @@
 \pgfplotsset{compat=1.18}
 \usepgfplotslibrary{fillbetween}
 
-% \setstretch{1.15} % adjust as needed, I like it when there's a bit more vertical space
+% \setstretch{1.15} % adjust as needed, I like it when there is a bit more vertical space
 
 \setcounter{secnumdepth}{3} % number subsubsections
 

diff --git a/src/background/parsers.pdf b/src/background/parsers.pdf
diff --git a/src/background/parsers.tex b/src/background/parsers.tex
@@ -38,16 +38,11 @@ \subsection{The \texttt{parsley} Library}
 \vspace{3ex}
 \centering
 \begin{align*}
-\langle \mathit{digit} \rangle &::= \text{`\texttt{0}'} \ldots \text{`\texttt{9}'} \\
-\langle \mathit{number} \rangle &::= \langle \mathit{digit} \rangle+ \\
-\langle \mathit{expr} \rangle &::= \langle \mathit{expr} \rangle \; \text{`\texttt{+}'} \; \langle \mathit{term} \rangle \enspace | \enspace
-                                \langle \mathit{expr} \rangle \; \text{`\texttt{-}'} \; \langle \mathit{term} \rangle \enspace | \enspace
-                                \langle \mathit{term} \rangle \\
-\langle \mathit{term} \rangle &::= \langle \mathit{term} \rangle \; \text{`\texttt{*}'} \; \langle \mathit{atom} \rangle \enspace | \enspace
-                                \langle \mathit{term} \rangle \; \text{`\texttt{/}'} \; \langle \mathit{atom} \rangle \enspace | \enspace
-                                \langle \mathit{atom} \rangle \\
-\langle \mathit{atom} \rangle &::= \text{`\texttt{(}'} \; \langle \mathit{expr} \rangle \; \text{`\texttt{)}'} \enspace | \enspace
-                                \langle \mathit{number} \rangle
+\nt{digit} &::= \sym{0} \ldots \sym{9} \\
+\nt{number} &::= \nt{digit}+ \\
+\nt{expr} &::= \nt{expr} \; \sym{+} \; \nt{term} \alt \nt{expr} \; \sym{-} \; \nt{term} \alt \nt{term} \\
+\nt{term} &::= \nt{term} \; \sym{*} \; \nt{atom} \alt \nt{term} \; \sym{/} \; \nt{atom} \alt \nt{atom} \\
+\nt{atom} &::= \sym{(} \; \nt{expr} \; \sym{)} \alt \nt{number} \\
 \end{align*}
 \caption{The grammar in e\textsc{bnf}.}
 \label{fig:simple-grammar-ebnf}

diff --git a/src/background/tools.pdf b/src/background/tools.pdf
diff --git a/src/background/tools.tex b/src/background/tools.tex
@@ -185,7 +185,7 @@ \subsection{Choice of Tooling}\label{sec:choice-of-tooling}
 However, this approach has several drawbacks.
 Firstly, compiler plugins are tightly coupled with the compiler itself, and therefore not portable across major compiler versions.
 % https://github.com/mattmoore/scala-compiler-plugins is a good reference, in case I need to revisit this
-For instance, plugins written for the Scala 3 compiler, known as \texttt{dotty}, are completely incompatible with Scala 2 plugins~\cite{lampepfl_changes_2022}.
+For instance, plugins written for the Scala 3 compiler, known as \emph{Dotty}, are completely incompatible with Scala 2 plugins~\cite{lampepfl_changes_2022}.
 Additionally, developing compiler plugins requires a deep understanding of arcane and poorly documented compiler internals.
 Exposing the full compiler \textsc{api} permits unsafe operations that may violate undocumented invariants assumed by the compiler, leading to exceptions during compilation or even malformed bytecode~\cite{sherwany_refactoring_2015}.
 The lack of higher-level abstractions also makes it difficult to implement even trivial tasks such as renaming a field.

diff --git a/src/body/body.pdf b/src/body/body.pdf
diff --git a/src/body/body.tex b/src/body/body.tex
@@ -2,10 +2,6 @@
 
 \begin{document}
 
-% Burmako:
-%  A metaprogramming framework (also, a metaprogramming library) is a collection of data structures and operations that allow to write metaprograms, i.e. programs that manipulate other programs as data. For example, scala.reflect which is developed and described in this dissertation is a metaprogramming framework, because it contains classes like Tree, Symbol, Type and others that represent Scala programs as data and exposes operations like prettyprinting, name resolution, typechecking and others that can manipulate this data.
-% We say that metaprogramming frameworks provide the ability to reflect programs, i.e. reify program elements as data structures that can be introspected by metaprograms. For instance, scala.reflect provides facilities to reflect Scala programs. These facilities include the ability to reify definitions written in the source code as instances of class Symbol. Metaprograms can call methods like Symbol.info to introspect these definitions.
-
 \subfile{simple-rules}%
 \subfile{leftrec}%
 \subfile{impl}%

diff --git a/src/body/impl.pdf b/src/body/impl.pdf
diff --git a/src/body/impl.tex b/src/body/impl.tex
@@ -7,22 +7,12 @@
 This \namecref{sec:impl} takes a step back from linting rules and focuses on ensuring how transformed terms can get pretty-printed in a human-readable form.
 The following ideas are explored:
 \begin{itemize}
-  \item First, \cref{sec:simplify-parsers} discusses how parser terms can be simplified via domain-specific optimisations based on parser laws.
+  \item First, \cref{sec:simplify-parsers} discusses how parsers can be simplified via domain-specific optimisations based on parser laws.
   \item Afterwards, \cref{sec:simplify-exprs} discusses how expressions can be partially evaluated to some extent. This is achieved using another intermediate \textsc{ast}, this time based on the $\lambda$-calculus, which unlocks the idea of $\beta$-reduction and normalisation as tools to reduce the complexity of these terms.
 \end{itemize}
 
-% Core embedded dsl with shallow embeddings as smart constructors https://link.springer.com/chapter/10.1007/978-3-642-40447-4_2 (folding dsls paper by gibbons and wu cites this as the originator of the "core" language idea)
-
-% DSL optimisation = for speed, linter optimisation = for readability
-
-% The idea is that we're linting and rewriting for the PARSLEY DSL, so really its better to work on a parsley ast
-% Since it's embedded it also has access to host language features, which is why we still work on top of the scala ast
-% And some of the stuff e.g. expressions can also be treated as a dsl and lifted into an intermediate ast
-
-% https://okmij.org/ftp/Denotational.html
 % Parser: syntactic approach to optimisation/simplification, focusing on term rewriting
 % Expr: semantic approach to optimisation/simplification, focusing on denotations
-% What would be cool for future work is this finally-tagless approach so Scala typechecks transformations for us, but the problem is how to reflect scala.meta.Types into actual types??
 
 \subfile{impl/parser}
 \subfile{impl/expr}

diff --git a/src/body/impl/expr.pdf b/src/body/impl/expr.pdf
diff --git a/src/body/impl/expr.tex b/src/body/impl/expr.tex
@@ -4,10 +4,6 @@
 
 \section{Representing and Normalising Expressions}\label{sec:simplify-exprs}
 
-% \Cref{sec:parser-representation} showed that it is useful to lift Scala \textsc{ast} nodes to a specialised \scala{Parser} \textsc{ast}, making it easier to manipulate and inspect parsers.
-% Crucially, this allowed us to simplify parsers via term-rewriting rules based on parser laws.
-% \Cref{sec:simplify-parsers} demonstrated why this is necessary for \texttt{parsley-garnish}: transformations such as left-recursion factoring~\cref{sec:factor-leftrec} result in complex parser terms that must be simplified to be readable.
-
 The previous \namecref{sec:simplify-parsers} demonstrated the process of simplifying the \scala{Parser} \textsc{ast}, but this is not the only syntactic structure that requires simplification.
 So far, parsers such as \scala{pure} and \scala{map} still treat expressions as black boxes in the form of raw \scala{scala.meta.Term} \textsc{ast} nodes.
 This is evident from where the example in \cref{sec:simplify-example} left off, where the parser itself is in a simplified form, but the function passed to \scala{map} is not:
@@ -18,7 +14,7 @@ \section{Representing and Normalising Expressions}\label{sec:simplify-exprs}
 Therefore, this \namecref{sec:simplify-exprs} explores the following:
 \begin{itemize}
   \item How expressions can be represented as another intermediate \textsc{ast}, so that they are statically inspectable enough to be simplified.
-  \item The idea of \emph{normalisation} to reduce expressions into a semantically equivalent but syntactically simpler form.
+  \item The notion of \emph{normalisation}, in order to reduce expressions into a semantically equivalent but syntactically simpler form.
 \end{itemize}
 
 \subsection{The $n$-ary Lambda Calculus}
@@ -29,7 +25,7 @@ \subsection{The $n$-ary Lambda Calculus}
 In the standard $\lambda$-calculus, each function only takes one argument, and multi-argument functions are represented as a chain of single-argument functions: this is known as \emph{currying}.
 Scala supports curried functions using multiple parameter lists, but uncurried functions are preferred for performance reasons.
 Since these functions will be transformed from Scala code and back, it is desirable to maintain a high-level equivalence between these two representations.
-Thus, the expression \textsc{ast} will be based on \cref{fig:lambda-calculus}, which extends the $\lambda$-calculus to support proper multi-argument functions using $n$-ary abstraction and application.
+As a result, the expression \textsc{ast} will be based on \cref{fig:lambda-calculus}, which extends the $\lambda$-calculus to support proper multi-argument functions using $n$-ary abstraction and application.
 
 \begin{figure}
 % annoying \enspace hack to get operators aligned along the centre
@@ -65,10 +61,10 @@ \subsubsection{$\beta$-Reduction and $\alpha$-Conversion}
 (\lamabs{x} \lamabs{y} x\ y)\ y &\betastep& (\lamabs{y} x\ y)\ [y / x] \\
 &\lameq& \lamabs{y} y\ y
 \end{lambdacalc}
-The $y$ that was substituted was originally a free variable, distinct from the $y$ bound in the lambda $\lambda y. x y$.
 % some weird spacing hacks because $y$ is weirdly slanted
-However, after substitution, it became captured under the lambda, where the two $y\ $ terms are now indistinguishable in the incorrect expression $\lambda y. y y$.
-The correct $\beta$-reduction with capture-avoiding substitution would instead proceed as follows:
+The $y\ $ that was substituted was originally a free variable, distinct from the $y\ $ bound in the lambda $\lambda y. x y$.
+However, after substitution, it became captured under the lambda, where the two $y\ $ terms have become indistinguishable in the incorrect expression $\lambda y. y y$.
+Instead, the correct $\beta$-reduction using capture-avoiding substitution would proceed as follows:
 \begin{lambdacalc}
 (\lamabs{x} \lamabs{y} x\ y)\ y &\betastep& (\lamabs{y} x\ y)\ [y / x] \\
 &\alphaequiv& (\lamabs{z} x\ z)\ [y/x] \\
@@ -153,8 +149,6 @@ \subsection{Representing Names}
 Therefore, this representation performs substitution through Scala's function application, which makes it extremely fast compared to the other approaches.
 However, since lambda abstractions are represented as closures within Scala itself, the function body becomes wrapped under Scala's variable bindings, making them difficult to inspect and work with.
 
-% \scala{Expr} has one binding construct -- the n-ary abstraction. This introduces variables and the need for capture-avoiding substitution.
-
 \subsection{Normalisation Strategies}\label{sec:normalisation-approach}
 One remaining hurdle stands before deciding on an \textsc{adt} representation: how normalisation will be implemented.
 The ideas of partial evaluation and normalisation are related concepts -- it is useful to view normalisation as statically evaluating as many terms as possible, but since not all terms have known values, the expression cannot be fully evaluated to a result value.
@@ -205,18 +199,6 @@ \subsection{The Expression \textsc{adt}}
   \item Reifying \scala{Sem} terms back into syntactic \scala{Expr} terms automatically $\alpha$-converts names, granting $\alpha$-equivalence for free.
 \end{itemize}
 %
-% \subsubsection{Defunctionalisation}
-% In the same way that representing parsers as datatypes allows them to be statically inspectable, the aforementioned higher-order functions can also be rendered as datatypes to allow them to be analysed in a similar manner.
-% This is an example of \emph{defunctionalisation}, a technique to eliminate higher-order functions by transforming them into specialised first-order equivalents~\cite{reynolds_defunc_1972,danvy_defunctionalization_2001}.
-%
-% Ideally for type safety, this could be represented as a generalised algebraic data type (\textsc{gadt})~\cite{cheney_gadt_2003} with a type parameter documenting the type that the function term represents:
-% \begin{minted}{scala}
-% trait Defunc[+T]
-% case class Identity[A]() extends Defunc[A => A]
-% case class Flip[A, B, C]() extends Defunc[(A => B => C) => B => A => C]
-% case class Compose[A, B, C]() extends Defunc[(B => C) => (A => B) => A => C]
-% \end{minted}
-%
 \Cref{fig:expr-adt} shows the implementation of the untyped \scala{Expr} \textsc{adt} representing the abstract syntax of $n$-ary $\lambda$-terms, extended with the following:
 \begin{itemize}
   \item Optional explicit type annotations for variables -- these are not used for type-checking, but are there to preserve Scala type annotations originally written by the user.
@@ -302,21 +284,18 @@ \subsection{The Expression \textsc{adt}}
 \label{fig:higher-order-funcs}
 \end{figure}
 
-% Ideally for type safety, this could be represented as a generalised algebraic data type (\textsc{gadt})~\cite{cheney_gadt_2003} with a type parameter documenting the type that the function term represents:
-
 \paragraph{Improved type safety}
 The originally intended design was to represent \scala{Expr} as a type-parameterised \textsc{gadt} for improved type safety, where it would be based on a \emph{typed} variant of the $\lambda$-calculus.
 This would have also allowed \scala{Parser} to be represented as a \textsc{gadt} parameterised by the result type of the parser.
 However, attempting to implement this ran into two main hurdles:
 \begin{itemize}
   \item \scala{Var} and \scala{Translucent} terms would need to be created with concrete type parameters of their inferred types. Scalafix's semantic \textsc{api} is not powerful enough to guarantee that all terms can be queried for their inferred types -- in fact, the built-in Scalafix rule \emph{Explicit Result Types} calls the Scala 2 presentation compiler to extract information like this\footnote{\url{https://github.com/scalacenter/scalafix/issues/1583}}. This solution is complex and brittle due to its reliance on unstable compiler internals, which undermines Scalafix's goal of being a cross-compatible, higher-level abstraction over compiler details.
-  \item Scala 2's type inference for \textsc{gadt}s is less than ideal, requiring extra type annotations and unsafe casts which ultimately defeat the original purpose of type safety. This situation is improved, although not completely solved, in Dotty~\cite{parreaux_towards_2019} -- but Scalafix does not yet support Scala 3.
+  \item Scala 2's type inference for \textsc{gadt}s is less than ideal, requiring extra type annotations and unsafe casts which ultimately defeat the original purpose of type safety. This situation is improved, although not completely solved, in \emph{Dotty}~\cite{parreaux_towards_2019} -- but Scalafix does not yet support Scala 3.
 \end{itemize}
 
 \subsection{Lifting to the Intermediate Expression \textsc{ast}}\label{sec:lifting-expr}
 The \scala{Parser} \textsc{ast} is amended to take \scala{Expr} arguments where they used to take \scala{scala.meta.Term} values.
 Take the \scala{Pure} parser as an example:
-% TODO: if time, highlight changes with tcolorbox?
 \begin{minted}{scala}
 case class Pure(x: Expr) extends Parser
 object Pure {
@@ -486,7 +465,7 @@ \subsection*{Discussion}
 \paragraph{Normalisation strategy}
 \texttt{parsley} normalises terms to full $\eta\beta$-\textsc{nf}, whereas \texttt{parsley-garnish} only normalises to $\beta$-\textsc{nf}.
 This is because $\eta$-reduction in Scala 2 is not as straightforward as in Haskell, and is not always possible -- in most cases the appropriate reduction is instead to convert lambdas to placeholder syntax.
-This is left as future work.
+Implementing this is left as future work.
 
 In \texttt{parsley}, normalisation is implemented as a reduction-based approach over the \textsc{hoas} \haskell{Lambda} datatype.
 Normalisation by $\beta$-reduction with Haskell function application brings this to $\beta$-\textsc{whnf}.

diff --git a/src/body/impl/parser.pdf b/src/body/impl/parser.pdf
diff --git a/src/body/impl/parser.tex b/src/body/impl/parser.tex
@@ -52,7 +52,7 @@ \subsection{Parser Laws}
   \text{\scala{empty <*> u = empty}} \label{eqn:alt-empty-absorb} \\
   \text{\scala{empty.map(f) = empty}} \label{eqn:alt-fmap-absorb}
 \end{gather}
-% I've wanted more fine-grained control, so instead of using cleveref I've manually written out the references -- TAKE CARE to keep them in the same order as the equations
+% I wanted more fine-grained control, so instead of using \cleveref I have manually written out the references -- TAKE CARE to keep them in the same order as the equations
 % \caption{Functor~\cref{eqn:functor-comp}, Applicative~\cref{eqn:app-homomorphism,eqn:app-fmap}, and Alternative~\cref{eqn:alt-left-neutral,eqn:alt-right-neutral,eqn:alt-left-biased-choice,eqn:alt-empty-absorb,eqn:alt-fmap-absorb} laws.}
 \caption{Functor~(\ref{eqn:functor-comp}), Applicative~(\ref{eqn:app-homomorphism}, \ref{eqn:app-fmap}), and Alternative~(\ref{eqn:alt-left-neutral}--\ref{eqn:alt-fmap-absorb}) laws.}
 \label{fig:parser-laws}
@@ -155,7 +155,6 @@ \subsection{Implementing Rewrites on the Parser \textsc{ast}}\label{sec:parser-r
 \paragraph{Extensibility and Safety}
 Further design considerations are made to ensure the extensibility of this approach: the \scala{Parser} trait is sealed, which enables compiler warnings if a new \scala{Parser} case is added and the \scala{transform} method is not updated.
 Although this formulation of the traversal is inspired by generic traversals, it still manually defines the traversal for each case: a safer approach would be to generically derive this.
-% parsley Haskell achieves this with cata
 In Scala, this would require the use of an external dependency such as \texttt{shapeless}\footnote{\url{https://github.com/milessabin/shapeless}},
 which is overkill given the relative simplicity of the \scala{Parser} \textsc{adt}.
 

diff --git a/src/body/leftrec.pdf b/src/body/leftrec.pdf