asjdfasdjf

roccojiang · Jun 11, 2024 · af9e1c7 · af9e1c7
1 parent eaf52c8
commit af9e1c7
Show file tree

Hide file tree

Showing 11 changed files with 68 additions and 30 deletions.
diff --git a/main.pdf b/main.pdf
diff --git a/src/background/tools.pdf b/src/background/tools.pdf
diff --git a/src/background/tools.tex b/src/background/tools.tex
@@ -340,7 +340,7 @@ \subsubsection{Safer Patches Using Quasiquotes}
 However, this safety is not foolproof: quasiquotes are not guaranteed to be well-typed or well-scoped, so the rewritten program still might not be able to compile.
 They are also not \emph{hygienic}: generated code will not be able to avoid name clashes with regular code~\cite{burmako_scalameta_2017}.
 The lack of hygiene can cause issues with variable capture, allowing a variable to be unintentionally shadowed by a generated variable.
-Again, it is the rule author's responsibility to ensure that variable capture does not occur: \cref{sec:function-representation} in part discusses how this is handled in \texttt{parsley-garnish}.
+Again, it is the rule author's responsibility to ensure that variable capture does not occur: \cref{sec:simplify-exprs} in part discusses how this is handled in \texttt{parsley-garnish}.
 
 \subsubsection{Semantic Information}
 Semantic rules are implemented in the same manner as \cref{fig:syntactic-rule-ex}, but instead take the more powerful \scala{SemanticDocument} as an implicit parameter.

diff --git a/src/body/complex-rules.pdf b/src/body/complex-rules.pdf
diff --git a/src/body/complex-rules.tex b/src/body/complex-rules.tex
@@ -2,6 +2,13 @@
 
 \begin{document}
 
+\ourchapter{Lint Rules Using the New Parser \textsc{ast}}
+
+\section{Removing Left-Recursion: Revisited}
+\TODO{
+  YEET
+}
+
 \section{Simplify Parser}
 \TODO{
 * Catch cases when user manually writes out a parser that is already defined in the library
@@ -17,4 +24,4 @@ \section{Convert to Parser Bridge}
 * indicate limitations that this will only work if the ADT is defined in the same file, in order to extend it
 }
 
-\end{document}
+\end{document}
diff --git a/src/body/impl.pdf b/src/body/impl.pdf
diff --git a/src/body/impl.tex b/src/body/impl.tex
@@ -8,14 +8,14 @@
 The following ideas are explored:
 \begin{itemize}
   \item First, \cref{sec:simplify-parsers} discusses how parser terms can be simplified via domain-specific optimisations based on parser laws.
-  \item Afterwards, \cref{sec:function-representation} discusses how expressions can be partially evaluated, to some extent. This is achieved using another intermediate \textsc{ast}, this time based on the $\lambda$-calculus, which unlocks the idea of $\beta$-reduction and normalisation as tools to reduce the complexity of these terms.
+  \item Afterwards, \cref{sec:simplify-exprs} discusses how expressions can be partially evaluated to some extent. This is achieved using another intermediate \textsc{ast}, this time based on the $\lambda$-calculus, which unlocks the idea of $\beta$-reduction and normalisation as tools to reduce the complexity of these terms.
 \end{itemize}
 
 % TODO
 % Writing domain-specific lint rules unlocks the potential for more powerful and interesting transformations utilising specialised domain knowledge.
 % Desirable:
 % * inspectability for analysis (that's what we're here for!) and optimisation
-% The purpose of this chapter is to describe the intermediate representations of parsers (\cref{sec:parser-representation}) and functions (\cref{sec:function-representation}).
+% The purpose of this chapter is to describe the intermediate representations of parsers (\cref{sec:parser-representation}) and functions (\cref{sec:simplify-exprs}).
 % Show that terms must be simplified to a normal form
 % Demonstrate equivalence to dsl optimisations in staged metaprogramming
 % Scalafix runs at the meta-level, outside of the phase distinction of compile- and run-time.
@@ -24,4 +24,10 @@
 \subfile{impl/parser}
 \subfile{impl/expr}
 
+\section*{Summary}
+This \namecref{sec:impl} introduced the idea of simplifying parsers and normalising expressions, by representing both as intermediate \textsc{ast}s to improve their static inspectability.
+It also demonstrated how these processes are related to the optimisation techniques used in both \texttt{parsley} Scala and \texttt{parsley} Haskell.
+
+With promising results applying these simplifications on the \scala{example} parser from last \namecref{sec:factor-leftrec}, the improved \scala{Parser} \textsc{ast} now unlocks the potential for more powerful and interesting transformations utilising specialised domain knowledge of parser combinators.
+
 \end{document}
diff --git a/src/body/impl/expr.pdf b/src/body/impl/expr.pdf
diff --git a/src/body/impl/expr.tex b/src/body/impl/expr.tex
@@ -2,26 +2,24 @@
 
 \begin{document}
 
-\section{Representing and Normalising Expressions}\label{sec:function-representation}
-% TODO: This section is about simplifying in the general domain, so really Squid can do all of this? Still an interesting approach ig --  And shows a shortcoming of scalameta quasiquotes
+\section{Representing and Normalising Expressions}\label{sec:simplify-exprs}
 
 % \Cref{sec:parser-representation} showed that it is useful to lift Scala \textsc{ast} nodes to a specialised \scala{Parser} \textsc{ast}, making it easier to manipulate and inspect parsers.
 % Crucially, this allowed us to simplify parsers via term-rewriting rules based on parser laws.
 % \Cref{sec:simplify-parsers} demonstrated why this is necessary for \texttt{parsley-garnish}: transformations such as left-recursion factoring~\cref{sec:factor-leftrec} result in complex parser terms that must be simplified to be readable.
 
-At this point, parsers such as \scala{pure} and \scala{map} still treat expressions as black boxes in the form of raw \scala{scala.meta.Term} \textsc{ast} nodes.
-No steps have been taken to improve the static inspectability of these values.
-This is evident from where we left off in the example from \cref{sec:simplify-example}:
+The previous \namecref{sec:simplify-parsers} demonstrated the process of simplifying the \scala{Parser} \textsc{ast}, but this is not the only syntactic structure that requires simplification.
+So far, parsers such as \scala{pure} and \scala{map} still treat expressions as black boxes in the form of raw \scala{scala.meta.Term} \textsc{ast} nodes.
+This is evident from where the example in \cref{sec:simplify-example} left off, where the parser itself is in a simplified form, but the function passed to \scala{map} isn't:
 \begin{minted}{scala}
-val f = flip(compose(a => b => a + b)(identity))
-// f is equivalent to (a => b => b + a)
+val f = flip(compose((_ + _).curried)(identity))
 \end{minted}
 %
-This mess is an artefact of the left-recursion factoring transformation -- recombination of unfolded parsers requires using higher-order functions such as \scala{flip} and \scala{compose}.
-Yet again, any user would find it unacceptable if \texttt{parsley-garnish} gave this as the output of a transformation.
-Therefore, these functions must be \emph{normalised} into a semantically equivalent but syntactically simpler form.
-
-This \namecref{sec:function-representation} explores how function term normalisation can be achieved. % TODO: drawing parallels to...?
+Therefore, this \namecref{sec:simplify-exprs} explores the following:
+\begin{itemize}
+  \item How expressions can be represented as another intermediate \textsc{ast}, so that they are statically inspectable enough to be simplified.
+  \item The notion of \emph{normalisation}, reducing expressions into a semantically equivalent but syntactically simpler form.
+\end{itemize}
 
 \subsection{The $n$-ary Lambda Calculus}
 Once again, the complexity of manipulating the generic Scalameta \textsc{ast} can be avoided by building a new intermediate \textsc{ast} representation for expression terms.
@@ -78,6 +76,20 @@ \subsubsection{$\beta$-Reduction and $\alpha$-Conversion}
 &= \lambda z. y z
 \end{align*}
 
+\subsubsection{Simplifying the Example Expression}
+The example from the beginning of the \namecref{sec:simplify-exprs} can thus be evaluated by hand via $\beta$-reduction, representing the higher-order functions as $\lambda$-abstractions:
+\begin{align*}
+\mathtt{flip(compose((\_ + \_).curried)(identity))}\quad &\mathrel{=\quad\,} \mathrm{flip}\ (\mathrm{compose}\ (\lambda a. \lambda b. a + b)\ \mathrm{identity}) \\
+&\mathrel{=\quad\,} \mathrm{flip}\ ( (\lambda f. \lambda g. \lambda x. f\ (g\ x)) (\lambda a. \lambda b.\ a + b) (\lambda x. x)) \\
+&\rightarrow_{\beta*} \mathrm{flip}\ (\lambda g. \lambda x.\ (\lambda b.\ g\ x + b) (\lambda x. x)) \\
+&\rightarrow_{\beta*} \mathrm{flip}\ (\lambda x. \lambda b.\ x + b) \\
+&\mathrel{=\quad\,} (\lambda f. \lambda x. \lambda y.\ f\ y\ x) (\lambda x. \lambda b.\ x + b) \\
+&\rightarrow_{\beta*} \lambda x. \lambda y.\ y + x
+\end{align*}
+%
+This normalised expression has the same meaning as the original, but is now suitable to be placed in the code rewrite!
+The rest of the \namecref{sec:simplify-exprs} now explores how this process can be implemented in \texttt{parsley-garnish}.
+
 \subsection{Representing Names}
 There exists a plethora of approaches to implementing the $\lambda$-calculus, mostly differing in how they represent variable names.
 This affects how variable capture is handled, and also how $\alpha$-equivalence of two terms can be determined.
@@ -297,9 +309,11 @@ \subsection{The Expression \textsc{adt}}
 \end{itemize}
 
 \subsubsection{Evaluating Performance of Normalisation Strategies}
+\TODO{
+TODO: This is worthy discussion anyway, but is is worth benchmarking the performance and comparing the old and new implementations? Does this count towards evaluation?
+}
 \texttt{parsley-garnish} originally used a named approach with Barendregt's convention, generating fresh variable names using an atomic counter.
 However, this required an extra $\alpha$-conversion pass to clean up variable names before pretty-printing the term, since the fresh variable names were very ugly.
-\TODO{TODO: graphs of benchmarks and comparison (nbe is orders of magnitude faster lol)}
 
 \subsection{Lifting to the Intermediate Expression \textsc{ast}}\label{sec:lifting-expr}
 The \scala{Parser} \textsc{ast} is amended to take \scala{Expr} arguments where they used to take \scala{scala.meta.Term} values.
@@ -316,7 +330,6 @@ \subsection{Lifting to the Intermediate Expression \textsc{ast}}\label{sec:lifti
 %
 The \scala{toExpr} extension method on \scala{scala.meta.Term} is used to lift \scala{Term} \textsc{ast} nodes to \scala{Expr} terms.
 Expression lifting is invoked whenever a parser expects an expression (whether a function or simple value) as an argument.
-
 This \namecref{sec:lifting-expr} gives a high-level overview of the three cases that \scala{toExpr} handles.
 
 \subsubsection{Lambda Expressions}
@@ -387,6 +400,7 @@ \subsection{Normalising Expression Terms}
 }
 \end{minted}
 %
+\rj{Much code in this following section -- is it necessary or should it go in an appendix}
 \paragraph{Evaluation}
 Evaluation proceeds by carrying an environment mapping bound variables to their semantic representations.
 Evaluating a variable looks up its name in the environment, while evaluating a lambda abstraction produces a closure using the current environment -- using \textsc{hoas} allows these closures to be represented as native Scala closures.
@@ -450,32 +464,41 @@ \subsection{Lowering Back to the Scalameta \textsc{ast}}
 
 \subsection{Discussion}
 \texttt{parsley} Haskell, as a staged parser combinator library, also has the ability to inspect and optimise the code of user-defined functions.
-The approach taken by \texttt{parsley-garnish} and \text{parsley} share many similarities, both using the $\lambda$-calculus as a core language for expressions.
-In both cases, the need to reduce expression terms is motivated by how parser simplifications involve fusion, resulting in a function application.
+The approach taken by \texttt{parsley-garnish} and \text{parsley} share many similarities, both using the $\lambda$-calculus as a core language to normalise expressions.
+In both cases, the need to reduce expression terms is motivated by how parser simplifications involve fusion, which results in function applications that can be partially evaluated.
 
-However, the two have different motivations and requirements for normalising expressions, so their approaches differ in some ways.
-\Cref{fig:nbe-vs-parsley} illustrates these differences.
+However, the two have different motivations and requirements for normalising expressions, so their approaches differ in some ways --
+\cref{fig:nbe-vs-parsley} illustrates these differences.
 
-\paragraph{Syntactic representation}
+\subsubsection{Syntactic representation}
 Unlike \texttt{parsley-garnish}, \texttt{parsley} has a two-level syntactic representation for expressions.
 \haskell{Defunc} is akin to a deep embedding of higher-order functions, representing them as a \textsc{gadt}: this process is known as \emph{defunctionalisation}~\cite{reynolds_defunc_1972,danvy_defunctionalization_2001}.
-This helps facilitate certain parser law optimisations which require pattern matching on the \scala{identity} function, for example.
-After this step, \haskell{Defunc} values are then brought into the lower-level $\lambda$-calculus representation, to be normalised by $\beta$-reduction.
+This helps facilitate certain parser law optimisations which require pattern matching on functions as well as parsers, for example:
+\begin{equation*}
+\text{\scala{pure(identity) <*> u = u}}
+\end{equation*}
+After this step, \haskell{Defunc} values are then brought into the lower-level $\lambda$-calculus representation \haskell{Lambda}, to be normalised by $\beta$-reduction.
 
-\paragraph{Normalisation strategy}
+At the moment, \texttt{parsley-garnish} does not have a need to implement any parser simplifications based on these laws, although this may change in the future.
+Adding an extra defunctionalised layer to the expression \textsc{ast} would be fairly straightforward.
+
+\subsubsection{Normalisation strategy}
 \texttt{parsley} normalises terms to full $\eta\beta$-\textsc{nf}, whereas \texttt{parsley-garnish} only normalises to $\beta$-\textsc{nf}.
 This is because $\eta$-reduction in Scala 2 is not as straightforward as in Haskell, and is not always possible -- in most cases the appropriate reduction is instead to convert lambdas to placeholder syntax.
 This is left as future work.
 
-\texttt{parsley} performs reduction-based normalisation on a \textsc{hoas} representation of lambda expressions.
+In \texttt{parsley}, normalisation is implemented as a reduction-based approach over the \textsc{hoas} \haskell{Lambda} datatype.
 Normalisation by $\beta$-reduction with Haskell function application brings this to $\beta$-\textsc{whnf}.
-Generating this into code is where this is brought to $\beta$-\textsc{nf} as desired, as well as an extra step for $\eta$-reduction to put the term into full $\eta\beta$-\textsc{nf}.
+Then, code generation brings this further to $\beta$-\textsc{nf} as desired, as well as an extra step for $\eta$-reduction to put the term into full $\eta\beta$-\textsc{nf}.
 
 The main reason why \texttt{parsley-garnish} takes a different normalisation approach is because unlike \texttt{parsley}, there is still a need for $\alpha$-equivalence checking after normalisation.
-Whereas in \texttt{parsley} the normalised forms are directly used for code generation, in \texttt{parsley-garnish} these terms continue to be analysed before being pretty-printed as code patches.
+In \texttt{parsley}, the normalised forms are immediately utilised for code generation, so they can be kept as \textsc{hoas} the entire time, without representing variables with any names.
+Conversely, in \texttt{parsley-garnish}, these normalised terms undergo further analysis before being transformed into code patches for pretty-printing.
 
 % Representation as a lambda calc has allocation overhead, but greatly simplifies function evaluation via beta reduction, instead of having to deal with high-level representations of compose/id (not too bad tbh) and flip (annoying).
 
+% TODO: scala 3 macros, squid quasiquotes?
+
 \begin{figure}[htbp]
   \begin{equation*}
   % Created here (but modified slightly) https://tikzcd.yichuanshen.de/#N4Igdg9gJgpgziAXAbVABwnAlgFyxMJZARgBoAGAXVJADcBDAGwFcYkQAdDnGADx2AAJAPIBBAMoBfEJNLpMufIRQBmCtTpNW7Lj35CxUgHq6+AsADNps+djwEiZAEwaGLNok7czwAGLMwAGN7MGs5EAw7JSI1Fxo3bU9TfQAZegBbACMoejDbRQcUcnV4rQ8vPQFxGHT6MDxAuDyIhRCiYrjNdx1vfXEAT3r6XmsNGCgAc3giUAsAJwh0pGKQHAgkMi7EirMcARg3aRpGekyYRgAFVujPRhgLHBlw+cWNmjWkNS3y5L2BOZgWAs-SOIBOZ0u10KYPujxsIBeS0QTne60QKwSP16f2AAIsd2CoPB5yuUWhdweT1mCyRKNWaK+mJ6lT2wEgc1qjCwcBgkgABABePm-VkAoEg4UcQJYOaBSUs-aHSXS2Xy3b-e4EnD8kDHU4kqHKEBzLATAAWcOeNKQdI+iE2XLA5RycDN41133YyGQfIAtHzKJQZJRJEA
@@ -497,4 +520,6 @@ \subsection{Discussion}
   \label{fig:nbe-vs-parsley}
 \end{figure}
 
+% TODO: This section is about simplifying in the general domain, so really Squid can do all of this? Still an interesting approach ig --  And shows a shortcoming of scalameta quasiquotes
+
 \end{document}
diff --git a/src/body/impl/parser.pdf b/src/body/impl/parser.pdf
diff --git a/src/body/impl/parser.tex b/src/body/impl/parser.tex
@@ -93,7 +93,7 @@ \subsubsection{Simplifying the Example Parser}\label{sec:simplify-example}
 \end{minted}
 %
 The parser is now expressed in a much simplified form, in a similar style to how it would be written by hand.
-The remaining challenge is to simplify the contents of the expression \scala{f}, which is tackled in \cref{sec:function-representation}.
+The remaining challenge is to simplify the contents of the expression \scala{f}, which is tackled in \cref{sec:simplify-exprs}.
 
 \subsection{Implementing Rewrites on the Parser \textsc{ast}}
 Lawful simplifications are applied by a bottom-up transformation over the recursively defined \scala{Parser} \textsc{ast}.