Skip to content

Commit

Permalink
asjdfasdjf
Browse files Browse the repository at this point in the history
  • Loading branch information
roccojiang committed Jun 11, 2024
1 parent eaf52c8 commit af9e1c7
Show file tree
Hide file tree
Showing 11 changed files with 68 additions and 30 deletions.
Binary file modified main.pdf
Binary file not shown.
Binary file modified src/background/tools.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion src/background/tools.tex
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@ \subsubsection{Safer Patches Using Quasiquotes}
However, this safety is not foolproof: quasiquotes are not guaranteed to be well-typed or well-scoped, so the rewritten program still might not be able to compile.
They are also not \emph{hygienic}: generated code will not be able to avoid name clashes with regular code~\cite{burmako_scalameta_2017}.
The lack of hygiene can cause issues with variable capture, allowing a variable to be unintentionally shadowed by a generated variable.
Again, it is the rule author's responsibility to ensure that variable capture does not occur: \cref{sec:function-representation} in part discusses how this is handled in \texttt{parsley-garnish}.
Again, it is the rule author's responsibility to ensure that variable capture does not occur: \cref{sec:simplify-exprs} in part discusses how this is handled in \texttt{parsley-garnish}.
\subsubsection{Semantic Information}
Semantic rules are implemented in the same manner as \cref{fig:syntactic-rule-ex}, but instead take the more powerful \scala{SemanticDocument} as an implicit parameter.
Expand Down
Binary file modified src/body/complex-rules.pdf
Binary file not shown.
9 changes: 8 additions & 1 deletion src/body/complex-rules.tex
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

\begin{document}

\ourchapter{Lint Rules Using the New Parser \textsc{ast}}

\section{Removing Left-Recursion: Revisited}
\TODO{
YEET
}

\section{Simplify Parser}
\TODO{
* Catch cases when user manually writes out a parser that is already defined in the library
Expand All @@ -17,4 +24,4 @@ \section{Convert to Parser Bridge}
* indicate limitations that this will only work if the ADT is defined in the same file, in order to extend it
}

\end{document}
\end{document}
Binary file modified src/body/impl.pdf
Binary file not shown.
10 changes: 8 additions & 2 deletions src/body/impl.tex
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@
The following ideas are explored:
\begin{itemize}
\item First, \cref{sec:simplify-parsers} discusses how parser terms can be simplified via domain-specific optimisations based on parser laws.
\item Afterwards, \cref{sec:function-representation} discusses how expressions can be partially evaluated, to some extent. This is achieved using another intermediate \textsc{ast}, this time based on the $\lambda$-calculus, which unlocks the idea of $\beta$-reduction and normalisation as tools to reduce the complexity of these terms.
\item Afterwards, \cref{sec:simplify-exprs} discusses how expressions can be partially evaluated to some extent. This is achieved using another intermediate \textsc{ast}, this time based on the $\lambda$-calculus, which unlocks the idea of $\beta$-reduction and normalisation as tools to reduce the complexity of these terms.
\end{itemize}

% TODO
% Writing domain-specific lint rules unlocks the potential for more powerful and interesting transformations utilising specialised domain knowledge.
% Desirable:
% * inspectability for analysis (that's what we're here for!) and optimisation
% The purpose of this chapter is to describe the intermediate representations of parsers (\cref{sec:parser-representation}) and functions (\cref{sec:function-representation}).
% The purpose of this chapter is to describe the intermediate representations of parsers (\cref{sec:parser-representation}) and functions (\cref{sec:simplify-exprs}).
% Show that terms must be simplified to a normal form
% Demonstrate equivalence to dsl optimisations in staged metaprogramming
% Scalafix runs at the meta-level, outside of the phase distinction of compile- and run-time.
Expand All @@ -24,4 +24,10 @@
\subfile{impl/parser}
\subfile{impl/expr}

\section*{Summary}
This \namecref{sec:impl} introduced the idea of simplifying parsers and normalising expressions, by representing both as intermediate \textsc{ast}s to improve their static inspectability.
It also demonstrated how these processes are related to the optimisation techniques used in both \texttt{parsley} Scala and \texttt{parsley} Haskell.

With promising results applying these simplifications on the \scala{example} parser from last \namecref{sec:factor-leftrec}, the improved \scala{Parser} \textsc{ast} now unlocks the potential for more powerful and interesting transformations utilising specialised domain knowledge of parser combinators.

\end{document}
Binary file modified src/body/impl/expr.pdf
Binary file not shown.
75 changes: 50 additions & 25 deletions src/body/impl/expr.tex
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,24 @@

\begin{document}

\section{Representing and Normalising Expressions}\label{sec:function-representation}
% TODO: This section is about simplifying in the general domain, so really Squid can do all of this? Still an interesting approach ig -- And shows a shortcoming of scalameta quasiquotes
\section{Representing and Normalising Expressions}\label{sec:simplify-exprs}

% \Cref{sec:parser-representation} showed that it is useful to lift Scala \textsc{ast} nodes to a specialised \scala{Parser} \textsc{ast}, making it easier to manipulate and inspect parsers.
% Crucially, this allowed us to simplify parsers via term-rewriting rules based on parser laws.
% \Cref{sec:simplify-parsers} demonstrated why this is necessary for \texttt{parsley-garnish}: transformations such as left-recursion factoring~\cref{sec:factor-leftrec} result in complex parser terms that must be simplified to be readable.

At this point, parsers such as \scala{pure} and \scala{map} still treat expressions as black boxes in the form of raw \scala{scala.meta.Term} \textsc{ast} nodes.
No steps have been taken to improve the static inspectability of these values.
This is evident from where we left off in the example from \cref{sec:simplify-example}:
The previous \namecref{sec:simplify-parsers} demonstrated the process of simplifying the \scala{Parser} \textsc{ast}, but this is not the only syntactic structure that requires simplification.
So far, parsers such as \scala{pure} and \scala{map} still treat expressions as black boxes in the form of raw \scala{scala.meta.Term} \textsc{ast} nodes.
This is evident from where the example in \cref{sec:simplify-example} left off, where the parser itself is in a simplified form, but the function passed to \scala{map} isn't:
\begin{minted}{scala}
val f = flip(compose(a => b => a + b)(identity))
// f is equivalent to (a => b => b + a)
val f = flip(compose((_ + _).curried)(identity))
\end{minted}
%
This mess is an artefact of the left-recursion factoring transformation -- recombination of unfolded parsers requires using higher-order functions such as \scala{flip} and \scala{compose}.
Yet again, any user would find it unacceptable if \texttt{parsley-garnish} gave this as the output of a transformation.
Therefore, these functions must be \emph{normalised} into a semantically equivalent but syntactically simpler form.

This \namecref{sec:function-representation} explores how function term normalisation can be achieved. % TODO: drawing parallels to...?
Therefore, this \namecref{sec:simplify-exprs} explores the following:
\begin{itemize}
\item How expressions can be represented as another intermediate \textsc{ast}, so that they are statically inspectable enough to be simplified.
\item The notion of \emph{normalisation}, reducing expressions into a semantically equivalent but syntactically simpler form.
\end{itemize}

\subsection{The $n$-ary Lambda Calculus}
Once again, the complexity of manipulating the generic Scalameta \textsc{ast} can be avoided by building a new intermediate \textsc{ast} representation for expression terms.
Expand Down Expand Up @@ -78,6 +76,20 @@ \subsubsection{$\beta$-Reduction and $\alpha$-Conversion}
&= \lambda z. y z
\end{align*}

\subsubsection{Simplifying the Example Expression}
The example from the beginning of the \namecref{sec:simplify-exprs} can thus be evaluated by hand via $\beta$-reduction, representing the higher-order functions as $\lambda$-abstractions:
\begin{align*}
\mathtt{flip(compose((\_ + \_).curried)(identity))}\quad &\mathrel{=\quad\,} \mathrm{flip}\ (\mathrm{compose}\ (\lambda a. \lambda b. a + b)\ \mathrm{identity}) \\
&\mathrel{=\quad\,} \mathrm{flip}\ ( (\lambda f. \lambda g. \lambda x. f\ (g\ x)) (\lambda a. \lambda b.\ a + b) (\lambda x. x)) \\
&\rightarrow_{\beta*} \mathrm{flip}\ (\lambda g. \lambda x.\ (\lambda b.\ g\ x + b) (\lambda x. x)) \\
&\rightarrow_{\beta*} \mathrm{flip}\ (\lambda x. \lambda b.\ x + b) \\
&\mathrel{=\quad\,} (\lambda f. \lambda x. \lambda y.\ f\ y\ x) (\lambda x. \lambda b.\ x + b) \\
&\rightarrow_{\beta*} \lambda x. \lambda y.\ y + x
\end{align*}
%
This normalised expression has the same meaning as the original, but is now suitable to be placed in the code rewrite!
The rest of the \namecref{sec:simplify-exprs} now explores how this process can be implemented in \texttt{parsley-garnish}.

\subsection{Representing Names}
There exists a plethora of approaches to implementing the $\lambda$-calculus, mostly differing in how they represent variable names.
This affects how variable capture is handled, and also how $\alpha$-equivalence of two terms can be determined.
Expand Down Expand Up @@ -297,9 +309,11 @@ \subsection{The Expression \textsc{adt}}
\end{itemize}

\subsubsection{Evaluating Performance of Normalisation Strategies}
\TODO{
TODO: This is worthy discussion anyway, but is is worth benchmarking the performance and comparing the old and new implementations? Does this count towards evaluation?
}
\texttt{parsley-garnish} originally used a named approach with Barendregt's convention, generating fresh variable names using an atomic counter.
However, this required an extra $\alpha$-conversion pass to clean up variable names before pretty-printing the term, since the fresh variable names were very ugly.
\TODO{TODO: graphs of benchmarks and comparison (nbe is orders of magnitude faster lol)}

\subsection{Lifting to the Intermediate Expression \textsc{ast}}\label{sec:lifting-expr}
The \scala{Parser} \textsc{ast} is amended to take \scala{Expr} arguments where they used to take \scala{scala.meta.Term} values.
Expand All @@ -316,7 +330,6 @@ \subsection{Lifting to the Intermediate Expression \textsc{ast}}\label{sec:lifti
%
The \scala{toExpr} extension method on \scala{scala.meta.Term} is used to lift \scala{Term} \textsc{ast} nodes to \scala{Expr} terms.
Expression lifting is invoked whenever a parser expects an expression (whether a function or simple value) as an argument.

This \namecref{sec:lifting-expr} gives a high-level overview of the three cases that \scala{toExpr} handles.

\subsubsection{Lambda Expressions}
Expand Down Expand Up @@ -387,6 +400,7 @@ \subsection{Normalising Expression Terms}
}
\end{minted}
%
\rj{Much code in this following section -- is it necessary or should it go in an appendix}
\paragraph{Evaluation}
Evaluation proceeds by carrying an environment mapping bound variables to their semantic representations.
Evaluating a variable looks up its name in the environment, while evaluating a lambda abstraction produces a closure using the current environment -- using \textsc{hoas} allows these closures to be represented as native Scala closures.
Expand Down Expand Up @@ -450,32 +464,41 @@ \subsection{Lowering Back to the Scalameta \textsc{ast}}

\subsection{Discussion}
\texttt{parsley} Haskell, as a staged parser combinator library, also has the ability to inspect and optimise the code of user-defined functions.
The approach taken by \texttt{parsley-garnish} and \text{parsley} share many similarities, both using the $\lambda$-calculus as a core language for expressions.
In both cases, the need to reduce expression terms is motivated by how parser simplifications involve fusion, resulting in a function application.
The approach taken by \texttt{parsley-garnish} and \text{parsley} share many similarities, both using the $\lambda$-calculus as a core language to normalise expressions.
In both cases, the need to reduce expression terms is motivated by how parser simplifications involve fusion, which results in function applications that can be partially evaluated.

However, the two have different motivations and requirements for normalising expressions, so their approaches differ in some ways.
\Cref{fig:nbe-vs-parsley} illustrates these differences.
However, the two have different motivations and requirements for normalising expressions, so their approaches differ in some ways --
\cref{fig:nbe-vs-parsley} illustrates these differences.

\paragraph{Syntactic representation}
\subsubsection{Syntactic representation}
Unlike \texttt{parsley-garnish}, \texttt{parsley} has a two-level syntactic representation for expressions.
\haskell{Defunc} is akin to a deep embedding of higher-order functions, representing them as a \textsc{gadt}: this process is known as \emph{defunctionalisation}~\cite{reynolds_defunc_1972,danvy_defunctionalization_2001}.
This helps facilitate certain parser law optimisations which require pattern matching on the \scala{identity} function, for example.
After this step, \haskell{Defunc} values are then brought into the lower-level $\lambda$-calculus representation, to be normalised by $\beta$-reduction.
This helps facilitate certain parser law optimisations which require pattern matching on functions as well as parsers, for example:
\begin{equation*}
\text{\scala{pure(identity) <*> u = u}}
\end{equation*}
After this step, \haskell{Defunc} values are then brought into the lower-level $\lambda$-calculus representation \haskell{Lambda}, to be normalised by $\beta$-reduction.

\paragraph{Normalisation strategy}
At the moment, \texttt{parsley-garnish} does not have a need to implement any parser simplifications based on these laws, although this may change in the future.
Adding an extra defunctionalised layer to the expression \textsc{ast} would be fairly straightforward.

\subsubsection{Normalisation strategy}
\texttt{parsley} normalises terms to full $\eta\beta$-\textsc{nf}, whereas \texttt{parsley-garnish} only normalises to $\beta$-\textsc{nf}.
This is because $\eta$-reduction in Scala 2 is not as straightforward as in Haskell, and is not always possible -- in most cases the appropriate reduction is instead to convert lambdas to placeholder syntax.
This is left as future work.

\texttt{parsley} performs reduction-based normalisation on a \textsc{hoas} representation of lambda expressions.
In \texttt{parsley}, normalisation is implemented as a reduction-based approach over the \textsc{hoas} \haskell{Lambda} datatype.
Normalisation by $\beta$-reduction with Haskell function application brings this to $\beta$-\textsc{whnf}.
Generating this into code is where this is brought to $\beta$-\textsc{nf} as desired, as well as an extra step for $\eta$-reduction to put the term into full $\eta\beta$-\textsc{nf}.
Then, code generation brings this further to $\beta$-\textsc{nf} as desired, as well as an extra step for $\eta$-reduction to put the term into full $\eta\beta$-\textsc{nf}.

The main reason why \texttt{parsley-garnish} takes a different normalisation approach is because unlike \texttt{parsley}, there is still a need for $\alpha$-equivalence checking after normalisation.
Whereas in \texttt{parsley} the normalised forms are directly used for code generation, in \texttt{parsley-garnish} these terms continue to be analysed before being pretty-printed as code patches.
In \texttt{parsley}, the normalised forms are immediately utilised for code generation, so they can be kept as \textsc{hoas} the entire time, without representing variables with any names.
Conversely, in \texttt{parsley-garnish}, these normalised terms undergo further analysis before being transformed into code patches for pretty-printing.

% Representation as a lambda calc has allocation overhead, but greatly simplifies function evaluation via beta reduction, instead of having to deal with high-level representations of compose/id (not too bad tbh) and flip (annoying).

% TODO: scala 3 macros, squid quasiquotes?

\begin{figure}[htbp]
\begin{equation*}
% Created here (but modified slightly) https://tikzcd.yichuanshen.de/#N4Igdg9gJgpgziAXAbVABwnAlgFyxMJZARgBoAGAXVJADcBDAGwFcYkQAdDnGADx2AAJAPIBBAMoBfEJNLpMufIRQBmCtTpNW7Lj35CxUgHq6+AsADNps+djwEiZAEwaGLNok7czwAGLMwAGN7MGs5EAw7JSI1Fxo3bU9TfQAZegBbACMoejDbRQcUcnV4rQ8vPQFxGHT6MDxAuDyIhRCiYrjNdx1vfXEAT3r6XmsNGCgAc3giUAsAJwh0pGKQHAgkMi7EirMcARg3aRpGekyYRgAFVujPRhgLHBlw+cWNmjWkNS3y5L2BOZgWAs-SOIBOZ0u10KYPujxsIBeS0QTne60QKwSP16f2AAIsd2CoPB5yuUWhdweT1mCyRKNWaK+mJ6lT2wEgc1qjCwcBgkgABABePm-VkAoEg4UcQJYOaBSUs-aHSXS2Xy3b-e4EnD8kDHU4kqHKEBzLATAAWcOeNKQdI+iE2XLA5RycDN41133YyGQfIAtHzKJQZJRJEA
Expand All @@ -497,4 +520,6 @@ \subsection{Discussion}
\label{fig:nbe-vs-parsley}
\end{figure}

% TODO: This section is about simplifying in the general domain, so really Squid can do all of this? Still an interesting approach ig -- And shows a shortcoming of scalameta quasiquotes

\end{document}
Binary file modified src/body/impl/parser.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion src/body/impl/parser.tex
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ \subsubsection{Simplifying the Example Parser}\label{sec:simplify-example}
\end{minted}
%
The parser is now expressed in a much simplified form, in a similar style to how it would be written by hand.
The remaining challenge is to simplify the contents of the expression \scala{f}, which is tackled in \cref{sec:function-representation}.
The remaining challenge is to simplify the contents of the expression \scala{f}, which is tackled in \cref{sec:simplify-exprs}.

\subsection{Implementing Rewrites on the Parser \textsc{ast}}
Lawful simplifications are applied by a bottom-up transformation over the recursively defined \scala{Parser} \textsc{ast}.
Expand Down

0 comments on commit af9e1c7

Please sign in to comment.