Skip to content

Commit

Permalink
Massage some sections
Browse files Browse the repository at this point in the history
  • Loading branch information
roccojiang committed Jun 11, 2024
1 parent 7739bfc commit 379d526
Show file tree
Hide file tree
Showing 8 changed files with 109 additions and 72 deletions.
Binary file modified src/body/impl/expr.pdf
Binary file not shown.
93 changes: 64 additions & 29 deletions src/body/impl/expr.tex

Large diffs are not rendered by default.

Binary file modified src/body/impl/parser.pdf
Binary file not shown.
39 changes: 28 additions & 11 deletions src/body/impl/parser.tex
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,23 @@ \section{Simplifying Parsers}\label{sec:simplify-parsers}
\item \texttt{parsley-garnish} performs rewrites on the parser \textsc{ast} to produce a more readable \emph{textual representation of code}.
\end{itemize}

\TODO{Place this in the right bit (I think intro is good)}
As noted by \textcite{gibbons_dsls_2014}, a deep-embedded \textsc{dsl} consists of two components:
\begin{itemize}
\item A representation of the language's abstract syntax, in the form of the aforementioned datatype.
\item Some traversals over the datatype, which gives \emph{semantics} to that syntax.
\end{itemize}
A deep-embedded \textsc{dsl} and a linter for that \textsc{dsl} can share the same abstract syntax, but differ in the semantic interpretation of that syntax:
% TODO: I'm not really getting my point across...
\begin{itemize}
\item The \textsc{dsl} semantics are evaluation. In this case, \texttt{parsley} interprets its syntax to output a parser.
\item The linter's semantics are pretty-printing. In this case, \texttt{parsley-garnish} interprets the syntax to output a human-readable representation of the parser.
\end{itemize}
% semantics for parsley: evaluate parser
% semantics for parsley-garnish: pretty-print the parser

Quote parsley Haskell: "parsers of hand-written quality"

% TODO: fix the above "intro" ------------------------------------------------------------------------------

\subsection{Parser Laws}
Expand All @@ -21,7 +38,7 @@ \subsection{Parser Laws}
These same principles can be used by \texttt{parsley-garnish} to simplify parser terms to be more human-readable.

\Cref{fig:parser-laws} shows the subset of parser laws utilised by \texttt{parsley-garnish} for parser simplification.
Most of the laws in \cref{fig:parser-laws} have already been shown to hold for Parsley by \textcite{willis_garnishing_2018}; an additional proof for \cref{eqn:alt-fmap-absorb} can be found in \cref{appendix:parser-law-proofs}.
Most of these laws have already been shown to hold for Parsley by \textcite{willis_garnishing_2018}; an additional proof for \cref{eqn:alt-fmap-absorb} can be found in \cref{appendix:parser-law-proofs}.

\begin{figure}[htbp]
\centering
Expand All @@ -44,15 +61,15 @@ \subsection{Parser Laws}
\label{fig:parser-laws}
\end{figure}

\subsubsection{Simplifying the Example Parser}
This section provides a worked example of how the parser in \cref{fig:leftrec-example-bad} is simplified using parser laws.
Most of the noise in \cref{fig:leftrec-example-bad} comes from the large number of \scala{empty} combinators.
These can be eliminated using \cref{eqn:alt-left-neutral,eqn:alt-right-neutral,eqn:alt-empty-absorb,eqn:alt-fmap-absorb}:
% lazy val expr: Parsley[String] = chain.postfix(string("b"))(
\subsubsection{Simplifying the Example Parser}\label{sec:simplify-example}
It is useful to illustrate how these laws can be used to simplify a parser term by starting with the parser in \cref{fig:leftrec-example-bad} --
most of the noise in \scala{example} comes from the large number of \scala{empty} combinators.
These can first be eliminated using \cref{eqn:alt-left-neutral,eqn:alt-right-neutral,eqn:alt-empty-absorb,eqn:alt-fmap-absorb}:
% lazy val example: Parsley[String] = chain.postfix(string("b"))(
% (pure(identity).map(compose((_ + _).curried))).map(flip) <*> string("a")
% )
\begin{minted}[escapeinside=\%\%]{scala}
lazy val expr: Parsley[String] = chain.postfix(string("b"))(
lazy val example: Parsley[String] = chain.postfix(string("b"))(
(pure(%\textcolor{gray}{identity}%).map(%\textcolor{gray}{compose((\_ + \_).curried)}%)).map(%\textcolor{gray}{flip}%) <*> string("a")
)
\end{minted}
Expand Down Expand Up @@ -112,15 +129,15 @@ \subsection{Implementing Rewrites on the Parser \textsc{ast}}
%
Therefore, any transformation on parsers can be defined without having to worry about recursion boilerplate: the act of traversal itself is fully abstracted away and encapsulated within the \scala{transform} method.
Using \scala{rewrite}, parser simplification can then be expressed in a clean and maintainable manner:
% // p.map(f).map(g) == p.map(g compose f)
% // pure(f) <*> pure(x) == pure(f(x))
% // u <|> empty == u
% // pure(f) <|> u == pure(f)
\begin{minted}{scala}
def simplify: Parser = this.rewrite {
// p.map(f).map(g) == p.map(g compose f)
case FMap(FMap(p, f), g) => FMap(p, compose(g, f))
// pure(f) <*> pure(x) == pure(f(x))
case Pure(f) <*> Pure(x) => Pure(App(f, x))
// u <|> empty == u
case u <|> Empty => u
// pure(f) <|> u == pure(f)
case Pure(f) <|> _ => Pure(f)
...
}
Expand Down
Binary file modified src/body/leftrec.pdf
Binary file not shown.
34 changes: 17 additions & 17 deletions src/body/leftrec.tex
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ \subsection{The Need for an Intermediate \textsc{ast}}\label{sec:parser-ast-moti

Since \texttt{parsley-garnish} is a linter, by nature, it has access to an explicit grammar representation in the form of the full \scala{scala.meta.Tree} \textsc{ast} of the source program.
However, this datatype represents general-purpose abstract Scala syntax, rather than the abstract syntax of a specialised parser combinator \textsc{dsl}.
This makes it clumsier to perform domain-specific operations over the \textsc{ast}.
This makes it not well-suited for performing domain-specific operations over the \textsc{ast}.

Take for example the task of combining two \textsc{ast} nodes \scala{Term.Name("p")} and \scala{Term.Name("q")}, representing named parsers \scala{p} and \scala{q}, with the \emph{ap} combinator \scala{<*>}.
This operation can be concisely expressed with Scalameta quasiquotes, rather than manually writing out the full explicit \textsc{ast}:
Expand Down Expand Up @@ -94,7 +94,7 @@ \subsubsection{The Parser \textsc{adt}}
This makes working with \scala{Parser} terms feel closer to writing \texttt{parsley} code.
For example, notice how constructing the \emph{code} representation of the \scala{example} parser resembles how the original parser itself would be written:
\begin{minted}{scala}
val exNT = NonTerminal(Sym("path/to/package/ObjectName.example."))
val exNT = NonTerminal(Sym(Term.Name("example").symbol))
// val ex: Parsley[String] = (ex, string("a")).zipped( _ + _ ) | string("b")
val ex: Parser = List(exNT, Str("a")).zipped(q"_ + _") <|> Str("b")
Expand All @@ -107,15 +107,14 @@ \subsubsection{The Parser \textsc{adt}}
% This representation also then gives us for free the implementation for lint rules such as \emph{Simplify Complex Parsers} rule, which applies parser laws to simplify parsers.
\subsection{Lifting to the Intermediate Parser \textsc{ast}}
Converting the raw Scala \textsc{ast} to the intermediate \textsc{ast} requires the following basic operations:
Converting the raw Scala \textsc{ast} to this intermediate parser combinator \textsc{ast} requires the following basic operations:
\begin{enumerate}
\item Identifying all named parsers defined in the source program -- these correspond to non-terminal symbols in the grammar.
\item Lifting the definition each parser into the intermediate \textsc{ast}, as a \scala{Parser} object.
\item Collecting these into a map to represent the high-level grammar: the unique symbol of each named parser is mapped to its corresponding \scala{Parser} object, along with some extra meta-information required for the transformation.
\item Lifting the definition each parser into the intermediate \textsc{ast}, i.e. a \scala{Parser} object.
\item Collecting these into a map to represent the high-level grammar -- the unique symbol of each named parser is mapped to its corresponding \scala{Parser} object, along with extra meta-information required for the transformation.
\end{enumerate}
%
Most importantly, this meta-information includes a reference to a parser's original node in the Scala \textsc{ast}, so that any lint diagnostics or code rewrites can be applied to the correct location in the source file.
This is simply defined as:
Most importantly, this meta-information includes a reference to a parser's original node in the Scala \textsc{ast}, so lint diagnostics or code rewrites can be applied to the correct location in the source file:
\begin{minted}{scala}
case class ParserDefn(name: Term.Name, parser: Parser, tpe: Type.Name, originalTree: Term)
\end{minted}
Expand All @@ -141,10 +140,10 @@ \subsubsection{Identifying Named Parsers}
%
% In this case, the type of \scala{example} is explicitly annotated by the user since this is required for recursive definitions.
% However in general, users will not explicitly annotate the types of their parsers, allowing the Scala compiler to infer the type.
Note that the \scala{decltpe} field refers to the syntax of the explicit type annotation, not the semantic information of the inferred type of the variable.
Note that the \scala{decltpe} field refers to the \emph{syntax} of the explicit type annotation, not the \emph{semantic} information the variable's inferred type.
Therefore, this field will not always be present, so in the general case, the type must be queried via a symbol information lookup like so:
\begin{minted}{scala}
tree match {
exampleTree match {
case Defn.Val(_, List(Pat.Var(varName)), _, body) =>
println(s"qualified symbol = ${varName.symbol}")
varName.symbol.info.get.signature match {
Expand All @@ -171,10 +170,10 @@ \subsubsection{Converting Scalameta Terms to the Parser \textsc{adt}}
This involves pattern matching on the \scala{scala.meta.Term} to determine which parser combinator it represents, and then constructing the appropriate \scala{Parser} instance.
Each \scala{Parser} defines a partial function \scala{fromTerm} to instantiate a parser from the appropriate \scala{scala.meta.Term}.
These \scala{fromTerm} methods perform the ugly work of pattern matching on the low-level syntactic constructs of the Scala \textsc{ast}.
These \scala{fromTerm} methods perform the menial work of pattern matching on the low-level syntactic constructs of the Scala \textsc{ast}.
All \scala{fromTerm} methods are combined to define the \scala{toParser} extension method on \scala{scala.meta.Term} -- this is where \textsc{ast} nodes are lifted to their corresponding \scala{Parser} representation.
The pattern matching example from \cref{sec:parser-ast-motivation} makes a reappearance in the definition of \scala{Ap.fromTerm}, where the arguments to the \scala{<*>} combinator are recursively lifted to \scala{Parser} objects:
The pattern matching example from \cref{sec:parser-ast-motivation} makes a reappearance in the definition of \scala{Ap.fromTerm}, where the arguments to the \scala{<*>} combinator are instead recursively lifted to \scala{Parser} objects:
% Use Scalafix's \scala{SymbolMatcher} to match tree nodes that resolve to a specific set of symbols.
% This makes use of semantic information from SemanticDB, so we are sure that a \scala{<*>} is actually within the \scala{parsley.Parsley} package, rather than some other function with the same name.
% This is much more robust compared to HLint, which suffers from false positives due to its reliance on syntactic information only.
Expand All @@ -192,7 +191,7 @@ \subsubsection{Converting Scalameta Terms to the Parser \textsc{adt}}
}
\end{minted}
%
Where a combinator takes a non-parser argument, this is treated as a black box and kept as a raw \textsc{ast} node:
Where a combinator takes a non-parser argument, this is treated as a black box and kept as a raw \textsc{ast} node of type \scala{scala.meta.Term}:
\begin{minted}{scala}
// x: A, pure(x): Parsley[A]
case class Pure(x: Term) extends Parser
Expand Down Expand Up @@ -226,10 +225,11 @@ \subsubsection{Building the Grammar Map}
}.toMap
\end{minted}
\subsection{Lowering Back to the Scalameta \textsc{ast}}
\subsection{Lowering Back to the Scalameta \textsc{ast}}\label{sec:lowering-parsers}
After all necessary transformations have been applied to parser terms, the final step is to convert them back to a textual representation to be applied as a Scalafix patch.
Parsers can be lowered back to \scala{scala.meta.Term} nodes by the inverse of the original \scala{fromTerm} transformation.
The \scala{Parser} trait defines this transformation as the method \scala{term}, using quasiquotes to simplify the construction of the \scala{scala.meta.Term} nodes.
For example:
\begin{minted}{scala}
case class Zipped(func: Function, parsers: List[Parser]) extends Parser {
val term: Term = q"(..${parsers.map(_.term)}).zipped(${func.term})"
Expand All @@ -242,7 +242,7 @@ \subsection{Implementing the Left-Recursion Transformation}
\TODO{TODO \\}
\subsubsection{Success...?}
Thus, running the transformation on the \scala{example} parser yields the output in \cref{fig:leftrec-example-bad}.
Running the transformation on the \scala{example} parser yields the output in \cref{fig:leftrec-example-bad}.
%
\begin{figure}[htbp]
\begin{minted}{scala}
Expand All @@ -262,12 +262,12 @@ \subsubsection{Success...?}
\label{fig:leftrec-example-bad}
\end{figure}
%
This is disappointing, to say the least.
This is... disappointing, to say the least.
There are \emph{many} things wrong with the transformed output:
\begin{itemize}
\item This output is horrendously complex and unreadable. The intent of the parser is entirely obfuscated in a sea of combinators.
\item The parser is horrendously complex and unreadable, its intent entirely obfuscated in a sea of combinators.
\item Having to define the \scala{flip} and \scala{compose} functions is not ideal, but inlining them as lambdas would make the code even worse.
\item The parser does not even typecheck -- unlike classical Hindley-Milner-based type systems, Scala only supports local type inference~\cite{cremet_core_2006}. As a result, the compiler is unable to correctly infer correct types for \scala{flip} and also asks for explicit type annotations in the lambda \scala{(_ + _).curried}.
\item Even worse, the parser does not even typecheck -- unlike classical Hindley-Milner-based type systems, Scala only has \emph{local} type inference~\cite{cremet_core_2006}. As a result, the compiler is unable to correctly infer correct types for \scala{flip} and also asks for explicit type annotations in the lambda \scala{(_ + _).curried}.
\end{itemize}
\end{document}
Binary file modified src/introduction/introduction.pdf
Binary file not shown.
15 changes: 0 additions & 15 deletions src/introduction/introduction.tex
Original file line number Diff line number Diff line change
Expand Up @@ -24,19 +24,4 @@ \section{Project Goals}
Additionally, for certain issues that can be automatically fixed, \texttt{parsley-garnish} will provide automated actions to resolve the issue. % TODO: via code transformations - put this in the background?
The goal of \texttt{parsley-garnish} is to be used as a companion library to \texttt{parsley}, in order to improve its ease of adoption and to help users enforce best practices.

\TODO{Place this in the right bit (I think intro is good)}
As noted by \textcite{gibbons_dsls_2014}, a deep-embedded \textsc{dsl} consists of two components:
\begin{itemize}
\item A representation of the language's abstract syntax, in the form of the aforementioned datatype.
\item Some traversals over the datatype, which gives \emph{semantics} to that syntax.
\end{itemize}
A deep-embedded \textsc{dsl} and a linter for that \textsc{dsl} can share the same abstract syntax, but differ in the semantic interpretation of that syntax:
% TODO: I'm not really getting my point across...
\begin{itemize}
\item The \textsc{dsl} semantics are evaluation. In this case, \texttt{parsley} interprets its syntax to output a parser.
\item The linter's semantics are pretty-printing. In this case, \texttt{parsley-garnish} interprets the syntax to output a human-readable representation of the parser.
\end{itemize}
% semantics for parsley: evaluate parser
% semantics for parsley-garnish: pretty-print the parser

\end{document}

0 comments on commit 379d526

Please sign in to comment.