Massage some sections

roccojiang · Jun 11, 2024 · 379d526 · 379d526
1 parent 7739bfc
commit 379d526
Show file tree

Hide file tree

Showing 8 changed files with 109 additions and 72 deletions.
diff --git a/src/body/impl/expr.pdf b/src/body/impl/expr.pdf
diff --git a/src/body/impl/expr.tex b/src/body/impl/expr.tex
diff --git a/src/body/impl/parser.pdf b/src/body/impl/parser.pdf
diff --git a/src/body/impl/parser.tex b/src/body/impl/parser.tex
@@ -13,6 +13,23 @@ \section{Simplifying Parsers}\label{sec:simplify-parsers}
   \item \texttt{parsley-garnish} performs rewrites on the parser \textsc{ast} to produce a more readable \emph{textual representation of code}.
 \end{itemize}
 
+\TODO{Place this in the right bit (I think intro is good)}
+As noted by \textcite{gibbons_dsls_2014}, a deep-embedded \textsc{dsl} consists of two components:
+\begin{itemize}
+  \item A representation of the language's abstract syntax, in the form of the aforementioned datatype.
+  \item Some traversals over the datatype, which gives \emph{semantics} to that syntax.
+\end{itemize}
+A deep-embedded \textsc{dsl} and a linter for that \textsc{dsl} can share the same abstract syntax, but differ in the semantic interpretation of that syntax:
+% TODO: I'm not really getting my point across...
+\begin{itemize}
+  \item The \textsc{dsl} semantics are evaluation. In this case, \texttt{parsley} interprets its syntax to output a parser.
+  \item The linter's semantics are pretty-printing. In this case, \texttt{parsley-garnish} interprets the syntax to output a human-readable representation of the parser.
+\end{itemize}
+% semantics for parsley: evaluate parser
+% semantics for parsley-garnish: pretty-print the parser
+
+Quote parsley Haskell: "parsers of hand-written quality"
+
 % TODO: fix the above "intro" ------------------------------------------------------------------------------
 
 \subsection{Parser Laws}
@@ -21,7 +38,7 @@ \subsection{Parser Laws}
 These same principles can be used by \texttt{parsley-garnish} to simplify parser terms to be more human-readable.
 
 \Cref{fig:parser-laws} shows the subset of parser laws utilised by \texttt{parsley-garnish} for parser simplification.
-Most of the laws in \cref{fig:parser-laws} have already been shown to hold for Parsley by \textcite{willis_garnishing_2018}; an additional proof for \cref{eqn:alt-fmap-absorb} can be found in \cref{appendix:parser-law-proofs}.
+Most of these laws have already been shown to hold for Parsley by \textcite{willis_garnishing_2018}; an additional proof for \cref{eqn:alt-fmap-absorb} can be found in \cref{appendix:parser-law-proofs}.
 
 \begin{figure}[htbp]
 \centering
@@ -44,15 +61,15 @@ \subsection{Parser Laws}
 \label{fig:parser-laws}
 \end{figure}
 
-\subsubsection{Simplifying the Example Parser}
-This section provides a worked example of how the parser in \cref{fig:leftrec-example-bad} is simplified using parser laws.
-Most of the noise in \cref{fig:leftrec-example-bad} comes from the large number of \scala{empty} combinators.
-These can be eliminated using \cref{eqn:alt-left-neutral,eqn:alt-right-neutral,eqn:alt-empty-absorb,eqn:alt-fmap-absorb}:
-% lazy val expr: Parsley[String] = chain.postfix(string("b"))(
+\subsubsection{Simplifying the Example Parser}\label{sec:simplify-example}
+It is useful to illustrate how these laws can be used to simplify a parser term by starting with the parser in \cref{fig:leftrec-example-bad} --
+most of the noise in \scala{example} comes from the large number of \scala{empty} combinators.
+These can first be eliminated using \cref{eqn:alt-left-neutral,eqn:alt-right-neutral,eqn:alt-empty-absorb,eqn:alt-fmap-absorb}:
+% lazy val example: Parsley[String] = chain.postfix(string("b"))(
 %   (pure(identity).map(compose((_ + _).curried))).map(flip) <*> string("a")
 % )
 \begin{minted}[escapeinside=\%\%]{scala}
-lazy val expr: Parsley[String] = chain.postfix(string("b"))(
+lazy val example: Parsley[String] = chain.postfix(string("b"))(
   (pure(%\textcolor{gray}{identity}%).map(%\textcolor{gray}{compose((\_ + \_).curried)}%)).map(%\textcolor{gray}{flip}%) <*> string("a")
 )
 \end{minted}
@@ -112,15 +129,15 @@ \subsection{Implementing Rewrites on the Parser \textsc{ast}}
 %
 Therefore, any transformation on parsers can be defined without having to worry about recursion boilerplate: the act of traversal itself is fully abstracted away and encapsulated within the \scala{transform} method.
 Using \scala{rewrite}, parser simplification can then be expressed in a clean and maintainable manner:
+% // p.map(f).map(g) == p.map(g compose f)
+% // pure(f) <*> pure(x) == pure(f(x))
+% // u <|> empty == u
+% // pure(f) <|> u == pure(f)
 \begin{minted}{scala}
 def simplify: Parser = this.rewrite {
-  // p.map(f).map(g) == p.map(g compose f)
   case FMap(FMap(p, f), g) => FMap(p, compose(g, f))
-  // pure(f) <*> pure(x) == pure(f(x))
   case Pure(f) <*> Pure(x) => Pure(App(f, x))
-  // u <|> empty == u
   case u <|> Empty => u
-  // pure(f) <|> u == pure(f)
   case Pure(f) <|> _ => Pure(f)
   ...
 }

diff --git a/src/body/leftrec.pdf b/src/body/leftrec.pdf
diff --git a/src/body/leftrec.tex b/src/body/leftrec.tex
@@ -19,7 +19,7 @@ \subsection{The Need for an Intermediate \textsc{ast}}\label{sec:parser-ast-moti
 
 Since \texttt{parsley-garnish} is a linter, by nature, it has access to an explicit grammar representation in the form of the full \scala{scala.meta.Tree} \textsc{ast} of the source program.
 However, this datatype represents general-purpose abstract Scala syntax, rather than the abstract syntax of a specialised parser combinator \textsc{dsl}.
-This makes it clumsier to perform domain-specific operations over the \textsc{ast}.
+This makes it not well-suited for performing domain-specific operations over the \textsc{ast}.
 
 Take for example the task of combining two \textsc{ast} nodes \scala{Term.Name("p")} and \scala{Term.Name("q")}, representing named parsers \scala{p} and \scala{q}, with the \emph{ap} combinator \scala{<*>}.
 This operation can be concisely expressed with Scalameta quasiquotes, rather than manually writing out the full explicit \textsc{ast}:
@@ -94,7 +94,7 @@ \subsubsection{The Parser \textsc{adt}}
 This makes working with \scala{Parser} terms feel closer to writing \texttt{parsley} code.
 For example, notice how constructing the \emph{code} representation of the \scala{example} parser resembles how the original parser itself would be written:
 \begin{minted}{scala}
-val exNT = NonTerminal(Sym("path/to/package/ObjectName.example."))  
+val exNT = NonTerminal(Sym(Term.Name("example").symbol))  
 
 // val ex: Parsley[String] =     (ex,  string("a")).zipped(  _ + _ )  |  string("b")
    val ex: Parser          = List(exNT,   Str("a")).zipped(q"_ + _") <|>    Str("b")
@@ -107,15 +107,14 @@ \subsubsection{The Parser \textsc{adt}}
 % This representation also then gives us for free the implementation for lint rules such as \emph{Simplify Complex Parsers} rule, which applies parser laws to simplify parsers.
 
 \subsection{Lifting to the Intermediate Parser \textsc{ast}}
-Converting the raw Scala \textsc{ast} to the intermediate \textsc{ast} requires the following basic operations:
+Converting the raw Scala \textsc{ast} to this intermediate parser combinator \textsc{ast} requires the following basic operations:
 \begin{enumerate}
   \item Identifying all named parsers defined in the source program -- these correspond to non-terminal symbols in the grammar.
-  \item Lifting the definition each parser into the intermediate \textsc{ast}, as a \scala{Parser} object.
-  \item Collecting these into a map to represent the high-level grammar: the unique symbol of each named parser is mapped to its corresponding \scala{Parser} object, along with some extra meta-information required for the transformation.
+  \item Lifting the definition each parser into the intermediate \textsc{ast}, i.e. a \scala{Parser} object.
+  \item Collecting these into a map to represent the high-level grammar -- the unique symbol of each named parser is mapped to its corresponding \scala{Parser} object, along with extra meta-information required for the transformation.
 \end{enumerate}
 %
-Most importantly, this meta-information includes a reference to a parser's original node in the Scala \textsc{ast}, so that any lint diagnostics or code rewrites can be applied to the correct location in the source file.
-This is simply defined as:
+Most importantly, this meta-information includes a reference to a parser's original node in the Scala \textsc{ast}, so lint diagnostics or code rewrites can be applied to the correct location in the source file:
 \begin{minted}{scala}
 case class ParserDefn(name: Term.Name, parser: Parser, tpe: Type.Name, originalTree: Term)
 \end{minted}
@@ -141,10 +140,10 @@ \subsubsection{Identifying Named Parsers}
 %
 % In this case, the type of \scala{example} is explicitly annotated by the user since this is required for recursive definitions.
 % However in general, users will not explicitly annotate the types of their parsers, allowing the Scala compiler to infer the type.
-Note that the \scala{decltpe} field refers to the syntax of the explicit type annotation, not the semantic information of the inferred type of the variable.
+Note that the \scala{decltpe} field refers to the \emph{syntax} of the explicit type annotation, not the \emph{semantic} information the variable's inferred type.
 Therefore, this field will not always be present, so in the general case, the type must be queried via a symbol information lookup like so:
 \begin{minted}{scala}
-tree match {
+exampleTree match {
   case Defn.Val(_, List(Pat.Var(varName)), _, body) =>
     println(s"qualified symbol = ${varName.symbol}")
     varName.symbol.info.get.signature match {
@@ -171,10 +170,10 @@ \subsubsection{Converting Scalameta Terms to the Parser \textsc{adt}}
 This involves pattern matching on the \scala{scala.meta.Term} to determine which parser combinator it represents, and then constructing the appropriate \scala{Parser} instance.
 
 Each \scala{Parser} defines a partial function \scala{fromTerm} to instantiate a parser from the appropriate \scala{scala.meta.Term}.
-These \scala{fromTerm} methods perform the ugly work of pattern matching on the low-level syntactic constructs of the Scala \textsc{ast}.
+These \scala{fromTerm} methods perform the menial work of pattern matching on the low-level syntactic constructs of the Scala \textsc{ast}.
 All \scala{fromTerm} methods are combined to define the \scala{toParser} extension method on \scala{scala.meta.Term} -- this is where \textsc{ast} nodes are lifted to their corresponding \scala{Parser} representation.
 
-The pattern matching example from \cref{sec:parser-ast-motivation} makes a reappearance in the definition of \scala{Ap.fromTerm}, where the arguments to the \scala{<*>} combinator are recursively lifted to \scala{Parser} objects:
+The pattern matching example from \cref{sec:parser-ast-motivation} makes a reappearance in the definition of \scala{Ap.fromTerm}, where the arguments to the \scala{<*>} combinator are instead recursively lifted to \scala{Parser} objects:
 % Use Scalafix's \scala{SymbolMatcher} to match tree nodes that resolve to a specific set of symbols.
 % This makes use of semantic information from SemanticDB, so we are sure that a \scala{<*>} is actually within the \scala{parsley.Parsley} package, rather than some other function with the same name.
 % This is much more robust compared to HLint, which suffers from false positives due to its reliance on syntactic information only.
@@ -192,7 +191,7 @@ \subsubsection{Converting Scalameta Terms to the Parser \textsc{adt}}
 }
 \end{minted}
 %
-Where a combinator takes a non-parser argument, this is treated as a black box and kept as a raw \textsc{ast} node:
+Where a combinator takes a non-parser argument, this is treated as a black box and kept as a raw \textsc{ast} node of type \scala{scala.meta.Term}:
 \begin{minted}{scala}
 // x: A, pure(x): Parsley[A]
 case class Pure(x: Term) extends Parser
@@ -226,10 +225,11 @@ \subsubsection{Building the Grammar Map}
 }.toMap
 \end{minted}
 
-\subsection{Lowering Back to the Scalameta \textsc{ast}}
+\subsection{Lowering Back to the Scalameta \textsc{ast}}\label{sec:lowering-parsers}
 After all necessary transformations have been applied to parser terms, the final step is to convert them back to a textual representation to be applied as a Scalafix patch.
 Parsers can be lowered back to \scala{scala.meta.Term} nodes by the inverse of the original \scala{fromTerm} transformation.
 The \scala{Parser} trait defines this transformation as the method \scala{term}, using quasiquotes to simplify the construction of the \scala{scala.meta.Term} nodes.
+For example:
 \begin{minted}{scala}
 case class Zipped(func: Function, parsers: List[Parser]) extends Parser {
   val term: Term = q"(..${parsers.map(_.term)}).zipped(${func.term})"
@@ -242,7 +242,7 @@ \subsection{Implementing the Left-Recursion Transformation}
 \TODO{TODO \\}
 
 \subsubsection{Success...?}
-Thus, running the transformation on the \scala{example} parser yields the output in \cref{fig:leftrec-example-bad}.
+Running the transformation on the \scala{example} parser yields the output in \cref{fig:leftrec-example-bad}.
 %
 \begin{figure}[htbp]
 \begin{minted}{scala}
@@ -262,12 +262,12 @@ \subsubsection{Success...?}
 \label{fig:leftrec-example-bad}
 \end{figure}
 %
-This is disappointing, to say the least.
+This is... disappointing, to say the least.
 There are \emph{many} things wrong with the transformed output:
 \begin{itemize}
-  \item This output is horrendously complex and unreadable. The intent of the parser is entirely obfuscated in a sea of combinators.
+  \item The parser is horrendously complex and unreadable, its intent entirely obfuscated in a sea of combinators.
   \item Having to define the \scala{flip} and \scala{compose} functions is not ideal, but inlining them as lambdas would make the code even worse.
-  \item The parser does not even typecheck -- unlike classical Hindley-Milner-based type systems, Scala only supports local type inference~\cite{cremet_core_2006}. As a result, the compiler is unable to correctly infer correct types for \scala{flip} and also asks for explicit type annotations in the lambda \scala{(_ + _).curried}.
+  \item Even worse, the parser does not even typecheck -- unlike classical Hindley-Milner-based type systems, Scala only has \emph{local} type inference~\cite{cremet_core_2006}. As a result, the compiler is unable to correctly infer correct types for \scala{flip} and also asks for explicit type annotations in the lambda \scala{(_ + _).curried}.
 \end{itemize}
 
 \end{document}
diff --git a/src/introduction/introduction.pdf b/src/introduction/introduction.pdf
diff --git a/src/introduction/introduction.tex b/src/introduction/introduction.tex
@@ -24,19 +24,4 @@ \section{Project Goals}
 Additionally, for certain issues that can be automatically fixed, \texttt{parsley-garnish} will provide automated actions to resolve the issue. % TODO: via code transformations - put this in the background?
 The goal of \texttt{parsley-garnish} is to be used as a companion library to \texttt{parsley}, in order to improve its ease of adoption and to help users enforce best practices.
 
-\TODO{Place this in the right bit (I think intro is good)}
-As noted by \textcite{gibbons_dsls_2014}, a deep-embedded \textsc{dsl} consists of two components:
-\begin{itemize}
-  \item A representation of the language's abstract syntax, in the form of the aforementioned datatype.
-  \item Some traversals over the datatype, which gives \emph{semantics} to that syntax.
-\end{itemize}
-A deep-embedded \textsc{dsl} and a linter for that \textsc{dsl} can share the same abstract syntax, but differ in the semantic interpretation of that syntax:
-% TODO: I'm not really getting my point across...
-\begin{itemize}
-  \item The \textsc{dsl} semantics are evaluation. In this case, \texttt{parsley} interprets its syntax to output a parser.
-  \item The linter's semantics are pretty-printing. In this case, \texttt{parsley-garnish} interprets the syntax to output a human-readable representation of the parser.
-\end{itemize}
-% semantics for parsley: evaluate parser
-% semantics for parsley-garnish: pretty-print the parser
-
 \end{document}